Results 1 -
4 of
4
The grid file: An adaptable, symmetric multikey file structure
- ACM Transactions on Database Systems
, 1984
"... Traditional file structures that provide multikey access to records, for example, inverted files, are extensions of file structures originally designed for single-key access. They manifest various deficien-cies in particular for multikey access to highly dynamic files. We study the dynamic aspects o ..."
Abstract
-
Cited by 362 (4 self)
- Add to MetaCart
Traditional file structures that provide multikey access to records, for example, inverted files, are extensions of file structures originally designed for single-key access. They manifest various deficien-cies in particular for multikey access to highly dynamic files. We study the dynamic aspects of tile structures that treat all keys symmetrically, that is, file structures which avoid the distinction between primary and secondary keys. We start from a bitmap approach and treat the problem of file design as one of data compression of a large sparse matrix. This leads to the notions of a grid partition of the search space and of a grid directory, which are the keys to a dynamic file structure called the grid file. This tile system adapts gracefully to its contents under insertions and deletions, and thus achieves an upper hound of two disk accesses for single record retrieval; it also handles range queries and partially specified queries efficiently. We discuss in detail the design decisions that led to the grid file, present simulation results of its behavior, and compare it to other multikey access file structures.
Query Processing in Tertiary Memory Databases
- IN PROC. OF THE 21ST INT. CONF. ON VERY LARGE DATA BASES
, 1996
"... ..."
A parallel algorithm for record clustering
- ACM Trans. on Database Systems
, 1990
"... We present an efficient heuristic algorithm for record clustering that can run on a SIMD machine. We introduce the P-tree, and its associated numbering scheme, which in the split phase allows each processor independently to compute the unique cluster number of a record satisfying an arbitrary query. ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
We present an efficient heuristic algorithm for record clustering that can run on a SIMD machine. We introduce the P-tree, and its associated numbering scheme, which in the split phase allows each processor independently to compute the unique cluster number of a record satisfying an arbitrary query. We show that by restricting ourselves in the merge phase to combining only sibling clusters, we obtain a parallel algorithm whose speedup ratio is optimal in the number of processors used. Finally, we report on experiments showing that our method produces substantial savings in an environment with relatively little overlap among the queries.
Automated Design of Multidimensional Clustering Tables for Relational Databases
, 2004
"... The ability to physically cluster a database table on multiple dimensions is a powerful technique that offers significant performance benefits in many OLAP, warehousing, and decision-support systems. An industrial implementation of this technique for the DB2 Universal Database^TM (DB2 UDB) pro ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The ability to physically cluster a database table on multiple dimensions is a powerful technique that offers significant performance benefits in many OLAP, warehousing, and decision-support systems. An industrial implementation of this technique for the DB2 Universal Database^TM (DB2 UDB) product, called multidimensional clustering (MDC), which co-exists with other classical forms of data storage and indexing methods, was described in VLDB 2003. This paper describes the first published model for automating the selection of clustering keys in single-dimensional and multidimensional relational databases that use a cell/block storage structure for MDC. For any significant dimensionality (3 or more), the possible solution space is combinatorially complex. The automated MDC design model is based on whatif query cost modeling, data sampling, and a search algorithm for evaluating a large constellation of possible combinations. The model is effective at trading the benefits of potential combinations of clustering keys against data sparsity and performance. It also effectively selects the granularity at which dimensions should be used for clustering (such as week of year versus month of year). We show results from experiments indicating that the model provides design recommendations of comparable quality to those made by human experts. The model has been implemented in the IBM DB2 UDB for Linux, UNIX and Windows Version 8.2 release.

