Results 1 -
4 of
4
The grid file: An adaptable, symmetric multikey file structure
- ACM Transactions on Database Systems
, 1984
"... Traditional file structures that provide multikey access to records, for example, inverted files, are extensions of file structures originally designed for single-key access. They manifest various deficien-cies in particular for multikey access to highly dynamic files. We study the dynamic aspects o ..."
Abstract
-
Cited by 362 (4 self)
- Add to MetaCart
Traditional file structures that provide multikey access to records, for example, inverted files, are extensions of file structures originally designed for single-key access. They manifest various deficien-cies in particular for multikey access to highly dynamic files. We study the dynamic aspects of tile structures that treat all keys symmetrically, that is, file structures which avoid the distinction between primary and secondary keys. We start from a bitmap approach and treat the problem of file design as one of data compression of a large sparse matrix. This leads to the notions of a grid partition of the search space and of a grid directory, which are the keys to a dynamic file structure called the grid file. This tile system adapts gracefully to its contents under insertions and deletions, and thus achieves an upper hound of two disk accesses for single record retrieval; it also handles range queries and partially specified queries efficiently. We discuss in detail the design decisions that led to the grid file, present simulation results of its behavior, and compare it to other multikey access file structures.
Universal Hashing of
"... Introduction Main memories of a computer usually have random access time. Since their size is limited with the cost requirements, databases are usually stored on external memory with a rather slow access. The typical searching problem is as follows : given a pattern, nd addresses of all records th ..."
Abstract
- Add to MetaCart
Introduction Main memories of a computer usually have random access time. Since their size is limited with the cost requirements, databases are usually stored on external memory with a rather slow access. The typical searching problem is as follows : given a pattern, nd addresses of all records that are close to the pattern in some sense. We will understand records and a pattern as binary vectors of the same length and assume that two vectors are close to each other if the Hamming distance between them does not exceed a given value T: The problem can be also introduced as list decoding for a code whose codewords are records when the pattern is the received word. The designer of a searching procedure usually has only some partial information about the database. In particular, records can consist of several \dense" subsets of vectors, where each subset contains records that are close to each other (these subsets are called clusters). A common approach to solving the problem is known a
Cache-Oblivious Hashing ∗
"... The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average tq = 1+1/2 ..."
Abstract
- Add to MetaCart
The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average tq = 1+1/2 Ω(b) disk accesses for any load factor α bounded away from 1. However, such near-perfect performance is achieved only when b is known and the hash table is particularly tuned for working with such a blocking. In this paper we study if it is possible to build a cache-oblivious hash table that works well with any blocking. Such a hash table will automatically perform well across all levels of the memory hierarchy and does not need any hardware-specific tuning, an important feature in autonomous databases. We first show that linear probing, a classical collision resolution strategy for hash tables, can be easily made cacheoblivious but it only achieves tq = 1 + O(α/b). Then we demonstrate that it is possible to obtain tq = 1 + 1/2 Ω(b), thus matching the cache-aware bound, if the following two conditions hold: (a) b is a power of 2; and (b) every block starts at a memory address divisible by b. Both conditions hold on a real machine, although they are not stated in the cache-oblivious model. Interestingly, we also show that neither condition is dispensable: if either of them is removed, the best obtainable bound is tq = 1 + O(α/b), which is exactly what linear probing achieves.
Cache-ObliviousHashing ∗
"... The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average tq = 1+1/2 ..."
Abstract
- Add to MetaCart
The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average tq = 1+1/2 Ω(b) disk accesses for any load factor α bounded away from 1. However, such near-perfect performance is achieved only when b is known and the hash table is particularly tuned for working with such a blocking. In this paper we study if it is possible to build a cache-oblivious hash table that works well with any blocking. Such a hash table will automatically perform well across all levels of the memory hierarchy and does not need any hardware-specific tuning, an important feature in autonomous databases. We first show that linear probing, a classical collision resolution strategy for hash tables, can be easily made cacheoblivious but it only achieves tq = 1 + O(α/b). Then we demonstrate that it is possible to obtain tq = 1 + 1/2 Ω(b), thus matching the cache-aware bound, if the following two conditions hold: (a) b is a power of 2; and (b) every block starts at a memory address divisible by b. Both conditions hold on a real machine, although they are not stated in the cache-oblivious model. Interestingly, we also show that neither condition is dispensable: if either of them is removed, the best obtainable bound is tq = 1 + O(α/b), which is exactly what linear probing achieves.

