Results 1 
3 of
3
Using Hashing to Solve the Dictionary Problem (In External Memory)
, 2011
"... We consider the dictionary problem in external memory and improve the update time of the wellknown buffer tree by roughly a logarithmic factor. For any λ ≥ max{lg lg n, log M/B(n/B)}, we can support updates in time O ( λ B) and queries in time O(log λ n). We also present a lower bound in the cellpr ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We consider the dictionary problem in external memory and improve the update time of the wellknown buffer tree by roughly a logarithmic factor. For any λ ≥ max{lg lg n, log M/B(n/B)}, we can support updates in time O ( λ B) and queries in time O(log λ n). We also present a lower bound in the cellprobe model showing that our data structure is optimal. In the RAM, hash tables have been use to solve the dictionary problem faster than binary search for more than half a century. By contrast, our data structure is the first to beat the comparison barrier in external memory. Ours is also the first data structure to depart convincingly from the indivisibility paradigm. 1
CacheOblivious Hashing
, 2010
"... The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average tq = 1+1/2 ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average tq = 1+1/2 Ω(b) disk accesses for any load factor α bounded away from 1. However, such nearperfect performance is achieved only when b is known and the hash table is particularly tuned for working with such a blocking. In this paper we study if it is possible to build a cacheoblivious hash table that works well with any blocking. Such a hash table will automatically perform well across all levels of the memory hierarchy and does not need any hardwarespecific tuning, an important feature in autonomous databases. We first show that linear probing, a classical collision resolution strategy for hash tables, can be easily made cacheoblivious but it only achieves tq = 1 + O(α/b). Then we demonstrate that it is possible to obtain tq = 1 + 1/2 Ω(b), thus matching the cacheaware bound, if the following two conditions hold: (a) b is a power of 2; and (b) every block starts at a memory address divisible by b. Both conditions hold on a real machine, although they are not stated in the cacheoblivious model. Interestingly, we also show that neither condition is dispensable: if either of them is removed, the best obtainable bound is tq = 1 + O(α/b), which is exactly what linear probing achieves.
Externalmemory Multimaps
, 2011
"... Many data structures support dictionaries, also known as maps or associative arrays, which store and manage a set of keyvalue pairs. A multimap is generalization that allows multiple values to be associated with the same key. For example, the inverted file data structure that is used prevalently in ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Many data structures support dictionaries, also known as maps or associative arrays, which store and manage a set of keyvalue pairs. A multimap is generalization that allows multiple values to be associated with the same key. For example, the inverted file data structure that is used prevalently in the infrastructure supporting search engines is a type of multimap, where words are used as keys and document pointers are used as values. We study the multimap abstract data type and how it can be implemented efficiently online in external memory frameworks, with constant expected I/O performance. The key technique used to achieve our results is a combination of cuckoo hashing using buckets that hold multiple items with a multiqueue implementation to cope with varying numbers of values per key. Our externalmemory results are for the standard twolevel memory model.