Results 1 
4 of
4
Lower Bound Techniques for Data Structures
, 2008
"... We describe new techniques for proving lower bounds on datastructure problems, with the following broad consequences:
â¢ the first Î©(lgn) lower bound for any dynamic problem, improving on a bound that had been standing since 1989;
â¢ for static data structures, the first separation between linea ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We describe new techniques for proving lower bounds on datastructure problems, with the following broad consequences:
â¢ the first Î©(lgn) lower bound for any dynamic problem, improving on a bound that had been standing since 1989;
â¢ for static data structures, the first separation between linear and polynomial space. Specifically, for some problems that have constant query time when polynomial space is allowed, we can show Î©(lg n/ lg lg n) bounds when the space is O(n Â· polylog n).
Using these techniques, we analyze a variety of central datastructure problems, and obtain improved lower bounds for the following:
â¢ the partialsums problem (a fundamental application of augmented binary search trees);
â¢ the predecessor problem (which is equivalent to IP lookup in Internet routers);
â¢ dynamic trees and dynamic connectivity;
â¢ orthogonal range stabbing;
â¢ orthogonal range counting, and orthogonal range reporting;
â¢ the partial match problem (searching with wildcards);
â¢ (1 + Îµ)approximate near neighbor on the hypercube;
â¢ approximate nearest neighbor in the lâ metric.
Our new techniques lead to surprisingly nontechnical proofs. For several problems, we obtain simpler proofs for bounds that were already known.
Google Research Award Proposal: Data Structures
"... Data structures are essential components of computer systems in general and Google in particular. We believe this area of research is in an auspicious position where practical and theoretical goals are well aligned, implying that deep algorithmic ideas can also have significant practical impact. We ..."
Abstract
 Add to MetaCart
(Show Context)
Data structures are essential components of computer systems in general and Google in particular. We believe this area of research is in an auspicious position where practical and theoretical goals are well aligned, implying that deep algorithmic ideas can also have significant practical impact. We exemplify with a few examples from our past research, which address problems of universal value, and should have important applications in real systems. Cacheoblivious Btrees: Btrees are a fundamental tool for representing large sets of data in external memory. But what is “external memory”? Modern computers have complicated memory hierarchies, including L1 cache, L2 cache, main memory, disk, and often network storage. Even if one decides to concentrate on one level of the hierarchy, choosing the optimal branching factor involves nontrivial tuning. A surprising, clean alternative is to design a Btree which works in the optimal O(log B n) time without knowing the memory block size B! Then the Btree will work optimally on all levels of the memory hierarchy simultaneously. Our initial paper [BDFC05] showing that this is possible has been very influential in the further study of cacheobliviousness. Bloomier filters: Suppose we want to represent a set S of items, and answer queries of the form
On The I/O Complexity of Dynamic Distinct Counting∗
"... In dynamic distinct counting, we want to maintain a multiset S of integers under insertions to answer efficiently the query: how many distinct elements are there in S? In external memory, the problem admits two standard solutions. The first one maintains S in a hash structure, so that the distinct ..."
Abstract
 Add to MetaCart
In dynamic distinct counting, we want to maintain a multiset S of integers under insertions to answer efficiently the query: how many distinct elements are there in S? In external memory, the problem admits two standard solutions. The first one maintains S in a hash structure, so that the distinct count can be incrementally updated after each insertion using O(1) expected I/Os. A query is answered for free. The second one stores S in a linked list, and thus supports an insertion in O(1/B) amortized I/Os. A query can be answered in O(NB logM/B N B) I/Os by sorting, where N = S, B is the block size, and M is the memory size. In this paper, we show that the above two naive solutions are already optimal within a polylog factor. Specifically, for any Las Vegas structure using NO(1) blocks, if its expected amortized insertion cost is o ( 1logB), then it must incur Ω( N B logB) expected I/Os answering a query in the worst case, under the (realistic) condition that N is a polynomial of B. This means that the problem is repugnant to update buffering: the query cost jumps from 0 dramatically to almost linearity as soon as the insertion cost drops slightly below Ω(1).
Succincter
"... We can represent an array of n values from {0, 1, 2} using ⌈n log 2 3 ⌉ bits (arithmetic coding), but then we cannot retrieve a single element efficiently. Instead, we can encode every block of t elements using ⌈t log 2 3 ⌉ bits, and bound the retrieval time by t. This gives a linear tradeoff betwe ..."
Abstract
 Add to MetaCart
We can represent an array of n values from {0, 1, 2} using ⌈n log 2 3 ⌉ bits (arithmetic coding), but then we cannot retrieve a single element efficiently. Instead, we can encode every block of t elements using ⌈t log 2 3 ⌉ bits, and bound the retrieval time by t. This gives a linear tradeoff between the redundancy of the representation and the query time. In fact, this type of linear tradeoff is ubiquitous in known succinct data structures, and in data compression. The folk wisdom is that if we want to waste one bit per block, the encoding is so constrained that it cannot help the query in any way. Thus, the only thing a query can do is to read the entire block and unpack it. We break this limitation and show how to use recursion to improve redundancy. It turns out that if a block is encoded with two (!) bits of redundancy, we can decode a single element, and answer many other interesting queries, in time logarithmic in the block size. Our technique allows us to revisit classic problems in succinct data structures, and give surprising new upper bounds. We also construct a locallydecodable version of arithmetic coding.