Results 1  10
of
14
On the optimality of the dimensionality reduction method
 in Proc. 47th IEEE Symposium on Foundations of Computer Science (FOCS
"... We investigate the optimality of (1+ɛ)approximation algorithms obtained via the dimensionality reduction method. We show that: • Any data structure for the (1 + ɛ)approximate nearest neighbor problem in Hamming space, which uses constant number of probes to answer each query, must use n Ω(1/ɛ2) sp ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
We investigate the optimality of (1+ɛ)approximation algorithms obtained via the dimensionality reduction method. We show that: • Any data structure for the (1 + ɛ)approximate nearest neighbor problem in Hamming space, which uses constant number of probes to answer each query, must use n Ω(1/ɛ2) space. • Any algorithm for the (1+ɛ)approximate closest substring problem must run in time exponential in 1/ɛ 2−γ for any γ> 0 (unless 3SAT can be solved in subexponential time) Both lower bounds are (essentially) tight. 1.
Range Selection and Median: Tight Cell Probe Lower Bounds and Adaptive Data Structures
"... Range selection is the problem of preprocessing an input array A of n unique integers, such that given a query (i, j, k), one can report the k’th smallest integer in the subarray A[i], A[i + 1],..., A[j]. In this paper we consider static data structures in the wordRAM for range selection and severa ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
Range selection is the problem of preprocessing an input array A of n unique integers, such that given a query (i, j, k), one can report the k’th smallest integer in the subarray A[i], A[i + 1],..., A[j]. In this paper we consider static data structures in the wordRAM for range selection and several natural special cases thereof. The first special case is known as range median, which arises when k is fixed to ⌊(j − i + 1)/2⌋. The second case, denoted prefix selection, arises when i is fixed to 0. Finally, we also consider the bounded rank prefix selection problem and the fixed rank range selection problem. In the former, data structures must support prefix selection queries under the assumption that k ≤ κ for some value κ ≤ n given at construction time, while in the latter, data structures must support range selection queries where k is fixed beforehand for all queries. We prove cell probe lower bounds for range selection, prefix selection and range median, stating that any data structure that uses S words of space needs Ω(log n / log(Sw/n)) time to answer a query. In particular, any data structure that uses n log O(1) n space needs Ω(log n / log log n) time to answer a query, and any data structure that supports queries in constant time, needs n 1+Ω(1) space. For data structures that uses n log O(1) n space this matches the best known upper bound. Additionally, we present a linear space data structure that supports range selection queries in O(log k / log log n + log log n) time. Finally, we prove that any data structure that uses S space, needs Ω(log κ / log(Sw/n)) time to answer a bounded rank prefix selection query and Ω(log k / log(Sw/n)) time to answer a fixed rank range selection query. This shows that our data structure is optimal except for small values of k. 1
UNIFYING THE LANDSCAPE OF CELLPROBE LOWER BOUNDS
, 2008
"... We show that a large fraction of the datastructure lower bounds known today in fact follow by reduction from the communication complexity of lopsided (asymmetric) set disjointness. This includes lower bounds for: • highdimensional problems, where the goal is to show large space lower bounds. • co ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
We show that a large fraction of the datastructure lower bounds known today in fact follow by reduction from the communication complexity of lopsided (asymmetric) set disjointness. This includes lower bounds for: • highdimensional problems, where the goal is to show large space lower bounds. • constantdimensional geometric problems, where the goal is to bound the query time for space O(n·polylogn). • dynamic problems, where we are looking for a tradeoff between query and update time. (In this case, our bounds are slightly weaker than the originals, losing a lglgn factor.) Our reductions also imply the following new results: • an Ω(lgn/lglgn) bound for 4dimensional range reporting, given space O(n · polylogn). This is quite timely, since a recent result [39] solved 3D reporting in O(lg 2 lgn) time, raising the prospect that higher dimensions could also be easy. • a tight space lower bound for the partial match problem, for constant query time. • the first lower bound for reachability oracles. In the process, we prove optimal randomized lower bounds for lopsided set disjointness.
A geometric approach to lower bounds for approximate nearneighbor search and partial match
 In Proc. 49th IEEE Symposium on Foundations of Computer Science (FOCS
, 2008
"... This work investigates a geometric approach to proving cell probe lower bounds for data structure problems. We consider the approximate nearest neighbor search problem on the Boolean hypercube ({0, 1} d, ‖ · ‖1) with d = Θ(log n). We show that any (randomized) data structure for the problem that a ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
This work investigates a geometric approach to proving cell probe lower bounds for data structure problems. We consider the approximate nearest neighbor search problem on the Boolean hypercube ({0, 1} d, ‖ · ‖1) with d = Θ(log n). We show that any (randomized) data structure for the problem that answers capproximate nearest neighbor search queries using t probes must use space at least n1+Ω(1/ct). In particular, our bound implies that any data structure that uses space Õ(n) with polylogarithmic word size, and with constant probability gives a constant approximation to nearest neighbor search queries must be probed Ω(log n / log log n) times. This improves on the lower bound of Ω(log log d / log log log d) probes shown by Chakrabarti and Regev [8] for any polynomial space data structure, and the Ω(log log d) lower bound in Pătras¸cu and Thorup [26] for linear space data structures. Our lower bound holds for the near neighbor problem, where the algorithm knows in advance a good approximation to the distance to the nearest neighbor. Additionally, it is an average case lower bound for the natural distribution for the problem. Our approach also gives the same bound for (2 − 1)approximation to the farthest neighbor problem. c For the case of nonadaptive algorithms we can improve the bound slightly and show a Ω(log n) lower bound on the time complexity of data structures with O(n) space and logarithmic word size. We also show similar lower bounds for the partial match problem: any randomized tprobe data structure that solves the partial match problem on {0, 1, ⋆} d for d = Θ(log n) must use space n1+Ω(1/t). This implies an Ω(log n / log log n) lower bound for time complexity of near linear space data structures, slightly improving the Ω(log n/(log log n) 2) lower bound from [25],[16] for this range of d. Recently and independently Pătras¸cu achieved similar bounds [24]. Our results also generalize to approximate partial match, improving on the bounds of [4, 25]. 1 1
Cell probe lower bounds and approximations for range mode
 In Proc. 37th International Colloquium on Automata, Languages, and Programming
, 2010
"... Abstract. The mode of a multiset of labels, is a label that occurs at least as often as any other label. The input to the range mode problem is an array A of size n. A range query [i, j] must return the mode of the subarray A[i], A[i + 1],..., A[j]. We prove that any data structure that uses log n l ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
Abstract. The mode of a multiset of labels, is a label that occurs at least as often as any other label. The input to the range mode problem is an array A of size n. A range query [i, j] must return the mode of the subarray A[i], A[i + 1],..., A[j]. We prove that any data structure that uses log n log(Sw/n) S memory cells of w bits needs Ω ( ) time to answer a range mode query. Secondly, we consider the related range kfrequency problem. The input to this problem is an array A of size n, and a query [i, j] must return whether there exists a label that occurs precisely k times in the subarray A[i], A[i+1],..., A[j]. We show that for any constant k> 1, this problem is equivalent to 2D orthogonal rectangle stabbing, and that for k = 1 this is no harder than foursided 3D orthogonal range emptiness. Finally, we consider approximate range mode queries. A capproximate range mode query must return a label that occurs at least 1/c times that of the mode. We describe a linear space data structure that supports 3approximate range mode queries in constant time, and a data structure that uses O ( n ε in O(log 1) time.
Mikkel Thorup: Randomization does not help searching predecessors
 Symposium on Discreat Algorithms
"... At STOC’06, we presented a new technique for proving cellprobe lower bounds for static data structures with deterministic queries. This was the first technique which could prove a bound higher than communication complexity, and it gave the first separation between data structures with linear and po ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
At STOC’06, we presented a new technique for proving cellprobe lower bounds for static data structures with deterministic queries. This was the first technique which could prove a bound higher than communication complexity, and it gave the first separation between data structures with linear and polynomial space. The new technique was, however, heavily tuned for the deterministic worstcase, demonstrating long query times only for an exponentially small fraction of the input. In this paper, we extend the technique to give lower bounds for randomized query algorithms with constant error probability. Our main application is the problem of searching predecessors in a static set of n integers, each contained in a ℓbit word. Our tradeoff lower bounds are tight for any combination of parameters. For small space, i.e. n 1+o(1) , proving such lower bounds was inherently impossible through known techniques. An interesting new consequence is that for near linear space, the classic van Emde Boas search time of O(lg ℓ) cannot be improved, even if we allow randomization. This is a separation from polynomial space, since Beame and Fich [STOC’02] give a predecessor search time of O(lg ℓ / lg lg ℓ) using quadratic space. We also show a tight Ω(lg lg n) lower bound for 2dimensional range queries, via a new reduction. This holds even in rank space, where no superconstant lower bound was known, neither randomized nor worstcase. We also slightly improve the best lower bound for the approximate nearest neighbor problem, when small space is available. 1
Higher cell probe lower bounds for evaluating polynomials
 In Proc. 53rd IEEE Symposium on Foundations of Computer Science
, 2012
"... Abstract—In this paper, we study the cell probe complexity of evaluating an ndegree polynomial P over a finite field F of size at least n 1+Ω(1). More specifically, we show that any static data structure for evaluating P (x), where x ∈ F, must use Ω(lg F / lg(Sw/n lg F)) cell probes to answer a ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract—In this paper, we study the cell probe complexity of evaluating an ndegree polynomial P over a finite field F of size at least n 1+Ω(1). More specifically, we show that any static data structure for evaluating P (x), where x ∈ F, must use Ω(lg F / lg(Sw/n lg F)) cell probes to answer a query, where S denotes the space of the data structure in number of cells and w the cell size in bits. This bound holds in expectation for randomized data structures with any constant error probability δ < 1/2. Our lower bound not only improves over the Ω(lg F / lg S) lower bound of Miltersen [TCS’95], but is in fact the highest static cell probe lower bound to date: For linear space (i.e. S = O(n lg F/w)), our query time lower bound simplifies to Ω(lg F), whereas the highest previous lower bound for any static data structure problem having d different queries is Ω(lg d / lg lg d), which was first achieved by Pǎtras¸cu and Thorup [SICOMP’10]. We also use the recent technique of Larsen [STOC’12] to show a lower bound of tq = Ω(lg F  lg n / lg(wtu / lg F) lg(wtu)) for dynamic data structures for polynomial evaluation over a finite field F of size Ω(n 2). Here tq denotes the expected query time and tu the worst case update time. This lower bound holds for randomized data structures with any constant error probability δ < 1/2. This is only the second time a lower bound beyond max{tu, tq} = Ω(max{lg n, lg d / lg lg d}) has been achieved for dynamic data structures, where d denotes the number of different queries and updates to the problem. Furthermore, it is the first such lower bound that holds for randomized data structures with a constant probability of error. Keywordscell probe model, lower bounds, data structures, polynomials I.
Optimal direct sum and privacy tradeoff results for quantum and classical communication complexity
 CoRR
"... Abstract. We show optimal Direct Sum result for the oneway entanglementassisted quantum communication complexity for any relation f ⊆ X × Y × Z. We show: Q 1,pub (f ⊕m) = Ω(m · Q 1,pub (f)), where Q 1,pub (f), represents the oneway entanglementassisted quantum communication complexity of f with ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. We show optimal Direct Sum result for the oneway entanglementassisted quantum communication complexity for any relation f ⊆ X × Y × Z. We show: Q 1,pub (f ⊕m) = Ω(m · Q 1,pub (f)), where Q 1,pub (f), represents the oneway entanglementassisted quantum communication complexity of f with error at most 1/3 and f ⊕m represents mcopies of f. Similarly for the oneway publiccoin classical communication complexity we show: R 1,pub (f ⊕m) = Ω(m · R 1,pub (f)), where R 1,pub (f), represents the oneway publiccoin classical communication complexity of f with error at most 1/3. We show similar optimal Direct Sum results for the Simultaneous Message Passing (SMP) quantum and classical models. For twoparty twoway protocols we present optimal Privacy Tradeoff results leading to a Weak Direct Sum result for such protocols. We show our Direct Sum and Privacy Tradeoff results via message compression arguments. These arguments also imply a new round elimination lemma in quantum communication, which allows us to extend classical lower bounds on the cell probe complexity of some data structure problems, e.g. Approximate Nearest Neighbor Searching (ANN) on the Hamming cube {0, 1} n and Predecessor Search to the quantum setting. In a separate result we show that Newman’s [New91] technique of reducing the number of publiccoins in a classical protocol cannot be lifted to the quantum setting. We do this by defining a general notion of blackbox reduction of prior entanglement that subsumes Newman’s technique. We prove that such a blackbox reduction is impossible for quantum protocols by exhibiting a particular oneround quantum protocol for the Equality function where the blackbox technique fails to reduce the amount of prior entanglement by more than a constant factor. In the final result in the theme of message compression, we provide an upper bound on the problem of Exact Remote State Preparation (ERSP). 1
Lower Bound Techniques for Data Structures
, 2008
"... We describe new techniques for proving lower bounds on datastructure problems, with the following broad consequences:
â¢ the first Î©(lgn) lower bound for any dynamic problem, improving on a bound that had been standing since 1989;
â¢ for static data structures, the first separation between linea ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We describe new techniques for proving lower bounds on datastructure problems, with the following broad consequences:
â¢ the first Î©(lgn) lower bound for any dynamic problem, improving on a bound that had been standing since 1989;
â¢ for static data structures, the first separation between linear and polynomial space. Specifically, for some problems that have constant query time when polynomial space is allowed, we can show Î©(lg n/ lg lg n) bounds when the space is O(n Â· polylog n).
Using these techniques, we analyze a variety of central datastructure problems, and obtain improved lower bounds for the following:
â¢ the partialsums problem (a fundamental application of augmented binary search trees);
â¢ the predecessor problem (which is equivalent to IP lookup in Internet routers);
â¢ dynamic trees and dynamic connectivity;
â¢ orthogonal range stabbing;
â¢ orthogonal range counting, and orthogonal range reporting;
â¢ the partial match problem (searching with wildcards);
â¢ (1 + Îµ)approximate near neighbor on the hypercube;
â¢ approximate nearest neighbor in the lâ metric.
Our new techniques lead to surprisingly nontechnical proofs. For several problems, we obtain simpler proofs for bounds that were already known.
Google Research Award Proposal: Data Structures
"... Data structures are essential components of computer systems in general and Google in particular. We believe this area of research is in an auspicious position where practical and theoretical goals are well aligned, implying that deep algorithmic ideas can also have significant practical impact. We ..."
Abstract
 Add to MetaCart
Data structures are essential components of computer systems in general and Google in particular. We believe this area of research is in an auspicious position where practical and theoretical goals are well aligned, implying that deep algorithmic ideas can also have significant practical impact. We exemplify with a few examples from our past research, which address problems of universal value, and should have important applications in real systems. Cacheoblivious Btrees: Btrees are a fundamental tool for representing large sets of data in external memory. But what is “external memory”? Modern computers have complicated memory hierarchies, including L1 cache, L2 cache, main memory, disk, and often network storage. Even if one decides to concentrate on one level of the hierarchy, choosing the optimal branching factor involves nontrivial tuning. A surprising, clean alternative is to design a Btree which works in the optimal O(log B n) time without knowing the memory block size B! Then the Btree will work optimally on all levels of the memory hierarchy simultaneously. Our initial paper [BDFC05] showing that this is possible has been very influential in the further study of cacheobliviousness. Bloomier filters: Suppose we want to represent a set S of items, and answer queries of the form