Results 1  10
of
34
Nearoptimal hashing algorithms for approximate nearest neighbor in high dimensions
, 2008
"... In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The ..."
Abstract

Cited by 443 (7 self)
 Add to MetaCart
In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The problem is of significant interest in a wide variety of areas.
Tractable Hypergraph Properties for Constraint Satisfaction and Conjunctive Queries
, 2010
"... An important question in the study of constraint satisfaction problems (CSP) is understanding how the graph or hypergraph describing the incidence structure of the constraints influences the complexity of the problem. For binary CSP instances (i.e., where each constraint involves only two variables ..."
Abstract

Cited by 31 (4 self)
 Add to MetaCart
An important question in the study of constraint satisfaction problems (CSP) is understanding how the graph or hypergraph describing the incidence structure of the constraints influences the complexity of the problem. For binary CSP instances (i.e., where each constraint involves only two variables), the situation is well understood: the complexity of the problem essentially depends on the treewidth of the graph of the constraints [19, 24]. However, this is not the correct answer if constraints with unbounded number of variables are allowed, and in particular, for CSP instances arising from query evaluation problems in database theory. Formally, if H is a class of hypergraphs, then let CSP(H) be CSP restricted to instances whose hypergraph is in H. Our goal is to characterize those classes of hypergraphs for which CSP(H) is polynomialtime solvable or fixedparameter tractable, parameterized by the number of variables. In the applications related to database query evaluation, we usually assume that the number of variables is much smaller than the size of the instance, thus parameterization by the number of variables is a meaningful question. The most general known property of H that makes CSP(H) polynomialtime solvable is bounded fractional hypertree width. Here we introduce a new hypergraph measure called submodular width, and show that bounded submodular width of H (which is a strictly more general property than bounded fractional hypertree width) implies that CSP(H) is fixedparameter tractable. In a matching hardness result, we show that if H has unbounded submodular width, then CSP(H) is not fixedparameter tractable (and hence not polynomialtime solvable), unless the Exponential Time Hypothesis (ETH) fails. The algorithmic result uses tree decompositions in a novel way: instead of using a single decomposition depending on the hypergraph, the instance is split into a set of
An optimal randomised cell probe lower bounds for approximate nearest neighbor searching
 In Proceedings of the Symposium on Foundations of Computer Science
"... Abstract We consider the approximate nearest neighbour search problem on the Hamming Cube {0, 1}d.We show that a randomised cell probe algorithm that uses polynomial storage and word size dO(1)requires a worst case query time of \Omega (log log d / log log log d). The approximation factor may beas l ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
(Show Context)
Abstract We consider the approximate nearest neighbour search problem on the Hamming Cube {0, 1}d.We show that a randomised cell probe algorithm that uses polynomial storage and word size dO(1)requires a worst case query time of \Omega (log log d / log log log d). The approximation factor may beas loose as 2log 1j d for any fixed j> 0. This generalises an earlier result [6] on the deterministic complexity of the same problem and, more importantly, fills a major gap in the study of thisproblem since all earlier lower bounds either did not allow randomisation [6, 19] or did not allow approximation [5, 2, 16]. We also give a cell probe algorithm which proves that our lower boundis optimal. Our proof uses a lower bound on the round complexity of the related communication problem.We show, additionally, that considerations of bit complexity alone cannot prove any nontrivial cell probe lower bound for the problem. This shows that the Richness Technique [20] used in a lot ofrecent research around this problem would not have helped here.
UNIFYING THE LANDSCAPE OF CELLPROBE LOWER BOUNDS
, 2008
"... We show that a large fraction of the datastructure lower bounds known today in fact follow by reduction from the communication complexity of lopsided (asymmetric) set disjointness. This includes lower bounds for: • highdimensional problems, where the goal is to show large space lower bounds. • co ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
We show that a large fraction of the datastructure lower bounds known today in fact follow by reduction from the communication complexity of lopsided (asymmetric) set disjointness. This includes lower bounds for: • highdimensional problems, where the goal is to show large space lower bounds. • constantdimensional geometric problems, where the goal is to bound the query time for space O(n·polylogn). • dynamic problems, where we are looking for a tradeoff between query and update time. (In this case, our bounds are slightly weaker than the originals, losing a lglgn factor.) Our reductions also imply the following new results: • an Ω(lgn/lglgn) bound for 4dimensional range reporting, given space O(n · polylogn). This is quite timely, since a recent result [39] solved 3D reporting in O(lg 2 lgn) time, raising the prospect that higher dimensions could also be easy. • a tight space lower bound for the partial match problem, for constant query time. • the first lower bound for reachability oracles. In the process, we prove optimal randomized lower bounds for lopsided set disjointness.
Distance oracles for sparse graphs
 In Proceedings of the 50th IEEE Symposium on Foundations of Computer Science (FOCS
"... Abstract — Thorup and Zwick, in their seminal work, introduced the approximate distance oracle, which is a data structure that answers distance queries in a graph. For any integer k, they showed an efficient algorithm to construct an approximate distance oracle using space O(kn 1+1/k) that can answe ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
(Show Context)
Abstract — Thorup and Zwick, in their seminal work, introduced the approximate distance oracle, which is a data structure that answers distance queries in a graph. For any integer k, they showed an efficient algorithm to construct an approximate distance oracle using space O(kn 1+1/k) that can answer queries in time O(k) with a distance estimate that is at most α = 2k − 1 times larger than the actual shortest distance (α is called the stretch). They proved that, under a combinatorial conjecture, their data structure is optimal in terms of space: if a stretch of at most 2k−1 is desired, then the space complexity is at least n 1+1/k. Their proof holds even if infinite query time is allowed: it is essentially an “incompressibility ” result. Also, the proof only holds for dense graphs, and the best bound it can prove only implies that the size of the data structure is lower bounded by the number of edges of the graph. Naturally, the following question arises: what happens for sparse graphs? In this paper we give a new lower bound for approximate distance oracles in the cellprobe model. This lower bound holds even for sparse (polylog(n)degree) graphs, and it is not an “incompressibility ” bound: we prove a threeway tradeoff between space, stretch and query time. We show that, when the query time is t, and the stretch is α, then the space S must be S ≥ n 1+Ω(1/tα) / lg n. (1) This lower bound follows by a reduction from lopsided set disjointness to distance oracles, based on and motivated by recent work of Pǎtras¸cu. Our results in fact show that for any highgirth regular graph, an approximate distance oracle that supports efficient queries for all subgraphs of G must obey Eq. (1). We also prove some lemmas that count sets of paths in highgirth regular graphs and highgirth regular expanders, which might be of independent interest. Keywordsdistance oracle; data structures; lower bounds; cellprobe model; lopsided set disjointness 1.
Higher lower bounds for nearneighbor and further rich problems
 in Proc. 47th IEEE Symposium on Foundations of Computer Science (FOCS
"... We convert cellprobe lower bounds for polynomial space into stronger lower bounds for nearlinear space. Our technique applies to any lower bound proved through the richness method. For example, it applies to partial match, and to nearneighbor problems, either for randomized exact search, or for d ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
(Show Context)
We convert cellprobe lower bounds for polynomial space into stronger lower bounds for nearlinear space. Our technique applies to any lower bound proved through the richness method. For example, it applies to partial match, and to nearneighbor problems, either for randomized exact search, or for deterministic approximate search (which are thought to exhibit the curse of dimensionality). These problems are motivated by search in large databases, so nearlinear space is the most relevant regime. Typically, richness has been used to imply Ω(d / lg n) lower bounds for polynomialspace data structures, where d is the number of bits of a query. This is the highest lower bound provable through the classic reduction to communication complexity. However, for space n lg O(1) n, we now obtain bounds of Ω(d / lg d). This is a significant improvement for natural values of d, such as lg O(1) n. In the most important case of d = Θ(lg n), we have the first superconstant lower bound. From a complexity theoretic perspective, our lower bounds are the highest known for any static data structure problem, significantly improving on previous records. 1
More Efficient Algorithms for Closest String and Substring Problems
"... Abstract. The closest string and substring problems find applications in PCR primer design, genetic probe design, motif finding, and antisense drug design. For their importance, the two problems have been extensively studied recently in computational biology. Unfortunately both problems are NPcompl ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
(Show Context)
Abstract. The closest string and substring problems find applications in PCR primer design, genetic probe design, motif finding, and antisense drug design. For their importance, the two problems have been extensively studied recently in computational biology. Unfortunately both problems are NPcomplete. Researchers have developed both fixedparameter algorithms and approximation algorithms for the two problems. In terms of fixedparameter, when the radius d is the parameter, the bestknown fixedparameter algorithm for closest string has time complexity O(nd d+1), which is still superpolynomial even if d = O(log n). In this paper we provide an O nΣ  O(d) algorithm where Σ is the alphabet. This gives a polynomial time algorithm when d = O(log n) and Σ has constant size. Using the same technique, we additionally provide a more efficient subexponential time algorithm for the closest substring problem. In terms of approximation, both closest string and closest substring problems admit polynomial time approximation schemes (PTAS). The best known time complexity of the PTAS is O(n O(ɛ−2 log 1 ɛ)). In this paper we present a PTAS with time complexity O(n O(ɛ−2)). At last, we prove that a restricted version of the closest substring has the same parameterized complexity as closest substring, answering an open question in the literature. 1
On the optimality of planar and geometric approximation schemes
"... We show for several planar and geometric problems that the best known approximation schemes are essentially optimal with respect to the dependence on ǫ. For example, we show that the 2O(1/ǫ) · n time approximation schemes for planar MAXIMUM INDEPENDENT SET and for TSP on a metric defined by a plan ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
We show for several planar and geometric problems that the best known approximation schemes are essentially optimal with respect to the dependence on ǫ. For example, we show that the 2O(1/ǫ) · n time approximation schemes for planar MAXIMUM INDEPENDENT SET and for TSP on a metric defined by a planar graph are essentially optimal: if there is a δ> 0 such that any of these problems admits a 2O((1/ǫ)1−δ) O(1) n time PTAS, then the Exponential Time Hypothesis (ETH) fails. It is known that MAXIMUM INDEPENDENT SET on unit disk graphs and the planar logic problems MPSAT, TMIN, TMAX admit nO(1/ǫ) time approximation schemes. We show that they are optimal in the sense that if there is a δ> 0 such that any of these problems admits a 2 (1/ǫ)O(1) nO((1/ǫ)1−δ) time PTAS, then ETH fails.
A geometric approach to lower bounds for approximate nearneighbor search and partial match
 In Proc. 49th IEEE Symposium on Foundations of Computer Science (FOCS
, 2008
"... This work investigates a geometric approach to proving cell probe lower bounds for data structure problems. We consider the approximate nearest neighbor search problem on the Boolean hypercube ({0, 1} d, ‖ · ‖1) with d = Θ(log n). We show that any (randomized) data structure for the problem that a ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
This work investigates a geometric approach to proving cell probe lower bounds for data structure problems. We consider the approximate nearest neighbor search problem on the Boolean hypercube ({0, 1} d, ‖ · ‖1) with d = Θ(log n). We show that any (randomized) data structure for the problem that answers capproximate nearest neighbor search queries using t probes must use space at least n1+Ω(1/ct). In particular, our bound implies that any data structure that uses space Õ(n) with polylogarithmic word size, and with constant probability gives a constant approximation to nearest neighbor search queries must be probed Ω(log n / log log n) times. This improves on the lower bound of Ω(log log d / log log log d) probes shown by Chakrabarti and Regev [8] for any polynomial space data structure, and the Ω(log log d) lower bound in Pătras¸cu and Thorup [26] for linear space data structures. Our lower bound holds for the near neighbor problem, where the algorithm knows in advance a good approximation to the distance to the nearest neighbor. Additionally, it is an average case lower bound for the natural distribution for the problem. Our approach also gives the same bound for (2 − 1)approximation to the farthest neighbor problem. c For the case of nonadaptive algorithms we can improve the bound slightly and show a Ω(log n) lower bound on the time complexity of data structures with O(n) space and logarithmic word size. We also show similar lower bounds for the partial match problem: any randomized tprobe data structure that solves the partial match problem on {0, 1, ⋆} d for d = Θ(log n) must use space n1+Ω(1/t). This implies an Ω(log n / log log n) lower bound for time complexity of near linear space data structures, slightly improving the Ω(log n/(log log n) 2) lower bound from [25],[16] for this range of d. Recently and independently Pătras¸cu achieved similar bounds [24]. Our results also generalize to approximate partial match, improving on the bounds of [4, 25]. 1 1
Lower bounds on near neighbor search via metric expansion
 CoRR
"... In this paper we show how the complexity of performing nearest neighbor (NNS) search on a metric space is related to the expansion of the metric space. Given a metric space we look at the graph obtained by connecting every pair of points within a certain distance r. We then look at various notions o ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
In this paper we show how the complexity of performing nearest neighbor (NNS) search on a metric space is related to the expansion of the metric space. Given a metric space we look at the graph obtained by connecting every pair of points within a certain distance r. We then look at various notions of expansion in this graph relating them to the cell probe complexity of NNS for randomized and deterministic, exact and approximate algorithms. For example if the graph has node expansion Φ then we show that any deterministic tprobe data structure for n points must use space S where (St/n)t> Φ. We show similar results for randomized algorithms as well. These relationships can be used to derive most of the known lower bounds in the well known metric spaces such as l1, l2, l ∞ by simply computing their expansion. In the process, we strengthen and generalize our previous results [19]. Additionally, we unify the approach in [19] and the communication complexity based approach. Our work reduces the problem of proving cell probe lower bounds of near neighbor search to computing the appropriate expansion parameter. In our results, as in all previous results, the dependence on t is weak; that is, the bound drops exponentially in t. We show a much stronger (tight) timespace tradeoff for the class of dynamic low contention data structures. These are data structures that supports updates in the data set and that do not look up any single cell too often. 1 1