Results 1  10
of
16
Asymmetric lsh (alsh) for sublinear time maximum inner product search (mips). arXiv
, 2014
"... ar ..."
(Show Context)
Fast Subspace Search via Grassmannian Based Hashing
"... The problem of efficiently deciding which of a database of models is most similar to a given input query arises throughout modern computer vision. Motivated by applications in recognition, image retrieval and optimization, there has been significant recent interest in the variant of this problem in ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
The problem of efficiently deciding which of a database of models is most similar to a given input query arises throughout modern computer vision. Motivated by applications in recognition, image retrieval and optimization, there has been significant recent interest in the variant of this problem in which the database models are linear subspaces and the input is either a point or a subspace. Current approaches to this problem have poor scaling in high dimensions, and may not guarantee sublinear query complexity. We present a new approach to approximate nearest subspace search, based on a simple, new locality sensitive hash for subspaces. Our approach allows pointtosubspace query for a database of subspaces of arbitrary dimension d, in a time that depends sublinearly on the number of subspaces in the database. The query complexity of our algorithm is linear in the ambient dimension D, allowing it to be directly applied to highdimensional imagery data. Numerical experiments on model problems in image repatching and automatic face recognition confirm the advantages of our algorithm in terms of both speed and accuracy. 1.
Down the Rabbit Hole: Robust Proximity Search and Density Estimation in Sublinear Space
, 2013
"... For a set of n points in IR d, and parameters k and ε, we present a data structure that answers (1 + ε, k)ANN queries in logarithmic time. Surprisingly, the space used by the datastructure is Õ(n/k); that is, the space used is sublinear in the input size if k is sufficiently large. Our approach pr ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
For a set of n points in IR d, and parameters k and ε, we present a data structure that answers (1 + ε, k)ANN queries in logarithmic time. Surprisingly, the space used by the datastructure is Õ(n/k); that is, the space used is sublinear in the input size if k is sufficiently large. Our approach provides a novel way to summarize geometric data, such that meaningful proximity queries on the data can be carried out using this sketch. Using this, we provide a sublinear space datastructure that can estimate the density of a point set under various measures, including: (i) sum of distances of k closest points to the query point, and (ii) sum of squared distances of k closest points to the query point. Our approach generalizes to other distance based estimation of densities of similar flavor. We also study the problem of approximating some of these quantities when using sampling. In particular, we show that a sample of size Õ(n/k) is sufficient, in some restricted cases, to estimate the above quantities. Remarkably, the sample size has only linear dependency on the dimension.
Parallel algorithms for geometric graph problems
 In STOC
, 2014
"... ABSTRACT We give algorithms for geometric graph problems in the modern parallel models such as MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the twodimensional space, our algorithm computes a (1 + )approximate MST. Our algorithms work in a constant nu ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
ABSTRACT We give algorithms for geometric graph problems in the modern parallel models such as MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the twodimensional space, our algorithm computes a (1 + )approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem We develop a general algorithmic framework that, besides MST, also applies to EarthMover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in nearlinear time, n 1+o (1) . We note that while recently
Approximating minimization diagrams and generalized proximity search
 In Proc. 54th Annu. IEEE Sympos. Found. Comput. Sci. (FOCS
, 2013
"... We investigate the classes of functions whose minimization diagrams can be approximated efficiently in IRd. We present a general framework and a datastructure that can be used to approximate the minimization diagram of such functions. The resulting datastructure has near linear size and can answer ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We investigate the classes of functions whose minimization diagrams can be approximated efficiently in IRd. We present a general framework and a datastructure that can be used to approximate the minimization diagram of such functions. The resulting datastructure has near linear size and can answer queries in logarithmic time. Applications include approximating the Voronoi diagram of (additively or multiplicatively) weighted points. Our technique also works for more general distance functions, such as metrics induced by convex bodies, and the nearest furthestneighbor distance to a set of point sets. Interestingly, our framework works also for distance functions that do not comply with the triangle inequality. For many of these functions no nearlinear size approximation was known before. 1.
Approximate Nearest Neighbor And Its Many Variants
, 2013
"... This thesis investigates two variants of the approximate nearest neighbor problem. First, motivated by the recent research on diversityaware search, we investigate the kdiverse near neighbor reporting problem. The problem is defined as follows: given a query point q, report the maximum diversity s ..."
Abstract
 Add to MetaCart
This thesis investigates two variants of the approximate nearest neighbor problem. First, motivated by the recent research on diversityaware search, we investigate the kdiverse near neighbor reporting problem. The problem is defined as follows: given a query point q, report the maximum diversity set S of k points in the ball of radius r around q. The diversity of a set S is measured by the minimum distance between any pair of points in S (the higher, the better). We present two approximation algorithms for the case where the points live in a ddimensional Hamming space. Our algorithms guarantee query times that are sublinear in n and only polynomial in the diversity parameter k, as well as the dimension d. For low values of k, our algorithms achieve sublinear query times even if the number of points within distance r from a query q is linear in n. To the best of our knowledge, these are the first known algorithms of this type that offer provable guarantees. In the other variant, we consider the approximate line near neighbor (LNN) problem. Here, the database consists of a set of lines instead of points but the query is
Diverse Near Neighbor Problem [Extended Abstract]
, 2013
"... Motivated by the recent research on diversityaware search, we investigate the kdiverse near neighbor reporting problem. The problem is defined as follows: given a query point q, report the maximum diversity set S of k points in the ball of radius r around q. The diversity of a set S is measured by ..."
Abstract
 Add to MetaCart
(Show Context)
Motivated by the recent research on diversityaware search, we investigate the kdiverse near neighbor reporting problem. The problem is defined as follows: given a query point q, report the maximum diversity set S of k points in the ball of radius r around q. The diversity of a set S is measured by the minimum distance between any pair of points in S (the higher, the better). We present two approximation algorithms for the case where the points live in a ddimensional Hamming space. Our algorithms guarantee query times that are sublinear in n and only polynomial in the diversity parameter k, as well as the dimension d. For low values of k, our algorithms achieve sublinear query times even if the number of points within distance r from a query q is linear in n. To the best of our knowledge, these are the first known algorithms of this type that offer provable guarantees.
Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search
, 2015
"... We introduce a new variant of the nearest neighbor search problem, which allows for some coordinates of the dataset to be arbitrarily corrupted or unknown. Formally, given a dataset of n points P = {x1,...,xn} in highdimensions, and a parameter k, the goal is to preprocess the dataset, such that gi ..."
Abstract
 Add to MetaCart
We introduce a new variant of the nearest neighbor search problem, which allows for some coordinates of the dataset to be arbitrarily corrupted or unknown. Formally, given a dataset of n points P = {x1,...,xn} in highdimensions, and a parameter k, the goal is to preprocess the dataset, such that given a query point q, one can compute quickly a point x ∈ P, such that the distance of the query to the point x is minimized, when ignoring the “optimal ” k coordinates. Note, that the coordinates being ignored are a function of both the query point and the point returned. We present a general reduction from this problem to answering ANN queries, which is similar in spirit to LSH (locality sensitive hashing) [IM98]. Specifically, we give a sampling technique which achieves a bicriterion approximation for this problem. If the distance to the nearest neighbor after ignoring k coordinates is r, the datastructure returns a point that is within a distance of O(r) after ignoring O(k) coordinates. We also present other applications and further extensions and refinements of the above result. The new datastructures are simple and (arguably) elegant, and should be practical – specifically, all bounds are polynomial in all relevant parameters (including the dimension of the space, and the robustness parameter k). 1.