Results 1 - 10
of
72
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality
, 1998
"... The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the d-dimens ..."
Abstract
-
Cited by 533 (28 self)
- Add to MetaCart
The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the d-dimensional Euclidean space where X = ! d under some l p norm. Despite decades of effort, the current solutions are far from satisfactory; in fact, for large d, in theory or in practice, they provide little improvement over the brute-force algorithm which compares the query point to each data point. Of late, there has been some interest in the approximate nearest neighbors problem, which is: Find a point p 2 P that is an ffl-approximate nearest neighbor of the query q in that for all p 0 2 P , d(p; q) (1 + ffl)d(p 0 ; q). We present two algorithmic results for the approximate version that significantly improve the known bounds: (a) preprocessing cost polynomial in n and d, and a trul...
The earth mover’s distance as a metric for image retrieval
- International Journal of Computer Vision
, 2000
"... 1 Introduction Multidimensional distributions are often used in computer vision to describe and summarize different features of an image. For example, the one-dimensional distribution of image intensities describes the overall brightness content of a gray-scale image, and a three-dimensional distrib ..."
Abstract
-
Cited by 301 (2 self)
- Add to MetaCart
1 Introduction Multidimensional distributions are often used in computer vision to describe and summarize different features of an image. For example, the one-dimensional distribution of image intensities describes the overall brightness content of a gray-scale image, and a three-dimensional distribution can play a similar role for color images. The texture content of an image can be described by a distribution of local signal energy over frequency. These descriptors can be used in a variety of applications including, for example, image retrieval.
Searching in Metric Spaces
, 1999
"... The problem of searching the elements of a set which are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather ge ..."
Abstract
-
Cited by 285 (34 self)
- Add to MetaCart
The problem of searching the elements of a set which are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather general case where the similarity criterion defines a metric space, instead of the more restricted case of a vector space. A large number of solutions have been proposed in different areas, in many cases without cross-knowledge. Because of this, the same ideas have been reinvented several times, and very different presentations have been given for the same approaches. We
Geometric Range Searching and Its Relatives
- CONTEMPORARY MATHEMATICS
"... ... process a set S of points in so that the points of S lying inside a query R region can be reported or counted quickly. Wesurvey the known techniques and data structures for range searching and describe their application to other related searching problems. ..."
Abstract
-
Cited by 223 (35 self)
- Add to MetaCart
... process a set S of points in so that the points of S lying inside a query R region can be reported or counted quickly. Wesurvey the known techniques and data structures for range searching and describe their application to other related searching problems.
Distributed Object Location in a Dynamic Network
, 2004
"... Modern networking applications replicate data and services widely, leading to a need for location-independent routing---the ability to route queries to objects using names independent of the objects' physical locations. Two important properties of such a routing infrastructure are routing locality a ..."
Abstract
-
Cited by 155 (16 self)
- Add to MetaCart
Modern networking applications replicate data and services widely, leading to a need for location-independent routing---the ability to route queries to objects using names independent of the objects' physical locations. Two important properties of such a routing infrastructure are routing locality and rapid adaptation to arriving and departing nodes. We show how these two properties can be efficiently achieved for certain network topologies. To do this, we present a new distributed algorithm that can solve the nearest-neighbor problem for these networks. We describe our solution in the context of Tapestry, an overlay network infrastructure that employs techniques proposed by Plaxton et al. [24].
Bounded geometries, fractals, and low-distortion embeddings
"... The doubling constant of a metric space (X; d) is thesmallest value * such that every ball in X can be covered by * balls of half the radius. The doubling dimension of X isthen defined as dim(X) = log2 *. A metric (or sequence ofmetrics) is called doubling precisely when its doubling dimension is ..."
Abstract
-
Cited by 130 (24 self)
- Add to MetaCart
The doubling constant of a metric space (X; d) is thesmallest value * such that every ball in X can be covered by * balls of half the radius. The doubling dimension of X isthen defined as dim(X) = log2 *. A metric (or sequence ofmetrics) is called doubling precisely when its doubling dimension is bounded. This is a robust class of metric spaceswhich contains many families of metrics that occur in applied settings.We give tight bounds for embedding doubling metrics into (low-dimensional) normed spaces. We consider bothgeneral doubling metrics, as well as more restricted families such as those arising from trees, from graphs excludinga fixed minor, and from snowflaked metrics. Our techniques include decomposition theorems for doubling metrics, andan analysis of a fractal in the plane due to Laakso [21]. Finally, we discuss some applications and point out a centralopen question regarding dimensionality reduction in L2.
Finding Nearest Neighbors in Growth-restricted Metrics
- In 34th Annual ACM Symposium on the Theory of Computing
, 2002
"... Most research on nearest neighbor algorithms in the literature has been focused on the Euclidean case. In many practical search problems however, the underlying metric is non-Euclidean. Nearest neighbor algorithms for general metric spaces are quite weak, which motivates a search for other classes o ..."
Abstract
-
Cited by 123 (0 self)
- Add to MetaCart
Most research on nearest neighbor algorithms in the literature has been focused on the Euclidean case. In many practical search problems however, the underlying metric is non-Euclidean. Nearest neighbor algorithms for general metric spaces are quite weak, which motivates a search for other classes of metric spaces that can be tractably searched.
Navigating nets: Simple algorithms for proximity search (Extended Abstract)
, 2004
"... Robert Krauthgamer # James R. Lee + Abstract We present a simple deterministic data structure for maintaining a set S of points in a general metric space, while supporting proximity search (nearest neighbor and range queries) and updates to S (insertions and deletions). Our data structure consists ..."
Abstract
-
Cited by 105 (9 self)
- Add to MetaCart
Robert Krauthgamer # James R. Lee + Abstract We present a simple deterministic data structure for maintaining a set S of points in a general metric space, while supporting proximity search (nearest neighbor and range queries) and updates to S (insertions and deletions). Our data structure consists of a sequence of progressively finer #-nets of S, with pointers that allow us to navigate easily from one scale to the next.
Fast construction of nets in low dimensional metrics, and their applications
- SIAM J. Comput
, 2005
"... We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This data-structure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, well-separated pair decomposition, s ..."
Abstract
-
Cited by 75 (7 self)
- Add to MetaCart
We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This data-structure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, well-separated pair decomposition, spanner construction, compact representation scheme, doubling measure, and computation of the (approximate) Lipschitz constant of a function. In all cases, the running (preprocessing) time is near linear and the space being used is linear. 1
Nearest-neighbor searching and metric space dimensions
- In Nearest-Neighbor Methods for Learning and Vision: Theory and Practice
, 2006
"... Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distan ..."
Abstract
-
Cited by 63 (0 self)
- Add to MetaCart
Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distance function as a “black box”. The structure is able to speed up nearest neighbor searching in a variety of settings, for example: points in low-dimensional or structured Euclidean space, strings under Hamming and edit distance, and bit vector data from an OCR application. The data structures are observed to need linear space, with a modest constant factor. The preprocessing time needed per site is observed to match the query time. The data structure can be viewed as an application of a “kd-tree ” approach in the metric space setting, using Voronoi regions of a subset in place of axis-aligned boxes. 1

