Results 1  10
of
102
Nearoptimal hashing algorithms for approximate nearest neighbor in high dimensions
, 2008
"... In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The ..."
Abstract

Cited by 242 (4 self)
 Add to MetaCart
In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The problem is of significant interest in a wide variety of areas.
Distributed Object Location in a Dynamic Network
, 2004
"... Modern networking applications replicate data and services widely, leading to a need for locationindependent routingthe ability to route queries to objects using names independent of the objects' physical locations. Two important properties of such a routing infrastructure are routing locality a ..."
Abstract

Cited by 168 (16 self)
 Add to MetaCart
Modern networking applications replicate data and services widely, leading to a need for locationindependent routingthe ability to route queries to objects using names independent of the objects' physical locations. Two important properties of such a routing infrastructure are routing locality and rapid adaptation to arriving and departing nodes. We show how these two properties can be efficiently achieved for certain network topologies. To do this, we present a new distributed algorithm that can solve the nearestneighbor problem for these networks. We describe our solution in the context of Tapestry, an overlay network infrastructure that employs techniques proposed by Plaxton et al. [24].
Bounded geometries, fractals, and lowdistortion embeddings
"... The doubling constant of a metric space (X; d) is thesmallest value * such that every ball in X can be covered by * balls of half the radius. The doubling dimension of X isthen defined as dim(X) = log2 *. A metric (or sequence ofmetrics) is called doubling precisely when its doubling dimension is ..."
Abstract

Cited by 153 (31 self)
 Add to MetaCart
The doubling constant of a metric space (X; d) is thesmallest value * such that every ball in X can be covered by * balls of half the radius. The doubling dimension of X isthen defined as dim(X) = log2 *. A metric (or sequence ofmetrics) is called doubling precisely when its doubling dimension is bounded. This is a robust class of metric spaceswhich contains many families of metrics that occur in applied settings.We give tight bounds for embedding doubling metrics into (lowdimensional) normed spaces. We consider bothgeneral doubling metrics, as well as more restricted families such as those arising from trees, from graphs excludinga fixed minor, and from snowflaked metrics. Our techniques include decomposition theorems for doubling metrics, andan analysis of a fractal in the plane due to Laakso [21]. Finally, we discuss some applications and point out a centralopen question regarding dimensionality reduction in L2.
Meridian: A Lightweight Network Location Service without Virtual Coordinates
 In SIGCOMM
, 2005
"... This paper introduces a lightweight, scalable and accurate framework, called Meridian, for performing node selection based on network location. The framework consists of an overlay network structured around multiresolution rings, query routing with direct measurements, and gossip protocols for diss ..."
Abstract

Cited by 141 (7 self)
 Add to MetaCart
This paper introduces a lightweight, scalable and accurate framework, called Meridian, for performing node selection based on network location. The framework consists of an overlay network structured around multiresolution rings, query routing with direct measurements, and gossip protocols for dissemination. We show how this framework can be used to address three commonly encountered problems, namely, closest node discovery, central leader election, and locating nodes that satisfy target latency constraints in largescale distributed systems without having to compute absolute coordinates. We show analytically that the framework is scalable with logarithmic convergence when Internet latencies are modeled as a growthconstrained metric, a lowdimensional Euclidean metric, or a metric of low doubling dimension. Large scale simulations, based on latency measurements from 6.25 million nodepairs as well as an implementation deployed on PlanetLab show that the framework is accurate and effective.
Cover trees for nearest neighbor
 In Proceedings of the 23rd international conference on Machine learning
, 2006
"... ABSTRACT. We present a tree data structure for fast nearest neighbor operations in generalpoint metric spaces. The data structure requires space regardless of the metric’s structure. If the point set has an expansion constant � in the sense of Karger and Ruhl [KR02], the data structure can be const ..."
Abstract

Cited by 139 (0 self)
 Add to MetaCart
ABSTRACT. We present a tree data structure for fast nearest neighbor operations in generalpoint metric spaces. The data structure requires space regardless of the metric’s structure. If the point set has an expansion constant � in the sense of Karger and Ruhl [KR02], the data structure can be constructed in � time. Nearest neighbor queries obeying the expansion bound require � time. In addition, the nearest neighbor of points can be queried in time. We experimentally test the algorithm showing speedups over the brute force search varying between 1 and 2000 on natural machine learning datasets. 1.
Fast construction of nets in lowdimensional metrics and their applications
 SIAM Journal on Computing
, 2006
"... We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, s ..."
Abstract

Cited by 98 (11 self)
 Add to MetaCart
We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, spanner construction, compact representation scheme, doubling measure, and computation of the (approximate) Lipschitz constant of a function. In all cases, the running (preprocessing) time is near linear and the space being used is linear. 1
Nearestneighbor searching and metric space dimensions
 In NearestNeighbor Methods for Learning and Vision: Theory and Practice
, 2006
"... Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distan ..."
Abstract

Cited by 87 (0 self)
 Add to MetaCart
Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distance function as a “black box”. The structure is able to speed up nearest neighbor searching in a variety of settings, for example: points in lowdimensional or structured Euclidean space, strings under Hamming and edit distance, and bit vector data from an OCR application. The data structures are observed to need linear space, with a modest constant factor. The preprocessing time needed per site is observed to match the query time. The data structure can be viewed as an application of a “kdtree ” approach in the metric space setting, using Voronoi regions of a subset in place of axisaligned boxes. 1
Distance Estimation and Object Location via Rings of Neighbors
 In 24 th Annual ACM Symposium on Principles of Distributed Computing (PODC
, 2005
"... We consider four problems on distance estimation and object location which share the common flavor of capturing global information via informative node labels: lowstretch routing schemes [47], distance labeling [24], searchable small worlds [30], and triangulationbased distance estimation [33]. Fo ..."
Abstract

Cited by 64 (4 self)
 Add to MetaCart
We consider four problems on distance estimation and object location which share the common flavor of capturing global information via informative node labels: lowstretch routing schemes [47], distance labeling [24], searchable small worlds [30], and triangulationbased distance estimation [33]. Focusing on metrics of low doubling dimension, we approach these problems with a common technique called rings of neighbors, which refers to a sparse distributed data structure that underlies all our constructions. Apart from improving the previously known bounds for these problems, our contributions include extending Kleinberg’s small world model to doubling metrics, and a short proof of the main result in Chan et al. [14]. Doubling dimension is a notion of dimensionality for general metrics that has recently become a useful algorithmic concept in the theoretical computer science literature. 1
Routing in networks with low doubling dimension
 In 26 th International Conference on Distributed Computing Systems (ICDCS). IEEE Computer
, 2006
"... This paper studies compact routing schemes for networks with low doubling dimension. Two variants are explored, nameindependent routing and labeled routing. The key results obtained for this model are the following. First, we provide the first nameindependent solution. Specifically, we achieve con ..."
Abstract

Cited by 63 (8 self)
 Add to MetaCart
This paper studies compact routing schemes for networks with low doubling dimension. Two variants are explored, nameindependent routing and labeled routing. The key results obtained for this model are the following. First, we provide the first nameindependent solution. Specifically, we achieve constant stretch and polylogarithmic storage. Second, we obtain the first truly scalefree solutions, namely, the network’s aspect ratio is not a factor in the stretch. Scalefree schemes are given for three problem models: nameindependent routing on graphs, labeled routing on metric spaces, and labeled routing on graphs. Third, we prove a lower bound requiring linear storage for stretch < 3 schemes. This has the important ramification of separating for the first time the nameindependent problem model from the labeled model for these networks, since compact stretch1+ε labeled schemes are known to be possible.
BMultiProbe LSH: Efficient indexing for highdimensional similarity search
 in Proc. 33rd Int. Conf. Very Large Data Bases
"... Similarity indices for highdimensional data are very desirable for building contentbased search systems for featurerich data such as audio, images, videos, and other sensor data. Recently, locality sensitive hashing (LSH) and its variations have been proposed as indexing techniques for approximate ..."
Abstract

Cited by 51 (3 self)
 Add to MetaCart
Similarity indices for highdimensional data are very desirable for building contentbased search systems for featurerich data such as audio, images, videos, and other sensor data. Recently, locality sensitive hashing (LSH) and its variations have been proposed as indexing techniques for approximate similarity search. A significant drawback of these approaches is the requirement for a large number of hash tables in order to achieve good search quality. This paper proposes a new indexing scheme called multiprobe LSH that overcomes this drawback. Multiprobe LSH is built on the wellknown LSH technique, but it intelligently probes multiple buckets that are likely to contain query results in a hash table. Our method is inspired by and improves upon recent theoretical work on entropybased LSH designed to reduce the space requirement of the basic LSH method. We have implemented the multiprobe LSH method and evaluated the implementation with two different highdimensional datasets. Our evaluation shows that the multiprobe LSH method substantially improves upon previously proposed methods in both space and time efficiency. To achieve the same search quality, multiprobe LSH has a similar timeefficiency as the basic LSH method while reducing the number of hash tables by an order of magnitude. In comparison with the entropybased LSH method, to achieve the same search quality, multiprobe LSH uses less query time and 5 to 8 times fewer number of hash tables. 1.