Results 1  10
of
576
An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions
 ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1994
"... Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any po ..."
Abstract

Cited by 786 (31 self)
 Add to MetaCart
Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any positive real ffl, a data point p is a (1 + ffl)approximate nearest neighbor of q if its distance from q is within a factor of (1 + ffl) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of n points in R d in O(dn log n) time and O(dn) space, so that given a query point q 2 R d , and ffl ? 0, a (1 + ffl)approximate nearest neighbor of q can be computed in O(c d;ffl log n) time, where c d;ffl d d1 + 6d=ffle d is a factor depending only on dimension and ffl. In general, we show that given an integer k 1, (1 + ffl)approximations to the k nearest neighbors of q can be computed in additional O(kd log n) time.
Scalable Recognition with a Vocabulary Tree
 IN CVPR
, 2006
"... A recognition scheme that scales efficiently to a large number of objects is presented. The efficiency and quality is exhibited in a live demonstration that recognizes CDcovers from a database of 40000 images of popular music CD's. The scheme ..."
Abstract

Cited by 623 (0 self)
 Add to MetaCart
A recognition scheme that scales efficiently to a large number of objects is presented. The efficiency and quality is exhibited in a live demonstration that recognizes CDcovers from a database of 40000 images of popular music CD's. The scheme
Similarity search in high dimensions via hashing
, 1999
"... The nearest or nearneighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building search/index structures for performing similarity search over highdimensional data, e.g., image dat ..."
Abstract

Cited by 415 (12 self)
 Add to MetaCart
The nearest or nearneighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building search/index structures for performing similarity search over highdimensional data, e.g., image databases, document collections, timeseries databases, and genome databases. Unfortunately, all known techniques for solving this problem fall prey to the \curse of dimensionality. " That is, the data structures scale poorly with data dimensionality; in fact, if the number of dimensions exceeds 10 to 20, searching in kd trees and related structures involves the inspection of a large fraction of the database, thereby doing no better than bruteforce linear search. It has been suggested that since the selection of features and the choice of a distance metric in typical applications is rather heuristic, determining an approximate nearest neighbor should su ce for most practical purposes. In this paper, we examine a novel scheme for approximate similarity search based on hashing. The basic idea is to hash the points
A Simple Proof of the Restricted Isometry Property for Random Matrices
 CONSTR APPROX
, 2008
"... We give a simple technique for verifying the Restricted Isometry Property (as introduced by Candès and Tao) for random matrices that underlies Compressed Sensing. Our approach has two main ingredients: (i) concentration inequalities for random inner products that have recently provided algorithmical ..."
Abstract

Cited by 301 (56 self)
 Add to MetaCart
We give a simple technique for verifying the Restricted Isometry Property (as introduced by Candès and Tao) for random matrices that underlies Compressed Sensing. Our approach has two main ingredients: (i) concentration inequalities for random inner products that have recently provided algorithmically simple proofs of the Johnson–Lindenstrauss lemma; and (ii) covering numbers for finitedimensional balls in Euclidean space. This leads to an elementary proof of the Restricted Isometry Property and brings out connections between Compressed Sensing and the Johnson–Lindenstrauss lemma. As a result, we obtain simple and direct proofs of Kashin’s theorems on widths of finite balls in Euclidean space (and their improvements due to Gluskin) and proofs of the existence of optimal Compressed Sensing measurement matrices. In the process, we also prove that these measurements have a certain universality with respect to the sparsityinducing basis.
Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation
, 2000
"... In this paper we show several results obtained by combining the use of stable distributions with pseudorandom generators for bounded space. In particular: ffl we show how to maintain (using only O(log n=ffl 2 ) words of storage) a sketch C(p) of a point p 2 l n 1 under dynamic updates of its coo ..."
Abstract

Cited by 263 (15 self)
 Add to MetaCart
In this paper we show several results obtained by combining the use of stable distributions with pseudorandom generators for bounded space. In particular: ffl we show how to maintain (using only O(log n=ffl 2 ) words of storage) a sketch C(p) of a point p 2 l n 1 under dynamic updates of its coordinates, such that given sketches C(p) and C(q) one can estimate jp \Gamma qj 1 up to a factor of (1 + ffl) with large probability. This solves the main open problem of [10]. ffl we obtain another sketch function C 0 which maps l n 1 into a normed space l m 1 (as opposed to C), such that m = m(n) is much smaller than n; to our knowledge this is the first dimensionality reduction lemma for l 1 norm ffl we give an explicit embedding of l n 2 into l n O(log n) 1 with distortion (1 + 1=n \Theta(1) ) and a nonconstructive embedding of l n 2 into l O(n) 1 with distortion (1 + ffl) such that the embedding can be represented using only O(n log 2 n) bits (as opposed to at least...
Nearoptimal hashing algorithms for approximate nearest neighbor in high dimensions
, 2008
"... In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The ..."
Abstract

Cited by 237 (4 self)
 Add to MetaCart
In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The problem is of significant interest in a wide variety of areas.
Similarity estimation techniques from rounding algorithms
 In Proc. of 34th STOC
, 2002
"... A locality sensitive hashing scheme is a distribution on a family F of hash functions operating on a collection of objects, such that for two objects x, y, Prh∈F[h(x) = h(y)] = sim(x,y), where sim(x,y) ∈ [0, 1] is some similarity function defined on the collection of objects. Such a scheme leads ..."
Abstract

Cited by 230 (6 self)
 Add to MetaCart
A locality sensitive hashing scheme is a distribution on a family F of hash functions operating on a collection of objects, such that for two objects x, y, Prh∈F[h(x) = h(y)] = sim(x,y), where sim(x,y) ∈ [0, 1] is some similarity function defined on the collection of objects. Such a scheme leads to a compact representation of objects so that similarity of objects can be estimated from their compact sketches, and also leads to efficient algorithms for approximate nearest neighbor search and clustering. Minwise independent permutations provide an elegant construction of such a locality sensitive hashing scheme for a collection of subsets with the set similarity measure sim(A, B) = A∩B A∪B . We show that rounding algorithms for LPs and SDPs used in the context of approximation algorithms can be viewed as locality sensitive hashing schemes for several interesting collections of objects. Based on this insight, we construct new locality sensitive hashing schemes for: 1. A collection of vectors with the distance between ⃗u and ⃗v measured by θ(⃗u,⃗v)/π, where θ(⃗u,⃗v) is the angle between ⃗u and ⃗v. This yields a sketching scheme for estimating the cosine similarity measure between two vectors, as well as a simple alternative to minwise independent permutations for estimating set similarity. 2. A collection of distributions on n points in a metric space, with distance between distributions measured by the Earth Mover Distance (EMD), (a popular distance measure in graphics and vision). Our hash functions map distributions to points in the metric space such that, for distributions P and Q,
A search engine for 3d models
 ACM Transactions on Graphics
, 2003
"... As the number of 3D models available on the Web grows, there is an increasing need for a search engine to help people find them. Unfortunately, traditional textbased search techniques are not always effective for 3D data. In this paper, we investigate new shapebased search methods. The key challen ..."
Abstract

Cited by 228 (21 self)
 Add to MetaCart
As the number of 3D models available on the Web grows, there is an increasing need for a search engine to help people find them. Unfortunately, traditional textbased search techniques are not always effective for 3D data. In this paper, we investigate new shapebased search methods. The key challenges are to develop query methods simple enough for novice users and matching algorithms robust enough to work for arbitrary polygonal models. We present a webbased search engine system that supports queries based on 3D sketches, 2D sketches, 3D
RapidlyExploring Random Trees: Progress and Prospects
 Algorithmic and Computational Robotics: New Directions
, 2000
"... this paper, which presents randomized, algorithmic techniques for path planning that are particular suited for problems that involve dierential constraints. ..."
Abstract

Cited by 228 (25 self)
 Add to MetaCart
this paper, which presents randomized, algorithmic techniques for path planning that are particular suited for problems that involve dierential constraints.
Finding motifs using random projections
, 2001
"... Pevzner and Sze [23] considered a precise version of the motif discovery problem and simultaneously issued an algorithmic challenge: find a motif Å of length 15, where each planted instance differs from Å in 4 positions. Whereas previous algorithms all failed to solve this (15,4)motif problem, Pevz ..."
Abstract

Cited by 211 (5 self)
 Add to MetaCart
Pevzner and Sze [23] considered a precise version of the motif discovery problem and simultaneously issued an algorithmic challenge: find a motif Å of length 15, where each planted instance differs from Å in 4 positions. Whereas previous algorithms all failed to solve this (15,4)motif problem, Pevzner and Sze introduced algorithms that succeeded. However, their algorithms failed to solve the considerably more difficult (14,4), (16,5), and (18,6)motif problems. We introduce a novel motif discovery algorithm based on the use of random projections of the input’s substrings. Experiments on simulated data demonstrate that this algorithm performs better than existing algorithms and, in particular, typically solves the difficult (14,4), (16,5), and (18,6)motif problems quite efficiently. A probabilistic estimate shows that the small values of � for which the algorithm fails to recover the planted Ð � �motif are in all likelihood inherently impossible to solve. We also present experimental results on realistic biological data by identifying ribosome binding sites in prokaryotes as well as a number of known transcriptional regulatory motifs in eukaryotes. 1. CHALLENGING MOTIF PROBLEMS Pevzner and Sze [23] considered a very precise version of the motif discovery problem of computational biology, which had also been considered by Sagot [26]. Based on this formulation, they issued an algorithmic challenge: Planted Ð � �Motif Problem: Suppose there is a fixed but unknown nucleotide sequence Å (the motif) of length Ð. The problem is to determine Å, givenØ nucleotide sequences each of length Ò, and each containing a planted variant of Å. More precisely, each such planted variant is a substring that is Å with exactly � point substitutions. One instantiation that they labeled “The Challenge Problem ” was parameterized as finding a planted (15,4)motif in Ø � sequences each of length Ò � �. These values of Ò, Ø, andÐ are