Results 1  10
of
33
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces
, 1998
"... We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a spaceefficient data structure that would allow us to ..."
Abstract

Cited by 191 (9 self)
 Add to MetaCart
We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a spaceefficient data structure that would allow us to search, given a query vector, for the closest or nearly closest vector in the database. We also address this problem when distances are measured by the L 1 norm, and in the Hamming cube. Significantly improving and extending recent results of Kleinberg, we construct data structures whose size is polynomial in the size of the database, and search algorithms that run in time nearly linear or nearly quadratic in the dimension (depending on the case; the extra factors are polylogarithmic in the size of the database). Computer Science Department, Technion  IIT, Haifa 32000, Israel. Email: eyalk@cs.technion.ac.il y Bell Communications Research, MCC1C365B, 445 South Street, Morristown, NJ ...
Two Algorithms for NearestNeighbor Search in High Dimensions
, 1997
"... Representing data as points in a highdimensional space, so as to use geometric methods for indexing, is an algorithmic technique with a wide array of uses. It is central to a number of areas such as information retrieval, pattern recognition, and statistical data analysis; many of the problems aris ..."
Abstract

Cited by 172 (0 self)
 Add to MetaCart
Representing data as points in a highdimensional space, so as to use geometric methods for indexing, is an algorithmic technique with a wide array of uses. It is central to a number of areas such as information retrieval, pattern recognition, and statistical data analysis; many of the problems arising in these applications can involve several hundred or several thousand dimensions. We consider the nearestneighbor problem for ddimensional Euclidean space: we wish to preprocess a database of n points so that given a query point, one can efficiently determine its nearest neighbors in the database. There is a large literature on algorithms for this problem, in both the exact and approximate cases. The more sophisticated algorithms typically achieve a query time that is logarithmic in n at the expense of an exponential dependence on the dimension d; indeed, even the averagecase analysis of heuristics such as kd trees reveals an exponential dependence on d in the query time. In this wor...
Computation in Noisy Radio Networks
 in Proc. 9th Ann. ACMSIAM Symp. on Discrete Algorithms
"... In this paper we examine noisy radio (broadcast) networks in which every bit transmitted has a certain probability to be flipped. Each processor has some initial input bit, and the goal is to compute a function of the initial inputs. In this model we show a protocol to compute any threshold function ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
In this paper we examine noisy radio (broadcast) networks in which every bit transmitted has a certain probability to be flipped. Each processor has some initial input bit, and the goal is to compute a function of the initial inputs. In this model we show a protocol to compute any threshold function using only a linear number of transmissions. 1 Introduction The influence of noise (or faults) on the complexity of computation was studied in many contexts. In particular people were interested in random noise. In a typical such scenario, it is assumed that the outcome of each operation is noisy with some fixed probability p and all the faults are independent. Usually, if t is the number of operations performed by the computation, then by repeating each operation O(log t) times and taking the majority of the results, one can ensure a constant probability of error at the cost of O(t log t) operations. It is desirable however to obtain a cost of O(t) (i.e., increase only by a constant fa...
The Communication Complexity of Threshold Gates
 In Proceedings of “Combinatorics, Paul Erdos is Eighty
, 1994
"... We prove upper bounds on the randomized communication complexity of evaluating a threshold gate (with arbitrary weights). For linear threshold gates this is done in the usual 2 party communication model, and for degreed threshold gates this is done in the multiparty model. We then use these upp ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
We prove upper bounds on the randomized communication complexity of evaluating a threshold gate (with arbitrary weights). For linear threshold gates this is done in the usual 2 party communication model, and for degreed threshold gates this is done in the multiparty model. We then use these upper bounds together with known lower bounds for communication complexity in order to give very easy proofs for lower bounds in various models of computation involving threshold gates. This generalizes several known bounds and answers several open problems.
Noisy sorting without resampling
 In SODA ’08: Proceedings of the 19th ACMSIAM Symposium on Discrete algorithms
, 2008
"... In this paper we study noisy sorting without resampling. In this problem there is an unknown order aπ(1) <... < aπ(n) where π is a permutation on n elements. The input is the status of () n 2 queries of the form q(ai, xj), where q(ai, aj) = + with probability at least 1/2 + γ if π(i)> π(j ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
In this paper we study noisy sorting without resampling. In this problem there is an unknown order aπ(1) <... < aπ(n) where π is a permutation on n elements. The input is the status of () n 2 queries of the form q(ai, xj), where q(ai, aj) = + with probability at least 1/2 + γ if π(i)> π(j) for all pairs i ̸ = j, where γ> 0 is a constant and q(ai, aj) = −q(aj, ai) for all i and j. It is assumed that the errors are independent. Given the status of the queries the goal is to find the maximum likelihood order. In other words, the goal is find a permutation σ that minimizes the number of pairs σ(i)> σ(j) where q(σ(i), σ(j)) = −. The problem so defined is the feedback arc set problem on distributions of inputs, each of which is a tournament obtained as a noisy perturbations of a linear order. Note that when γ < 1/2 and n is large, it is impossible to recover the original order π. It is known that the weighted feedback are set problem on tournaments is NPhard in general. Here we present an algorithm of running time nO(γ−4) and sampling complexity Oγ(n log n) that with high probability solves the noisy sorting without resampling problem. We also show that if a σ(1), a σ(2),..., a σ(n) is an optimal solution of the problem then it is “close ” to the original order. More formally, with high probability it holds that ∑ i σ(i)−π(i)  = Θ(n) and maxi σ(i)−π(i)  = Θ(log n). Our results are of interest in applications to ranking, such as ranking in sports, or ranking of search items based on comparisons by experts. C.S. University of Toronto, partially supported by and NSERC CGS scholarship. Part of the work was done while on a visit to
The fast JohnsonLindenstrauss transform and approximate nearest neighbors
 SIAM J. Comput
, 2009
"... Abstract. We introduce a new lowdistortion embedding of ℓd n) 2 into ℓO(log p (p =1, 2) called the fast Johnson–Lindenstrauss transform (FJLT). The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
Abstract. We introduce a new lowdistortion embedding of ℓd n) 2 into ℓO(log p (p =1, 2) called the fast Johnson–Lindenstrauss transform (FJLT). The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for lowdistortion embeddings. We overcome this handicap by exploiting the “Heisenberg principle ” of the Fourier transform, i.e., its localglobal duality. The FJLT can be used to speed up search algorithms based on lowdistortion embeddings in ℓ1 and ℓ2. We consider the case of approximate nearest neighbors in ℓd 2. We provide a faster algorithm using classical projections, which we then speed up further by plugging in the FJLT. We also give a faster algorithm for searching over the hypercube.
Three Thresholds for a Liar
 Combinatorics, Probability and Computing
, 1992
"... Motivated by the problem of making correct computations from partly false information, we study a corruption of the classic game "Twenty Questions" in which the player who answers the yesorno questions is permitted to lie up to a fixed fraction r of the time. The other player is allowed ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
Motivated by the problem of making correct computations from partly false information, we study a corruption of the classic game "Twenty Questions" in which the player who answers the yesorno questions is permitted to lie up to a fixed fraction r of the time. The other player is allowed q arbitrary questions with which to try to determine, with certainty, which of n objects his opponent has in mind; he "wins" if he can always do so, and "wins quickly" if he can do so using only O(log n) questions. It turns out that there is a threshold value for r below which the querier can win quickly, and above which he cannot win at all. However, the threshold value varies according to the precise rules of the game. Our "three thresholds theorem" says that when the answerer is forbidden at any point to have answered more than a fraction r of the questions incorrectly, then the threshold value is r = 1 2 ; when the requirement is merely that the total number of lies cannot exceed rq, the threshol...
Mixing times of the biased card shuffling and the asymmetric exclusion process
 Trans. Amer. Math. Soc
, 2005
"... Abstract. Consider the following method of card shuffling. Start with a deck of N cards numbered 1 through N. Fix a parameter p between 0 and 1. In this model a “shuffle ” consists of uniformly selecting a pair of adjacent cards and then flipping a coin that is heads with probability p. If the coin ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
Abstract. Consider the following method of card shuffling. Start with a deck of N cards numbered 1 through N. Fix a parameter p between 0 and 1. In this model a “shuffle ” consists of uniformly selecting a pair of adjacent cards and then flipping a coin that is heads with probability p. If the coin comes up heads, then we arrange the two cards so that the lowernumbered card comes before the highernumbered card. If the coin comes up tails, then we arrange the cards with the highernumbered card first. In this paper we prove that for all p � = 1/2, the mixing time of this card shuffling is O(N 2), as conjectured by Diaconis and Ram (2000). Our result is a rare case of an exact estimate for the convergence rate of the Metropolis algorithm. A novel feature of our proof is that the analysis of an infinite (asymmetric exclusion) process plays an essential role in bounding the mixing time of a finite process. 1.
ErrorResilient DNA Computation
, 1997
"... The DNA model of computation, with test tubes of DNA molecules encoding bit sequences, is based on three primitives, ExtractABit, which splits a test tube into two test tubes according to the value of a particular bit x, MergeTwo Tubes and DetectEmptiness. Perfect operations can test the satisfi ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
The DNA model of computation, with test tubes of DNA molecules encoding bit sequences, is based on three primitives, ExtractABit, which splits a test tube into two test tubes according to the value of a particular bit x, MergeTwo Tubes and DetectEmptiness. Perfect operations can test the satisfiability of any boolean formula in linear time. However, in reality the Extract operation is faulty; it misclassifies a certain proportion of the strands. We consider the following problem: given an algorithm based on perfect Extract, Merge and Detect operations, convert it to one that works correctly with high probability when the Extract operation is faulty. The fundamental problem in such a conversion is to construct a sequence of faulty Extracts and perfect Merges that simulates a highly reliable Extract operation. We first determine (up to a small constant factor) the minimum number of faulty Extract operations inherently required to simulate a highly reliable Extract operation. We then ...