## A Randomized Approximate Nearest Neighbors Algorithm (2010)

Citations: | 2 - 0 self |

### BibTeX

@MISC{Jones10arandomized,

author = {Peter W. Jones and Andrei Osipov and Vladimir Rokhlin},

title = {A Randomized Approximate Nearest Neighbors Algorithm},

year = {2010}

}

### OpenURL

### Abstract

We present a randomized algorithm for the approximate nearest neighbor problem in d-dimensional Euclidean space. Given N points {xj} in R d, the algorithm attempts to find k nearest neighbors for each of xj, where k is a user-specified integer parameter. The algorithm is iterative, and its CPU time requirements are proportional to T ·N ·(d·(log d)+ k · (log k) · (log N)) + N · k 2 · (d + log k), with T the number of iterations performed. The memory requirements of the procedure are of the order N · (d + k). A byproduct of the scheme is a data structure, permitting a rapid search for the k nearest neighbors among {xj} for an arbitrary point x ∈ R d. The cost of each such query is proportional to T · (d · (log d) + log(N/k) + k 2 · (d + log k)), and the memory requirements for the requisite data structure are of the order N · (d + k) + T · (d + N · k). The algorithm utilizes random rotations and a basic divide-and-conquer scheme, followed by a local graph search. We analyze the scheme’s behavior for certain types of distributions

### Citations

3011 |
Probability and measure
- Billingsley
- 2008
(Show Context)
Citation Context ...e average of f over Q by the formula Q Avg(f) = Q ∫ Q f ∫ Q 1. (18) 62.3 Probability In this section, we summarize some well known facts from the probability theory. These facts can be found in [1], =-=[6]-=-, [7], [8]. We say that the discrete random variable X has binomial distribution Bin(N, p) with integer parameter N > 0 and real parameter 0 < p < 1, if for all integer k = 1, . . .,N the probability ... |

1451 |
An Introduction to Probability Theory
- Feller
- 1966
(Show Context)
Citation Context ...rage of f over Q by the formula Q Avg(f) = Q ∫ Q f ∫ Q 1. (18) 62.3 Probability In this section, we summarize some well known facts from the probability theory. These facts can be found in [1], [6], =-=[7]-=-, [8]. We say that the discrete random variable X has binomial distribution Bin(N, p) with integer parameter N > 0 and real parameter 0 < p < 1, if for all integer k = 1, . . .,N the probability that ... |

1360 |
Real and Complex Analysis
- Rudin
- 1987
(Show Context)
Citation Context ...ates coincide with the corresponding symbols of the word σ. 2.2 Analysis In this section, we summarize some well known facts from the real and complex analysis. These facts can be found in [1], [11], =-=[15]-=-, [16]. Suppose that x > 0 is a positive real number. In agreement with the standard practice, we define the real gamma function by the formula Γ(x) = ∫ ∞ t 0 x−1 e −t dt. (7) 5Suppose that d > 1 is ... |

773 | A.: An optimal algorithm for approximate nearest neighbor searching in fixed dimensions
- Arya, Mount, et al.
- 1998
(Show Context)
Citation Context ...s log 2(N/k). This means that we can use the T trees generated, and then pass to Step 7 (see Sections 4.2.4, 4.3.4). Almost all known techniques for solving ANN problems use tree structures (see e.g. =-=[5]-=-, [9]). Two apparently novel features of our method are the use of fast random rotations (Step 1), and the local graph search (Step 7), which dramatically increases the accuracy of the scheme. We use ... |

405 |
Extensions of Lipschitz mappings into a Hilbert space
- Johnson, Lindenstrauss
- 1984
(Show Context)
Citation Context ...ying phenomenon; namely the Johnson-Lindenstrauss Lemma. (The JL Lemma roughly states that projection of N points on a random subspace of dimension C(ε) ·(log N) has expected distortion 1+ε, see e.g. =-=[10]-=-.) We have chosen to use random rotations in place of the usual random projections generated by selecting random Gaussian vectors. The fast random rotations require O(d · (log d)) operations, which is... |

235 | Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
- Andoni, Indyk
(Show Context)
Citation Context ...f the k chosen suspected nearest neighbors), for different types of randomly generated data sets consisting of N points in R d . Unlike other ANN algorithms that have been recently proposed (see e.g. =-=[9]-=-), the method of this paper does not use locality-sensitive hashing. Instead we use a simple randomized divide-and-conquer approach. The basic algorithm is iterated several times, and then followed by... |

221 | An introduction to Harmonic Analysis
- Katznelson
- 1976
(Show Context)
Citation Context ...oordinates coincide with the corresponding symbols of the word σ. 2.2 Analysis In this section, we summarize some well known facts from the real and complex analysis. These facts can be found in [1], =-=[11]-=-, [15], [16]. Suppose that x > 0 is a positive real number. In agreement with the standard practice, we define the real gamma function by the formula Γ(x) = ∫ ∞ t 0 x−1 e −t dt. (7) 5Suppose that d >... |

31 | Almost optimal unrestricted fast johnson-lindenstrauss transform. Available at http://arxiv.org/abs/1005.5513
- Ailon, Liberty
- 2010
(Show Context)
Citation Context ... of application of Θ to a vector x ∈ R d is also given by the formula (62). Remark 1. The use of the Hadamard matrix (without 2 ×2 rotations) appears in a related problem studied by Ailon and Liberty =-=[4]-=-. 123 Analytical Apparatus The purpose of this section is to provide the analytical apparatus to be used in the rest of the paper. The following theorem generalizes Theorem 2 in Section 2.3. Its proo... |

22 | The fast Johnson-Lindenstrauss transform and approximate nearest neighbors - Ailon, Chazelle |

22 |
Seminumerical Algorithms, vol. 2 of The Art of Computer Programming
- Knuth
- 1998
(Show Context)
Citation Context ...ed by (59), (60), respectively, preserve the norm of any vector x ∈ R d . Therefore, F (d) is a real orthogonal transformation R d → R d . The cost of the generation of a random permutation (see e.g. =-=[12]-=-) is O(d) operations. The cost of the application of each P (d) j to a vector x ∈ Rd is obviously d operations due to (54). The cost of generation of d − 1 uniform random variables is O(d) operations.... |

1 |
Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, Mark Tygert, Randomized algorithms for the low-rank approximation of matrices
- Liberty, Franco
- 2007
(Show Context)
Citation Context ...andom projections generated by selecting random Gaussian vectors. The fast random rotations require O(d · (log d)) operations, which is an improvement over methods using random projections (see [13], =-=[14]-=-). The N × k lookup table arising in Step 7 is the adjacency matrix of a graph whose vertices are the points {xj}. In Step 7 we perform a depth one search on this graph, and obtain ≤ k + k 2 ”candidat... |

1 |
63 2 / d 0.7 0.6 0.5 0.4 0.3 D true vs. D susp D true / d D susp / d 0.2 d (a) Square of the distances to true nearest neighbors and suspects. 1.8 Ratio of D susp to D true D susp / D true 1.6 1.4 1.2 d (b) Ratio of the average square of the distances, su
- Rudin, Real, et al.
- 1970
(Show Context)
Citation Context ...oincide with the corresponding symbols of the word σ. 2.2 Analysis In this section, we summarize some well known facts from the real and complex analysis. These facts can be found in [1], [11], [15], =-=[16]-=-. Suppose that x > 0 is a positive real number. In agreement with the standard practice, we define the real gamma function by the formula Γ(x) = ∫ ∞ t 0 x−1 e −t dt. (7) 5Suppose that d > 1 is an int... |