#### DMCA

## Quality and Efficiency in High Dimensional Nearest Neighbor Search

### Cached

### Download Links

Citations: | 32 - 1 self |

### Citations

1265 | The R*-tree: An Efficient and Robust Access Method for Points and Rectangles,
- Beckmann
- 1990
(Show Context)
Citation Context ...ential scan. Research on high-dimensional NN search can be divided into exact and approximate retrieval. In the exact category, Lin et al. [32] propose the TV-tree which improves conventional R-trees =-=[5]-=- by creating MBRs only in selected subspaces. Weber et al. [36] design the VA-file, which compresses the dataset to minimize the cost of sequential scan. Also based on the idea of compression, Berchto... |

1034 | Approximate nearest neighbors: Towards removing the curse of dimensionality.
- Indyk, Motwani
- 1998
(Show Context)
Citation Context ...query is one where the query point q is much closer to its NN than to most data points. This is true in many applications involving high-dimensional data, as supported by a large body of recent works =-=[1, 3, 14, 15, 16, 18, 21, 22, 23, 25, 26, 31, 33, 34]-=-. Sequential scan trivially solves a NN query by examining the entire dataset D, but its cost grows linearly with the cardinality of D. Ideally, a practical solution should satisfy two requirements: (... |

985 | An optimal algorithm for approximate nearest neighbor searching in fixed dimensions”, - Arya, Mount, et al. - 1998 |

717 | Optimal aggregation algorithms for middleware. - Fagin, Lotem, et al. - 2001 |

686 | Multidimensional access methods.
- Gaede, Gunther
- 1998
(Show Context)
Citation Context ...herefore, there are U/w cells per dimension, and totally (U/w) m cells in the whole grid. Given the grid, calculating the Z-order value z(o) of G(o) is a standard process well-known in the literature =-=[20]-=-. Let u = log2(U/w). Each z(o) is thus a binary string with um bits. Example. To illustrate the conversion, assume that the dataset D consists of 4 two-dimensional points o1, o2, ..., o4 as shown in F... |

642 | Similarity search in high dimensions via hashing. In
- Gionis, Indyk, et al.
- 1999
(Show Context)
Citation Context ...query is one where the query point q is much closer to its NN than to most data points. This is true in many applications involving high-dimensional data, as supported by a large body of recent works =-=[1, 3, 14, 15, 16, 18, 21, 22, 23, 25, 26, 31, 33, 34]-=-. Sequential scan trivially solves a NN query by examining the entire dataset D, but its cost grows linearly with the cardinality of D. Ideally, a practical solution should satisfy two requirements: (... |

625 | A quantitative analysis and performance study for similarity-search methods in high dimensional spaces, in:
- Weber, Schek, et al.
- 1998
(Show Context)
Citation Context ...high dimensional space. Many algorithms (e.g., those based on data or space partitioning indexes [20]) that perform nicely on low dimensional data, deteriorate rapidly as the dimensionality increases =-=[10, 36]-=-, and are eventually outperformed even by sequential scan. Research on high-dimensional NN search can be divided into exact and approximate retrieval. In the exact category, Lin et al. [32] propose th... |

592 | Nearest Neighbor Queries,”
- Roussopoulos, Kelley, et al.
- 1995
(Show Context)
Citation Context ...ecrease as c increases. This provides a graceful tradeoff between quality and efficiency. We leave the details to the full paper. 7. RELATED WORK NN search is well understood in low dimensional space =-=[24, 35]-=-. This problem, however, becomes much more difficult in high dimensional space. Many algorithms (e.g., those based on data or space partitioning indexes [20]) that perform nicely on low dimensional da... |

523 | Locality-sensitive hashing scheme based on p-stable distributions.
- Datar, Immorlica, et al.
- 2004
(Show Context)
Citation Context ...query is one where the query point q is much closer to its NN than to most data points. This is true in many applications involving high-dimensional data, as supported by a large body of recent works =-=[1, 3, 14, 15, 16, 18, 21, 22, 23, 25, 26, 31, 33, 34]-=-. Sequential scan trivially solves a NN query by examining the entire dataset D, but its cost grows linearly with the cardinality of D. Ideally, a practical solution should satisfy two requirements: (... |

520 | Lof: Identifying density-based local outliers.
- Breunig, Kriegel, et al.
- 2000
(Show Context)
Citation Context ...er than if q falls in cluster 2. Hence, it is impossible to choose an rm that closely captures the NN distances of all queries. Note that clusters with different densities are common in real datasets =-=[11]-=-. Figure 4: No good rm exists if clusters have different densities Recently, Lv et al. [33] present a variation of adhoc-LSH with less space consumption. This variation, however, suffers from the same... |

460 | Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions.
- Andoni, Indyk
- 2008
(Show Context)
Citation Context |

409 | When is ”nearest neighbor” meaningful?,
- Beyer, Goldstein, et al.
- 1999
(Show Context)
Citation Context ...t to q. Formally, there is no other point o ∈ D satisfying ‖o, q‖ < ‖o ∗ , q‖, where ‖, ‖ denotes the distance of two points. In this paper, we consider high-dimensional NN search. Some studies argue =-=[9]-=- that high-dimensional NN queries may not be meaningful. On the other hand, there is also evidence [6] that such an argument is based on restrictive assumptions. Intuitively, a meaningful query is one... |

391 | Distance browsing in spatial databases.
- Hjaltason, Samet
- 1999
(Show Context)
Citation Context ...ecrease as c increases. This provides a graceful tradeoff between quality and efficiency. We leave the details to the full paper. 7. RELATED WORK NN search is well understood in low dimensional space =-=[24, 35]-=-. This problem, however, becomes much more difficult in high dimensional space. Many algorithms (e.g., those based on data or space partitioning indexes [20]) that perform nicely on low dimensional da... |

218 | The TV-tree: an index structure for high dimensional data,
- Lin, Jagadish, et al.
- 1995
(Show Context)
Citation Context ...on 7), with a single exception to be explained shortly, all the existing solutions violate at least one of the above requirements. Specifically, the majority of them (e.g., those based on new indexes =-=[2, 22, 23, 25, 32]-=-) demand non-relational features, and thus cannot be incorporated in a commercial system. There also exist relational solutions (such as iDistance [27] and MedRank [16]), which are experimentally show... |

163 | Evaluating Top-k Selection Queries,
- Chaudhuri, Gravano
- 1999
(Show Context)
Citation Context ...e dataset to minimize the cost of sequential scan. Also based on the idea of compression, Berchtold et al. [7] develop the IQ-tree, combining features of the R-tree and VA-file. Chaudhuri and Gravano =-=[12]-=- perform NN search by converting it to range queries. In [8] Berchtold et al. provide a solution leveraging high-dimensional Voronoi diagrams, whereas Korn et al. [28] tackle the problem by utilizing ... |

153 | Efficient similarity search and classification via rank aggregation.
- Fagin, Kumar, et al.
- 2003
(Show Context)
Citation Context ...t clusters according to their radii. Houle and Sakuma [25] develop SASH which is designed for memoryresident data, but is not suitable for disk-oriented data due to severe I/O thrashing. Fagin et al. =-=[16]-=- develop the MedRank technique that converts the dataset to several sorted lists by projecting the data onto different vectors. To answer a query, MedRank traverses these lists in a way similar to the... |

150 | Navigating nets: Simple algorithms for proximity search.
- Krauthgamer, Lee
- 2004
(Show Context)
Citation Context ... distance from q to its exact NN o ∗ , namely, ‖o, q‖ ≤ c‖o ∗ , q‖, where c ≥ 1 is the approximation ratio. It is widely recognized that approximate NNs already fulfill the needs of many applications =-=[1, 2, 3, 15, 18, 21, 23, 25, 26, 30, 31, 33, 34]-=-. LSH is originally proposed as a theoretical method [26] with attractive asymptotical space and query performance. As elaborated in Section 3, its practical implementation can be either rigorous or a... |

137 | The string b-tree: a new data structure for string search in external memory and its applications.
- Ferragina, Grossi
- 1999
(Show Context)
Citation Context ...fetches at most 4Bl/d leaf entries. The cost of (i) is bounded by O(lE). As a leaf entry consumes O(d) words, 4Bl/d of them occupy at most O(l) pages. By implementing each LSB-tree as a string B-tree =-=[19]-=-, the height E is bounded by O(log B(n/B)), resulting in query complexity O(log B(n/B) · √ dn/B). 5.3 Comparison with rigorous-LSH As discussed in Section 3, for 4-approximate NN search, rigorous-LSH ... |

117 | Multiprobe lsh: Efficient indexing for high-dimensional similarity search.
- Lv, Josephson, et al.
- 2007
(Show Context)
Citation Context |

97 | A replacement for Voronoi diagrams of near linear size,
- Har-Peled
- 2001
(Show Context)
Citation Context |

93 | iDistance: An adaptive B+-tree based indexing method for nearest neighbor search.
- Jagadish, Ooi, et al.
- 2005
(Show Context)
Citation Context ...ms, whereas Korn et al. [28] tackle the problem by utilizing the fractal dimensionality of the dataset. Koudas et al. [29] give a bitmap-based approach. The state of the art is due to Jagadish et al. =-=[27]-=-. They develop the iDistance technique that converts high-dimensional points to 1D values, which are indexed using a B-tree for NN processing. We will compare our solution to iDistance experimentally.... |

91 | Independent quantization: An index compression technique for high-dimensional data spaces.
- Berchtold, Bohm, et al.
- 2000
(Show Context)
Citation Context ...ng MBRs only in selected subspaces. Weber et al. [36] design the VA-file, which compresses the dataset to minimize the cost of sequential scan. Also based on the idea of compression, Berchtold et al. =-=[7]-=- develop the IQ-tree, combining features of the R-tree and VA-file. Chaudhuri and Gravano [12] perform NN search by converting it to range queries. In [8] Berchtold et al. provide a solution leveragin... |

59 | LSH forest: self-tuning indexes for similarity search.
- Bawa, Condie, et al.
- 2005
(Show Context)
Citation Context ...mory, but as discussed in Section 3.2, their method loses the guarantee on the approximation ratio. The locality-sensitive hash functions for lp norms are discovered by Datar et al. [15]. Bawa et al. =-=[4]-=- 570 propose a method to tune the parameters of LSH automatically. Their method, however, no longer ensures the same query performance as LSH unless the adopted hash function has a so-called “(, f()... |

53 | A Cost Model for Query Processing in High Dimensional Data Spaces’,
- Bohm
- 2000
(Show Context)
Citation Context ...high dimensional space. Many algorithms (e.g., those based on data or space partitioning indexes [20]) that perform nicely on low dimensional data, deteriorate rapidly as the dimensionality increases =-=[10, 36]-=-, and are eventually outperformed even by sequential scan. Research on high-dimensional NN search can be divided into exact and approximate retrieval. In the exact category, Lin et al. [32] propose th... |

53 | Abbadi. Approximate nearest neighbor searching in multimedia databases.
- Ferhatosmanoglu, Tuncel, et al.
- 2001
(Show Context)
Citation Context |

52 | Clustering for approximate similarity search in high-dimensional spaces
- Li, Chang, et al.
- 2002
(Show Context)
Citation Context |

51 | Entropy based nearest neighbor search in high dimensions. In SODA,
- Panigrahy
- 2006
(Show Context)
Citation Context |

45 | PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces.
- Ciaccia, Patella
- 2000
(Show Context)
Citation Context |

45 | On the ’dimensionlity curse’ and the ’self-similarity blessing
- KORN, PAGEL, et al.
- 2001
(Show Context)
Citation Context ...d VA-file. Chaudhuri and Gravano [12] perform NN search by converting it to range queries. In [8] Berchtold et al. provide a solution leveraging high-dimensional Voronoi diagrams, whereas Korn et al. =-=[28]-=- tackle the problem by utilizing the fractal dimensionality of the dataset. Koudas et al. [29] give a bitmap-based approach. The state of the art is due to Jagadish et al. [27]. They develop the iDist... |

37 |
Fast approximate similarity search in extremely high-dimensional data sets
- Houle, Sakuma
- 2005
(Show Context)
Citation Context |

36 | Indexing the solution space: A new technique for nearest neighbor search in highdimensional space
- BERCHTOLD, KEIM, et al.
(Show Context)
Citation Context ... on the idea of compression, Berchtold et al. [7] develop the IQ-tree, combining features of the R-tree and VA-file. Chaudhuri and Gravano [12] perform NN search by converting it to range queries. In =-=[8]-=- Berchtold et al. provide a solution leveraging high-dimensional Voronoi diagrams, whereas Korn et al. [28] tackle the problem by utilizing the fractal dimensionality of the dataset. Koudas et al. [29... |

35 | Density-based indexing for approximate nearestneighbor queries
- Bennett, Fayyad, et al.
- 1999
(Show Context)
Citation Context ...distance of two points. In this paper, we consider high-dimensional NN search. Some studies argue [9] that high-dimensional NN queries may not be meaningful. On the other hand, there is also evidence =-=[6]-=- that such an argument is based on restrictive assumptions. Intuitively, a meaningful query is one where the query point q is much closer to its NN than to most data points. This is true in many appli... |

29 |
Contrast plots and psphere trees: Space vs. time in nearest neighbour searches.
- Goldstein, Ramakrishnan
- 2000
(Show Context)
Citation Context ...earch, a majority of the query cost is spent on verifying a point as a real NN [6, 14]. Approximate retrieval improves efficiency by relaxing the precision of verification. Goldstein and Ramakrishnan =-=[22]-=- leverage the knowledge of the query distribution to balance the efficiency and result quality. Ferhatosmanoglu et al. [18] find NNs by examining only the interesting subspaces. Chen and Lin [13] comb... |

27 | Nearest Neighbor Retrieval Using Distance-Based Hashing.
- Athitsos, Potamias, et al.
- 2008
(Show Context)
Citation Context |

22 |
A sampling-based estimator for top-k query
- Chen, Ling
- 2002
(Show Context)
Citation Context ...hnan [22] leverage the knowledge of the query distribution to balance the efficiency and result quality. Ferhatosmanoglu et al. [18] find NNs by examining only the interesting subspaces. Chen and Lin =-=[13]-=- combine sampling with a reduction [12] to range search. Li et al. [31] first partition the dataset into clusters, and then prunes the irrelevant clusters according to their radii. Houle and Sakuma [2... |

18 | Ldc: Enabling search by partial distance in a hyper-dimensional space
- Koudas, Ooi, et al.
- 2004
(Show Context)
Citation Context ...[8] Berchtold et al. provide a solution leveraging high-dimensional Voronoi diagrams, whereas Korn et al. [28] tackle the problem by utilizing the fractal dimensionality of the dataset. Koudas et al. =-=[29]-=- give a bitmap-based approach. The state of the art is due to Jagadish et al. [27]. They develop the iDistance technique that converts high-dimensional points to 1D values, which are indexed using a B... |