## Density-based indexing for approximate nearest-neighbor queries (1999)

### Cached

### Download Links

- [www.cs.technion.ac.il]
- [ftp.research.microsoft.com]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc. KDD |

Citations: | 33 - 2 self |

### BibTeX

@INPROCEEDINGS{Bennett99density-basedindexing,

author = {Kristin P. Bennett and Usama Fayyad and Dan Geiger},

title = {Density-based indexing for approximate nearest-neighbor queries},

booktitle = {In Proc. KDD},

year = {1999}

}

### Years of Citing Articles

### OpenURL

### Abstract

We consider the problem of performing Nearest-neighbor queries efficiently over large high-dimensional databases. To avoid a full database scan, we target constructing a multidimensional index structure. It is well-accepted that traditional database indexing algorithms fail for high-dimensional data (say d> 10 or 20 depending on the scheme). Some arguments have advocated that nearest-neighbor queries do not even make sense for high-dimensional data. We show that these arguments are based on over-restrictive assumptions, and that in the general case it is meaningful and possible to build an index for such queries. Our approach, called DBIN, scales to high-dimensional databases by exploiting statistical properties of the data. The approach is based on statistically modeling the density of the content of the data table. DBIN uses the density model to derive a single index over the data table and requires physically rewriting data in a new table sorted by the newly created index (i.e. create a clustered-index). The indexing scheme produces a mapping between a query point (a data record) and an ordering on the clustered index values. Data is then scanned according to the index. We present theoretical and empirical justification for DBIN. The scheme supports a family of distance functions which includes the traditional Euclidean distance measure. 1