## Dynamic VP-Tree Indexing for N-Nearest Neighbor Search Given Pair-Wise Distances (2000)

Venue: | VLDB Journal |

Citations: | 22 - 0 self |

### BibTeX

@ARTICLE{Fu00dynamicvp-tree,

author = {Ada Wai-chee Fu and Polly M. S. Chan and Yin-ling Cheung and Y. S. Moon},

title = {Dynamic VP-Tree Indexing for N-Nearest Neighbor Search Given Pair-Wise Distances},

journal = {VLDB Journal},

year = {2000},

volume = {9},

pages = {154--173}

}

### OpenURL

### Abstract

For some multimedia applications, it has been found that domain objects cannot be represented as feature vectors in a multidimensional space. Instead, pair-wise distances between data objects are the only input. To support content-based retrieval, one approach maps each object to a k-dimensional (k-d) point and tries to preserve the distances among the points. Then existing spatial access index methods such as the R-trees and KD-trees can support fast searching on the resulting k-d points. However, information loss is inevitable with such an approach since the distances between data objects can only be preserved to a certain extent. Here we investigate the use of distance-based indexing method. In particular we apply the vantage point tree (vp-tree) method. There are two important problems for the vp-tree method that are not studied in depth, the n-nearest neighbors search and the updating mechanisms. We propose an n-nearest neighbors search algorithm, which is shown by exper...

### Citations

2384 | R-trees: A dynamic index structure for spatial searching
- Guttman
- 1984
(Show Context)
Citation Context ...ts, for which there are many finetuned indexing methods available. In general, these methods are called multidimensional indexing or Spatial Access Methods (SAMs) [30]. Some of the previous works are =-=[2, 3, 4, 12, 13, 15, 19, 21, 25, 26, 28, 35, 38, 34]-=-. It has been found that the above setting cannot be applied to certain applications. For example, [24] cites the example of typed English words, where the similarity function is defined by the minimu... |

1248 |
The Design and Analysis of Spatial Data Structures
- Samet
- 1989
(Show Context)
Citation Context ...ing and retrieving k-dimensional points, for which there are many finetuned indexing methods available. In general, these methods are called multidimensional indexing or Spatial Access Methods (SAMs) =-=[30]-=-. Some of the previous works are [2, 3, 4, 12, 13, 15, 19, 21, 25, 26, 28, 35, 38, 34]. It has been found that the above setting cannot be applied to certain applications. For example, [24] cites the ... |

638 | An algorithm for finding best matches in logarithmic expected time
- Friedman, Bentley, et al.
- 1977
(Show Context)
Citation Context ...ts, for which there are many finetuned indexing methods available. In general, these methods are called multidimensional indexing or Spatial Access Methods (SAMs) [30]. Some of the previous works are =-=[2, 3, 4, 12, 13, 15, 19, 21, 25, 26, 28, 35, 38, 34]-=-. It has been found that the above setting cannot be applied to certain applications. For example, [24] cites the example of typed English words, where the similarity function is defined by the minimu... |

558 | M-tree: An efficient access method for similarity search in metric spaces
- Ciaccia, Patella, et al.
- 1997
(Show Context)
Citation Context ...e but our experimental results show that setting the infinite initial value can achieve better performance. Note that instead of a depth first traversal, we could adopt the method that is proposed in =-=[11]-=- where the best traversed node and its subtree is searched in each iteration. This could lead to better pruning power. However, the current method is simpler and our experimental results show that it ... |

554 |
The qbic project: Querying images by content using color, texture and shape
- Niblack, Barber, et al.
- 1908
(Show Context)
Citation Context ...f matching digitized voice excerpts, which include the consideration of time warping [31, 29]. The time warping problem occurs also in the similarity search in time series [5, 1]. For other examples, =-=[27]-=- describes a method of measuring the similarity between color images based on a color similarity matrix which takes into account the perceptual distance between different pairs of colors. For shape si... |

545 | The X-Tree: An Index Structure for High-Dimensional Data
- Berchtold, Keim, et al.
- 1996
(Show Context)
Citation Context ...ts, for which there are many finetuned indexing methods available. In general, these methods are called multidimensional indexing or Spatial Access Methods (SAMs) [30]. Some of the previous works are =-=[2, 3, 4, 12, 13, 15, 19, 21, 25, 26, 28, 35, 38, 34]-=-. It has been found that the above setting cannot be applied to certain applications. For example, [24] cites the example of typed English words, where the similarity function is defined by the minimu... |

520 | Nearest Neighbor Queries
- Roussopoulos, Kelley, et al.
- 1995
(Show Context)
Citation Context |

500 | Dynamic programming algorithm optimization for spoken word recognition
- Sakoe, Chiba
- 1978
(Show Context)
Citation Context ...e minimum number of insertion, deletions and substitutions to transform one string to another, and the example of of matching digitized voice excerpts, which include the consideration of time warping =-=[31, 29]-=-. The time warping problem occurs also in the similarity search in time series [5, 1]. For other examples, [27] describes a method of measuring the similarity between color images based on a color sim... |

498 |
The R* tree: An efficient and robust access method for points and rectangles
- Beckmann, Kriegel, et al.
- 1990
(Show Context)
Citation Context |

468 | Efficient and effective querying by image content
- Faloutsos, Barber, et al.
- 1994
(Show Context)
Citation Context ...er performance than the M -tree and the R -tree. The result also provide further support for the findings made by previous work that R -trees stop being efficient for dimensionalities greater than 20 =-=[14, 20, 6]-=-. 10 0 500 1000 1500 2000 2500 3000 3500 10 20 30 40 50 60 70 80 90 100 Dataset Size (K) Vp-tree 0 500 1000 1500 2000 2500 3000 3500 10 20 30 40 50 60 70 80 90 100 Dataset Size (K) Vp-tree R*-tree 0 5... |

434 | FastMap: A fast algorithm for indexing, data mining and visualization of traditional and multimedia datasets
- Faloutsos, Lin
- 1995
(Show Context)
Citation Context ...ods (SAMs) [30]. Some of the previous works are [2, 3, 4, 12, 13, 15, 19, 21, 25, 26, 28, 35, 38, 34]. It has been found that the above setting cannot be applied to certain applications. For example, =-=[24]-=- cites the example of typed English words, where the similarity function is defined by the minimum number of insertion, deletions and substitutions to transform one string to another, and the example ... |

421 |
Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison
- Sankoff, Kruskal
- 1983
(Show Context)
Citation Context ...e minimum number of insertion, deletions and substitutions to transform one string to another, and the example of of matching digitized voice excerpts, which include the consideration of time warping =-=[31, 29]-=-. The time warping problem occurs also in the similarity search in time series [5, 1]. For other examples, [27] describes a method of measuring the similarity between color images based on a color sim... |

401 | The sr-tree: An index structure for high-dimensional nearest neighbor queries
- Katayama, Satoh
- 1997
(Show Context)
Citation Context |

371 |
Multidimensional scaling
- Kruskal, Wish
- 1978
(Show Context)
Citation Context ...ts so that we can subsequently make use of Vector Space Model (VSM) [37] multidimensional indexing methods such as the R-tree and its variants. The FastMap algorithm [24] and Multidimensional Scaling =-=[23]-=- fall in this category. The main challenge for this approach is to preserve distances as well as possible. It is difficult to decide on a value of k and then map each domain object into a k-dimensiona... |

315 | Similarity Indexing with the SS-tree
- White, Jain
- 1996
(Show Context)
Citation Context |

304 | The R+ tree: A dynamic index for multi-dimensional objects
- Sellis, Roussopoulos, et al.
- 1987
(Show Context)
Citation Context ...ved sufficiently, some of the information that distinguishes the objects cannot be maintained. After the mapping, any one of a number of highly fine-tuned spatial access methods like the R-tree, etc. =-=[2, 4, 19, 25, 28, 33]-=- can be 2 v S 1 S 2 ( v,s) S S 1 2 Figure 1: Partitioning mechanism of the vantage-point tree method. employed to provide fast searching for range queries 1 and n-nearest neighbor queries 2 . FastMap ... |

293 | Data structures and algorithms for nearest neighbor search in general metric spaces
- Yianilos
- 1993
(Show Context)
Citation Context ...hbor search. Therefore we study an alternative approach that uses distance-based indexing, known as Metric Space Model (MSM) indexing. In particular we examine the Vantage-Point tree (vp-tree) method =-=[36, 39, 10]-=-. This approach can obviously save the overhead of inferring points in a multidimensional space, and can also avoid the difficulty in preserving distances. Our main contributions are the following. 1 ... |

273 |
Efficient Color Histogram Indexing for Quadratic Distance Functions
- Hafner, Sawhney, et al.
- 1995
(Show Context)
Citation Context ...er performance than the M -tree and the R -tree. The result also provide further support for the findings made by previous work that R -trees stop being efficient for dimensionalities greater than 20 =-=[14, 20, 6]-=-. 10 0 500 1000 1500 2000 2500 3000 3500 10 20 30 40 50 60 70 80 90 100 Dataset Size (K) Vp-tree 0 500 1000 1500 2000 2500 3000 3500 10 20 30 40 50 60 70 80 90 100 Dataset Size (K) Vp-tree R*-tree 0 5... |

208 | The TV-Tree: An Index Structure for HighDimensional Data
- Lin, Jagadish, et al.
- 1994
(Show Context)
Citation Context |

191 | The hB-tree: A multiattribute indexing method with good guaranteed performance
- Lomet, Salzberg
- 1990
(Show Context)
Citation Context |

189 | Near neighbor search in large metric spaces
- Brin
- 1995
(Show Context)
Citation Context ... results in terms of reduction in distance computation are similar for the different methods. For future work, we may investigate some other methods of vantage point selection. The major criticism of =-=[7]-=- about vp-trees is that the region inside the median sphere and the region outside the median sphere are extremely asymmetric, and since volume grows rapidly as the radius of a sphere increases, the o... |

180 |
Satisfying general proximity/similarity queries with metric trees
- Uhlmann
- 1991
(Show Context)
Citation Context ...hbor search. Therefore we study an alternative approach that uses distance-based indexing, known as Metric Space Model (MSM) indexing. In particular we examine the Vantage-Point tree (vp-tree) method =-=[36, 39, 10]-=-. This approach can obviously save the overhead of inferring points in a multidimensional space, and can also avoid the difficulty in preserving distances. Our main contributions are the following. 1 ... |

176 | Optimal Multi-Step k-Nearest Neighbor Search
- Seidl, Kriegel
- 1998
(Show Context)
Citation Context ...between objects and the preservation problem gets worse when k is getting smaller. There are enhancements proposed on the R-tree based methods, such as the multi-step k-Nearest Neighbor Search method =-=[32, 22]-=-. However, these are enhancement on the efficiency of the method and would not help in enhancing the accuracy of the method when the data is subjected to the above information loss. 3.1 Discussion Fas... |

154 | A model for the prediction of r-tree performance
- Theodoridis, Sellis
- 1996
(Show Context)
Citation Context |

131 |
Some approaches to best-match file searching
- Burkhard, Keller
- 1973
(Show Context)
Citation Context ...sed Index Structures Quite a number of distance-based indexing structures have been proposed. A summary of some of these methods can be found in [6, 7]. Previous work includes techniques suggested in =-=[8]-=-, which contains some of the basic ideas for later methods, the generalized hyperplane tree (gh-tree) [36], the vantage point tree (vptree) [36, 39, 10], the Geometric Near-neighbor Access Tree (GNAT)... |

127 |
Contentbased image retrieval systems
- GUDIVADA, RAGHAVAN
- 1995
(Show Context)
Citation Context ...-tree, M -tree, vp-tree. 0 1 Introduction With the advent of large-scale multimedia database systems, there is a need to efficiently answer user queries. Content-based retrieval is typically required =-=[18]-=-. One advantage of such an approach is that it bypasses the difficult problem of specifying the desired multimedia objects in terms of formal query languages. A popular form of content-based queries e... |

122 | Distance-Based Indexing for High-Dimensional Metric Spaces
- Bozkaya, Ozsoyoglu
- 1997
(Show Context)
Citation Context ...hbor search with R -tree and M -tree by experiments, and show that the search of vp-tree is considerably more efficient. 2. The update problem has also been left open for the vp-tree and its variants =-=[6]-=-. We propose mechanisms for update operations on the vp-tree. We investigate two alternatives in the insert operation: splitfirst and redistribute-first techniques; and two alternatives in the delete ... |

116 | Finding Patterns in Time Series: A Dynamic Programming Approach, in Advances in knowledge discovery and data - Berndt, Clifford - 1996 |

105 | Fast Nearest Neighbor Search in Medical Image Databases
- Korn, Sidiropoulos, et al.
- 1996
(Show Context)
Citation Context ...between objects and the preservation problem gets worse when k is getting smaller. There are enhancements proposed on the R-tree based methods, such as the multi-step k-Nearest Neighbor Search method =-=[32, 22]-=-. However, these are enhancement on the efficiency of the method and would not help in enhancing the accuracy of the method when the data is subjected to the above information loss. 3.1 Discussion Fas... |

75 | Efficient user-adaptable similarity search in large multimedia databases
- Seidl, Kriegel
- 1997
(Show Context)
Citation Context |

74 | Content-Based Image Indexing
- Chiueh
- 1994
(Show Context)
Citation Context ...hbor search. Therefore we study an alternative approach that uses distance-based indexing, known as Metric Space Model (MSM) indexing. In particular we examine the Vantage-Point tree (vp-tree) method =-=[36, 39, 10]-=-. This approach can obviously save the overhead of inferring points in a multidimensional space, and can also avoid the difficulty in preserving distances. Our main contributions are the following. 1 ... |

71 |
Algorithms and strategies for similarity retrieval
- White, Jain
- 1996
(Show Context)
Citation Context ...er [17]. In still other applications, it may be relatively easier for a domain expert to assess the similarity or distance between two objects rather than giving a computable definition of similarity =-=[37]-=-. In all of the above applications, we are given the similarities or distances between pairs of data. In most cases, the distances are metric, meaning that the triangle inequality property applies, an... |

63 | Fast parallel similarity search in multimedia databases
- Berchtold, Böhm, et al.
- 1997
(Show Context)
Citation Context |

37 | chee Fu. Enhanced nearest neighbour search on the R-tree
- Cheung, W
- 1998
(Show Context)
Citation Context ...ree and the R -tree. We implemented the algorithms for the R -tree by Berchtold, Keim and Kriegel [4], which is able to support n-nearest neighbor queries, and enhanced it with the method proposed in =-=[9]-=-. We also implemented the n-nearest neighbor search algorithms for the M -tree. Next we describe the setup, as well as our results and observations. 4.1.1 Experimental Setup We used 2 synthetic datase... |

27 |
Index-based object recognition in pictorial data management
- Groskey, Mehrotra
- 1990
(Show Context)
Citation Context ...es. Another method represents a shape in terms of its boundaries, and a string-edit-distance-based similarity measure is employed based on the number of changes required to transform one to the other =-=[17]-=-. In still other applications, it may be relatively easier for a domain expert to assess the similarity or distancesA.W. Fu et al.: Dynamic vp-tree indexing for n-nearest neighbor search given pair-wi... |

15 |
Indexed-Based Object Recognition
- Grosky, Mehrotra
- 1990
(Show Context)
Citation Context ...es. Another method represents a shape in terms of its boundaries, and a string-edit-distance-based similarity measure is employed based on the number of changes required to transform one to the other =-=[17]-=-. In still other applications, it may be relatively easier for a domain expert to assess the similarity or distance between two objects rather than giving a computable definition of similarity [37]. I... |

8 | The hB-Pi-Tree: A multi-attribute index supporting concurrency, recovery and node consolidation
- Evangelidis, Lomet, et al.
- 1997
(Show Context)
Citation Context |

7 |
A.: Speech Recognition by Machine
- Ainsworth
- 1988
(Show Context)
Citation Context ...nother, and the example of of matching digitized voice excerpts, which include the consideration of time warping [31, 29]. The time warping problem occurs also in the similarity search in time series =-=[5, 1]-=-. For other examples, [27] describes a method of measuring the similarity between color images based on a color similarity matrix which takes into account the perceptual distance between different pai... |

5 |
1977]: An Algorithm for Finding Best Matches in Logarithmic Expected Time
- Friedman, Bentley, et al.
(Show Context)
Citation Context ...s, for which there are many fine-tuned indexing methods available. In general, these methods are called multidimensional indexing or spatial access methods (SAMs) [30]. Some of the previous works are =-=[2, 3, 4, 12, 13, 15, 19, 21, 25, 26, 28, 35, 38, 34]-=-. It has been found that the above setting cannot be applied to certain applications. For example, [24] cites the example of typed English words, where the similarity function is defined by the minimu... |

1 |
The hB II -tree: a Modified bB-tree supporting concurrency, recovery and node consolidation
- Evangelidis, Lomet, et al.
- 1995
(Show Context)
Citation Context |

1 |
Enhancements on the R-tree to support efficient similarity search in highdimensional space
- Fu, Cheung
- 1997
(Show Context)
Citation Context ...rther investigation to see if this is true. Finally, we would like to investigate improvements on the vp-tree in the future. There have been enhancements on the R -tree such as introducing redundancy =-=[16]-=- and the introduction of some flavor of linear search as in the X-tree [4]. We can try to capture the ideas behind such enhancements and apply them to the vp-tree. Acknowledgments We are very thankful... |