## The TV-tree -- an index structure for high-dimensional data (1994)

Venue: | VLDB Journal |

Citations: | 195 - 7 self |

### BibTeX

@ARTICLE{Lin94thetv-tree,

author = {King-ip Lin and H. V. Jagadish and Christos Faloutsos},

title = {The TV-tree -- an index structure for high-dimensional data},

journal = {VLDB Journal},

year = {1994},

volume = {3},

pages = {517--542}

}

### Years of Citing Articles

### OpenURL

### Abstract

We propose a file structure to index high-dimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such `varying length' feature vectors. Finally we report simulation results, comparing the proposed structure with the R -tree, which is one of the most successful methods for low-dimensionality spaces. The results illustrate the superiority of our method, with up to 80% savings in disk accesses. Type of Contribution: New Index Structure, for high-dimensionality feature spaces. Algorithms and performance measurements. Keywords: Spatial Index, Similarity Retrieval, Query by Content 1 Introduction Many applications require enhanced indexing, capable of performing similarity searching on several, non-traditional (`exotic') data types. The targ...

### Citations

3779 |
Basic local alignment search tool
- Altschul, Gish, et al.
- 1990
(Show Context)
Citation Context ... databases where there is a large collection of strings from a four-letter alphabet (A,G,C,T); a new string has to be matched against the old strings, to find the best candidates. The BLAST algorithm =-=[2]-=- uses successive overlapping n-grams to index on. Regarding n-grams as features, we need 4 n features or 1,024 features for n=5. ffl Searching for names or addresses, say in a customer (mailing) list,... |

2649 |
Introduction to Statistical Pattern Recognitionâ€ť, 2nd edition
- Fukunaga
- 1990
(Show Context)
Citation Context ...ower. Figure 1 gives a 2-d example, where the vectors k1 and k2 are the results of the Karhunen Loeve transform on the illustrated set of points. For more details on the Karhunen Loeve transform, see =-=[11]-=-. The Karhunen Loeve transform is optimal if the set of data is known in advance, that is, the transform is data-dependent. Sets of data with rare or no updates appear in real applications: for exampl... |

2219 | R-Trees: A Dynamic Index Structure for Spatial Searching
- Guttman
- 1984
(Show Context)
Citation Context ...er of features per object may be of the order of 10 or 100. The spatial access methods of the past have mainly concentrated on 2-dimensional and 3-dimensional spaces, such as the R-tree based methods =-=[14]-=-, and the linear-quadtree based ones (eg., the z-ordering [28]). Although conceptually they can be extended to higher dimensionalities, they usually require time and/or space that grows exponentially ... |

1500 |
A k-means clustering algorithm
- Hartigan, Wong
(Show Context)
Citation Context ...iral search' method by Bentley and Weide [7] also has a complexity that grows exponentially with the dimensionality. Relevant to our work is a wide variety of clustering algorithms (see, for example, =-=[15, 30, 24]-=- for surveys). However, their main goals are to detect patterns in the data, and/or to assess the quality of the clustering scheme using the `precision' and `recall' measures; 4 there is usually littl... |

1174 |
The Design and Analysis of Spatial Data Structures
- Samet
- 1990
(Show Context)
Citation Context ... in feature space; these points must be stored in a spatial access method. The prevailing methods form three classes: (a) R -trees [6] and the rest of the R-tree family [14, 18]; (b) linear quadtrees =-=[31]-=-; and (c) grid-files [27]. Different kinds of queries arise; the most typical ones are listed below: ffl Exact match queries. Find if a given query object is in the database. For example, check if a c... |

980 | The R*-tree: An efficient and robust access method for points and rectangles
- Beckmann, Kriegel, et al.
- 1990
(Show Context)
Citation Context ...of applications, feature extraction functions map objects into points in feature space; these points must be stored in a spatial access method. The prevailing methods form three classes: (a) R -trees =-=[6]-=- and the rest of the R-tree family [14, 18]; (b) linear quadtrees [31]; and (c) grid-files [27]. Different kinds of queries arise; the most typical ones are listed below: ffl Exact match queries. Find... |

871 | The JPEG Still Picture Compression Standard
- Wallace
- 1991
(Show Context)
Citation Context ...vance, and if the new data have the same statistical characteristics as the old ones. In a completely dynamic case, we have to resort to data-independent transforms, such as the Discrete Cosine (DCT) =-=[33]-=- the Discrete Fourier (DFT), the Hadamard, the wavelet [29] transform etc. Fortunately, many data-independent transforms will perform as well as the Karhunen Loeve if the data follows specific statist... |

566 | Voronoi Diagrams - A Survey of a Fundamental Geometric Data Structure
- Aurenhammer
- 1991
(Show Context)
Citation Context ...ed list. Similar problems with high dimensionality have been reported for methods that mainly focus on nearest-neighbor queries: Voronoi diagrams do not work at all for dimensionalities higher than 3 =-=[5]-=-. The method by Friedman et al. [10] does almost as much work as linear scanning for dimensionalities ? 9. The `spiral search' method by Bentley and Weide [7] also has a complexity that grows exponent... |

535 |
The qbic project: Querying image by content using color, texture, and shape
- Niblack, Barber, et al.
- 1993
(Show Context)
Citation Context ... such a system follows: ffl Image databases: In [19] we showed how to query for similar shapes, describing each shape by the co-ordinates of a few rectangles that cover it (20 features per shape). In =-=[26]-=- we supported queries on color, shape and texture. For colors, we used the color histograms (64-256 attributes per image) as feature vectors; for shapes we used the first 20 moments. ffl Medical datab... |

414 | Efficient similarity search in sequence databases
- Agrawal, Faloutsos, et al.
- 1993
(Show Context)
Citation Context ...ng and research purposes. ffl Time series, such as financial databases with stock-price movements. The goal is to aid forecasting, by examining similar patterns that may have appeared in the past. In =-=[1]-=- we used as features the co-efficients of the Discrete Fourier Transform (DFT). ffl Multimedia databases, with audio (voice, music), video etc. [25]. Users might want to retrieve, eg., similar music s... |

383 | The grid file: an adaptable, symmetric multikey file structure
- Nievergelt, Hinterberger, et al.
- 1984
(Show Context)
Citation Context ...points must be stored in a spatial access method. The prevailing methods form three classes: (a) R -trees [6] and the rest of the R-tree family [14, 18]; (b) linear quadtrees [31]; and (c) grid-files =-=[27]-=-. Different kinds of queries arise; the most typical ones are listed below: ffl Exact match queries. Find if a given query object is in the database. For example, check if a certain inventory item exi... |

352 |
Automatically correcting words in text
- Kukich
(Show Context)
Citation Context ...e are partially specified or have errors. For example "1234 Springs Road" instead of "1235 Spring Rd", or "Mr. John Smith" instead of "Dr. J. Smith, Jr." Simila=-=r applications include spelling, typing [22]-=- and OCR error correction [20]. There, given a wrong string, we should search a dictionary to find the closest strings to it. Triplets of letters are often used [3] to assess the similarity of two wor... |

324 | Efficient processing of spatial joins using r-trees
- Brinkhoff, Kriegel, et al.
- 1993
(Show Context)
Citation Context ... to each other (eg, closer than a tolerance ffl). Again, a recursive algorithm that prunes out remote branches of the tree can be used; efficient improvements on this algorithm have recently appeared =-=[8]-=-. Similarly, nearest-neighbor queries can be handled with a branch-and-bound algorithm, as in [12]. The algorithm works as follows: given a (query)(query) point, examine the toplevel branches, and com... |

202 |
The analysis of Time-Series: An introduction
- Chatfield
(Show Context)
Citation Context ...for indexing). Specifically, the amplitude spectrum is approximately O(f \Gamma1 ), where f is the frequency). Stock movements and exchange rates have been successfully modeled as random walks (e.g., =-=[9, 23]-=-). Birkhoff's theory [32] claims that `interesting' signals, such as musical scores and other works of art, consist of pink noise, whose spectrum is similarly skewed (O(f \Gamma0:5 )). In general, if ... |

198 |
The Fractal Geometry of Nature (W.H
- Mandelbrot
- 1982
(Show Context)
Citation Context ...for indexing). Specifically, the amplitude spectrum is approximately O(f \Gamma1 ), where f is the frequency). Stock movements and exchange rates have been successfully modeled as random walks (e.g., =-=[9, 23]-=-). Birkhoff's theory [32] claims that `interesting' signals, such as musical scores and other works of art, consist of pink noise, whose spectrum is similarly skewed (O(f \Gamma0:5 )). In general, if ... |

185 | Hilbert R-tree: An improved R-tree using fractals
- Kamel, Faloutsos
- 1994
(Show Context)
Citation Context ...amonds. Ordering can be done in a few different ways. We have implemented one that sorts the vectors lexicographically. Other orderings, like some form of space-filling curves (e.g. the Hilbert curve =-=[21]-=-) can also be used. Algorithm 3 Splitting by ordering begin /* assume N is a internal node; similar for leaf nodes */ /* min f ill is the minimum percentage (in bytes) of the node to be occupied */ Pr... |

129 |
A Retrieval Technique for Similar Shapes
- Jagadish
- 1991
(Show Context)
Citation Context ...le object. We rely on a domain expert to provide the appropriate similarity/distance functions between two objects. A list of potential applications for such a system follows: ffl Image databases: In =-=[19]-=- we showed how to query for similar shapes, describing each shape by the co-ordinates of a few rectangles that cover it (20 features per shape). In [26] we supported queries on color, shape and textur... |

118 |
A branch and bound algorithm for computing k-nearest neighbors
- Fukunaga, Narendra
- 1975
(Show Context)
Citation Context ...mote branches of the tree can be used; efficient improvements on this algorithm have recently appeared [8]. Similarly, nearest-neighbor queries can be handled with a branch-and-bound algorithm, as in =-=[12]-=-. The algorithm works as follows: given a (query)(query) point, examine the toplevel branches, and compute upper and lower bounds for the distance; descend the most promising branch, disregarding bran... |

106 |
A survey of recent advances in hierarchical clustering algorithms
- Murtagh
- 1983
(Show Context)
Citation Context ...iral search' method by Bentley and Weide [7] also has a complexity that grows exponentially with the dimensionality. Relevant to our work is a wide variety of clustering algorithms (see, for example, =-=[15, 30, 24]-=- for surveys). However, their main goals are to detect patterns in the data, and/or to assess the quality of the clustering scheme using the `precision' and `recall' measures; 4 there is usually littl... |

104 |
PROBE Spatial Data Modeling and Query Processing in an Image Database Application
- Orenstein, Manola
- 1988
(Show Context)
Citation Context ...e spatial access methods of the past have mainly concentrated on 2-dimensional and 3-dimensional spaces, such as the R-tree based methods [14], and the linear-quadtree based ones (eg., the z-ordering =-=[28]-=-). Although conceptually they can be extended to higher dimensionalities, they usually require time and/or space that grows exponentially with the dimensionality. In this paper we propose a tree-struc... |

88 | A.C.: Optimal expected-time algorithms for closest point problems
- Bentley, Weide, et al.
- 1980
(Show Context)
Citation Context ...at all for dimensionalities higher than 3 [5]. The method by Friedman et al. [10] does almost as much work as linear scanning for dimensionalities ? 9. The `spiral search' method by Bentley and Weide =-=[7]-=- also has a complexity that grows exponentially with the dimensionality. Relevant to our work is a wide variety of clustering algorithms (see, for example, [15, 30, 24] for surveys). However, their ma... |

63 |
An Algorithm for Finding Nearest Neighbors
- Friedman, Baskett, et al.
- 1975
(Show Context)
Citation Context ... dimensionality have been reported for methods that mainly focus on nearest-neighbor queries: Voronoi diagrams do not work at all for dimensionalities higher than 3 [5]. The method by Friedman et al. =-=[10]-=- does almost as much work as linear scanning for dimensionalities ? 9. The `spiral search' method by Bentley and Weide [7] also has a complexity that grows exponentially with the dimensionality. Relev... |

63 |
An implementation and performance analysis of spatial data access methods
- Greene
- 1989
(Show Context)
Citation Context ...at contain a certain pattern; or we want to find all x-ray images that contain tissue with tumor-like texture. Previous work comparing the performance of different spatial data structures appeared in =-=[13, 16]-=-. [13] compared the R-tree, R+-tree and the K-D-B tree and the 2-D ISAM and it concluded that the R-tree and the R-+ tree give the better performance. In [16] the PMR-quadtree is compared to the R-tre... |

61 |
Operations on images using quad trees
- Hunter, Steiglitz
- 1979
(Show Context)
Citation Context ...exing methods explode exponentially with the dimensionality, eventually reducing to sequential scanning. For linear quadtrees, the effort is proportional to the hypersurface bounding the query region =-=[17]-=-; the hypersurface grows exponentially with the dimensionality. Grid files face similar problems, since they require a directory that grows exponentially with the dimensionality. The R-tree and its va... |

55 |
Automatic spelling correction using a trigram similarity measure
- Willett, Angell
- 1983
(Show Context)
Citation Context ...lications include spelling, typing [22] and OCR error correction [20]. There, given a wrong string, we should search a dictionary to find the closest strings to it. Triplets of letters are often used =-=[3]-=- to assess the similarity of two words, in which case we have at leasts26 3 = 17,576 features per word (assuming that words consist exclusively of the 26 English letters, ignoring digits, upper-case l... |

52 |
Spatial Search with Polyhedra
- Jagadish
- 1990
(Show Context)
Citation Context ...nctions map objects into points in feature space; these points must be stored in a spatial access method. The prevailing methods form three classes: (a) R -trees [6] and the rest of the R-tree family =-=[14, 18]-=-; (b) linear quadtrees [31]; and (c) grid-files [27]. Different kinds of queries arise; the most typical ones are listed below: ffl Exact match queries. Find if a given query object is in the database... |

46 | A Qualitative Comparison Study of Data Structures for
- Hoel, Samet
- 1992
(Show Context)
Citation Context ...at contain a certain pattern; or we want to find all x-ray images that contain tissue with tumor-like texture. Previous work comparing the performance of different spatial data structures appeared in =-=[13, 16]-=-. [13] compared the R-tree, R+-tree and the K-D-B tree and the 2-D ISAM and it concluded that the R-tree and the R-+ tree give the better performance. In [16] the PMR-quadtree is compared to the R-tre... |

42 |
Laws: Minutes from an Infinite Paradise
- Fractals, Power
- 1991
(Show Context)
Citation Context ... the amplitude spectrum is approximately O(f \Gamma1 ), where f is the frequency). Stock movements and exchange rates have been successfully modeled as random walks (e.g., [9, 23]). Birkhoff's theory =-=[32]-=- claims that `interesting' signals, such as musical scores and other works of art, consist of pink noise, whose spectrum is similarly skewed (O(f \Gamma0:5 )). In general, if the statistical propertie... |

29 |
Generation and search of clustered files
- SALTON, WONG
- 1978
(Show Context)
Citation Context ...iral search' method by Bentley and Weide [7] also has a complexity that grows exponentially with the dimensionality. Relevant to our work is a wide variety of clustering algorithms (see, for example, =-=[15, 30, 24]-=- for surveys). However, their main goals are to detect patterns in the data, and/or to assess the quality of the clustering scheme using the `precision' and `recall' measures; 4 there is usually littl... |

22 | Qbism: A Prototype 3-D Medical Image Database System
- Arya, Cody, et al.
- 1993
(Show Context)
Citation Context ...attributes per image) as feature vectors; for shapes we used the first 20 moments. ffl Medical databases, where 1-d objects (eg., ECGs), 2-d images (eg., X-rays) and 3-d images (eg., MRI brain scans) =-=[4]-=- are stored. Ability to retrieve quickly past cases with similar symptoms would be valuable for diagnosis, as well as for medical teaching and research purposes. ffl Time series, such as financial dat... |

18 |
Integrating multiple knowledge sources in a bayesian ocr post-processor
- Jones, Story, et al.
- 1991
(Show Context)
Citation Context ...ave errors. For example "1234 Springs Road" instead of "1235 Spring Rd", or "Mr. John Smith" instead of "Dr. J. Smith, Jr." Similar applications include spellin=-=g, typing [22] and OCR error correction [20]-=-. There, given a wrong string, we should search a dictionary to find the closest strings to it. Triplets of letters are often used [3] to assess the similarity of two words, in which case we have at l... |

6 |
Narasimhalu and Stavros Christodoulakis. Multimedia information systems: the unfolding of a reality
- Desai
- 1991
(Show Context)
Citation Context ...ilar patterns that may have appeared in the past. In [1] we used as features the co-efficients of the Discrete Fourier Transform (DFT). ffl Multimedia databases, with audio (voice, music), video etc. =-=[25]-=-. Users might want to retrieve, eg., similar music scores, or video clips. ffl DNA databases where there is a large collection of strings from a four-letter alphabet (A,G,C,T); a new string has to be ... |