#### DMCA

## FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets (1995)

### Cached

### Download Links

- [www.cs.umd.edu]
- [www.cs.cmu.edu]
- [drum.lib.umd.edu]
- [repository.cmu.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 493 - 23 self |

### Citations

3944 |
Introduction to Modern Information Retrieval
- Salton, McGill
- 1985
(Show Context)
Citation Context ...Finally, our work could be benecial to research on clustering algorithms, where several approaches have been proposed. See, eg., [Mur83], [Har75] for surveys, [NH94] for a recent application in GIS, =-=[SM83]-=- [VR79] for applications in Information Retrieval. 3 Proposed Method In thesrst part, we describe the proposed algorithm, which achieves a fast mapping of objects into points, so that distances are pr... |

3703 |
Introduction to Statistical Pattern Recognition
- Fukunaga
- 1990
(Show Context)
Citation Context ...tensively in statistical pattern recognition and matrix algebra. The optimal way to map n-dimensional points to k-dimensional points (k n) is the Karhunen-Loeve (`K-L') transform (eg., see [DH73], =-=[Fuk90]-=-). K-L is optimal in the sense that it minimizes the mean square error, where the error is the distance between each n-d point and its k-d image. Figure 1 shows a set of 2-d points, and the correspond... |

2713 | R-trees: a dynamic index structure for spatial searching
- Guttman
- 1984
(Show Context)
Citation Context ...which, by denition, is a method that can handle k-dimensional points, rectangles, or even more complicated shapes. The most popular methods form three classes: (a) tree-based methods like the R-tree =-=[Gut84]-=-, and its variants (R + -tree [SRF87], hB-tree [LS90], P-tree [Jag90a], R -tree [BKSS90], Hilbert R-trees [KF94] 6 Symbols Denitions. N Number of objects in database n dimensionality of original sp... |

2207 |
Clustering algorithms
- Hartigan
- 1995
(Show Context)
Citation Context ... in `target space', nor to provide a tool for visualization. Finally, our work could be benecial to research on clustering algorithms, where several approaches have been proposed. See, eg., [Mur83], =-=[Har75]-=- for surveys, [NH94] for a recent application in GIS, [SM83] [VR79] for applications in Information Retrieval. 3 Proposed Method In thesrst part, we describe the proposed algorithm, which achieves a f... |

998 | Linear Algebra and Its Applications - Strang - 1988 |

549 |
Multidimensional scaling
- Kruskal, Wish
- 1978
(Show Context)
Citation Context ...Several generalizations and extensions have been proposed to the above basic algorithm: Kruskal [KW78] proposed a method that automatically determines a good value for k; Shepard [She62], and Kruskal =-=[Kru64]-=- proposed the non-metric MDS where the distance between items are specied qualitatively; Young [You87] describes the individual dierence MDS, which incorporates multiple distance measures, correspon... |

469 |
Time Warps, String Edits and Macromolecules: the Theory and Practice of Sequence Comparison
- Sankoff, Kruskal
- 1983
(Show Context)
Citation Context ...stitutions to transform one string to the other). It is not clear which the features should be in this case. Similarly, in matching digitized voice excerpts, we typically have to do some time-warping =-=[SK83]-=-, which makes it dicult to design feature-extraction functions. Overcoming these diculties is exactly the motivation behind this work. Generalizing the approach by Jagadish, we try to map objects in... |

458 |
Techniques for automatically correcting words in text
- Kukich
- 1992
(Show Context)
Citation Context ...ast. In [AFS93] we used the Euclidean distance (sum of squared errors) as the distance function between two time series. Similarity searching in string databases, as in the case of spelling, typing =-=[Kuk92]-=- and OCR error correction [JSB91]. There, given a wrong string, we should search a dictionary tosnd the closest strings to it. Conceptually identical is the case of approximate matching in DNA databas... |

395 |
The analysis of proximities: multidimensional scaling with an unknown distance function
- Shepard
- 1962
(Show Context)
Citation Context ...re given as numbers. Several generalizations and extensions have been proposed to the above basic algorithm: Kruskal [KW78] proposed a method that automatically determines a good value for k; Shepard =-=[She62]-=-, and Kruskal [Kru64] proposed the non-metric MDS where the distance between items are specied qualitatively; Young [You87] describes the individual dierence MDS, which incorporates multiple distanc... |

364 | Efficient processing of spatial joins using R-trees - Brinkhoff, Kriegel, et al. - 1993 |

310 |
Multidimensional scaling: I. theory and method
- Torgerson
- 1952
(Show Context)
Citation Context ...otential clusters, correlations among attributes and other regularities that data-mining is looking for. We introduce an older method from pattern recognition, namely, Multi-Dimensional Scaling (MDS) =-=[Tor52]-=-; although unsuitable for indexing, we use it as yardstick for our method. Then, we propose a much faster algorithm to solve the problem in hand, while in addition it allows for indexing. Experiments ... |

306 | Faloutsos C.: The R + -Tree: A Dynamic Index for Multi-Dimensional Objects
- Sellis, Roussopoulos
- 1987
(Show Context)
Citation Context ... can handle k-dimensional points, rectangles, or even more complicated shapes. The most popular methods form three classes: (a) tree-based methods like the R-tree [Gut84], and its variants (R + -tree =-=[SRF87]-=-, hB-tree [LS90], P-tree [Jag90a], R -tree [BKSS90], Hilbert R-trees [KF94] 6 Symbols Denitions. N Number of objects in database n dimensionality of original space (`features' case only) k dimensio... |

276 |
Personalized information delivery: An analysis of information filtering methods
- Foltz, Dumais
- 1992
(Show Context)
Citation Context ...ll on the `distance' case even in the `features' case, it may be slow for large databases (N 1) with many attributes (n 1) The latter situation appears, eg., in information retrieval andsltering =-=[FD92]-=-, [Dum94], where documents correspond to V -dimensional vectors (V being the vocabulary size of the collection, typically in the tens of thousands). Sub-section 3.3 provides such an example. 2.3 Retri... |

231 |
Pattern Classi and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...udied extensively in statistical pattern recognition and matrix algebra. The optimal way to map n-dimensional points to k-dimensional points (k n) is the Karhunen-Loeve (`K-L') transform (eg., see =-=[DH73]-=-, [Fuk90]). K-L is optimal in the sense that it minimizes the mean square error, where the error is the distance between each n-d point and its k-d image. Figure 1 shows a set of 2-d points, and the c... |

221 | Hilbert R-tree: An Improved R-tree using Fractals
- Kamel
- 1994
(Show Context)
Citation Context ... The most popular methods form three classes: (a) tree-based methods like the R-tree [Gut84], and its variants (R + -tree [SRF87], hB-tree [LS90], P-tree [Jag90a], R -tree [BKSS90], Hilbert R-trees =-=[KF94]-=- 6 Symbols Denitions. N Number of objects in database n dimensionality of original space (`features' case only) k dimensionality of `target space' D(; ) the distance function between two objects jj... |

216 | Clustering of objects with multiple attributes - Linear - 1990 |

188 | The hB-tree: a multiattribute indexing method with good guaranteed performance
- Lomet, Salzberg
- 1990
(Show Context)
Citation Context ...ensional points, rectangles, or even more complicated shapes. The most popular methods form three classes: (a) tree-based methods like the R-tree [Gut84], and its variants (R + -tree [SRF87], hB-tree =-=[LS90]-=-, P-tree [Jag90a], R -tree [BKSS90], Hilbert R-trees [KF94] 6 Symbols Denitions. N Number of objects in database n dimensionality of original space (`features' case only) k dimensionality of `targe... |

176 |
Spatial Query Processing in an Object-Oriented Database Sytem
- Orenstein
- 1986
(Show Context)
Citation Context ...es two major benets: 1. It can accelerate the search time for queries. The reason is that we can employ highlysne-tuned Spatial Access Methods (SAMs), like the R -trees [BKSS90] and the z-ordering =-=[Ore86]-=-. These methods provide fast searching for range queries as well as spatial joins [BKSS94]. 2. it can help with visualization, clustering and data-mining: Plotting objects as points in k=2 or 3 dimens... |

170 |
An effective way to represent quadtrees
- GARGANTINI
- 1982
(Show Context)
Citation Context ...e function between two objects jj~xjj 2 the length (= L 2 norm) of vector ~x (AB) the length of the line segment AB Table 1: Summary of Symbols and Denitions etc.) (b) methods using linear quadtrees =-=[Gar82]-=- or, equivalently, the z-ordering [Ore86, Ore90], or other space-lling curves [FR89, Jag90b] andsnally (c) methods that use grid-les [NHS84, HN83]. There are also retrieval methods for the case wher... |

167 |
A survey of recent advances in hierarchical clustering algorithms. The Computer Journal, 26(4):354–359
- Murtagh
- 1983
(Show Context)
Citation Context ...to points in `target space', nor to provide a tool for visualization. Finally, our work could be benecial to research on clustering algorithms, where several approaches have been proposed. See, eg., =-=[Mur83]-=-, [Har75] for surveys, [NH94] for a recent application in GIS, [SM83] [VR79] for applications in Information Retrieval. 3 Proposed Method In thesrst part, we describe the proposed algorithm, which ach... |

163 | Fractals for secondary key retrieval - Faloutsos, Roseman - 1989 |

149 |
Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases
- Agrawal
- 1993
(Show Context)
Citation Context ...ications, the distance is typically the editing distance ie., minimum number of insertions, deletions or substitutions that are needed to transform thesrst string to the second. Data mining [AS94], =-=[AIS93]-=- and visualization applications. For example, given records of patients (with attributes like gender, age, blood-pressure etc.), we would like to help the physician detect any clusters, or correlation... |

138 | B.: ‘Multi-Step Processing of Spatial Joins
- Brinkhoff, Kriegel, et al.
- 1994
(Show Context)
Citation Context ...spatial access methods (SAMs), to answer several types of queries, including the `Query By Example' type (which translates to a range query); the `all pairs' query (which translates to a spatial join =-=[BKSS94]-=-); the nearest-neighbor or best-match query, etc. However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two obje... |

138 |
A Retrieval Technique for Similar Shapes
- Jagadish
- 1991
(Show Context)
Citation Context ...Abstract A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in k-d space, using k feature-extraction functions, provided by a domain expert =-=[Jag91]-=-. Thus, we can subsequently use highlysne-tuned spatial access methods (SAMs), to answer several types of queries, including the `Query By Example' type (which translates to a range query); the `all p... |

118 | Latent semantic indexing (lsi) and trec-2
- Dumais
- 1994
(Show Context)
Citation Context ...e `distance' case even in the `features' case, it may be slow for large databases (N 1) with many attributes (n 1) The latter situation appears, eg., in information retrieval andsltering [FD92], =-=[Dum94]-=-, where documents correspond to V -dimensional vectors (V being the vocabulary size of the collection, typically in the tens of thousands). Sub-section 3.3 provides such an example. 2.3 Retrieval and ... |

106 |
Identifying high level features of texture perception. CVGIP: Graphical Models and Image Processing, 55:218–233
- Rao, Lohse
- 1993
(Show Context)
Citation Context ...; physics (nuclear gamma-ray spectra pattern recognition, recognizing the dierent type of spins and their relationships); political science (determining ideological shifts) [You87]; texture analysis =-=[RL92]-=-. However, for our applications, MDS suers from two drawbacks: It requires O(N 2 ) time, where N is the number of items. Thus, it is impractical for large datasets. In the applications presented ab... |

86 |
Computer programs for detecting and correcting spelling errors
- Peterson
- 1980
(Show Context)
Citation Context ...he collection. For the English language, we can expect V to range from 2,000 up to and exceeding 100,000 (the vocabulary of every-day English, and the size of a very detailed dictionary, respectively =-=[Pet80]-=-). The coordinates of such vectors are called term weights and can be binary ('1' if the term appears in the document; '0' if not) or real-valued, with values increasing with the importance (eg., occu... |

76 |
Multidimensional Scaling: History, Theory and Applications. Lawrence Erlbaum Associates
- Young, Hamer
- 1987
(Show Context)
Citation Context ... older attempts to solve the problem. First we discuss the Multidimensional Scaling (MDS) method that has been used in several diverseselds (eg., social sciences, psychology, market research, physics =-=[You87]-=-) to solve the `distance' case problem. Then, we present the Karhunen-Loeve transform and the closely related Singular Value Decomposition that has been used for dimensionality reduction (`features' ... |

59 | A comparison of spatial query processing techniques for native and parameter spaces - ORENSTEIN - 1990 |

57 | New techniques for best-match retrieval
- Shasha, Wang
- 1990
(Show Context)
Citation Context ...er space-lling curves [FR89, Jag90b] andsnally (c) methods that use grid-les [NHS84, HN83]. There are also retrieval methods for the case where only the triangular inequality holds [BK73], [Sha77], =-=[SW90]-=-, [BYCMW94]. All these methods try to exploit the triangular inequality in order to prune the search space on a range query. However, none of them tries to map objects into points in `target space', n... |

52 |
V.: ‘Spatial Search with Polyhedra
- Jagadish
- 1990
(Show Context)
Citation Context ..., rectangles, or even more complicated shapes. The most popular methods form three classes: (a) tree-based methods like the R-tree [Gut84], and its variants (R + -tree [SRF87], hB-tree [LS90], P-tree =-=[Jag90a]-=-, R -tree [BKSS90], Hilbert R-trees [KF94] 6 Symbols Denitions. N Number of objects in database n dimensionality of original space (`features' case only) k dimensionality of `target space' D(; ) ... |

49 |
Ecient and eective clustering method for spatial data mining
- Ng, Han
- 1994
(Show Context)
Citation Context ...or to provide a tool for visualization. Finally, our work could be benecial to research on clustering algorithms, where several approaches have been proposed. See, eg., [Mur83], [Har75] for surveys, =-=[NH94]-=- for a recent application in GIS, [SM83] [VR79] for applications in Information Retrieval. 3 Proposed Method In thesrst part, we describe the proposed algorithm, which achieves a fast mapping of objec... |

47 |
Information Retrieval. Butterworths
- Van-Rijsbergen
- 1979
(Show Context)
Citation Context ..., our work could be benecial to research on clustering algorithms, where several approaches have been proposed. See, eg., [Mur83], [Har75] for surveys, [NH94] for a recent application in GIS, [SM83] =-=[VR79]-=- for applications in Information Retrieval. 3 Proposed Method In thesrst part, we describe the proposed algorithm, which achieves a fast mapping of objects into points, so that distances are preserved... |

36 |
The R*-tree: an ecient and robust access method for points and rectangles
- Beckmann, Kriegel, et al.
- 1990
(Show Context)
Citation Context ...space. Such a mapping provides two major benets: 1. It can accelerate the search time for queries. The reason is that we can employ highlysne-tuned Spatial Access Methods (SAMs), like the R -trees =-=[BKSS90]-=- and the z-ordering [Ore86]. These methods provide fast searching for range queries as well as spatial joins [BKSS94]. 2. it can help with visualization, clustering and data-mining: Plotting objects a... |

23 | Qbism: a prototype 3-d medical image database system - Arya, Cody, et al. - 1993 |

22 |
Integrating multiple knowledge sources in a bayesian OCR post-processor
- Jones, Story, et al.
- 1991
(Show Context)
Citation Context ...dean distance (sum of squared errors) as the distance function between two time series. Similarity searching in string databases, as in the case of spelling, typing [Kuk92] and OCR error correction =-=[JSB91]-=-. There, given a wrong string, we should search a dictionary tosnd the closest strings to it. Conceptually identical is the case of approximate matching in DNA databases, where there is a large collec... |

21 |
Ecient similarity search in sequence databases
- Agrawal, Faloutsos, et al.
- 1993
(Show Context)
Citation Context ...nd past days in which the solar magnetic wind showed patterns similar to today's pattern' [Vas93]. The goal is to aid forecasting, by examining similar patterns that may have appeared in the past. In =-=[AFS93]-=- we used the Euclidean distance (sum of squared errors) as the distance function between two time series. Similarity searching in string databases, as in the case of spelling, typing [Kuk92] and OCR... |

18 |
Multidimensional scaling: Theory and applications in the behavioral sciences. Volume I—Theory
- Shepard, Kimball, et al.
- 1972
(Show Context)
Citation Context ...vers' perception of the data's dierence. MDS has been used in numerous, diverse applications, including the following: semantic structure analysis of words; perceived personality trait relationships =-=[RSN72]-=-, operating on 60 dierent personality traits and people's perception of what goes together (like `warm' and `trusting'); physics (nuclear gamma-ray spectra pattern recognition, recognizing the diere... |

14 |
The input-state space approach to the prediction of auroral geomagnetic activity from solar wind variables
- Vassiliadis
- 1993
(Show Context)
Citation Context ... 2 Time series, with, eg.snancial data, such as stock prices, sales numbers etc., or scientic databases, with time series of sensor data, weather [CoPES92], geological, environmental, astrophysics =-=[Vas93]-=- data, etc., In such databases, typical queries would be `nd companies whose stock prices move similarly', or `nd past days in which the solar magnetic wind showed patterns similar to today's patter... |

11 |
Some approaches to best-match searching
- Burkhard, Keller
- 1973
(Show Context)
Citation Context ...6, Ore90], or other space-lling curves [FR89, Jag90b] andsnally (c) methods that use grid-les [NHS84, HN83]. There are also retrieval methods for the case where only the triangular inequality holds =-=[BK73]-=-, [Sha77], [SW90], [BYCMW94]. All these methods try to exploit the triangular inequality in order to prune the search space on a range query. However, none of them tries to map objects into points in ... |

11 | The grid an adaptable, symmetric multikey structure - Nievergelt, Hinterberger - 1984 |

11 |
The choice of reference points in best-match searching
- Shapiro
- 1977
(Show Context)
Citation Context ...], or other space-lling curves [FR89, Jag90b] andsnally (c) methods that use grid-les [NHS84, HN83]. There are also retrieval methods for the case where only the triangular inequality holds [BK73], =-=[Sha77]-=-, [SW90], [BYCMW94]. All these methods try to exploit the triangular inequality in order to prune the search space on a range query. However, none of them tries to map objects into points in `target s... |

10 | The grid a data structure to support proximity queries on spatial objects - Hinrichs, Nievergelt - 1983 |

7 |
Narasimhalu and Stavros Christodoulakis. Multimedia information systems: the unfolding of a reality
- Desai
- 1991
(Show Context)
Citation Context ...ween appropriately selected feature vectors (color attributes, moments of inertia for shape, etc.) Search-by-content is highly desirable in multimedia databases, with audio (voice, music), video etc. =-=[NC91]-=-. For example, users might want to retrieve, music scores, or video clips that are similar to a target music score or video clip. Once the similarity (or dis-similarity) function has been determined, ... |

5 |
Warping 3d models for interbrain comparisons
- Toga, Banerjee, et al.
- 1990
(Show Context)
Citation Context ...istance functions are complicated, typically requiring some warping of the two images, to make sure that the anatomical structures (eg., bones) are properly aligned, before we consider the dierences =-=[TBS90]-=-. This warping makes it dicult tosnd features that would adequately describe each image (and therefore, map it into a point in feature space). 2 Time series, with, eg.snancial data, such as stock p... |

1 |
Proximity matching using queries trees
- Baeza-Yates, Cunto, et al.
(Show Context)
Citation Context ...-lling curves [FR89, Jag90b] andsnally (c) methods that use grid-les [NHS84, HN83]. There are also retrieval methods for the case where only the triangular inequality holds [BK73], [Sha77], [SW90], =-=[BYCMW94]-=-. All these methods try to exploit the triangular inequality in order to prune the search space on a range query. However, none of them tries to map objects into points in `target space', nor to provi... |