Results 1  10
of
58
Multidimensional Access Methods
, 1998
"... Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that ..."
Abstract

Cited by 607 (3 self)
 Add to MetaCart
Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that overlap a given search region). More
Efficient and Effective Querying by Image Content
 Journal of Intelligent Information Systems
, 1994
"... In the QBIC (Query By Image Content) project we are studying methods to query large online image databases using the images' content as the basis of the queries. Examples of the content we use include color, texture, and shape of image objects and regions. Potential applications include med ..."
Abstract

Cited by 468 (13 self)
 Add to MetaCart
(Show Context)
In the QBIC (Query By Image Content) project we are studying methods to query large online image databases using the images' content as the basis of the queries. Examples of the content we use include color, texture, and shape of image objects and regions. Potential applications include medical ("Give me other images that contain a tumor with a texture like this one"), photojournalism ("Give me images that have blue at the top and red at the bottom"), and many others in art, fashion, cataloging, retailing, and industry. We describe a set of novel features and similarity measures allowing query by color, texture, and shape of image object. We demonstrate the effectiveness of the QBIC system with normalized precision and recall experiments on test databases containing over 1000 images and 1000 objects populated from commercially available photo clip art images, and of images of airplane silhouettes. We also consider the efficient indexing of these features, specifically addre...
Fast Subsequence Matching in TimeSeries Databases
 SIGMOD 94
, 1994
"... We present an efficient indexing method to locate 1dimensional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space ..."
Abstract

Cited by 447 (22 self)
 Add to MetaCart
(Show Context)
We present an efficient indexing method to locate 1dimensional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space. Then, these rectangles can be readily indexed using traditional spatial access methods, like the R*tree [9]. In more deteil, we use a sliding window over the data sequence and extract its features; the result is a trail in feature space. We propose an efficient and effective algorithm to divide such trails into subtrails, which are subsequently represented by their Minimum Bounding Rectangles (MBRs). We also examine queries of varying lengths, and we show how to handle each case efficiently. We implemented our method and carried out experiments on synthetic and real data (stock price movements). We compared the method to sequential scanning, which is the only obvious competitor. The results were excellent: our method accelerated the search time from 3 times up to 100 times.
Efficient similarity search in sequence databases
, 1994
"... We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Anot ..."
Abstract

Cited by 443 (21 self)
 Add to MetaCart
We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Another important observation is Parseval's theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lowerdimensionality space by using only the first few Fourier coe cients, we use Rtrees to index the sequences and e ciently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coefficients (13) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.
FastMap: A Fast Algorithm for Indexing, DataMining and Visualization of Traditional and Multimedia Datasets
, 1995
"... A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in kd space, using k featureextraction functions, provided by a domain expert [25]. Thus, we can subsequently use highly finetuned spatial access methods (SAMs), to answer several types ..."
Abstract

Cited by 434 (22 self)
 Add to MetaCart
(Show Context)
A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in kd space, using k featureextraction functions, provided by a domain expert [25]. Thus, we can subsequently use highly finetuned spatial access methods (SAMs), to answer several types of queries, including the `Query By Example' type (which translates to a range query); the `all pairs' query (which translates to a spatial join [8]); the nearestneighbor or bestmatch query, etc. However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points. This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some kdimensional space (k is userdefined), such that the dissimilarities are preserved. There are two benefits from this mapping: (a) efficient ret...
On Packing Rtrees
 In ACM CIKM
, 1993
"... – main idea; file structure – algorithms: insertion/split – deletion – search: range, nn, spatial joins – performance analysis – variations (packed; hilbert;...) 15721 Copyright: C. Faloutsos (2001) 2 Problem • Given a collection of geometric objects (points, lines, polygons,...) • organize them on ..."
Abstract

Cited by 232 (16 self)
 Add to MetaCart
(Show Context)
– main idea; file structure – algorithms: insertion/split – deletion – search: range, nn, spatial joins – performance analysis – variations (packed; hilbert;...) 15721 Copyright: C. Faloutsos (2001) 2 Problem • Given a collection of geometric objects (points, lines, polygons,...) • organize them on disk, to answer spatial queries (range, nn, etc) 15721 Copyright: C. Faloutsos (2001) 3 1 (Who cares?)
The TVtree  an index structure for highdimensional data
 VLDB Journal
, 1994
"... We propose a file structure to index highdimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree struc ..."
Abstract

Cited by 208 (7 self)
 Add to MetaCart
(Show Context)
We propose a file structure to index highdimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such `varying length' feature vectors. Finally we report simulation results, comparing the proposed structure with the R tree, which is one of the most successful methods for lowdimensionality spaces. The results illustrate the superiority of our method, with up to 80% savings in disk accesses. Type of Contribution: New Index Structure, for highdimensionality feature spaces. Algorithms and performance measurements. Keywords: Spatial Index, Similarity Retrieval, Query by Content 1 Introduction Many applications require enhanced indexing, capable of performing similarity searching on several, nontraditional (`exotic') data types. The targ...
Hilbert Rtree: An improved Rtree using fractals
, 1994
"... We propose a new Rtree structure that outperforms all the older ones. The heart of the idea is to facilitate the deferred splitting approach in Rtrees. This is done by proposing an ordering on the Rtree nodes. This ordering has to be 'good', in the sense that it should group 'simil ..."
Abstract

Cited by 191 (11 self)
 Add to MetaCart
(Show Context)
We propose a new Rtree structure that outperforms all the older ones. The heart of the idea is to facilitate the deferred splitting approach in Rtrees. This is done by proposing an ordering on the Rtree nodes. This ordering has to be 'good', in the sense that it should group 'similar' data rectangles together, to minimize the area and perimeter of the resulting minimum bounding rectangles (MBRs). Following [19] we have chosen the socalled '2Dc' method, which sorts rectangles according to the Hilbert value of the center of the rectangles. Given the ordering, every node has a welldefined set of sibling nodes; thus, we can use deferred splitting. By adjusting the split policy, the Hilbert Rtree can achieve as high utilization as desired. To the contrary, the R tree has no control over the space utilization, typically achieving up to 70%. We designed the manipulation algorithms in detail, and we did a full implementation of the Hilbert Rtree. Our experiments show that the '2to...
Beyond uniformity and independence: Analysis of rtrees using the concept of fractal dimension
 In Proc. PODS
, 1994
"... We propose the concept of fractal dimension of a set of points, in order to quantify the deviation from the uniformity distribution. Using measurements on real data sets (road intersections of U.S. counties, star coordinates from NASA’s InfraredUltraviolet Explorer etc.) we provide evidence that re ..."
Abstract

Cited by 164 (19 self)
 Add to MetaCart
(Show Context)
We propose the concept of fractal dimension of a set of points, in order to quantify the deviation from the uniformity distribution. Using measurements on real data sets (road intersections of U.S. counties, star coordinates from NASA’s InfraredUltraviolet Explorer etc.) we provide evidence that real data indeed are skewed, and, moreover, we show that they behave as mathematical fractals, with a measurable, noninteger fract al dimension. Armed with this tool, we then show its practical use in predicting the performance of spatial access methods, and specifically of the Rtrees. We provide the jirst analysis of Rtrees for skewed distributions of points: We develop a formula that estimates the number of disk accesses for range queries, given only the fractal dimension of the point set, and its count. Experiments on real data sets show that the formula is very accurate: the relative error is usually below 5%, and it rarely exceeds 10%. We believe that the fractal dimension will help replace the uniformity and independence assumptions, allowing more accurate analysis for any spatial access method, as well as better estimates for query optimization on multiattribute queries. 1
Analysis of the clustering properties of the Hilbert spacefilling curve
 IEEE Transactions on Knowledge and Data Engineering
, 2001
"... AbstractÐSeveral schemes for the linear mapping of a multidimensional space have been proposed for various applications, such as access methods for spatiotemporal databases and image compression. In these applications, one of the most desired properties from such linear mappings is clustering, whic ..."
Abstract

Cited by 153 (11 self)
 Add to MetaCart
(Show Context)
AbstractÐSeveral schemes for the linear mapping of a multidimensional space have been proposed for various applications, such as access methods for spatiotemporal databases and image compression. In these applications, one of the most desired properties from such linear mappings is clustering, which means the locality between objects in the multidimensional space being preserved in the linear space. It is widely believed that the Hilbert spacefilling curve achieves the best clustering [1], [14]. In this paper, we analyze the clustering property of the Hilbert spacefilling curve by deriving closedform formulas for the number of clusters in a given query region of an arbitrary shape (e.g., polygons and polyhedra). Both the asymptotic solution for the general case and the exact solution for a special case generalize previous work [14]. They agree with the empirical results that the number of clusters depends on the hypersurface area of the query region and not on its hypervolume. We also show that the Hilbert curve achieves better clustering than the z curve. From a practical point of view, the formulas given in this paper provide a simple measure that can be used to predict the required disk access behaviors and, hence, the total access time.