Results 1 - 10
of
160
Multidimensional Access Methods
, 1998
"... Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that ..."
Abstract
-
Cited by 686 (3 self)
- Add to MetaCart
Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that overlap a given search region).
Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets
, 1995
"... A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in k-d space, using k feature-extraction functions, provided by a domain expert [Jag91]. Thus, we can subsequently use highly fine-tuned spatial access methods (SAMs), to answer several ..."
Abstract
-
Cited by 502 (22 self)
- Add to MetaCart
A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in k-d space, using k feature-extraction functions, provided by a domain expert [Jag91]. Thus, we can subsequently use highly fine-tuned spatial access methods (SAMs), to answer several types of queries, including the `Query By Example' type (which translates to a range query); the `all pairs' query (which translates to a spatial join [BKSS94]); the nearest-neighbor or best-match query, etc. However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points. This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some k-dimensional space (k is user-defined), such that the dis-similarities are preserved. There are two benefits from this mapping: ...
Geometric Range Searching and Its Relatives
- CONTEMPORARY MATHEMATICS
"... ... process a set S of points in so that the points of S lying inside a query R region can be reported or counted quickly. Wesurvey the known techniques and data structures for range searching and describe their application to other related searching problems. ..."
Abstract
-
Cited by 266 (39 self)
- Add to MetaCart
... process a set S of points in so that the points of S lying inside a query R region can be reported or counted quickly. Wesurvey the known techniques and data structures for range searching and describe their application to other related searching problems.
Hilbert R-tree: An Improved R-tree Using Fractals
- Proceedings 20th VLDB Conference
, 1994
"... We propose a new R-tree structure that outperforms all the older ones. The heart of the idea is to facilitate the deferred splitting approach in R-trees. This is done by proposing an ordering on the R-tree nodes. This ordering has to be 'good', in the sense that it should group 'simil ..."
Abstract
-
Cited by 223 (11 self)
- Add to MetaCart
(Show Context)
We propose a new R-tree structure that outperforms all the older ones. The heart of the idea is to facilitate the deferred splitting approach in R-trees. This is done by proposing an ordering on the R-tree nodes. This ordering has to be 'good', in the sense that it should group 'similar ' data rectangles together, to minimize the area and perimeter of the resulting minimum bounding rectangles (MBRs). Following [19] we have chosen the so-called '2D-c ' method, which sorts rectangles according to the Hilbert value of the center of the rectangles. Given the ordering, every node has a well-de ned set of sibling nodes; thus, we can use deferred splitting. By adjusting the split policy, the Hilbert R-tree can achieve as high utilization as desired. To the contrary, the R-tree has no control over the space utilization, typically achieving up to 70%. We designed the manipulation algorithms in detail, and we did a full implementation of the Hilbert R-tree. Our experiments show that the '2-to-3 ' split policy provides a compromise between the insertion complexity and the search cost, giving up to 28 % savings over the R tree [3] on real data. 1
Analysis of the clustering properties of the Hilbert space-filling curve
- IEEE Transactions on Knowledge and Data Engineering
, 2001
"... AbstractÐSeveral schemes for the linear mapping of a multidimensional space have been proposed for various applications, such as access methods for spatio-temporal databases and image compression. In these applications, one of the most desired properties from such linear mappings is clustering, whic ..."
Abstract
-
Cited by 192 (12 self)
- Add to MetaCart
(Show Context)
AbstractÐSeveral schemes for the linear mapping of a multidimensional space have been proposed for various applications, such as access methods for spatio-temporal databases and image compression. In these applications, one of the most desired properties from such linear mappings is clustering, which means the locality between objects in the multidimensional space being preserved in the linear space. It is widely believed that the Hilbert space-filling curve achieves the best clustering [1], [14]. In this paper, we analyze the clustering property of the Hilbert space-filling curve by deriving closed-form formulas for the number of clusters in a given query region of an arbitrary shape (e.g., polygons and polyhedra). Both the asymptotic solution for the general case and the exact solution for a special case generalize previous work [14]. They agree with the empirical results that the number of clusters depends on the hypersurface area of the query region and not on its hypervolume. We also show that the Hilbert curve achieves better clustering than the z curve. From a practical point of view, the formulas given in this paper provide a simple measure that can be used to predict the required disk access behaviors and, hence, the total access time.
STR: a simple and efficient algorithm for R-tree packing.
- In Proceedings of the Thirteenth International Conference on Data Engineering (ICDE
, 1997
"... ..."
Similarity Searching in Medical Image DataBases
, 1997
"... We propose a method to handle approximate searching by image content in medical image databases. Image content is represented by attributed relational graphs holding features of objects and relationships between objects. The method relies on the assumption that a fixed number of "labeled" ..."
Abstract
-
Cited by 107 (8 self)
- Add to MetaCart
We propose a method to handle approximate searching by image content in medical image databases. Image content is represented by attributed relational graphs holding features of objects and relationships between objects. The method relies on the assumption that a fixed number of "labeled" or "expected" objects (e.g., "heart", "lungs" etc.) are common in all images of a given application domain in addition to a variable number of "unexpected" or "unlabeled" objects (e.g., "tumor", "hematoma" etc.). The method can answer queries by example such as "find all X-rays that are similar to Smith's X-ray". The stored images are mapped to points in a multidimensional space and are indexed using state-of-the-art database methods (R-trees). The proposed method has several desirable properties: (a) Database search is approximate so that all images up to a prespecified degree of similarity (tolerance) are retrieved, (b) it has no "false dismissals" (i.e., all images qualifying query selection criteria are retrieved) and (c) it is much faster than sequential scanning for searching in the main memory and on the disk (i.e., by up to an order of magnitude) thus scaling-up well for large databases.
Query and Update Efficient B+-Tree Based Indexing of Moving Objects
- In VLDB
, 2004
"... A number of emerging applications of data management technology involve the monitoring and querying of large quantities of continuous variables, e.g., the positions of mobile service users, termed moving objects. In such applications, large quantities of state samples obtained via sensors are ..."
Abstract
-
Cited by 97 (17 self)
- Add to MetaCart
A number of emerging applications of data management technology involve the monitoring and querying of large quantities of continuous variables, e.g., the positions of mobile service users, termed moving objects. In such applications, large quantities of state samples obtained via sensors are streamed to a database. Indexes for moving objects must support queries efficiently, but must also support frequent updates. Indexes based on minimum bounding regions (MBRs) such as the R-tree exhibit high concurrency overheads during node splitting, and each individual update is known to be quite costly. This motivates the design of a solution that enables the B -tree to manage moving objects. We represent moving-object locations as vectors that are timestamped based on their update time. By applying a novel linearization technique to these values, it is possible to index the resulting values using a single B that partitions values according to their timestamp and otherwise preserves spatial proximity. We develop algorithms for range and nearest neighbor queries, as well as continuous queries. The proposal can be grafted into existing database systems cost effectively. An extensive experimental study explores the performance characteristics of the proposal and also shows that it is capable of substantially outperforming the R-tree based TPRtree for both single and concurrent access scenarios.
Declustering Using Fractals
- In Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems
, 1993
"... We propose a method to achieve declustering for cartesian product files on M units. The focus is on range queries, as opposed to partial match queries that older declustering methods have examined. Our method uses a distance-preserving mapping, namely, the Hilbert curve, to impose a linear ordering ..."
Abstract
-
Cited by 93 (1 self)
- Add to MetaCart
(Show Context)
We propose a method to achieve declustering for cartesian product files on M units. The focus is on range queries, as opposed to partial match queries that older declustering methods have examined. Our method uses a distance-preserving mapping, namely, the Hilbert curve, to impose a linear ordering on the multidimensional points (buckets); then, it traverses the buckets according to this ordering, assigning buckets to disks in a round-robin fashion. Thanks to the good distance-preserving properties of the Hilbert curve, the end result is that each disk contains buckets that are far away in the linear ordering, and, most probably, far away in the k-d address space. This is exactly the goal of declustering. Experiments show that these intuitive arguments lead indeed to good performance: the proposed method performs at least as well or better than older declustering schemes. Categories and Subject Descriptors: E.1 [Data Structures]; E.5 [Files]; H.2.2 [Data Base Management]: Physical Des...
Parallel R-trees
, 1992
"... We consider the problem of exploiting parallelism to accelerate the performance of spatial access methods and specifically, R-trees [11]. Our goal is to design a server for spatial data, so that to maximize the throughput of range queries. This can be achieved by (a) maximizing parallelism for large ..."
Abstract
-
Cited by 81 (1 self)
- Add to MetaCart
We consider the problem of exploiting parallelism to accelerate the performance of spatial access methods and specifically, R-trees [11]. Our goal is to design a server for spatial data, so that to maximize the throughput of range queries. This can be achieved by (a) maximizing parallelism for large range queries, and (b) by engaging as few disks as possible on point queries [22]. We propose a simple hardware architecture consisting of one processor with several disks attached to it. On this architecture, we propose to distribute the nodes of a traditional R-tree, with cross-disk pointers (`Multiplexed' R-tree). The R-tree code is identical to the one for a single-disk R-tree, with the only addition that we have to decide which disk a newly created R-tree node should be stored in. We propose and examine several criteria to choose a disk for a new node. The most successful one, termed `proximity index' or PI, estimates the similarity of the new node with the other R-tree nodes already o...