Results 1 - 10
of
41
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the ..."
Abstract
-
Cited by 286 (24 self)
- Add to MetaCart
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the state of the art in the design and analysis of external memory algorithms and data structures (which are sometimes referred to as "EM" or "I/O" or "out-of-core" algorithms and data structures). EM algorithms and data structures are often designed and analyzed using the parallel disk model (PDM). The three machine-independent measures of performance in PDM are the number of I/O operations, the CPU time, and the amount of disk space. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle tertiary storage and hierarchical memory. We discuss several important paradigms for how to solve batched and online problems efficiently in external memory. Programming tools and environments are available for simplifying the programming task. The TPIE system (Transparent Parallel I/O programming Environment) is both easy to use and efficient in terms of execution speed. We report on some experiments using TPIE in the domain of spatial databases. The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynami ..."
Abstract
-
Cited by 78 (34 self)
- Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
Discovering Spatial Co-location Patterns: A Summary of Results
- Lecture Notes in Computer Science
, 2001
"... Given a collection of boolean spatial features, the co-location pattern discovery process finds the subsets of features frequently located together. For example, the analysis of an ecology dataset may reveal the frequent co-location of a fire ignition source feature with a needle vegetation type fea ..."
Abstract
-
Cited by 44 (5 self)
- Add to MetaCart
Given a collection of boolean spatial features, the co-location pattern discovery process finds the subsets of features frequently located together. For example, the analysis of an ecology dataset may reveal the frequent co-location of a fire ignition source feature with a needle vegetation type feature and a drought feature. The spatial co-location rule problem is different from the association rule problem. Even though boolean spatial feature types (also called spatial events) may correspond to items in association rules over market-basket datasets, there is no natural notion of transactions. This creates difficulty in using traditional measures (e.g. support, confidence) and applying association rule mining algorithms which use support based pruning. We propose a notion of user-specified neighborhoods in place of transactions to specify groups of items. New interest measures for spatial co-location patterns are proposed which are robust in the face of potentially infinite overlapping neighborhoods. We also propose an algorithm to mine frequent spatial co-location patterns and analyze its correctness, and completeness. We plan to carry out experimental evaluations and performance tuning in the near future.
Efficient Bulk Operations on Dynamic R-Trees
- ALGORITHMICA
, 2002
"... In recent years there has been an upsurge of interest in spatial databases. A major issue is how to manipulate efficiently massive amounts of spatial data stored on disk in multidimensional spatial indexes (data structures). Construction of spatial indexes (bulk loading) has been studied intensivel ..."
Abstract
-
Cited by 37 (10 self)
- Add to MetaCart
In recent years there has been an upsurge of interest in spatial databases. A major issue is how to manipulate efficiently massive amounts of spatial data stored on disk in multidimensional spatial indexes (data structures). Construction of spatial indexes (bulk loading) has been studied intensively in the database community. The continuous arrival of massive amounts of new data makes it important to update existing indexes (bulk updating) efficiently. In this paper we present a simple, yet efficient, technique for performing bulk update and query operations on multidimensional indexes. We present our technique in terms of the so-called R-tree and its variants, as they have emerged as practically efficient indexing methods for spatial data. Our method uses ideas from the buffer tree lazy buffering technique and fully utilizes the available internal memory and the page size of the operating system. We give a theoretical analysis of our technique, showing that it is efficient both in terms of I/O communication, disk storage, and internal computation time. We also present the results of an extensive set of experiments showing that in practice our approach performs better than the previously best known bulk update methods with respect to update time, and that it produces a better quality index in terms of query performance. One important novel feature of our technique is that in most cases it allows us to perform a batch of updates and queries simultaneously. To be able to do so is essential in environments where queries have to be answered even while the index is being updated and reorganized.
Discovering colocation patterns from spatial data sets: a general approach
- IEEE Transactions on Knowledge and Data Engineering
, 2004
"... Given a collection of Boolean spatial features, the colocation pattern discovery process finds the subsets of features frequently located together. For example, the analysis of an ecology data set may reveal symbiotic species. The spatial colocation rule problem is different from the association rul ..."
Abstract
-
Cited by 30 (11 self)
- Add to MetaCart
Given a collection of Boolean spatial features, the colocation pattern discovery process finds the subsets of features frequently located together. For example, the analysis of an ecology data set may reveal symbiotic species. The spatial colocation rule problem is different from the association rule problem since there is no natural notion of transactions in spatial data sets which are embedded in continuous geographic space. In this paper, we provide a transaction-free approach to mine colocation patterns by using the concept of proximity neighborhood. A new interest measure, a participation index, is also proposed for spatial colocation patterns. The participation index is used as the measure of prevalence of a colocation for two reasons. First, this measure is closely related to the cross-K function, which is often used as a statistical measure of interaction among pairs of spatial features. Second, it also possesses an antimonotone property which can be exploited for computational efficiency. Furthermore, we design an algorithm to discover colocation patterns. This algorithm includes a novel multiresolution pruning technique. Finally, experimental results are provided to show the strength of the algorithm and design decisions related to performance tuning.
Multiway Spatial Joins
- ACM Transactions on Database Systems (TODS
, 2001
"... Due to the evolution of Geographical Information Systems, large collections of spatial data having various thematic contents are currently available. As a result, the interest of users is not limited to simple spatial selections and joins, but complex query types that implicate numerous spatial inpu ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
Due to the evolution of Geographical Information Systems, large collections of spatial data having various thematic contents are currently available. As a result, the interest of users is not limited to simple spatial selections and joins, but complex query types that implicate numerous spatial inputs become more common. Although several algorithms have been proposed for computing the result of pairwise spatial joins, limited work exists on processing and optimization of multiway spatial joins. In this article, we review pairwise spatial join algorithms and show how they can be combined for multiple inputs. In addition, we explore the application of synchronous traversal (ST), a methodology that processes synchronously all inputs without producing intermediate results. Then, we integrate the two approaches in an engine that includes ST and pairwise algorithms, using dynamic programming to determine the optimal execution plan. The results show that, in most cases, multiway spatial joins are best processed by combining ST with pairwise methods. Finally, we study the optimization of very large queries by employing randomized search algorithms.
PBiTree coding and efficient processing of containment joins
, 2003
"... This paper addresses issues related to containment join processing in tree-structured data such as XML documents. A containment join takes two sets of XML node elements as input and returns pairs of elements such that the containment relationship holds between them. While there are previous algorith ..."
Abstract
-
Cited by 22 (5 self)
- Add to MetaCart
This paper addresses issues related to containment join processing in tree-structured data such as XML documents. A containment join takes two sets of XML node elements as input and returns pairs of elements such that the containment relationship holds between them. While there are previous algorithms for processing containment joins, they require both element sets either sorted or indexed. This paper proposes a novel and complete containment query processing framework based on a new coding scheme, PBiTree code. The PBiTree code allows us to determine the ancestor-descendant relationship between two elements from their PBiTree-based codes efficiently. We present algorithms in the framework that are optimized for various combinations of settings. In particular, the newly proposed partitioning based algorithms can process containment joins efficiently without sorting or indexes. Experimental results indicate that the containment join processing algorithms based on the proposed coding scheme outperform existing algorithms significantly. 1.
Scalable spatio-temporal continuous query processing for location-aware services
- In 16th International Conf. on Scientific and Statistical Database Management (SSDBM’04). IEEE Computer Society
, 2004
"... Real-time spatio-temporal query processing needs to effectively handle a large number of moving objects and continuous spatio-temporal queries. In this paper, we use shared execution as a mechanism to support scalability in location-aware servers. Our main idea is to maintain a query table that stor ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
Real-time spatio-temporal query processing needs to effectively handle a large number of moving objects and continuous spatio-temporal queries. In this paper, we use shared execution as a mechanism to support scalability in location-aware servers. Our main idea is to maintain a query table that stores information about continuous spatio-temporal queries. Then, answering spatio-temporal queries is abstracted as a spatial join among the moving objects and queries. Three query join policies are proposed aiming to minimize the cost of the join operation under the shared execution paradigm, namely the Clock-triggered Join Policy, the Incremental Join Policy, and the Hot Join Policy. We introduce the concept of a No-Action Region that is used in conjunction with the hot join policy. We propose algorithms that calculate the No-Action region for objects and queries. Experimental performance demonstrates that the No-Action region is more efficient than other approaches when used along with the hot join policy. Experiments also demonstrate that the hot join policy outperforms the clock-triggered join policy and the incremental join policy in terms of both I/O and CPU costs. 1
Efficient Temporal Join Processing Using Indices
- Proc. of ICDE
, 2002
"... We examine the problem of processing temporal joins in the presence of indexing schemes. Previous work on temporal joins has concentrated on non-indexed relations which were fully scanned. Given the large data volumes created by the ever increasing time dimension, sequential scanning is prohibitive. ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
We examine the problem of processing temporal joins in the presence of indexing schemes. Previous work on temporal joins has concentrated on non-indexed relations which were fully scanned. Given the large data volumes created by the ever increasing time dimension, sequential scanning is prohibitive. This is especially true when the temporal join involves only parts of the joining relations (e.g., a given time interval instead of the whole timeline). Utilizing an index becomes then beneficial as it directs the join to the data of interest. We consider temporal join algorithms for three representative indexing schemes, namely a B+-tree, an R*-tree and a temporal index, the Multiversion B+-tree (MVBT). Both the B+-tree and R*-tree result in simple but not efficient join algorithms because neither index achieves good temporal data clustering. Better clustering is maintained by the MVBT through record copying. Nevertheless, copies can greatly affect the correctness and effectiveness of the join algorithms. We identify these problems and propose efficient solutions and optimizations. An extensive comparison of all index based temporal joins, using a variety of datasets and query characteristics shows that the MVBT based join algorithms are consistently faster. In particular the link-based algorithm has the most robust behavior. In our experiments it showed a ten-fold improvement over the R*-tree joins while it was between six and thirty times faster than the B+-tree joins. 1
Adaptive Multi-Stage Distance Join Processing
- In SIGMOD
, 1999
"... A spatial distance join is a relatively new type of operation introduced for spatial and multimedia database applications. Additional requirements for ranking and stopping cardinality are often combined with the spatial distance join in on-line query processing or internet search environments. The ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
A spatial distance join is a relatively new type of operation introduced for spatial and multimedia database applications. Additional requirements for ranking and stopping cardinality are often combined with the spatial distance join in on-line query processing or internet search environments. These requirements pose new challenges as well as opportunities for more ecient processing of spatial distance join queries. In this paper, we rst present an ecient k-distance join algorithm that uses spatial indexes such as R-trees. Bi-directional node expansion and plane-sweeping techniques are used for fast pruning of distant pairs, and the plane-sweeping is further optimized by novel strategies for selecting a sweeping axis and direction. Furthermore, we propose adaptive multi-stage algorithms for k-distance join and incremental distance join operations. Our performance study shows that the proposed adaptive multistage algorithms outperform previous work by up to an order of magnitu...

