Results 1 - 10
of
12
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynami ..."
Abstract
-
Cited by 78 (34 self)
- Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
Efficient Computation of Temporal Aggregates with Range Predicates
, 2001
"... A temporal aggregation query is an important but costly operation for applications that maintain timeevolving data (data warehouses, temporal databases, etc.). Due to the large volume of such data, performance improvements for temporal aggregation queries are critical. In this paper we examine tec ..."
Abstract
-
Cited by 39 (8 self)
- Add to MetaCart
A temporal aggregation query is an important but costly operation for applications that maintain timeevolving data (data warehouses, temporal databases, etc.). Due to the large volume of such data, performance improvements for temporal aggregation queries are critical. In this paper we examine techniques to compute temporal aggregates that include key-range predicates (range temporal aggregates). In particular we concentrate on SUM, COUNT and AVG aggregates. This problem is novel; to handle arbitrary key ranges, previous methods would need to keep a separate index for every possible key range. We propose an approach based on a new index structure called the Multiversion SB-Tree, which incorporates features from both the SB-Tree and the Multiversion B-Tree, to handle arbitrary key-range temporal SUM, COUNT and AVG queries. We analyze the performance of our approach and present experimental results that show its efficiency. 1
Implementing I/O-Efficient Data Structures Using TPIE
- In Proc. European Symposium on Algorithms
, 2002
"... In recent years, many theoretically I/O-efficient algorithms and data structures have been developed. The TPIE project at Duke University was started to investigate the practical importance of these theoretical results. The goal of this ongoing project is to provide a portable, extensible, flexib ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
In recent years, many theoretically I/O-efficient algorithms and data structures have been developed. The TPIE project at Duke University was started to investigate the practical importance of these theoretical results. The goal of this ongoing project is to provide a portable, extensible, flexible, and easy to use C++ programming environment for efficiently implementing I/O-algorithms and data structures. The TPIE library has been developed in two phases. The first phase focused on supporting algorithms with a sequential I/O pattern, while the recently developed second phase has focused on supporting on-line I/O-efficient data structures, which exhibit a more random I/O pattern. This paper describes the design and implementation of the second phase of TPIE.
Efficient Temporal Join Processing Using Indices
- Proc. of ICDE
, 2002
"... We examine the problem of processing temporal joins in the presence of indexing schemes. Previous work on temporal joins has concentrated on non-indexed relations which were fully scanned. Given the large data volumes created by the ever increasing time dimension, sequential scanning is prohibitive. ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
We examine the problem of processing temporal joins in the presence of indexing schemes. Previous work on temporal joins has concentrated on non-indexed relations which were fully scanned. Given the large data volumes created by the ever increasing time dimension, sequential scanning is prohibitive. This is especially true when the temporal join involves only parts of the joining relations (e.g., a given time interval instead of the whole timeline). Utilizing an index becomes then beneficial as it directs the join to the data of interest. We consider temporal join algorithms for three representative indexing schemes, namely a B+-tree, an R*-tree and a temporal index, the Multiversion B+-tree (MVBT). Both the B+-tree and R*-tree result in simple but not efficient join algorithms because neither index achieves good temporal data clustering. Better clustering is maintained by the MVBT through record copying. Nevertheless, copies can greatly affect the correctness and effectiveness of the join algorithms. We identify these problems and propose efficient solutions and optimizations. An extensive comparison of all index based temporal joins, using a variety of datasets and query characteristics shows that the MVBT based join algorithms are consistently faster. In particular the link-based algorithm has the most robust behavior. In our experiments it showed a ten-fold improvement over the R*-tree joins while it was between six and thirty times faster than the B+-tree joins. 1
Joining interval data in relational databases
- In Proceedings of the ACM SIGMOD Conference
, 2004
"... The increasing use of temporal and spatial data in presentday relational systems necessitates an efficient support of joins on interval-valued attributes. Standard join algorithms do not support those data types adequately, whereas special approaches for interval joins usually require an augmentatio ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
The increasing use of temporal and spatial data in presentday relational systems necessitates an efficient support of joins on interval-valued attributes. Standard join algorithms do not support those data types adequately, whereas special approaches for interval joins usually require an augmentation of the internal access methods which is not supported by existing relational systems. To overcome these problems we introduce new join algorithms for interval data. Based on the Relational Interval Tree, these algorithms can easily be implemented on top of any relational database system while providing excellent performance on joining intervals. As experimental results on an Oracle9i server show, the new techniques outperform existing relational methods for joining intervals significantly. 1.
Spatial Join Techniques
"... A variety of techniques for performing a spatial join are reviewed. Instead of just summarizing the literature and presenting each technique in its entirety, distinct components of the different techniques are described and each is decomposed into an overall framework for performing a spatial join. ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
A variety of techniques for performing a spatial join are reviewed. Instead of just summarizing the literature and presenting each technique in its entirety, distinct components of the different techniques are described and each is decomposed into an overall framework for performing a spatial join. A typical spatial join technique consists of the following components: partitioning the data, performing internal-memory spatial joins on subsets of the data, and checking if the full polygons intersect. Each technique is decomposed into these components and each component addressed in a separate section so as to compare and contrast similar aspects of each technique. The goal of this survey is to describe the algorithms within each component in detail, comparing and contrasting competing methods, thereby enabling further analysis and experimentation with each component and allowing the best algorithms for a particular situation to be built piecemeal, or, even better, enabling an optimizer to choose which algorithms to use. Categories and Subject Descriptors: H.2.4 [Database Management]: Systems—Query processing; H.2.8 [Database Management]: Database Applications—Spatial databases and GIS
Maintenance of k-nn and spatial join queries on continuously moving points
- TODS
, 2006
"... Cars, aircraft, mobile cell phones, ships, tanks, and mobile robots all have the common property that they are moving objects. A kinematic representation can be used to describe the location of these objects as a function of time. For example, a moving point can be represented by the function p(t) ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Cars, aircraft, mobile cell phones, ships, tanks, and mobile robots all have the common property that they are moving objects. A kinematic representation can be used to describe the location of these objects as a function of time. For example, a moving point can be represented by the function p(t) = − → x0 + (t − t0) − → v, where − → x0 is the start location, t0 is the start time, and − → v is its velocity vector. Instead of storing the location of the object at a given time in a database, the coefficients of the function are stored. When an object’s behavior changes enough so that the function describing its location is no longer accurate, the function coefficients for the object are updated. Because the location of each object is represented as a function of time, spatial query results can change even when no transactions update the database. We present efficient algorithms to maintain k-nearest neighbor, and spatial join queries in this domain as time advances and updates occur. We assume no previous knowledge of what the updates will be before they occur. We experimentally compare these new algorithms with more straight forward adaptations of previous work to support updates. Experiments are conducted using synthetic uniformly distributed data, and real aircraft flight data. The primary metric of comparison is the number of I/O disk accesses needed to maintain the query results and the supporting data
Aggregation Computation over Complex Objects
, 2002
"... The aggregation query is an important but costly operation in database management systems. While the aggregation in relational databases has been well studied, recently there has been a growing interest in improving the performance of computing aggregates over complex objects. Each of such objects m ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The aggregation query is an important but costly operation in database management systems. While the aggregation in relational databases has been well studied, recently there has been a growing interest in improving the performance of computing aggregates over complex objects. Each of such objects may have a time interval, a spatial location/region, or both, as appears in temporal, spatial, and spatio-temporal databases. An aggregation query over these objects typically involves some selection condition on their time and/or spatial attributes, e.g. to aggregate over temporal records whose time intervals intersect a given time interval. A straightforward approach is to, with the aid of some index structures, locate the objects that satisfy the selection condition and aggregate their values on the fly. Such indices are general in the sense that they can be utilized not only to compute aggregates, but also to perform selection queries. However, the aggregation query performance is proportional to the number of objects satisfying the selection condition. In the worst case, to comv pute an aggregate, all objects in a database needs to be examined. Nevertheless, in many applications (on-line analysis, etc.), we need to compute the aggregates very fast, and to scan through all the existing objects might be too time-consuming. In this thesis, we focus on devising specialized indices for aggregation over complex objects. Our research shows that our newly designed structures have much better query performance than the existing general-index-based solutions, sometimes over a hundred times faster. In...
An Experimental Performance Evaluation of Spatio-Temporal Join Strategies
, 2004
"... Many applications capture, or make use of, spatial data that changes over time. This requirement for effective and efficient spatio-temporal data management has given rise to a range of research activities relating to spatio-temporal data management. Such work has sought to understand, for example ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Many applications capture, or make use of, spatial data that changes over time. This requirement for effective and efficient spatio-temporal data management has given rise to a range of research activities relating to spatio-temporal data management. Such work has sought to understand, for example, the requirements of different categories of application, and the modelling facilities that are most effective for these applications. However, at present, there are few systems with fully integrated support for spatio-temporal data, and thus developers must often construct custom solutions for their applications.
Index Based Processing of Semi-Restrictive Temporal Joins
- TIME
"... Temporal joins are important but very costly operations. While a temporal join can involve the whole time (and/or key) domain, we consider the more general case where the join is defined by some time-key rectangle from the whole space (i.e., when the user is interested in joining portions of the –us ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Temporal joins are important but very costly operations. While a temporal join can involve the whole time (and/or key) domain, we consider the more general case where the join is defined by some time-key rectangle from the whole space (i.e., when the user is interested in joining portions of the –usually large – temporal data). In the most restrictive join, objects (within this rectangle) are joined together based on key equality and interval intersection. This paper concentrates on semi-restrictive joins, i.e., when either the key equality (equi-join) or the interval intersection (timejoin) predicates are used. Given the large relations created by the ever increasing time dimension, we assume that each temporal relation is indexed and examine efficient ways to process semi-restrictive temporal joins. Utilizing an index is helpful since it directs the join towards the objects that are within the time-key rectangle. A straightforward approach is to perform an unsynchronized join. An index selection query on each relation identifies all objects within the time-key rectangle which are then joined. Although simple, this approach ignores the data distribution in the other relation. Instead, in a synchronized join, both indices are concurrently traversed as the join is computed. Synchronized semi-restrictive join algorithms can be performed utilizing traditional indices like B+-trees or R-trees. The drawback of this approach is that traditional indices do not achieve good temporal data clustering. Better clustering is achieved by temporal indices through record copying. Nevertheless, record copies can greatly affect the correctness and effectiveness of join performance. In this paper we introduce correct and efficient algorithms for performing semi-restrictive temporal joins using temporal indices. An extensive experimental comparison shows that the newly proposed algorithms have the best performance. While the paper concentrates on using the Multiversion B+-tree, our algorithms apply equally to other efficient tree-based temporal indices.

