Multidimensional Access Methods
, 1998
"... Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that ..."
Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that overlap a given search region). More
Indexing the Positions of Continuously Moving Objects
, 2000
"... The coming years will witness dramatic advances in wireless communications as well as positioning technologies. As a result, tracking the changing positions of objects capable of continuous movement is becoming increasingly feasible and necessary. The present paper proposes a novel, R # tree base ..."
The coming years will witness dramatic advances in wireless communications as well as positioning technologies. As a result, tracking the changing positions of objects capable of continuous movement is becoming increasingly feasible and necessary. The present paper proposes a novel, R # tree based indexing technique that supports the efficient querying of the current and projected future positions of such moving objects. The technique is capable of indexing objects moving in one, two, and threedimensional space. Update algorithms enable the index to accommodate a dynamic data set, where objects may appear and disappear, and where changes occur in the anticipated positions of existing objects. A comprehensive performance study is reported.
Generalized Search Trees for Database Systems
 IN PROC. 21 ST INTERNATIONAL CONFERENCE ON VLDB
, 1995
"... This paper introduces the Generalized Search Tree (GiST), an index structure supporting an extensible set of queries and data types. The GiST allows new data types to be indexed in a manner supporting queries natural to the types; this is in contrast to previous work on tree extensibility which only ..."
This paper introduces the Generalized Search Tree (GiST), an index structure supporting an extensible set of queries and data types. The GiST allows new data types to be indexed in a manner supporting queries natural to the types; this is in contrast to previous work on tree extensibility which only supported the traditional set of equality and range predicates. In a single data structure, the GiST provides all the basic search tree logic required by a database system, thereby unifying disparate structures such as B+trees and Rtrees in a single piece of code, and opening the application of search trees to general extensibility. To illustrate the exibility of the GiST, we provide simple method implementations that allow it to behave like a B+tree, an Rtree, and an RDtree, a new index for data with setvalued attributes. We also present a preliminary performance analysis of RDtrees, which leads to discussion on the nature of tree indices and how they behave for various datasets.
On Indexing Mobile Objects
, 1999
"... We show how to index mobile objects in one and two dimensions using efficient dynamic external memory data structures. The problem is motivated by real life applications in traffic monitoring, intelligent navigation and mobile communications domains. For the 1dimensional case, we give (i) a dynamic ..."
We show how to index mobile objects in one and two dimensions using efficient dynamic external memory data structures. The problem is motivated by real life applications in traffic monitoring, intelligent navigation and mobile communications domains. For the 1dimensional case, we give (i) a dynamic, external memory algorithm with guaranteed worst case performance and linear space and (ii) a practical approximation algorithm also in the dynamic, external memory setting, which has linear space and expected logarithmic query time. We also give an algorithm with guaranteed logarithmic query time for a restricted version of the problem. We present extensions of our techniques to two dimensions. In addition we give a lower bound on the number of I/O's needed to answer the ddimensional problem. Initial experimental results and comparisons to traditional indexing approaches are also included. 1 Introduction Traditional database management systems assume that data stored in the database rem...
Topk selection queries over relational databases: Mapping strategies and performance evaluation
 TODS
, 2002
"... In many applications, users specify target values for certain attributes, without requiring exact matches to these values in return. Instead, the result to such queries is typically a rank of the “top k ” tuples that best match the given attribute values. In this paper, we study the advantages and l ..."
In many applications, users specify target values for certain attributes, without requiring exact matches to these values in return. Instead, the result to such queries is typically a rank of the “top k ” tuples that best match the given attribute values. In this paper, we study the advantages and limitations of processing a topk query by translating it into a single range query that a traditional relational database management system (RDBMS) can process efficiently. In particular, we study how to determine a range query to evaluate a topk query by exploiting the statistics available to an RDBMS, and the impact of the quality of these statistics on the retrieval efficiency of the resulting scheme. We also report the first experimental evaluation of the mapping strategies over a real RDBMS, namely over Microsoft’s SQL Server 7.0. The experiments show that our new techniques are robust and significantly more efficient than previously known strategies requiring at least one sequential scan of the data sets.
Indexing multidimensional uncertain data with arbitrary probability density functions
 In Proc. VLDB
, 2005
"... In an “uncertain database”, an object o is associated with a multidimensional probability density function (pdf), which describes the likelihood that o appears at each position in the data space. A fundamental operation is the “probabilistic range search ” which, given a value pq and a rectangular ..."
In an “uncertain database”, an object o is associated with a multidimensional probability density function (pdf), which describes the likelihood that o appears at each position in the data space. A fundamental operation is the “probabilistic range search ” which, given a value pq and a rectangular area rq, retrieves the objects that appear in rq with probabilities at least pq. In this paper, we propose the Utree, an access method designed to optimize both the I/O and CPU time of range retrieval on multidimensional imprecise data. The new structure is fully dynamic (i.e., objects can be incrementally inserted/deleted in any order), and does not place any constraints on the data pdfs. We verify the query and update efficiency of Utrees with extensive experiments. 1
On the Analysis of Indexing Schemes
 In Proc. 16th ACM SIGACTSIGMODSIGART Symposium on Principles of Database Systems
, 1997
"... We consider the problem of indexing general database workloads (combinations of data sets and sets of potential queries). We define a framework for measuring the efficiency of an indexing scheme for a workload based on two characterizations: storage redundancy (how many times each item in the data s ..."
We consider the problem of indexing general database workloads (combinations of data sets and sets of potential queries). We define a framework for measuring the efficiency of an indexing scheme for a workload based on two characterizations: storage redundancy (how many times each item in the data set is stored), and access overhead (how many times more blocks than necessary does a query retrieve). Using this framework we present some initial results, showing upper and lower bounds and tradeoffs between them in the case of multidimensional range queries and set queries. 1 Introduction The success and ubiquity of the relational data model arguably owes much to the Btree, the access method breakthrough that accompanied it with superb timing [2]. It seems likely that access methods will continue to play an important role in, and largely determine the viability of, the novel data models currently under intense scrutiny in the database research community. The Btree is widely recognized...
Efficient Indexing of Spatiotemporal Objects
, 2002
"... Spatiotemporal objects, i.e., objects which change their position and/or extent over time appear in many applications. In this paper we examine the problem of indexing large volumes of such data. Important in this environment is how the spatiotemporal objects move and/or change. We consider a rath ..."
Spatiotemporal objects, i.e., objects which change their position and/or extent over time appear in many applications. In this paper we examine the problem of indexing large volumes of such data. Important in this environment is how the spatiotemporal objects move and/or change. We consider a rather general case where object movements/changes are defined by combinations of polynomial functions. We further concentrate on "snapshot" as well as small "interval" queries as these are quite common when examining the history of the gathered data. The obvious approach that approximates each spatiotemporal object by an MBR and uses a traditional multidimensional access method to index them is inefficient. Objects that "live" for long time intervals have large MBRs which introduce a lot of empty space. Clustering long intervals has been dealt in temporal databases by the use of partially persistent indices. What differentiates this problem from traditional temporal indexing, is that objects are allowed to move/change during their lifetime. Better ways are thus needed to approximate general spatiotemporal objects. One obvious solution is to introduce artificial splits: the lifetime of a longlived object is split into smaller consecutive pieces. This decreases the empty space but increases the number of indexed MBRs. We first give an optimal algorithm and a heuristic for splitting a given spatiotemporal object in a predefined number of pieces. Then, given an upper bound on the total number of possible splits, we present three algorithms that decide how the splits are distributed among all the objects so that the total empty space is minimized. The number of splits cannot be increased indefinitely since the extra objects will eventually affect query performance. Usi...
The Effect of Buffering on the Performance of RTrees
, 1996
"... Past Rtree studies have focused on the number of nodes visited as a metric of query performance. Since database systems usually include a buffering mechanism we propose that the number of disk accesses is a more realistic measure of performance. We develop a buffer model to analyze the number of di ..."
Past Rtree studies have focused on the number of nodes visited as a metric of query performance. Since database systems usually include a buffering mechanism we propose that the number of disk accesses is a more realistic measure of performance. We develop a buffer model to analyze the number of disk accesses required for spatial queries using Rtrees. The model can be used to evaluate the quality of Rtree update operations, such as various node splitting and tree restructuring policies, as measured by query performance on the resulting tree. We use our model to study the performance of three well known Rtree packing algorithms. We show that ignoring buffer behavior and using number of nodes accessed as a performance metric can lead to incorrect conclusions, not only quantitatively, but also qualitatively. In addition, we consider the problem of how many levels of the Rtree should be pinned in the buffer. This research was supported in part by the National Aeronautics and Space A...