Results 1  10
of
172
Multidimensional Access Methods
, 1998
"... Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that ..."
Abstract

Cited by 689 (3 self)
 Add to MetaCart
Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that overlap a given search region).
Evaluating Probabilistic Queries over Imprecise Data
 In SIGMOD
, 2003
"... Sensors are often employed to monitor continuously changing entities like locations of moving objects and temperature. The sensor readings are reported to a database system, and are subsequently used to answer queries. Due to continuous changes in these values and limited resources (e.g., network ..."
Abstract

Cited by 277 (46 self)
 Add to MetaCart
(Show Context)
Sensors are often employed to monitor continuously changing entities like locations of moving objects and temperature. The sensor readings are reported to a database system, and are subsequently used to answer queries. Due to continuous changes in these values and limited resources (e.g., network bandwidth and battery power), the database may not be able to keep track of the actual values of the entities. Queries that use these old values may produce incorrect answers. However, if the degree of uncertainty between the actual data value and the database value is limited, one can place more confidence in the answers to the queries. More generally, query answers can be augmented with probabilistic guarantees of the validity of the answers. In this paper, we study probabilistic query evaluation based on uncertain data. A classification of queries is made based upon the nature of the result set. For each class, we develop algorithms for computing probabilistic answers, and provide efficient indexing and numeric solutions. We address the important issue of measuring the quality of the answers to these queries, and provide algorithms for efficiently pulling data from relevant sensors or moving objects in order to improve the quality of the executing queries. Extensive experiments
SimilarityBased Queries for Time Series Data
 Proc. 1997 ACMSIGMOD Conf
, 1997
"... We study a set of linear transformations on the Fourier series representation of a sequence that can be used as the basis for similarity queries on timeseries data. We show that our set of transformations is rich enough to formulate operations such as moving average and time warping. We present a q ..."
Abstract

Cited by 156 (7 self)
 Add to MetaCart
We study a set of linear transformations on the Fourier series representation of a sequence that can be used as the basis for similarity queries on timeseries data. We show that our set of transformations is rich enough to formulate operations such as moving average and time warping. We present a query processing algorithm that uses the underlying Rtree index of a multidimensional data set to answer similarity queries efficiently. Our experiments show that the performance of this algorithm is competitive to that of processing ordinary (exact match) queries using the index, and much faster than sequential scanning. We relate our transformations to the general framework for similarity queries of Jagadish et al. 1
Indexing multidimensional uncertain data with arbitrary probability density functions
 In Proc. VLDB
, 2005
"... In an “uncertain database”, an object o is associated with a multidimensional probability density function (pdf), which describes the likelihood that o appears at each position in the data space. A fundamental operation is the “probabilistic range search ” which, given a value pq and a rectangular ..."
Abstract

Cited by 117 (16 self)
 Add to MetaCart
(Show Context)
In an “uncertain database”, an object o is associated with a multidimensional probability density function (pdf), which describes the likelihood that o appears at each position in the data space. A fundamental operation is the “probabilistic range search ” which, given a value pq and a rectangular area rq, retrieves the objects that appear in rq with probabilities at least pq. In this paper, we propose the Utree, an access method designed to optimize both the I/O and CPU time of range retrieval on multidimensional imprecise data. The new structure is fully dynamic (i.e., objects can be incrementally inserted/deleted in any order), and does not place any constraints on the data pdfs. We verify the query and update efficiency of Utrees with extensive experiments. 1
Spatiotemporal indexing for large multimedia applications
 In International Conference on Multimedia Computing and Systems
, 1996
"... Multimedia applications usually involve a large number of multimedia objects (texts, images, sounds etc.). Spatial and temporal relationships among these objects should be efficiently supported and retrieved within a multimedia authoring tool. In this paper we present severol spatial, temporal and s ..."
Abstract

Cited by 88 (20 self)
 Add to MetaCart
Multimedia applications usually involve a large number of multimedia objects (texts, images, sounds etc.). Spatial and temporal relationships among these objects should be efficiently supported and retrieved within a multimedia authoring tool. In this paper we present severol spatial, temporal and spatiatemporal relationships of inierest and propose efficient indexing schenzes, based on multidimensional (spatial) data structures, for large multimedia applications that involve thousands of objects. Evaluation models of the proposed schemes are also presented as well as hints for the selection of the most appropriate one, according to the multimedia author’s requiremeni‘s. 1.
Approximating MultiDimensional Aggregate Range Queries Over Real Attributes
, 2000
"... Finding approximate answers to multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper we consider the following problem: given a table of d attributes whose domain is the real numbers, and a quer ..."
Abstract

Cited by 86 (9 self)
 Add to MetaCart
Finding approximate answers to multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper we consider the following problem: given a table of d attributes whose domain is the real numbers, and a query that specifies a range in each dimension, find a good approximation of the number of records in the table that satisfy the query. We present a new histogram technique that is designed to approximate the density of multidimensional datasets with real attributes. Our technique finds buckets of variable size, and allows the buckets to overlap. Overlapping buckets allow more efficient approximation of the density. The size of the cells is based on the local density of the data. This technique leads to a faster and more compact approximation of the data distribution. We also show how to generalize kernel density estimators, and how to apply them on the multidimensional query approxim...
Reverse kNN Search in Arbitrary Dimensionality
 IN VLDB
, 2004
"... Given a point q, a reverse k nearest neighbor (RkNN) query retrieves all the data points that have q as one of their k nearest neighbors. Existing methods for processing such queries have at least one of the following they are applicable only to 2D data (but not to higher dimensionality), and (iv) t ..."
Abstract

Cited by 76 (4 self)
 Add to MetaCart
Given a point q, a reverse k nearest neighbor (RkNN) query retrieves all the data points that have q as one of their k nearest neighbors. Existing methods for processing such queries have at least one of the following they are applicable only to 2D data (but not to higher dimensionality), and (iv) they retrieve only approximate results. Motivated by these shortcomings, we develop algorithms for exact processing of RkNN with arbitrary values of k on dynamic multidimensional datasets. Our methods utilize a conventional datapartitioning index on the dataset and do not require any precomputation. In addition to their flexibility, we experimentally verify that the proposed algorithms outperform the existing ones even in their restricted focus.
Efficient Indexing of Spatiotemporal Objects
, 2002
"... Spatiotemporal objects, i.e., objects which change their position and/or extent over time appear in many applications. In this paper we examine the problem of indexing large volumes of such data. Important in this environment is how the spatiotemporal objects move and/or change. We consider a rath ..."
Abstract

Cited by 71 (11 self)
 Add to MetaCart
(Show Context)
Spatiotemporal objects, i.e., objects which change their position and/or extent over time appear in many applications. In this paper we examine the problem of indexing large volumes of such data. Important in this environment is how the spatiotemporal objects move and/or change. We consider a rather general case where object movements/changes are defined by combinations of polynomial functions. We further concentrate on "snapshot" as well as small "interval" queries as these are quite common when examining the history of the gathered data. The obvious approach that approximates each spatiotemporal object by an MBR and uses a traditional multidimensional access method to index them is inefficient. Objects that "live" for long time intervals have large MBRs which introduce a lot of empty space. Clustering long intervals has been dealt in temporal databases by the use of partially persistent indices. What differentiates this problem from traditional temporal indexing, is that objects are allowed to move/change during their lifetime. Better ways are thus needed to approximate general spatiotemporal objects. One obvious solution is to introduce artificial splits: the lifetime of a longlived object is split into smaller consecutive pieces. This decreases the empty space but increases the number of indexed MBRs. We first give an optimal algorithm and a heuristic for splitting a given spatiotemporal object in a predefined number of pieces. Then, given an upper bound on the total number of possible splits, we present three algorithms that decide how the splits are distributed among all the objects so that the total empty space is minimized. The number of splits cannot be increased indefinitely since the extra objects will eventually affect query performance. Usi...
A Cost Model for Similarity Queries in Metric Spaces
, 1998
"... Wu consider tho problem of estimating CPU (distance computntlons) nnd I/O costs for processing range and knearest neighbors qucrics over metric spaces. Unlike the specific case of vector spaces, where information on data distribution has been exploited to derive cost models for predicting the per ..."
Abstract

Cited by 63 (12 self)
 Add to MetaCart
(Show Context)
Wu consider tho problem of estimating CPU (distance computntlons) nnd I/O costs for processing range and knearest neighbors qucrics over metric spaces. Unlike the specific case of vector spaces, where information on data distribution has been exploited to derive cost models for predicting the performanco of multidimensional access methods, in a generic metric space there is no such a possibility, which makes the problem quite different and requires a novel approach. We insist that the distance distribution of objects can be profitably used to solve the problem, and consequently develop a concrete cost model for the Mtree access method [lo]. Our results rely on the assumption that the indexed dataset comes from a metric space which is “homogeneous ” enough (in a probabilistic sense) to allow reliable cost estimations even if the distance distribution with respect to a specific query object is unknown. We experimentally validate the modol ovor both real and synthetic datasets, and show how the model can be used to tune the Mtree in order to minimlzo a combination of CPU and I/O costs. Finally, we sketch how the same approach can be applied to derive a cost model for the uptree index structure [8].
The effect of buffering on the performance of rtrees
 Knowledge and Data Engineering
"... AbstractÐPast Rtree studies have focused on the number of nodes visited as a metric of query performance. Since database systems usually include a buffering mechanism, we propose that the number of disk accesses is a more realistic measure of performance. We develop a buffer model to analyze the nu ..."
Abstract

Cited by 60 (7 self)
 Add to MetaCart
(Show Context)
AbstractÐPast Rtree studies have focused on the number of nodes visited as a metric of query performance. Since database systems usually include a buffering mechanism, we propose that the number of disk accesses is a more realistic measure of performance. We develop a buffer model to analyze the number of disk accesses required for spatial queries using Rtrees. The model can be used to evaluate the quality of Rtree update operations, such as various node splitting and tree restructuring policies, as measured by query performance on the resulting tree. We use our model to study the performance of three wellknown Rtree loading algorithms. We show that ignoring buffer behavior and using number of nodes accessed as a performance metric can lead to incorrect conclusions, not only quantitatively, but also qualitatively. In addition, we consider the problem of how many levels of the Rtree should be pinned in the buffer. Index TermsÐAnalytical model, buffer model, multidimensional indexing, performance evaluation, Rtree. æ 1