Results 1 - 10
of
163
Surfing wavelets on streams: One-pass summaries for approximate aggregate queries
- In VLDB
, 2001
"... Abstract We present techniques for computing small spacerepresentations of massive data streams. These are inspired by traditional wavelet-based approx-imations that consist of specific linear projections of the underlying data. We present general"sketch " based methods for capturi ..."
Abstract
-
Cited by 215 (16 self)
- Add to MetaCart
Abstract We present techniques for computing small spacerepresentations of massive data streams. These are inspired by traditional wavelet-based approx-imations that consist of specific linear projections of the underlying data. We present general"sketch " based methods for capturing various linear projections of the data and use them to pro-vide pointwise and rangesum estimation of data streams. These methods use small amounts ofspace and per-item time while streaming through the data, and provide accurate representation asour experiments with real data streams show.
K-Nearest Neighbor Search for Moving Query Point
- In SSTD
, 2001
"... Abstract. This paper addresses the problem of finding k nearest neighbors for moving query point (we call it k-NNMP). It is an important issue in both mobile computing research and real-life applications. The problem assumes that the query point is not static, as in k-nearest neighbor problem, but v ..."
Abstract
-
Cited by 153 (0 self)
- Add to MetaCart
(Show Context)
Abstract. This paper addresses the problem of finding k nearest neighbors for moving query point (we call it k-NNMP). It is an important issue in both mobile computing research and real-life applications. The problem assumes that the query point is not static, as in k-nearest neighbor problem, but varies its position over time. In this paper, four different methods are proposed for solving the problem. Discussion about the parameters affecting the performance of the algorithms is also presented. A sequence of experiments with both synthetic and real point data sets are studied. In the experiments, our algorithms always outperform the existing ones by fetching 70 % less disk pages. In some settings, the saving can be as much as one order of magnitude. 1
Minimal Probing: Supporting Expensive Predicates for Top-k Queries
- In SIGMOD
, 2002
"... This paper addresses the problem of evaluating ranked top- queries with expensive predicates. As major DBMSs now all support expensive user-defined predicates for Boolean queries, we believe such support for ranked queries will be even more important: First, ranked queries often need to model use ..."
Abstract
-
Cited by 140 (7 self)
- Add to MetaCart
(Show Context)
This paper addresses the problem of evaluating ranked top- queries with expensive predicates. As major DBMSs now all support expensive user-defined predicates for Boolean queries, we believe such support for ranked queries will be even more important: First, ranked queries often need to model user-specific concepts of preference, relevance, or similarity, which call for dynamic user-defined functions. Second, middleware systems must incorporate external predicates for integrating autonomous sources typically accessible only by per-object queries. Third, fuzzy joins are inherently expensive, as they are essentially user-defined operations that dynamically associate multiple relations. These predicates, being dynamically defined or externally accessed, cannot rely on index mechanisms to provide zero-time sorted output, and must instead require per-object probe to evaluate. The current standard sort-merge framework for ranked queries cannot efficiently handle such predicates because it must completely probe all objects, before sorting and merging them to produce top- answers. To minimize expensive probes, we thus develop the formal principle of "necessary probes," which determines if a probe is absolutely required. We then propose Algorithm MPro which, by implementing the principle, is provably optimal with minimal probe cost. Further, we show that MPro can scale well and can be easily parallelized. Our experiments using both a real-estate benchmark database and synthetic datasets show that MPro enables significant probe reduction, which can be orders of magnitude faster than the standard scheme using complete probing.
Monitoring k-Nearest Neighbor Queries Over Moving Objects
"... Many location-based applications require constant monitoring of k-nearest neighbor (k-NN) queries over moving objects within a geographic area. Existing approaches to this problem have focused on predictive queries, and relied on the assumption that the trajectories of the objects are fully predicta ..."
Abstract
-
Cited by 127 (0 self)
- Add to MetaCart
Many location-based applications require constant monitoring of k-nearest neighbor (k-NN) queries over moving objects within a geographic area. Existing approaches to this problem have focused on predictive queries, and relied on the assumption that the trajectories of the objects are fully predictable at query processing time. We relax this
Top-k selection queries over relational databases: Mapping strategies and performance evaluation
- TODS
, 2002
"... In many applications, users specify target values for certain attributes, without requiring exact matches to these values in return. Instead, the result to such queries is typically a rank of the “top k” tuples that best match the given attribute values. In this paper, we study the advantages and li ..."
Abstract
-
Cited by 113 (7 self)
- Add to MetaCart
(Show Context)
In many applications, users specify target values for certain attributes, without requiring exact matches to these values in return. Instead, the result to such queries is typically a rank of the “top k” tuples that best match the given attribute values. In this paper, we study the advantages and limitations of processing a top-k query by translating it into a single range query that a traditional relational database management system (RDBMS) can process efficiently. In particular, we study how to determine a range query to evaluate a top-k query by exploiting the statistics available to an RDBMS, and the impact of the quality of these statistics on the retrieval efficiency of the resulting scheme. We also report the first experimental evaluation of the mapping strategies over a real RDBMS, namely over Microsoft’s SQL Server 7.0. The experiments show that our new techniques are robust and significantly more efficient than previously known strategies requiring at least one sequential scan of the data sets.
RankSQL: Query algebra and optimization for relational top-k queries
- In SIGMOD
, 2005
"... This paper introduces RankSQL, a system that provides a systematic and principled framework to support efficient evaluations of ranking (top-k) queries in relational database systems (RDBMS), by extending relational algebra and query optimization. Previously, top-k query processing is studied in the ..."
Abstract
-
Cited by 110 (17 self)
- Add to MetaCart
(Show Context)
This paper introduces RankSQL, a system that provides a systematic and principled framework to support efficient evaluations of ranking (top-k) queries in relational database systems (RDBMS), by extending relational algebra and query optimization. Previously, top-k query processing is studied in the middleware scenario or in RDBMS in a “piecemeal ” fashion, i.e., focusing on specific operator or sitting outside the core of query engines. In contrast, we aim to support ranking as a first-class database construct. As a key insight, the new ranking relationship can be viewed as another logical property of data, parallel to the “membership ” property of relational data model. While membership is essentially supported in RDBMS, the same support for ranking is clearly lacking. We address the fundamental integration of ranking in RDBMS in a way similar to how membership, i.e., Boolean filtering, is supported. We extend relational algebra by proposing a rank-relational model to capture the ranking property, and introducing new and extended operators to support ranking as a first-class construct. Enabled by the extended algebra, we present a pipelined and incremental execution model of ranking query plans (that cannot be expressed traditionally) based on a fundamental ranking principle. To optimize top-k queries, we propose a dimensional enumeration algorithm to explore the extended plan space by enumerating plans along two dual dimensions: ranking and membership. We also propose a sampling-based method to estimate the cardinality of rank-aware operators, for costing plans. Our experiments show the validity of our framework and the accuracy of the proposed estimation model. 1.
Approximating Multi-Dimensional Aggregate Range Queries Over Real Attributes
, 2000
"... Finding approximate answers to multi-dimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper we consider the following problem: given a table of d attributes whose domain is the real numbers, and a quer ..."
Abstract
-
Cited by 85 (9 self)
- Add to MetaCart
Finding approximate answers to multi-dimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper we consider the following problem: given a table of d attributes whose domain is the real numbers, and a query that specifies a range in each dimension, find a good approximation of the number of records in the table that satisfy the query. We present a new histogram technique that is designed to approximate the density of multi-dimensional datasets with real attributes. Our technique finds buckets of variable size, and allows the buckets to overlap. Overlapping buckets allow more efficient approximation of the density. The size of the cells is based on the local density of the data. This technique leads to a faster and more compact approximation of the data distribution. We also show how to generalize kernel density estimators, and how to apply them on the multi-dimensional query approxim...
Time-Parameterized Queries in Spatio-Temporal Databases
, 2002
"... Time-parameterized queries (TP queries for short) retrieve (i) the actual result at the time that the query is issued, (ii) the validity period of the result given the current motion of the query and the database objects, and (iii) the change that causes the expiration of the result. Due to the hi ..."
Abstract
-
Cited by 81 (4 self)
- Add to MetaCart
Time-parameterized queries (TP queries for short) retrieve (i) the actual result at the time that the query is issued, (ii) the validity period of the result given the current motion of the query and the database objects, and (iii) the change that causes the expiration of the result. Due to the highly dynamic nature of several spatio-temporal applications, TP queries are important both as standalone methods, as well as building blocks of more complex operations. However, little work has been done towards their efficient processing. In this paper, we propose a general framework that covers time-parameterized variations of the most common spatial queries, namely window queries, k-nearest neighbors and spatial joins. In particular, each of these TP queries is reduced to nearest neighbor search where the distance functions are def'med according to the query type. This reduction allows the application and extension of well-known branch and bound techniques to the current problem. The proposed methods can be applied with mobile queries, mobile objects or both, given a suitable indexing method. Our experimental evaluation is based on R-trees and their extensions for dynamic objects.
Efficient Top-K Query Calculation in Distributed Networks
- In PODC
, 2004
"... This paper presents a new algorithm to answer top-k queries (e.g. “find the k objects with the highest aggregate values”) in a distributed network. Existing algorithms such as the Threshold Algorithm [FLN01] consume an excessive amount of bandwidth when the number of nodes, m, is high. We propose a ..."
Abstract
-
Cited by 79 (0 self)
- Add to MetaCart
(Show Context)
This paper presents a new algorithm to answer top-k queries (e.g. “find the k objects with the highest aggregate values”) in a distributed network. Existing algorithms such as the Threshold Algorithm [FLN01] consume an excessive amount of bandwidth when the number of nodes, m, is high. We propose a new algorithm called “Three-Phase Uniform Threshold” (TPUT). TPUT reduces network bandwidth consumption by pruning away ineligible objects, and terminates in three round-trips regardless of data input. The paper presents two sets of results about TPUT. First, trace-driven simulations show that, depending on the size of the network, TPUT reduces network traffic by one to two orders of magnitude compared to existing algorithms. Second, TPUT is proven to be instance-optimal on data series that satisfy a lower bound on the slope of decreases in values. In particular, analysis shows that by using a pruning parameter α < 1, TPUT achieves a qualitative reduction in network traffic, for example, lowering the optimality ratio from O(m ∗ m) to O(m ∗ √ m) for data series following Zipf distribution. 1
Tree Pattern Relaxation
- In Proc. of the International Conference on Extending Database Technology (EDBT
, 2002
"... Abstract. Tree patterns are fundamental to querying tree-structured data like XML. Because of the heterogeneity of XML data, it is often more appropriate to permit approximate query matching and return ranked answers, in the spirit of Information Retrieval, than to return only exact answers. In thi ..."
Abstract
-
Cited by 70 (5 self)
- Add to MetaCart
Abstract. Tree patterns are fundamental to querying tree-structured data like XML. Because of the heterogeneity of XML data, it is often more appropriate to permit approximate query matching and return ranked answers, in the spirit of Information Retrieval, than to return only exact answers. In this paper, we study the problem of approximate XML query matching, based on tree pattern relaxations, and devise efficient algorithms to evaluate relaxed tree patterns. We consider weighted tree patterns, where exact and relaxed weights, associated with nodes and edges of the tree pattern, are used to compute the scores of query answers. We are interested in the problem of finding answers whose scores are at least as large as a given threshold. We design data pruning algorithms where intermediate query results are filtered dynamically during the evaluation process. We develop an optimization that exploits scores of intermediate results to improve query evaluation efficiency. Finally, we show experimentally that our techniques outperform rewriting-based and post-pruning strategies.