Results 21 - 30
of
230
Load shedding for aggregation queries over data streams (full version
- In preparation
"... Systems for processing continuous monitoring queries over data streams must be adaptive because data streams are often bursty and data characteristics may vary over time. In this paper, we focus on one particular type of adaptivity: the ability to gracefully degrade performance via “load shedding ” ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
Systems for processing continuous monitoring queries over data streams must be adaptive because data streams are often bursty and data characteristics may vary over time. In this paper, we focus on one particular type of adaptivity: the ability to gracefully degrade performance via “load shedding ” (dropping unprocessed tuples to reduce system load) when the demands placed on the system cannot be met in full given available resources. Focusing on aggregation queries, we present algorithms that determine at what points in a query plan should load shedding be performed and what amount of load should be shed at each point in order to minimize the degree of inaccuracy introduced into query answers. We report the results of experiments that validate our analytical conclusions. 1
Adaptive Query Processing: Technology in Evolution
- IEEE DATA ENGINEERING BULLETIN
, 2000
"... As query engines are scaled and federated, they must cope with highly unpredictable and changeable environments. In the Telegraph project, we are attempting to architect and implement a continuously adaptive query engine suitable for global-area systems, massive parallelism, and sensor networks. To ..."
Abstract
-
Cited by 73 (9 self)
- Add to MetaCart
As query engines are scaled and federated, they must cope with highly unpredictable and changeable environments. In the Telegraph project, we are attempting to architect and implement a continuously adaptive query engine suitable for global-area systems, massive parallelism, and sensor networks. To set the stage for our research, we present a survey of prior work on adaptive query processing, focusing on three characterizations of adaptivity: the frequency of adaptivity, the effects of adaptivity, and the extent of adaptivity. Given this survey, we sketch directions for research in the Telegraph project.
Congressional samples for approximate answering of group-by queries
- In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data
, 2000
"... In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex decision support queries using precomputed summary statistics, such as samples. Decision support queries routinely segment the data into groups and then aggregate the information in each ..."
Abstract
-
Cited by 72 (5 self)
- Add to MetaCart
In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex decision support queries using precomputed summary statistics, such as samples. Decision support queries routinely segment the data into groups and then aggregate the information in each group (group-by queries). Depending on the data, there can be a wide disparity between the number of data items in each group. As a result, approximate answers based on uniform random samples of the data can result in poor accuracy for groups with very few data items, since such groups will be represented in the sample by very few (often zero) tuples. In this paper, we propose a general class of techniques for obtaining fast, highly-accurate answers for group-by queries. These techniques rely on precomputed non-uniform (biased) samples of the data. In particular, we proposecongressional samples, ahybrid union of uniform and biased samples. Given a xed amount of space, congressional samples seek to maximize the accuracy for all possible group-by queries on a set of columns. We present a one pass algorithm for constructing a congressional sample and use this technique to also incrementally maintain the sample up-to-date without accessing the base relation. We also evaluate query rewriting strategies for providing approximate answers from congressional samples. Finally, we conduct an extensive set of experiments on the TPC-D database, which demonstrates the e cacy of the techniques proposed. 1
Approximating Multi-Dimensional Aggregate Range Queries Over Real Attributes
, 2000
"... Finding approximate answers to multi-dimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper we consider the following problem: given a table of d attributes whose domain is the real numbers, and a quer ..."
Abstract
-
Cited by 70 (8 self)
- Add to MetaCart
Finding approximate answers to multi-dimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper we consider the following problem: given a table of d attributes whose domain is the real numbers, and a query that specifies a range in each dimension, find a good approximation of the number of records in the table that satisfy the query. We present a new histogram technique that is designed to approximate the density of multi-dimensional datasets with real attributes. Our technique finds buckets of variable size, and allows the buckets to overlap. Overlapping buckets allow more efficient approximation of the density. The size of the cells is based on the local density of the data. This technique leads to a faster and more compact approximation of the data distribution. We also show how to generalize kernel density estimators, and how to apply them on the multi-dimensional query approxim...
Histogram-Based Approximation of Set-Valued Query Answers
- In Proceedings of the 25th VLDB Conference
, 1999
"... Answering queries approximately has recently been proposed as a way to reduce query response times in on-line decision support systems, when the precise answer is not necessary or early feedback is helpful. Most of the work in this area uses sampling-based techniques and handles aggregate quer ..."
Abstract
-
Cited by 67 (2 self)
- Add to MetaCart
Answering queries approximately has recently been proposed as a way to reduce query response times in on-line decision support systems, when the precise answer is not necessary or early feedback is helpful. Most of the work in this area uses sampling-based techniques and handles aggregate queries, ignoring queries that return relations as answers. In this paper, we extend the scope of approximate query answering to general queries. We propose a novel and intuitive error measure for quantifying the error in an approximate query answer, which can be a multiset in general.
The History of Histograms (abridged)
- PROC. OF VLDB CONFERENCE
, 2003
"... The history of histograms is long and rich, full of detailed information in every step. It includes the course of histograms in diFFerent scientific fields, the successes and failures of histograms in approximating and compressing information, their adoption by industry, and solutions that hav ..."
Abstract
-
Cited by 67 (0 self)
- Add to MetaCart
The history of histograms is long and rich, full of detailed information in every step. It includes the course of histograms in diFFerent scientific fields, the successes and failures of histograms in approximating and compressing information, their adoption by industry, and solutions that have been given on a great variety of histogram-related problems. In this paper and in the same spirit of the histogram techniques themselves, we compress their entire history (including their "future history" as currently anticipated) in the given/fixed space budget, mostly recording details for the periods, events, and results with the highest (personally-biased) interest. In a limited set of experiments, the semantic distance between the compressed and the full form of the history was found relatively small!
Progressive Approximate Aggregate Queries with a Multi-Resolution Tree Structure
, 2001
"... Answering aggregate queries like SUM, COUNT, MIN, MAX, AVG in an approximate manner is often desirable when the exact answer is not needed or too costly to compute. We present an algorithm for answering such queries in multi-dimensional databases, using selective traversal of a Multi-Resolution Aggr ..."
Abstract
-
Cited by 64 (8 self)
- Add to MetaCart
Answering aggregate queries like SUM, COUNT, MIN, MAX, AVG in an approximate manner is often desirable when the exact answer is not needed or too costly to compute. We present an algorithm for answering such queries in multi-dimensional databases, using selective traversal of a Multi-Resolution Aggregate (MRA) tree structure storing point data. Our approach provides 100% intervals of confidence on the value of the aggregate and works iteratively, coming up with improving quality answers, until some error requirement is satisfied or time constraint is reached. Using the same technique we can also answer aggregate queries exactly and our experiments indicate that even for exact answering the proposed data structure and algorithm are very fast. 1 Introduction We deal with the problem of answering aggregate queries in a multi-dimensional space containing point data items. The data space is R space ` ! d where d is the dimensionality. Data items are pairs (loc; values) where loc 2 R spa...
Semantically-Smart Disk Systems
, 2003
"... We propose and evaluate the concept of a semantically-smart disk system (SDS). As opposed to a traditional "smart" disk, an SDS has detailed knowledge of how the file system above is using the disk system, including information about the on-disk data structures of the file system. An SDS exploits th ..."
Abstract
-
Cited by 64 (14 self)
- Add to MetaCart
We propose and evaluate the concept of a semantically-smart disk system (SDS). As opposed to a traditional "smart" disk, an SDS has detailed knowledge of how the file system above is using the disk system, including information about the on-disk data structures of the file system. An SDS exploits this knowledge to transparently improve performance or enhance functionality beneath a standard block read/write interface. To automatically acquire this knowledge, we introduce a tool (EOF) that can discover file-system structure for certain types of file systems, and then show how an SDS can exploit this knowledge on-line to understand file-system behavior. We quantify the space and time overheads that are common in an SDS, showing that they are not excessive. We then study the issues surrounding SDS construction by designing and implementing a number of prototypes as case studies; each case study exploits knowledge of some aspect of the file system to implement powerful functionality beneath the standard SCSI interface. Overall, we find that a surprising amount of functionality can be embedded within an SDS, hinting at a future where disk manufacturers can compete on enhanced functionality and not simply cost-per-byte and performance.

