Results 11 - 20
of
72
The History of Histograms (abridged)
- PROC. OF VLDB CONFERENCE
, 2003
"... The history of histograms is long and rich, full of detailed information in every step. It includes the course of histograms in diFFerent scientific fields, the successes and failures of histograms in approximating and compressing information, their adoption by industry, and solutions that hav ..."
Abstract
-
Cited by 67 (0 self)
- Add to MetaCart
The history of histograms is long and rich, full of detailed information in every step. It includes the course of histograms in diFFerent scientific fields, the successes and failures of histograms in approximating and compressing information, their adoption by industry, and solutions that have been given on a great variety of histogram-related problems. In this paper and in the same spirit of the histogram techniques themselves, we compress their entire history (including their "future history" as currently anticipated) in the given/fixed space budget, mostly recording details for the periods, events, and results with the highest (personally-biased) interest. In a limited set of experiments, the semantic distance between the compressed and the full form of the history was found relatively small!
Multi-dimensional Selectivity Estimation Using Compressed Histogram
- In SIGMOD
, 1999
"... The database query optimizer requires the estimation of the query selectivity to find the most efficient access plan. For queries referencing multiple attributes from the same relation, we need a multi-dimensional selectivity estimation technique when the attributes are dependent each other because ..."
Abstract
-
Cited by 51 (1 self)
- Add to MetaCart
The database query optimizer requires the estimation of the query selectivity to find the most efficient access plan. For queries referencing multiple attributes from the same relation, we need a multi-dimensional selectivity estimation technique when the attributes are dependent each other because the selectivity is determined by the joint data distribution of the attributes. Additionally, for multimedia databases, there are intrinsic requirements for the multi-dimensional selectivity estimation because feature vectors are stored in multi-dimensional indexing trees. In the 1-dimensional case, a histogram is practically the most preferable. In the multi-dimensional case, however, a histogram is not adequate because of high storage overhead and high error rates. In this paper, we propose a novel approach for the multidimensional selectivity estimation. Compressed information from a large number of small-sized histogram buckets is maintained using the discrete cosine transform. This ena...
Materialized Views and Data Warehouses
- SIGMOD Record
, 1998
"... A data warehouse is a redundant collection of data replicated from several possibly distributed and loosely coupled source databases, organized to answer OLAP queries. Relational views are used both as a speci cation technique and as an execution plan for the derivation of the warehouse data. In thi ..."
Abstract
-
Cited by 37 (8 self)
- Add to MetaCart
A data warehouse is a redundant collection of data replicated from several possibly distributed and loosely coupled source databases, organized to answer OLAP queries. Relational views are used both as a speci cation technique and as an execution plan for the derivation of the warehouse data. In this position paper, we summarize the versatility of relational views and their potential. 1
Adapting to Source Properties in Processing Data Integration Queries
- In Proc. of the 2004 ACM SIGMOD Intl. Conf. on Management of Data
, 2004
"... An effective query optimizer finds a query plan that exploits the characteristics of the source data. In data integration, little is known in advance about sources' properties, which necessitates the use of adaptive query processing techniques to adjust query processing on-the-fly. Prior work in ada ..."
Abstract
-
Cited by 35 (3 self)
- Add to MetaCart
An effective query optimizer finds a query plan that exploits the characteristics of the source data. In data integration, little is known in advance about sources' properties, which necessitates the use of adaptive query processing techniques to adjust query processing on-the-fly. Prior work in adaptive query processing has focused on compensating for delays and adjusting for mis-estimated cardinality or selectivity values. In this paper, we present a generalized architecture for adaptive query processing and introduce a new technique, called adaptive data partitioning (ADP), which is based on the idea of dividing the source data into regions, each executed by different, complementary plans. We show how this model can be applied in novel ways to not only correct for underestimated selectivity and cardinality values, but also to discover and exploit order in the source data, and to detect and exploit source data that can be effectively pre-aggregated. We experimentally compare a number of alternative strategies and show that our approach is effective.
Recovering Information from Summary Data
- IN PROC. 23RD INTERNATIONAL CONF. ON VERY LARGE DATA BASES
, 1997
"... Data is often stored in summarized form, as a histogram of aggregates (COUNTs, SUMs, or AVeraGes) over specified ranges. We study how to estimate the original detail data from the stored summary. We formulate this task as an inverse problem, specifying a well-defined cost function that has to be op ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
Data is often stored in summarized form, as a histogram of aggregates (COUNTs, SUMs, or AVeraGes) over specified ranges. We study how to estimate the original detail data from the stored summary. We formulate this task as an inverse problem, specifying a well-defined cost function that has to be optimized under constraints. We show that our formulation includes the uniformity and independence assumptions as a special case, and that it can achieve better reconstruction results if we maximize the smoothness as opposed to the uniformity. In our experiments on real and synthetic datasets, the proposed method almost consistently outperforms its competitor, improving the root-mean-square error by up to 20 per cent for stock price data, and up to 90 per cent for smoother data sets. Finally, we show how to apply this theory to a variety of database problems that involve partial information, such as OLAP, data warehousing and histograms in query optimization.
Spatio-Temporal Aggregation Using Sketches
- In ICDE
, 2004
"... Several spatio-temporal applications require the retrieval of summarized information about moving objects that lie in a query region during a query interval (e.g., the number of mobile users covered by a cell, traffic volume in a district, etc.). Existing solutions have the distinct counting problem ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
Several spatio-temporal applications require the retrieval of summarized information about moving objects that lie in a query region during a query interval (e.g., the number of mobile users covered by a cell, traffic volume in a district, etc.). Existing solutions have the distinct counting problem: if an object remains in the query region for several timestamps during the query interval, it will be counted multiple times in the result. The paper solves this problem by integrating spatio-temporal indexes with sketches, traditionally used for approximate query processing. The proposed techniques can also be applied to reduce the space requirements of conventional spatiotemporal data and to mine spatio-temporal association rules.
Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-Size Estimation
- VLDB CONFERENCE
, 1999
"... This paper aims to improve the accuracy of query result-size estimations in query optimizers by leveraging the dynamic feedback obtained from observations on the executed query workload. To this end, an approximate "synopsis" of data-value distributions is devised that combines histograms with para ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
This paper aims to improve the accuracy of query result-size estimations in query optimizers by leveraging the dynamic feedback obtained from observations on the executed query workload. To this end, an approximate "synopsis" of data-value distributions is devised that combines histograms with parametric curve fitting, leading to a specific class of linear splines. The approach reconciles the benefits of histograms, simplicity and versatility, with those of parametric techniques especially the adaptivity to statistically biased and dynamically evolving query workloads. The paper
A Comparison of Selectivity Estimators for Range Queries on Metric Attributes
- In Proceedings of the ACM SIGMOD Conference
, 1999
"... In this paper, we present a comparison of nonparametric esti-mation methods for computing approximations of the selec-tivities of queries, in particular range queries. In contrast to previous studies, the focus of our comparison is on metric attributes with large domains which occur for example in s ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
In this paper, we present a comparison of nonparametric esti-mation methods for computing approximations of the selec-tivities of queries, in particular range queries. In contrast to previous studies, the focus of our comparison is on metric attributes with large domains which occur for example in spatial and temporal databases. We also assume that only small sample sets of the required relations are available for estimating the selectivity. In addition to the popular histo-gram estimators, our comparison includes so-called kernel estimation methods. Although these methods have been proven to be among the most accurate estimators known in statistics, they have not been considered for selectivity esti-mation of database queries, so far. We first show how to gen-erate kernel estimators that deliver accurate approximate selectivities of queries. Thereafter, we reveal that two param-eters, the number of samples and the so-called smoothing parameter, are important for the accuracy of both kernel esti-mators and histogram estimators. For histogram estimators, the smoothing parameter determines the number of bins (his-togram classes). We first present the optimal smoothing parameter as a function of the number of samples and show how to compute approximations of the optimal parameter. Moreover, we propose a new selectivity estimator that can be viewed as an hybrid of histogram and kernel estimators. Experimental results show the performance of different esti-mators in practice. We found in our experiments that kernel estimators are most efficient for continuously distributed data sets, whereas for our real data sets the hybrid technique is most promising. 1.
Dynamic Query Re-Optimization
- In SSDBM
, 1999
"... Very long-running queries in database systems are not uncommon in non-traditional application domains such as image processing or data warehousing analysis. Query optimization, therefore, is important. However, estimates of the query characteristics before query execution are usually inaccurate. Fur ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Very long-running queries in database systems are not uncommon in non-traditional application domains such as image processing or data warehousing analysis. Query optimization, therefore, is important. However, estimates of the query characteristics before query execution are usually inaccurate. Further, system configuration and resource availability may change during long evaluation period. As a result, queries are often evaluated with sub-optimal plan configurations. To remedy this situation, we have designed a novel approach to re-optimize suboptimal query plan configurations onthe -fly with Conquest --- an extensible and distributed query processing system. A dynamic optimizer considers reconfiguration cost as well as execution cost in determining the best query plan configuration. Experimental results are presented. 1 Introduction Parallelism is important in today's database query processing. Very long-running queries require parallel processing to deliver reasonable performance ...
Self-tuning database systems: A decade of progress
- in VLDB, 2007
"... In this paper we discuss advances in self-tuning database systems over the past decade, based on our experience in the AutoAdmin project at Microsoft Research. This paper primarily focuses on the problem of automated physical database design. We also highlight other areas where research on self-tuni ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
In this paper we discuss advances in self-tuning database systems over the past decade, based on our experience in the AutoAdmin project at Microsoft Research. This paper primarily focuses on the problem of automated physical database design. We also highlight other areas where research on self-tuning database technology has made significant progress. We conclude with our thoughts on opportunities and open issues. 1. HISTORY OF AUTOADMIN PROJECT Our VLDB 1997 paper [26] reported our first technical results from the AutoAdmin project that was started in Microsoft Research in the summer of 1996. The SQL Server product group at that time had taken on the ambitious task of redesigning the SQL Server code for their next release (SQL Server 7.0). Ease of use and elimination of knobs was a driving force for their design

