Results 1 -
3 of
3
Lightweight graphical models for selectivity estimation without independence assumptions
- PVLDB
, 2011
"... As aresultofdecadesofresearchandindustrialdevelopment, modern query optimizers are complex software artifacts. However, the quality of the query plan chosen by an optimizer is largely determined by the qualityoftheunderlyingstatistical summaries. Small selectivity estimation errors, propagated expon ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
As aresultofdecadesofresearchandindustrialdevelopment, modern query optimizers are complex software artifacts. However, the quality of the query plan chosen by an optimizer is largely determined by the qualityoftheunderlyingstatistical summaries. Small selectivity estimation errors, propagated exponentially, can lead to severely sub-optimal plans. Modern optimizers typically maintain one-dimensional statisticalsummariesand make the attribute value independence and join uniformity assumptions for efficiently estimating selectivities. Therefore, selectivity estimation errors in today’s optimizers are frequently caused by missed correlations between attributes. We present a selectivity estimation approach that does not make the independence assumptions. By carefully using concepts from the field of graphical models, we are able to factor the joint probability distributionof allthe attributes inthe databaseintosmall, usuallytwo-dimensionaldistributions. Wedescribe several optimizations that can make selectivity estimation highly efficient, and we present a complete implementation inside PostgreSQL’s query optimizer. Experimental results indicate an order of magnitude better selectivity estimates, while keeping optimizationtime inthe range of tens of milliseconds.
Partitioning Techniques for Fine-grained Indexing
"... Abstract — Many data-intensive websites use databases that grow much faster than the rate that users access the data. Such growing datasets lead to ever-increasing space and performance overheads for maintaining and accessing indexes. Furthermore, there is often considerable skew with popular users ..."
Abstract
- Add to MetaCart
Abstract — Many data-intensive websites use databases that grow much faster than the rate that users access the data. Such growing datasets lead to ever-increasing space and performance overheads for maintaining and accessing indexes. Furthermore, there is often considerable skew with popular users and recent data accessed much more frequently. These observations led us to design Shinobi, a system which uses horizontal partitioning as a mechanism for improving query performance to cluster the physical data, and increasing insert performance by only indexing data that is frequently accessed. We present database design algorithms that optimally partition tables, drop indexes from partitions that are infrequently queried, and maintain these partitions as workloads change. We show a 60 × performance improvement over traditionally indexed tables using a real-world query workload derived from a traffic monitoring application I.
Data Generation for Application-Specific Benchmarking ∗
"... The Transaction Processing Council (TPC) has played a pivotal role in the database industry’s growth over the last twenty-five years. However, its handful of domain-specific benchmarks are increasingly irrelevant to the multitude of data-centric applications, and its top-down process is slow. This m ..."
Abstract
- Add to MetaCart
The Transaction Processing Council (TPC) has played a pivotal role in the database industry’s growth over the last twenty-five years. However, its handful of domain-specific benchmarks are increasingly irrelevant to the multitude of data-centric applications, and its top-down process is slow. This mismatch calls for a paradigm shift to a bottomup community effort to develop tools for application-specific benchmarking. Such a development program would center around techniques for synthetically scaling (up or down) an empirical dataset. This engineering effort in turn requires the development of a database theory on attribute value correlation. 1.

