## Workload-based wavelet synopses (2003)

Citations: | 7 - 3 self |

### BibTeX

@TECHREPORT{Matias03workload-basedwavelet,

author = {Yossi Matias and Leon Portman},

title = {Workload-based wavelet synopses},

institution = {},

year = {2003}

}

### OpenURL

### Abstract

This paper introduces workload-based wavelet synopses, which exploit query workload information to significantly boost accuracy in approximate query processing. We show that wavelet synopses can adapt effectively to workload information, and that they have significant advantages over previous approaches. An important aspect of our approach is optimizing synopses constructions toward error metrics defined by workload information, rather than based on some uniform metrics. We present an adaptive greedy algorithm which is simple and efficient. It is run-time competitive to previous, non-workload based algorithms, and constructs workload-based wavelet synopses that are significantly more accurate than previous synopses. The algorithm also obtains improved accuracy for non-workload case when the error metric is the mean relative error. We also present a self-tuning algorithm that adapts the workload-based synopses to changes in the workload. All algorithms are extended to workload-based multidimensional wavelet synopses with improved performance over previous algorithms. Experimental results demonstrate the effectiveness of workload-based wavelet synopses for different types of data sets and query workloads, and show significant improvement in accuracy even with very small training sets. 1

### Citations

700 | The space complexity of approximating the frequency moments
- Alon, Matias, et al.
- 1999
(Show Context)
Citation Context ...research in data synopses, including random samples, various histograms, and wavelet synopses. 1sPrecomputed samples are based on the use of random samples as synopses for large data sets (see, e.g., =-=[13, 3, 1, 2, 8, 7]-=- and references therein). Histograms are commonly used in database applications to capture the distribution of the data stored in a database relation, and are used to guide selectivity estimation as w... |

236 | Improved Histograms for Selectivity Estimation of Range Predicates
- Poosala, Ioannidis, et al.
- 1996
(Show Context)
Citation Context ...e and some have been deployed in commercial RDBMSs. Some examples are equi-depth histograms (e.g., [1]), compressed histograms, v-optimal histograms, and maxdiff histograms; see the taxonomy given in =-=[25]-=-. Recently, wavelet-based histograms were introduced as a means for improved histogram accuracy [21, 30]. Wavelets are a mathematical tool for hierarchical decomposition of functions, whose adaptive n... |

210 | Wavelet-Based Histograms for Selectivity Estimation
- Matias, Vitter, et al.
- 1998
(Show Context)
Citation Context ...1]), compressed histograms, v-optimal histograms, and maxdiff histograms; see the taxonomy given in [25]. Recently, wavelet-based histograms were introduced as a means for improved histogram accuracy =-=[21, 30]-=-. Wavelets are a mathematical tool for hierarchical decomposition of functions, whose adaptive nature make them good candidate for a “lossy” data representation. Wavelets represent functions in terms ... |

205 |
New sampling-based summary statistics for improving approximate query answers
- Gibbons, Matias
- 1998
(Show Context)
Citation Context ...research in data synopses, including random samples, various histograms, and wavelet synopses. 1sPrecomputed samples are based on the use of random samples as synopses for large data sets (see, e.g., =-=[13, 3, 1, 2, 8, 7]-=- and references therein). Histograms are commonly used in database applications to capture the distribution of the data stored in a database relation, and are used to guide selectivity estimation as w... |

175 | Approximate Query Processing Using Wavelets
- Chakrabarti, Garofalakis, et al.
- 2000
(Show Context)
Citation Context ...to the nature of the underlying data sets. The use of wavelet-based synopses in databases has drawn increasing attention, with recent works on OLAP applications [29, 28], approximate query processing =-=[6]-=-, probabilistic wavelet synopses [11], and extensions to multiple measures [9]. While the above works deal primarily with building data synopses for given data sets, they do not consider the query dis... |

170 | Approximate Computation of Multidimensional Aggregates on Sparse Data Using Wavelets
- Vitter, Wang
- 1999
(Show Context)
Citation Context ...tograms by adapting their construction to the nature of the underlying data sets. The use of wavelet-based synopses in databases has drawn increasing attention, with recent works on OLAP applications =-=[29, 28]-=-, approximate query processing [6], probabilistic wavelet synopses [11], and extensions to multiple measures [9]. While the above works deal primarily with building data synopses for given data sets, ... |

143 | Join Synopses for Approximate Query Answering - Acharya, Gibbons, et al. - 1999 |

131 | Self-tuning histograms: Building histograms without looking at data
- Aboulnaga, Chaudhuri
- 1999
(Show Context)
Citation Context ...research in data synopses, including random samples, various histograms, and wavelet synopses. 1sPrecomputed samples are based on the use of random samples as synopses for large data sets (see, e.g., =-=[13, 3, 1, 2, 8, 7]-=- and references therein). Histograms are commonly used in database applications to capture the distribution of the data stored in a database relation, and are used to guide selectivity estimation as w... |

108 | Synopsis data structures for massive data sets
- Gibbons, Matias
- 1999
(Show Context)
Citation Context ...eans to address performance issues in massive data sets. Data synopses are concise representations of data sets, that are meant to effectively support approximate queries to the represented data sets =-=[14]-=-. A primary constraint of a data synopsis is its size, and its effectiveness is measured by the accuracy of the answers it provides, and by its build time and response time. Several different synopses... |

105 | STHoles: a multidimensional workloadaware histogram
- Bruno, Chaudhuri, et al.
(Show Context)
Citation Context ...rimarily with building data synopses for given data sets, they do not consider the query distribution with which the synopses will be used. In typical real-world scenarios, queries are heavily biased =-=[10, 1, 5]-=-. There have been several recent works that deal with workload-based samples and basic histograms [1, 5]. As argued in [8], identifying an appropriate precomputed sample that avoids large errors for a... |

92 | Data Cube Approximation and Histograms via Wavelets
- Vitter, Wang, et al.
- 1998
(Show Context)
Citation Context ...tograms by adapting their construction to the nature of the underlying data sets. The use of wavelet-based synopses in databases has drawn increasing attention, with recent works on OLAP applications =-=[29, 28]-=-, approximate query processing [6], probabilistic wavelet synopses [11], and extensions to multiple measures [9]. While the above works deal primarily with building data synopses for given data sets, ... |

91 | D.: Wavelets for computer graphics: a primer, part 2
- Stollnitz, DeRose, et al.
- 1995
(Show Context)
Citation Context ...ctions, whose adaptive nature make them good candidate for a “lossy” data representation. Wavelets represent functions in terms of a coarse overall shape, plus details that range from broad to narrow =-=[27]-=-, thus offering an elegant technique for representing the various levels of function details in a space-efficient manner. Wavelet-based histograms exhibit significant improvement over basic histograms... |

83 | Dynamic maintenance of wavelet-based histograms
- Matias, Vitter, et al.
- 2000
(Show Context)
Citation Context ...rithms through extensive experimentation using synthetic and real-life data sets and a variety of synthetic query workloads. Two natural extensions are to account for dynamic updates to the data sets =-=[22]-=-, and employing probabilistic methods [11]. The algorithms proposed in this paper are with respect to a model that defines a workload-based errormetric. This model presents important theoretical and o... |

69 | Wavelet synopses with error guarantees
- Garofalakis, Gibbons
- 2004
(Show Context)
Citation Context ... sets. The use of wavelet-based synopses in databases has drawn increasing attention, with recent works on OLAP applications [29, 28], approximate query processing [6], probabilistic wavelet synopses =-=[11]-=-, and extensions to multiple measures [9]. While the above works deal primarily with building data synopses for given data sets, they do not consider the query distribution with which the synopses wil... |

48 | R.: ICICLES: Self-Tuning Samples for Approximate Query Answering
- Ganti, Lee, et al.
(Show Context)
Citation Context ...rimarily with building data synopses for given data sets, they do not consider the query distribution with which the synopses will be used. In typical real-world scenarios, queries are heavily biased =-=[10, 1, 5]-=-. There have been several recent works that deal with workload-based samples and basic histograms [1, 5]. As argued in [8], identifying an appropriate precomputed sample that avoids large errors for a... |

44 | A robust, optimizationbased approach for approximate answering of aggregate queries
- Chaudhuri, Das, et al.
- 2001
(Show Context)
Citation Context |

43 | Overcoming limitations of sampling for aggregation queries
- Chaudhuri, Das, et al.
- 1999
(Show Context)
Citation Context |

41 |
Dynamic Sample Selection for Approximate Query Processing
- Babcock, Chaudhuri, et al.
- 2003
(Show Context)
Citation Context ...orkload. Chaudhuri et al [8] formulate the problem of precomputing a sample as an optimization problem, whose goal is to select a sample that minimizes the error for the given workload. Babcock et al =-=[4]-=- argue that for many aggregation queries, appropriately constructed biased (non-uniform) samples can provide more accurate approximations than a uniform sample. They also point out that optimal type o... |

37 | Extended wavelets for multiple measures
- Deligiannakis, Roussopoulos
- 2003
(Show Context)
Citation Context ...n databases has drawn increasing attention, with recent works on OLAP applications [29, 28], approximate query processing [6], probabilistic wavelet synopses [11], and extensions to multiple measures =-=[9]-=-. While the above works deal primarily with building data synopses for given data sets, they do not consider the query distribution with which the synopses will be used. In typical real-world scenario... |

36 |
Deterministic wavelet thresholding for maximum error metric
- Garofalakis, Kumar
- 2004
(Show Context)
Citation Context ...l for range queries. Indeed, while the the basic greedy-based synopsis is known to be optimal for point queries in the (non-workload) MSE metric, it is not optimal for range queries. Similarly, while =-=[12]-=- presented a synopsis that is optimal in the MAXRE metric for point queries, it was shown in [18] that synopsis could be significantly less accurate than the synopses built using our adaptive algorith... |

15 | Optimal workload-based weighted wavelet synopses”, Theor
- Matias, Urieli
- 2007
(Show Context)
Citation Context ...ot the case is when the Haar basis is orthonormal with respect to the inner product defined by the error metric of interest, where Parseval’s theorem implies that the greedy heuristic is optimal (see =-=[20, 19]-=- for an elaborated discussion). This is the case for the MSE metric and the uniform workload case, but is not the case for either other workloads or other error metrics. In the next section we address... |

12 | Approximate data structures with applications
- Matias, Vitter, et al.
- 1994
(Show Context)
Citation Context ... coefficient to be selected has the error contribution that is within a small ɛ approximation to the minimal coefficient at that stage. This can be implemented by using an approximated priority queue =-=[23, 17]-=-, for which every operation takes constant, or near constant time (for very small ɛ). The notion of approximation is as follows [23]: the operations are guaranteed to be consistent with the behavior o... |

12 |
Subquadratic algorithms for workload-aware Haar wavelet synopses
- MUTHUKRISHNAN
(Show Context)
Citation Context ...e the introduction of workload-based wavelet synopses in an earlier version of this work [15, 26] there has been some interesting progress in the research of regarding workload-based wavelet synopses =-=[20, 24]-=-. In [20] an efficient algorithm is presented for finding a weighted-haar basis and a synopsis that is optimal with respect to that basis, for a given workload of point queries. More recently, an opti... |

7 | τ-synopses: a system for run-time management of remote synopses
- Matias, Portman
- 2004
(Show Context)
Citation Context ... used for experimental results in previous methods and can provide a good reference point for comparison. The framework. All experimental results in this paper are obtained from the τ-Synopses system =-=[16]-=-. τ-Synopses is a system designed to provide a run-time environment for remote execution of various synopses. It enables easy registration of new synopses from remote platforms, after which the system... |

7 | Performance evaluation of approximate priority queues. Presented at DIMACS Fifth Implementation Challenge: Priority Queues, Dictionaries, and Point Sets, organized by
- Matias, Sahinalp, et al.
- 1996
(Show Context)
Citation Context ... coefficient to be selected has the error contribution that is within a small ɛ approximation to the minimal coefficient at that stage. This can be implemented by using an approximated priority queue =-=[23, 17]-=-, for which every operation takes constant, or near constant time (for very small ɛ). The notion of approximation is as follows [23]: the operations are guaranteed to be consistent with the behavior o... |

5 |
Approximation and Learning Techniques in Database Systems
- Wang
- 1999
(Show Context)
Citation Context ...1]), compressed histograms, v-optimal histograms, and maxdiff histograms; see the taxonomy given in [25]. Recently, wavelet-based histograms were introduced as a means for improved histogram accuracy =-=[21, 30]-=-. Wavelets are a mathematical tool for hierarchical decomposition of functions, whose adaptive nature make them good candidate for a “lossy” data representation. Wavelets represent functions in terms ... |

3 |
On the optimality of the greedy heuristic in wavelet synopses for range queries
- Matias, Urieli
- 2005
(Show Context)
Citation Context ...ot the case is when the Haar basis is orthonormal with respect to the inner product defined by the error metric of interest, where Parseval’s theorem implies that the greedy heuristic is optimal (see =-=[20, 19]-=- for an elaborated discussion). This is the case for the MSE metric and the uniform workload case, but is not the case for either other workloads or other error metrics. In the next section we address... |

1 | Improved implementation and experimental evaluation of the max-error optimized wavelet synopses
- Matias, Urieli
- 2004
(Show Context)
Citation Context ... point queries in the (non-workload) MSE metric, it is not optimal for range queries. Similarly, while [12] presented a synopsis that is optimal in the MAXRE metric for point queries, it was shown in =-=[18]-=- that synopsis could be significantly less accurate than the synopses built using our adaptive algorithm in the MAXRE metric, when tested for range queries. Thus, the main open problem remains to find... |