Abstract:
At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. This paper presents fast algorithms for computing a collection of groupbys. We focus on a special case of the aggregation problem --- computation of the CUBE operator. The CUBE operator requires computing group-bys on all possible combinations of a list of attributes, and is equivalent to the union of a number of standard group-by operations. We show how the structure of CUBE computation can be viewed in terms of a hierarchy of group-by operations. Our algorithms extend sort-based and hashbased grouping methods with several optimizations, like combining common operations across multiple group-bys, caching, and using pre-computed group-bys for computing other group-bys. Empirical evaluation shows that the resulting algorithms give much better performanc...
Citations
|
1040
|
An Introduction to Probability Theory and Its Applications, Volume I, 3rd Edition
– Feller
- 1968
|
|
853
|
Combinatorial Optimization: Algorithms and Complexity
– Papadimitriou, Steiglitz
- 1982
|
|
718
|
An Introduction to Probability Theory and Its
– Feller
- 1971
|
|
544
|
Query evaluation techniques for large databases
– Graefe
- 1993
|
|
385
|
Implementing data cubes efficiently
– Harinarayan, Rajaraman, et al.
- 1996
|
|
91
|
Sampling-based estimation of the number of distinct values of an attribute
– Haas, Naughton, et al.
- 1995
|
|
65
|
On computing the data cube
– Sarawagi, Agrawal, et al.
- 1996
|
|
55
|
Statistical databases: Characteristics, problems and some solutions
– Shoshani
- 1982
|
|
31
|
Adaptive Parallel Aggregation Algorithms
– Shatdal, Naughton
- 1995
|
|
29
|
Data Cube: A Relational Operator Generalizing Group-By, CrossTab and Sub-Totals
– Gray, Bosworth, et al.
- 1996
|
|
26
|
Hierarchically split cube forests for decision support: description and tuned design
– Johnson, Shasha
- 1996
|
|
22
|
The data model and access method of summary data management
– Chen, McNamee
- 1989
|
|
16
|
TBSAM: An access method for efficient processing of statistical queries
– Srivastava, Tan, et al.
- 1989
|
|
14
|
Sort versus hash revisited
– Graefe, Linville, et al.
- 1994
|
|
13
|
Statistical and Scientific Databases
– Michalewicz, ed
- 1991
|
|
10
|
Indexing for aggregation
– Salzberg, Reuter
- 1996
|
|
8
|
Understanding the Need for On-Line Analytical Servers
– Finkelstein
- 1995
|
|
4
|
Managing Multidimensional Data: Harnessing the Power
– Weldon
- 1995
|
|
2
|
Naughton and Raghu Ramakrishnan. Computation of Multidimensional Aggregates
– Deshpande, Agarwal, et al.
- 1996
|
|
2
|
Providing OLAP: An
– Codd
- 1993
|
|
2
|
Statistical and Scienti c Databases
– Michalewicz
- 1992
|
|
2
|
TBSAM: An access method for e cient processing of statistical queries
– Srivastava, Tan, et al.
- 1989
|
|
1
|
Techniques for Processing of Aggregates in Relational Database Systems
– Epsteinr
- 1979
|
|
1
|
Naughton and Karthik Ramasamy. Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies
– Shukla, Deshpande, et al.
- 1996
|
|
1
|
Rajaraman and Je Ullman. Implementing Data Cubes E ciently
– Harinarayan, Anand
- 1996
|
|
1
|
Venky Harinarayan, Anand Rajaraman and Je
– Gupta
- 1996
|
|
1
|
Je rey F. Naughton and Karthik Ramasamy. Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies
– Shukla, Deshpande
- 1996
|