## On the computation of multidimensional aggregates (1996)

### Cached

### Download Links

Venue: | IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES |

Citations: | 218 - 18 self |

### BibTeX

@INPROCEEDINGS{Agarwal96onthe,

author = {Sameet Agarwal and Rakesh Agrawal and Prasad M. Deshpande and Ashish Gupta and Jeffrey F. Naughton and Raghu Ramakrishnan and Sunita Sarawagi},

title = {On the computation of multidimensional aggregates},

booktitle = {IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES},

year = {1996},

pages = {506--521},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. This paper presents fast algorithms for computing a collection of groupbys. We focus on a special case of the aggregation problem -- computation of the CUBE operator. The CUBE operator requires computing group-bys on all possible combinations of a list of attributes, and is equivalent to the union of a number of standard group-by operations. We show howthe structure of CUBE computation can be viewed in terms of a hierarchy of group-by operations. Our algorithms extend sort-based and hash-based grouping methods with several optimizations, like combining common operations across multiple group-bys, caching, and using pre-computed group-bys for computing other group-bys. Empirical evaluation shows that the resulting algorithms give much better performance compared to straightforward methods. This paper combines work done concurrently on computing the data cube by two different teams as reported in [SAG96] and [DANR96].

### Citations

2365 |
An introduction to probability theory and its applications. Vol. II. Second edition
- Feller
- 1971
(Show Context)
Citation Context ... distinct values. A number of statistical procedures (e.g., [HNSS95]) can be used for this purpose. The input to the algorithm is the search lattice de ned as follows. Search Lattice A search lattice =-=[HRU96]-=- for a data cube is a graph where a vertex represents a group-by of the cube. A directed edge connects group-by i to group-by j whenever j can be generated from i and j has exactly one attribute less ... |

1746 | An I..ntroduction to Probability Theory - FELLER - 1966 |

1372 |
Combinatorial Optimization: algorithms and complexity
- Papadimitriou, Steiglitz
- 1982
(Show Context)
Citation Context ...N \Gamma 1, where N is the total number of attributes. For each level k, it finds the best way of computing level k from level k + 1 by reducing the problem to a weighted bipartite matching problem 2 =-=[PS82]-=- as follows. 2 The weighted bipartite matching problems is defined as follows: We are given a graph with two disjoint sets of vertices V 1 and V2 and a set of edges E that connect vertices in set V 1 ... |

699 | Query evaluation techniques for large databases
- Graefe
- 1993
(Show Context)
Citation Context ...e some aggregate functions (holistic functions of [GBLP96]) e.g., median, that cannot be computed in parts and combined. Related Work Methods of computing single group-bys have been well-studied (see =-=[Gra93]-=- for a survey), but little work has been done on optimizing a collection of related aggregates. [GBLP96] gives some rules of thumb to be used in an efficient implementation of the cube operator. These... |

503 | Implementing data cube efficiently
- Harinarayan, Rajaraman, et al.
- 1996
(Show Context)
Citation Context ...distinct values. A number of statistical procedures (e.g., [HNSS95]) can be used for this purpose. The input to the algorithm is the search lattice defined as follows. Search Lattice A search lattice =-=[HRU96]-=- for a data cube is a graph where a vertex represents a group-by of the cube. A directed edge connects group-by i to group-by j whenever j can be generated from i and j has exactly one attribute less ... |

123 | Sampling-based estimation of the number of distinct values of an attribute
- Haas, Naughton, et al.
- 1995
(Show Context)
Citation Context ...only one tuple per group-by in the pipeline in memory. Algorithm PipeSort Assume that for each group-by we have an estimate of the number of distinct values. A number of statistical procedures (e.g., =-=[HNSS95]-=-) can be used for this purpose. The input to the algorithm is the search lattice defined as follows. Search Lattice A search lattice [HRU96] for a data cube is a graph where a vertex represents a grou... |

71 | Abbadi. The dynamic data cube
- Geffner, Agrawal, et al.
(Show Context)
Citation Context ... the resulting algorithms give much better performance compared to straightforward methods. This paper combines work done concurrently on computing the data cube by two different teams as reported in =-=[SAG96]-=- and [DANR96]. 1 Introduction The group-by operator in SQL is typically used to compute aggregates on a set of attributes. For busiPermission to copy without fee all or part of this material is grante... |

68 |
Statistical databases: Characteristics, problems and some solutions
- Shoshani
- 1982
(Show Context)
Citation Context ...oup-bys to pre-compute and index; [SR96] and [JS96] discuss methods for indexing pre-computed summaries to allow efficient querying. Aggregate pre-computation is quite common in statistical databases =-=[Sho82]-=-. Research in this area has considered various aspects of the problem starting from developing a model for aggregate computation [CM89], indexing pre-computed aggregates [STL89] and incrementally main... |

50 | Naughton: Adaptive Parallel Aggregation Algorithms - Shatdal, Jeffrey - 1995 |

41 |
Data cube: A relational operator generalizing group-by, cross-tab and sub-totals
- Gray, Bosworth, et al.
- 1996
(Show Context)
Citation Context ...ttributes. Speed is critical for this precomputation as well, since the cost and speed of precomputation influences how frequently the aggregates are brought up-to-date. 1.1 What is a CUBE? Recently, =-=[GBLP96]-=- introduced the CUBE operator for conveniently supporting multiple aggregates in OLAP databases. The CUBE operator is the ndimensional generalization of the group-by operator. It computes group-bys co... |

26 | Hierarchically Split Cube Forests for Decision Support: Description and Tuned Design,” working paper
- Johnson, Shasha
- 1996
(Show Context)
Citation Context ...ere are reports of on-going research related to the data cube in directions complementary to ours: [HRU96, GHRU96] presents algorithms for deciding what group-bys to pre-compute and index; [SR96] and =-=[JS96]-=- discuss methods for indexing pre-computed summaries to allow efficient querying. Aggregate pre-computation is quite common in statistical databases [Sho82]. Research in this area has considered vario... |

23 |
The Data Model and Access Method of Summary Data Management
- Chen, McNamee
- 1989
(Show Context)
Citation Context ...regate pre-computation is quite common in statistical databases [Sho82]. Research in this area has considered various aspects of the problem starting from developing a model for aggregate computation =-=[CM89]-=-, indexing pre-computed aggregates [STL89] and incrementally maintaining them [Mic92]. However, to the best of our knowledge, there is no published work in the statistical database literature on metho... |

17 |
Sort versus hash revisited
- Graefe, Linville, et al.
- 1994
(Show Context)
Citation Context ...and PipeHash For the datasets in Table 1, the sort-based method performs better than the hash-based method. For Dataset-D, PipeSort is almost a factor of two better than PipeHash. Based on results in =-=[GLS94]-=-, we had expected the hash-based method to be comparable or better than the sort-based method. Careful scrutiny of the performance data revealed that this deviation is because after some parent group-... |

16 |
TBSAM: An access method for efficient processing of statistical queries
- Srivastava, Tan, et al.
- 1989
(Show Context)
Citation Context ... statistical databases [Sho82]. Research in this area has considered various aspects of the problem starting from developing a model for aggregate computation [CM89], indexing pre-computed aggregates =-=[STL89]-=- and incrementally maintaining them [Mic92]. However, to the best of our knowledge, there is no published work in the statistical database literature on methods for optimizing the computation of relat... |

13 | Statistical and Scientific Databases - Michalewicz - 1992 |

9 | Indexing for aggregation
- Salzberg, Reuter
- 1996
(Show Context)
Citation Context ...GBLP96]. There are reports of on-going research related to the data cube in directions complementary to ours: [HRU96, GHRU96] presents algorithms for deciding what group-bys to pre-compute and index; =-=[SR96]-=- and [JS96] discuss methods for indexing pre-computed summaries to allow efficient querying. Aggregate pre-computation is quite common in statistical databases [Sho82]. Research in this area has consi... |

8 | Understanding the Need for On-Line Analytical Servers - Finkelstein - 1995 |

6 | Managing Multidimensional Data: Harnessing the Power - Weldon - 1995 |

4 | Statistical and Scienti c Databases - Michalewicz - 1992 |

4 |
TBSAM: An access method for e cient processing of statistical queries
- Srivastava, Tan, et al.
- 1989
(Show Context)
Citation Context ... statistical databases [Sho82]. Research in this area has considered various aspects of the problem starting from developing a model for aggregate computation [CM89], indexing pre-computed aggregates =-=[STL89]-=- and incrementally maintaining them [Mic92]. However, to the best of our knowledge, there is no published work in the statistical database literature on methods for optimizing the computation of relat... |

3 |
Ramakrishnan “Computation of Multidimensional Aggregates
- Agarwal, R
(Show Context)
Citation Context ...erlap method. Computations of different cuboids are overlapped and all cuboids are computed in sorted order. In this paper we give only a short description of our method. More details can be found in =-=[DANR96]-=-. We first define some terms which will be used frequently. Sorted Runs : Consider a cuboid on j attributes fA 1 ; A 2 ; : : : ; A j g. We use (A 1 ; A 2 ; : : : ; A j ) to denote the cuboid sorted on... |

2 |
Providing OLAP: An
- Codd
- 1993
(Show Context)
Citation Context ...akdown of sales by customer. ffl sum of sales by P: For each product, give total sales. Speed is a primary goal in these class of applications called On-Line Analytical Processing (OLAP) applications =-=[CODD93]-=-. To make interactive analysis (response time in seconds) possible, OLAP databases often precompute aggregates at various levels of detail and on various combinations of attributes. Speed is critical ... |

1 | Techniques for Processing of Aggregates in Relational Database Systems - Epsteinr - 1979 |

1 | Naughton and Karthik Ramasamy. Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies - Shukla, Deshpande, et al. - 1996 |

1 |
Rajaraman and Je Ullman. Implementing Data Cubes E ciently
- Harinarayan, Anand
- 1996
(Show Context)
Citation Context ...f distinct values. Anumber of statistical procedures (e.g., [HNSS95]) can be used for this purpose. The input to the algorithm is the search lattice de ned as follows. Search Lattice A search lattice =-=[HRU96]-=- for a data cube is a graph where a vertex represents a group-by of the cube. A directed edge connects group-by i to group-by j whenever j can be generated from i and j has exactly one attribute less ... |

1 | Venky Harinarayan, Anand Rajaraman and Je - Gupta - 1996 |

1 | Je rey F. Naughton and Karthik Ramasamy. Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies - Shukla, Deshpande - 1996 |