Abstract:
A data cube is a popular organization for summary data. A cube is simply a multidimensional structure that contains at each point an aggregate value, i.e., the result of applying an aggregate function to an underlying relation. In practical situations, cubes can require a large amount of storage. The typical approach to reducing storage cost is to materialize parts of the cube on demand. Unfortunately, this lazy evaluation can be a time-consuming operation. In this paper, we propose an approximation technique that reduces the storage cost of the cube without incurring the run time cost of lazy evaluation. The idea is to characterize regions of the cube by using statistical models whose description take less space than the data itself. Then, the model parameters can be used to estimate the cube cells with a certain level of accuracy. To increase the accuracy, some of the "outliers," i.e., cells that incur in the largest errors when estimated can be retained. The storage taken by the mod...
Citations
|
3011
|
Pattern Classification and Scene Analysis
– Duda, Hart
- 1973
|
|
1588
|
A theory for multiresolution signal decomposition: The wavelet representation
– Mallat
- 1989
|
|
1182
|
Orthonormal bases of compactly supported wavelets
– Daubechies
- 1988
|
|
970
|
Principal Component Analysis
– Jolliffe
- 1986
|
|
529
|
Data Cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals
– Gray, Bosworth, et al.
- 1996
|
|
500
|
Categorical Data Analysis
– Agresti
- 1990
|
|
457
|
Linear Algebra and Its Applications
– Strang
- 1976
|
|
385
|
Implementing data cubes efficiently
– Harinarayan, Rajaraman, et al.
- 1996
|
|
244
|
Online aggregation
– Hellerstein, Haas, et al.
- 1997
|
|
215
|
Research problems in data warehousing
– Widom
- 1995
|
|
166
|
Non-negative Matrices and Markov Chains
– Seneta
- 1981
|
|
158
|
Latent semantic indexing: A probabilistic analysis
– Papadimitriou, Tamaki, et al.
- 1998
|
|
85
|
Adaptive Selectivity Estimation Using Query Feedback
– Roussopoulos
- 1994
|
|
81
|
Efficiently supporting ad hoc queries in large datasets of time sequences
– Korn, Jagadish
- 1997
|
|
79
|
Latent semantic indexing (LSI) and TREC-2
– Dumais
- 1994
|
|
48
|
Indexing OLAP data
– Sarawagi
- 1997
|
|
48
|
Recursive Estimation and Time-Series Analysis
– Young
- 1984
|
|
45
|
relational and multidimensional database systems
– OLAP
- 1996
|
|
24
|
Introductory Statistics
– Wonnacott, Wonnacott
- 1972
|
|
20
|
Information retrieval from an incomplete data cube
– Dyreson
- 1996
|
|
18
|
Some approaches to index design for cube forests
– Johnson, Shasha
- 1997
|
|
5
|
Approximate Query Processing with Summary Tables in Statistical Databases
– Abad-Mota
- 1992
|
|
5
|
Bit string compressor with boolean operation processing capability
– Glaser, DesJardins, et al.
- 1991
|
|
3
|
The Data Warehouse Toolkit: How to Design Dimensional Data Warehouses
– Kimball
- 1996
|
|
3
|
Fast Computations of Sparse Cubes
– Srivastava, Ross
- 1997
|
|
2
|
Technology Group. Designing the Data Warehouse on Relational Databases. White Paper
– Stanford
|