Results 1 - 10
of
23
Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals
, 1996
"... Abstract. Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. Applications need the N-dimensional generalization of these op ..."
Abstract
-
Cited by 630 (6 self)
- Add to MetaCart
Abstract. Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. Applications need the N-dimensional generalization of these operators. This paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, crosstabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The novelty is that cubes are relations. Consequently, the cube operator can be imbedded in more complex non-procedural data analysis programs. The cube operator treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensional cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. This paper (1) explains the cube and roll-up operators, (2) shows how they fit in SQL, (3) explains how users can define new aggregate functions for cubes, and (4) discusses efficient techniques to compute the cube. Many of these features are being added to the SQL Standard.
On the computation of multidimensional aggregates
- IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES
, 1996
"... At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. This paper presents fast algorithms for computing a collection ..."
Abstract
-
Cited by 189 (18 self)
- Add to MetaCart
At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. This paper presents fast algorithms for computing a collection of groupbys. We focus on a special case of the aggregation problem -- computation of the CUBE operator. The CUBE operator requires computing group-bys on all possible combinations of a list of attributes, and is equivalent to the union of a number of standard group-by operations. We show howthe structure of CUBE computation can be viewed in terms of a hierarchy of group-by operations. Our algorithms extend sort-based and hash-based grouping methods with several optimizations, like combining common operations across multiple group-bys, caching, and using pre-computed group-bys for computing other group-bys. Empirical evaluation shows that the resulting algorithms give much better performance compared to straightforward methods. This paper combines work done concurrently on computing the data cube by two different teams as reported in [SAG96] and [DANR96].
Power-Conserving Computation of Order-Statistics over Sensor Networks
- In PODS
, 2004
"... We study the problem of power-conserving computation of order statistics in sensor networks. Significant power-reducing optimizations have been devised for computing simple aggregate queries such as count, average, or max over sensor networks. In contrast, aggregate queries such as median have seen ..."
Abstract
-
Cited by 54 (0 self)
- Add to MetaCart
We study the problem of power-conserving computation of order statistics in sensor networks. Significant power-reducing optimizations have been devised for computing simple aggregate queries such as count, average, or max over sensor networks. In contrast, aggregate queries such as median have seen little progress over the brute force approach of forwarding all data to a central server. Moreover, battery life of current sensors seems largely determined by communication costs --- therefore we aim to minimize the number of bytes transmitted. Unoptimized aggregate queries typically impose extremely high power consumption on a subset of sensors located near the server. Metrics such as total communication cost underestimate the penalty of such imbalance: network lifetime may be dominated by the worst-case replacement time for depleted batteries.
Maintaining Data Cubes under Dimension Updates
, 1999
"... OLAP systems support data analysis through a multidimensional data model, according to which data facts are viewed as points in a space of application-related "dimensions", organized into levels which conform a hierarchy. The usual assumption is that the data points reflect the dynamic aspect of the ..."
Abstract
-
Cited by 39 (8 self)
- Add to MetaCart
OLAP systems support data analysis through a multidimensional data model, according to which data facts are viewed as points in a space of application-related "dimensions", organized into levels which conform a hierarchy. The usual assumption is that the data points reflect the dynamic aspect of the data warehouse, while dimensions are relatively static. However, in practice, dimension updates are often necessary to adapt the multidimensional database to changing requirements. Structural updates can also take place, like addition of categories or modification of the hierarchical structure. When these updates are performed, the materialized aggregate views that are typically stored in OLAP systems must be efficiently maintained. These updates are poorly supported (or not supported at all) in current commercial systems, and have received little attention in the research literature. We present a formal model of dimension updates in a multidimensional model, a collection of primitive opera...
Reasoning about Summarizability in Heterogeneous Multidimensional Schemas
- In IEEE ICDT
, 2001
"... . In OLAP applications, data are modeled as points in a multidimensional space. Dimensions themselves have structure, described by a schema and an instance; the schema is basically a directed acyclic graph of granularity levels, and the instance consists of a set of elements for each level and m ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
. In OLAP applications, data are modeled as points in a multidimensional space. Dimensions themselves have structure, described by a schema and an instance; the schema is basically a directed acyclic graph of granularity levels, and the instance consists of a set of elements for each level and mappings between these elements, usually called rollup functions. Current dimension models restrict dimensions in various ways; for example, rollup functions are restricted to be total. We relax these restrictions, yielding what we call heterogeneous schemas, which describe more naturally and cleanly many practical situations. In the context of heterogeneous schemas, the notion of summarizability becomes more complex. An aggregate view defined at some granularity level is summarizable from a set of precomputed views defined at other levels if the rollup functions can be used to compute the first view from the set of views. In order to study summarizability in heterogeneous schemas, ...
Cardinality-based Inference Control in Sum-only Data Cubes
- In Proceedings of the 7th European Symposium on Research in Computer Security (ESORICS 2002
, 2002
"... This paper deals with the inference problems in data warehouses and decision support systems such as on-line analytical processing (OLAP) systems. ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
This paper deals with the inference problems in data warehouses and decision support systems such as on-line analytical processing (OLAP) systems.
Design and Implementation of On-Line Analytical Processing (OLAP) of Spatial Data
, 1997
"... On-line analytical processing (OLAP) has gained its popularity in database industry. With a huge amount of data stored in spatial databases and the introduction of spatial components to many relational or object-relational databases, it is important to study the methods for spatial data warehousing ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
On-line analytical processing (OLAP) has gained its popularity in database industry. With a huge amount of data stored in spatial databases and the introduction of spatial components to many relational or object-relational databases, it is important to study the methods for spatial data warehousing and on-line analytical processing of spatial data. This thesis investigates methods for spatial OLAP, by integration of nonspatial on-line analytical processing (OLAP) methods with spatial database implementation techniques. A spatial data warehouse model, which consists of both spatial and nonspatial dimensions and measures, is proposed. Methods for computation of spatial data cubes and analytical processing on such spatial data cubes are studied, with several strategies proposed, including approximation and partial materialization of the spatial objects resulting from spatial OLAP operations. Some techniques for selective materialization of the spatial computation results are worked out, a...
Modeling and Querying Multidimensional Databases: An Overview
, 1999
"... This paper presents some highlights about the concept of multidimensional database and On-Line Analytical Processing (OLAP), a technology used in the context of decision support. It mainly focuses on multidimensional data models and manipulations. We propose both an inventory and a classification of ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
This paper presents some highlights about the concept of multidimensional database and On-Line Analytical Processing (OLAP), a technology used in the context of decision support. It mainly focuses on multidimensional data models and manipulations. We propose both an inventory and a classification of the elementary operations underlying OLAP treatments. We describe several typical complex manipulations based on these elementary operations. Throughout the paper, we present the informal concepts stemming from users' needs and the formal proposals of research works. Hence it provides an entry point in the domain of OLAP modeling and querying.
Integrating Hierarchical Navigation and Querying: A User Customizable Solution
- Proceedings of ACM Workshop on Effective Abstractions in Multimedia Layout, Presentation, and Interaction
, 1995
"... This work is part of a joint project between IBM Almaden Research Center and the Ohio State University. The goal of the project is to facilitate the querying and browsing of multimedia databases over a wide spectrum of networks and media. In this paper, we focus on the related issues of data abst ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This work is part of a joint project between IBM Almaden Research Center and the Ohio State University. The goal of the project is to facilitate the querying and browsing of multimedia databases over a wide spectrum of networks and media. In this paper, we focus on the related issues of data abstraction, navigation and presentation. We are proposing an integrated form of querying and hierarchical navigation that is similar to the methods used by data analysis systems to allow "drill down" operations on statistical data. We have applied our solution to hypertext style database publishing and developed a user customizable database visualization paradigm. The data visualization is interactive and enables users, even ones who have minimal familiarity with the database contents and schema, to easily pinpoint data in which they are interested. A pilot implementation for the project, using a web interface, is already operational. It provides access to tourist related data of the Gree...
Intelligent Support for Multidimensional Data Analysis in Environmental Epidemiology
- Advances in Intelligent Data Analysis - Reasoning about Data. Second International Symposium, IDA-97, volume 1280 of Lecture Notes in Computer Science
, 1997
"... Within the scope of the project CARLOS (Cancer Registry Lower--Saxony), a software system --- CARESS (CARLOS Epidemiological and Statistical Data Exploration System) --- was developed to support modeling and conducting of descriptive epidemiologic studies. The fundamental idea was to implement a pow ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Within the scope of the project CARLOS (Cancer Registry Lower--Saxony), a software system --- CARESS (CARLOS Epidemiological and Statistical Data Exploration System) --- was developed to support modeling and conducting of descriptive epidemiologic studies. The fundamental idea was to implement a powerful core of a system for statistical analysis, which is easily extensible with regard to both data types and algorithms for processing the data. We followed a knowledgebased approach, i. e. a strict separation of data and knowledge on the one hand and the control cycle processing this knowledge on the other. The main concepts concerning data structures, methods, and data processing are presented. Special emphasis is put on the underlying data analysis model and the user interface, namely a visual workbench providing easy access to the whole trail of a study and all relevant data and knowledge. CARESS aims at novel techniques for analysing cancer clustering using advanced database technolog...

