Results 1  10
of
40
Data cube: A relational aggregation operator generalizing groupby, crosstab, and subtotals
, 1996
"... Abstract. Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zerodimensional or onedimensional aggregates. Applications need the Ndimensional generalization of these op ..."
Abstract

Cited by 754 (8 self)
 Add to MetaCart
Abstract. Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zerodimensional or onedimensional aggregates. Applications need the Ndimensional generalization of these operators. This paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, crosstabulation, rollup, drilldown, and subtotal constructs found in most report writers. The novelty is that cubes are relations. Consequently, the cube operator can be imbedded in more complex nonprocedural data analysis programs. The cube operator treats each of the N aggregation attributes as a dimension of Nspace. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an Ndimensional cube. Superaggregates are computed by aggregating the Ncube to lower dimensional spaces. This paper (1) explains the cube and rollup operators, (2) shows how they fit in SQL, (3) explains how users can define new aggregate functions for cubes, and (4) discusses efficient techniques to compute the cube. Many of these features are being added to the SQL Standard.
Research Problems in Data Warehousing
, 1995
"... The topic of data warehousing encompasses architectures, algorithms, and tools for bringing together selected data from multiple databases or other information sources into a single repository, called a data warehouse, suitable for direct querying or analysis. In recent years data warehousing has be ..."
Abstract

Cited by 265 (9 self)
 Add to MetaCart
The topic of data warehousing encompasses architectures, algorithms, and tools for bringing together selected data from multiple databases or other information sources into a single repository, called a data warehouse, suitable for direct querying or analysis. In recent years data warehousing has become a prominent buzzword in the database industry, but attention from the database research community has been limited. In this paper we motivate the concept of a data warehouse, we outline a general data warehousing architecture, and we propose a number of technical issues arising from the architecture that we believe are suitable topics for exploratory research. 1 Introduction Providing integrated access to multiple, distributed, heterogeneous databases and other information sources has become one of the leading issues in database research and industry #6#. In the research community, most approaches to the data integration problem are based on the following very general twostep process...
On the computation of multidimensional aggregates
 IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES
, 1996
"... At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. This paper presents fast algorithms for computing a collection ..."
Abstract

Cited by 218 (18 self)
 Add to MetaCart
At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. This paper presents fast algorithms for computing a collection of groupbys. We focus on a special case of the aggregation problem  computation of the CUBE operator. The CUBE operator requires computing groupbys on all possible combinations of a list of attributes, and is equivalent to the union of a number of standard groupby operations. We show howthe structure of CUBE computation can be viewed in terms of a hierarchy of groupby operations. Our algorithms extend sortbased and hashbased grouping methods with several optimizations, like combining common operations across multiple groupbys, caching, and using precomputed groupbys for computing other groupbys. Empirical evaluation shows that the resulting algorithms give much better performance compared to straightforward methods. This paper combines work done concurrently on computing the data cube by two different teams as reported in [SAG96] and [DANR96].
Powerconserving computation of orderstatistics over sensor networks
 in Proceedings of the ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems
, 2004
"... We study the problem of powerconserving computation of order statistics in sensor networks. Significant powerreducing optimizations have been devised for computing simple aggregate queries such as COUNT, AVERAGE, or MAX over sensor networks. In contrast, aggregate queries such as MEDIAN have seen ..."
Abstract

Cited by 75 (1 self)
 Add to MetaCart
We study the problem of powerconserving computation of order statistics in sensor networks. Significant powerreducing optimizations have been devised for computing simple aggregate queries such as COUNT, AVERAGE, or MAX over sensor networks. In contrast, aggregate queries such as MEDIAN have seen little progress over the brute force approach of forwarding all data to a central server. Moreover, battery life of current sensors seems largely determined by communication costs therefore we aim to minimize the number of bytes transmitted. Unoptimized aggregate queries typically impose extremely high power consumption on a subset of sensors located near the server. Metrics such as total communication cost underestimate the penalty of such imbalance: network lifetime may be dominated by the worstcase replacement time for depleted batteries. In this paper, we design the first algorithms for computing orderstatistics such that power consumption is balanced across the entire network. Our first main result is a distributed algorithm εapproximate quantile summary of the sensor data such that each sensor transmits only O(log2n/ε) data values, irrespective of the network topology, an improvement over the current worstcase behavior of Ω(n). Second, we show an improved result when the height, h, of the network is significantly smaller than n. Our third result is that we
Maintaining Data Cubes under Dimension Updates
, 1999
"... OLAP systems support data analysis through a multidimensional data model, according to which data facts are viewed as points in a space of applicationrelated "dimensions", organized into levels which conform a hierarchy. The usual assumption is that the data points reflect the dynamic asp ..."
Abstract

Cited by 50 (9 self)
 Add to MetaCart
OLAP systems support data analysis through a multidimensional data model, according to which data facts are viewed as points in a space of applicationrelated "dimensions", organized into levels which conform a hierarchy. The usual assumption is that the data points reflect the dynamic aspect of the data warehouse, while dimensions are relatively static. However, in practice, dimension updates are often necessary to adapt the multidimensional database to changing requirements. Structural updates can also take place, like addition of categories or modification of the hierarchical structure. When these updates are performed, the materialized aggregate views that are typically stored in OLAP systems must be efficiently maintained. These updates are poorly supported (or not supported at all) in current commercial systems, and have received little attention in the research literature. We present a formal model of dimension updates in a multidimensional model, a collection of primitive opera...
Beyond basic faceted search
 In Proceedings of the international conference on Web search and web data mining, WSDM ’08
, 2008
"... This paper extends traditional faceted search to support richer information discovery tasks over more complex data models. Our first extension adds flexible, dynamic business intelligence aggregations to the faceted application, enabling users to gain insight into their data that is far richer tha ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
This paper extends traditional faceted search to support richer information discovery tasks over more complex data models. Our first extension adds flexible, dynamic business intelligence aggregations to the faceted application, enabling users to gain insight into their data that is far richer than just knowing the quantities of documents belonging to each facet. We see this capability as a step toward bringing OLAP capabilities, traditionally supported by databases over relational data, to the domain of freetext queries over metadatarich content. Our second extension shows how one can efficiently extend a faceted search engine to support correlated facets a more complex information model in which the values associated with a document across multiple facets are not independent. We show that by reducing the problem to a recently solved treeindexing scenario, data with correlated facets can be efficiently indexed and retrieved.
Reasoning about Summarizability in Heterogeneous Multidimensional Schemas
 In IEEE ICDT
, 2001
"... . In OLAP applications, data are modeled as points in a multidimensional space. Dimensions themselves have structure, described by a schema and an instance; the schema is basically a directed acyclic graph of granularity levels, and the instance consists of a set of elements for each level and m ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
. In OLAP applications, data are modeled as points in a multidimensional space. Dimensions themselves have structure, described by a schema and an instance; the schema is basically a directed acyclic graph of granularity levels, and the instance consists of a set of elements for each level and mappings between these elements, usually called rollup functions. Current dimension models restrict dimensions in various ways; for example, rollup functions are restricted to be total. We relax these restrictions, yielding what we call heterogeneous schemas, which describe more naturally and cleanly many practical situations. In the context of heterogeneous schemas, the notion of summarizability becomes more complex. An aggregate view defined at some granularity level is summarizable from a set of precomputed views defined at other levels if the rollup functions can be used to compute the first view from the set of views. In order to study summarizability in heterogeneous schemas, ...
Cardinalitybased Inference Control in Sumonly Data Cubes
 In Proceedings of the 7th European Symposium on Research in Computer Security (ESORICS 2002
, 2002
"... This paper deals with the inference problems in data warehouses and decision support systems such as online analytical processing (OLAP) systems. ..."
Abstract

Cited by 13 (7 self)
 Add to MetaCart
This paper deals with the inference problems in data warehouses and decision support systems such as online analytical processing (OLAP) systems.
Modeling and Querying Multidimensional Databases: An Overview
, 1999
"... This paper presents some highlights about the concept of multidimensional database and OnLine Analytical Processing (OLAP), a technology used in the context of decision support. It mainly focuses on multidimensional data models and manipulations. We propose both an inventory and a classification of ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
This paper presents some highlights about the concept of multidimensional database and OnLine Analytical Processing (OLAP), a technology used in the context of decision support. It mainly focuses on multidimensional data models and manipulations. We propose both an inventory and a classification of the elementary operations underlying OLAP treatments. We describe several typical complex manipulations based on these elementary operations. Throughout the paper, we present the informal concepts stemming from users' needs and the formal proposals of research works. Hence it provides an entry point in the domain of OLAP modeling and querying.