Results 1 - 10
of
78
Implementing data cubes efficiently
- In SIGMOD
, 1996
"... Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total ..."
Abstract
-
Cited by 425 (1 self)
- Add to MetaCart
Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube..A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant factor of optimal under a variety of models. We then consider the most common case of the hypercube lattice and examine the choice of materialized views for hypercubes in detail, giving some good tradeoffs between the space used and the average time to answer a query. 1
Optimizing Queries with Materialized Views
, 1995
"... While much work has addressed the problem of maintaining materialized views, the important question of optimizing queries in the presence of materialized views has not been resolved. In this paper, we analyze the optimization question and provide a comprehensive and efficient solution. Our solution ..."
Abstract
-
Cited by 217 (4 self)
- Add to MetaCart
While much work has addressed the problem of maintaining materialized views, the important question of optimizing queries in the presence of materialized views has not been resolved. In this paper, we analyze the optimization question and provide a comprehensive and efficient solution. Our solution has the desirable property that it is a simple generalization of the traditional query optimization algorithm. 1 Introduction The idea of using materialized views for the benefit of improved query processing has been proposed in the literature more than a decade ago. In this context, problems such as definition of views, composition of views, maintenance of views [BC79, KP81, SI84, BLT86, CW91, Rou91, GMS93] have been researched but one topic has been conspicuous by its absence. This concerns the problem of the judicious use of materialized views in answering a query. It may seem that materialized views should be used to evaluate a query whenever they are applicable. In fact, blind applicat...
Aggregate-Query Processing in Data Warehousing Environments
- In Proceedings of the International Conference on Very Large Databases
, 1995
"... In this paper we introduce generalized projections (GPs), an extension of duplicate eliminating projections, that capture aggregations, groupbys, duplicate-eliminating projections (distinct), and duplicate-preserving projections in a common unified framework. Using GPs we extend well known and simpl ..."
Abstract
-
Cited by 105 (1 self)
- Add to MetaCart
In this paper we introduce generalized projections (GPs), an extension of duplicate eliminating projections, that capture aggregations, groupbys, duplicate-eliminating projections (distinct), and duplicate-preserving projections in a common unified framework. Using GPs we extend well known and simple algorithms for SQL queries that use distinct projections to derive algorithms for queries using aggregations like sum, max, min, count, and avg. We develop powerful query rewrite rules for aggregate queries that unify and extend rewrite rules previously known in the literature. We then illustrate the power of our approach by solving a very practical and important problem in data warehousing: how to answer an aggregate query on base tables using materialized aggregate views (summary tables).
An overview of query optimization in relational systems
- In PODS
, 1998
"... There has been extensive work in query optimization since the early ‘70s. It is hard to capture the breadth and depth of this large body of work in a short article. Therefore, I have decided to focus primarily on the optimization of SQL queries in relational database systems and present my biased an ..."
Abstract
-
Cited by 99 (1 self)
- Add to MetaCart
There has been extensive work in query optimization since the early ‘70s. It is hard to capture the breadth and depth of this large body of work in a short article. Therefore, I have decided to focus primarily on the optimization of SQL queries in relational database systems and present my biased and incomplete view of this field. The goal of this article is not to be comprehensive, but rather to explain the foundations and present samplings of significant work in this area. I would like to apologize to the many contributors in this area whose work I have failed to explicitly acknowledge due to oversight or lack of space. I take the liberty of trading technical precision for ease of presentation. 2.
The Dimensional Fact Model: A Conceptual Model For Data Warehouses
- International Journal of Cooperative Information Systems
, 1998
"... this paper we formalize a graphical conceptual model for data warehouses, called Dimensional Fact model, and propose a semi-automated methodology to build it from the pre-existing (conceptual or logical) schemes describing the enterprise relational database. The representation o ..."
Abstract
-
Cited by 99 (17 self)
- Add to MetaCart
this paper we<E-382> formalize a graphical conceptual model for data warehouses, called Dimensional Fact model, and<E-380> propose a semi-automated methodology to build it from the pre-existing (conceptual or logical)<E-366> schemes describing the enterprise relational database. The representation of reality built using our<E-381> conceptual model consists of a set of fact schemes whose basic elements are facts, measures,<E-358> attributes, dimensions and hierarchies; other features which may be represented on fact schemes are<E-382> the additivity of fact attributes along dimensions, the optionality of dimension attributes and the<E-381> existence of non-dimension attributes. Compatible fact schemes may be overlapped in order to relate<E-373> and compare data for drill-across queries. Fact schemes should be integrated with information of the<E-382> conjectured workload, to be used as the input of logical and physical design phases; to this end, we<E-382> propose a simple language to denote data warehouse queries in terms of sets of fact instances.<E-334>
Algorithms for materialized view design in data warehousing environment
, 1997
"... Selecting views to materialize is one of the most important decisions in designing a data warehouse. In this paper, we present a frame-work for analyzing the issues in selecting views to materialize so as to achieve the best com-bination of good query performance and low view maintenance. We first d ..."
Abstract
-
Cited by 85 (10 self)
- Add to MetaCart
Selecting views to materialize is one of the most important decisions in designing a data warehouse. In this paper, we present a frame-work for analyzing the issues in selecting views to materialize so as to achieve the best com-bination of good query performance and low view maintenance. We first develop a heuristic algorithm which can provide a feasible solu-tion based on individual optimal query plans. We also map the materialized view design problem as O-l integer programming problem, whose solution can guarantee an optimal so-lution. 1
Optimization of queries with user-defined predicates
- TODS
, 1999
"... Relational databases provide the ability to store user-defined functions and predicates which can be invoked in SQL queries. When evaluation of a user-defined predicate is relatively expensive, the traditional method of evaluating predicates as early as possible is no longer a sound heuristic. There ..."
Abstract
-
Cited by 73 (5 self)
- Add to MetaCart
Relational databases provide the ability to store user-defined functions and predicates which can be invoked in SQL queries. When evaluation of a user-defined predicate is relatively expensive, the traditional method of evaluating predicates as early as possible is no longer a sound heuristic. There are two previous approaches for optimizing such queries. However, neither is able to guarantee the optimal plan over the desired execution space. We present efficient techniques that are able to guarantee the choice of an optimal plan over the desired execution space. The naive optimization algorithm is very general, and therefore is most widely applicable. The optimization algorithm with complete rank-ordering improves upon the naive optimization algorithm by exploiting the nature of the cost formulas for join methods and is polynomial in the number of user-defined predicates (for a given number of relations). We also propose pruning rules that significantly reduce the cost of searching the execution space for both the naive algorithm as well as for the optimization algorithm with complete rank-ordering, without compromising optimality. We also propose a conservative local heuristic that is simpler and has low optimization overhead. Although it is not always guaranteed to find the optimal plans, it produces close to optimal plans in most cases. We discuss how, depending on application requirements, to determine the algorithm of choice. It should be emphasized that our optimization algorithms handle user-defined selections as well as user-defined join predicates uniformly. We present complexity analysis and experimental comparison of the algorithms.
Congressional samples for approximate answering of group-by queries
- In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data
, 2000
"... In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex decision support queries using precomputed summary statistics, such as samples. Decision support queries routinely segment the data into groups and then aggregate the information in each ..."
Abstract
-
Cited by 72 (5 self)
- Add to MetaCart
In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex decision support queries using precomputed summary statistics, such as samples. Decision support queries routinely segment the data into groups and then aggregate the information in each group (group-by queries). Depending on the data, there can be a wide disparity between the number of data items in each group. As a result, approximate answers based on uniform random samples of the data can result in poor accuracy for groups with very few data items, since such groups will be represented in the sample by very few (often zero) tuples. In this paper, we propose a general class of techniques for obtaining fast, highly-accurate answers for group-by queries. These techniques rely on precomputed non-uniform (biased) samples of the data. In particular, we proposecongressional samples, ahybrid union of uniform and biased samples. Given a xed amount of space, congressional samples seek to maximize the accuracy for all possible group-by queries on a set of columns. We present a one pass algorithm for constructing a congressional sample and use this technique to also incrementally maintain the sample up-to-date without accessing the base relation. We also evaluate query rewriting strategies for providing approximate answers from congressional samples. Finally, we conduct an extensive set of experiments on the TPC-D database, which demonstrates the e cacy of the techniques proposed. 1
Maintenance of Data Cubes and Summary Tables in a Warehouse
- IN SIGMOD
, 1997
"... Data warehouses contain large amounts of information, often collected from a variety of independent sources. Decisionsupport functions in a warehouse, such as on-line analytical processing (OLAP), involve hundreds of complex aggregate queries over large volumes of data. It is not feasible to compute ..."
Abstract
-
Cited by 72 (3 self)
- Add to MetaCart
Data warehouses contain large amounts of information, often collected from a variety of independent sources. Decisionsupport functions in a warehouse, such as on-line analytical processing (OLAP), involve hundreds of complex aggregate queries over large volumes of data. It is not feasible to compute these queries by scanning the data sets each time. Warehouse applications therefore build a large number of summary tables, or materialized aggregate views, to help them increase the system performance. As changes, most notably new transactional data, are collected at the data sources, all summary tables at the warehouse that depend upon this data need to be updated. Usually, source changes are loaded into the warehouse at regular intervals, usually once a day, in a batch window, and the warehouse is made unavailable for querying while it is updated. Since the number of summary tables that need to be maintained is often large, a critical issue for data warehousing is how to maintain the su...
Answering Queries with Aggregation Using Views
, 1996
"... We present novel algorithms for the problem of using materialized views to compute answers to SQL queries with grouping and aggregation, in the presence of multiset tables. In addition to its obvious potential in query optimization, this problem is important in many applications, such as data wareho ..."
Abstract
-
Cited by 70 (2 self)
- Add to MetaCart
We present novel algorithms for the problem of using materialized views to compute answers to SQL queries with grouping and aggregation, in the presence of multiset tables. In addition to its obvious potential in query optimization, this problem is important in many applications, such as data warehousing, very large transaction recording systems, global information systems and mobile computing, where access to local or cached materialized views may be cheaper than access to the underlying database. Our contributions are the following: First, we show that in the case where the query has grouping and aggregation but the views do not, a view is usable in answering a query only if there is an isomorphism between the view and a portion of the query. Second, when the views also have grouping and aggregation we identify conditions under which the aggregation information present in a view is sufficient to perform the aggregation computations required in the query. The algorithms we describe fo...

