Results 1 - 10
of
81
Answering Queries Using Views: A Survey
, 2000
"... The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a w ..."
Abstract
-
Cited by 395 (27 self)
- Add to MetaCart
The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result, finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views. Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it and the relevant theoretical results.
Bottom-Up Computation of Sparse and Iceberg CUBEs
- Proceedings of the 1999 ACM SIGMOD Conference
, 1999
"... We introduce the Iceberg-CUBE problem as a reformulation of the datacube (CUBE) problem. The Iceberg-CUBE problem is to compute only those group-by partitions with an aggregate value (e.g., count) above some minimum support threshold. The result of Iceberg-CUBE can be used (1) to answer group-by que ..."
Abstract
-
Cited by 132 (3 self)
- Add to MetaCart
We introduce the Iceberg-CUBE problem as a reformulation of the datacube (CUBE) problem. The Iceberg-CUBE problem is to compute only those group-by partitions with an aggregate value (e.g., count) above some minimum support threshold. The result of Iceberg-CUBE can be used (1) to answer group-by queries with a clause such as HAVING COUNT(*) ?= X, where X is greater than the threshold, (2) for mining multidimensional association rules, and (3) to complement existing strategies for identifying interesting subsets of the CUBE for precomputation. We present a new algorithm (BUC) for Iceberg-CUBE computation. BUC builds the CUBE bottom-up; i.e., it builds the CUBE by starting from a group-by on a single attribute, then a group-by on a pair of attributes, then a group-by on three attributes, and so on. This is the opposite of all techniques proposed earlier for computing the CUBE, and has an important practical advantage: BUC avoids computing the larger group-bys that do not meet minimum sup...
The Dimensional Fact Model: A Conceptual Model For Data Warehouses
- International Journal of Cooperative Information Systems
, 1998
"... this paper we formalize a graphical conceptual model for data warehouses, called Dimensional Fact model, and propose a semi-automated methodology to build it from the pre-existing (conceptual or logical) schemes describing the enterprise relational database. The representation o ..."
Abstract
-
Cited by 99 (17 self)
- Add to MetaCart
this paper we<E-382> formalize a graphical conceptual model for data warehouses, called Dimensional Fact model, and<E-380> propose a semi-automated methodology to build it from the pre-existing (conceptual or logical)<E-366> schemes describing the enterprise relational database. The representation of reality built using our<E-381> conceptual model consists of a set of fact schemes whose basic elements are facts, measures,<E-358> attributes, dimensions and hierarchies; other features which may be represented on fact schemes are<E-382> the additivity of fact attributes along dimensions, the optionality of dimension attributes and the<E-381> existence of non-dimension attributes. Compatible fact schemes may be overlapped in order to relate<E-373> and compare data for drill-across queries. Fact schemes should be integrated with information of the<E-382> conjectured workload, to be used as the input of logical and physical design phases; to this end, we<E-382> propose a simple language to denote data warehouse queries in terms of sets of fact instances.<E-334>
Caching Multidimensional Queries Using Chunks
- In Proceedings of the ACM SIGMOD Conference on Management of Data
, 1998
"... Caching has been proposed (and implemented) by OLAP systems in order to reduce response times for multidimensional queries. Previous work on such caching has considered table level caching and query level caching. Table level caching is suitable for static schemes. On the other hand, Query level c ..."
Abstract
-
Cited by 64 (1 self)
- Add to MetaCart
Caching has been proposed (and implemented) by OLAP systems in order to reduce response times for multidimensional queries. Previous work on such caching has considered table level caching and query level caching. Table level caching is suitable for static schemes. On the other hand, Query level caching can be used in dynamic schemes, but is too coarse for "large" query results. Query level caching has the further drawback for small query results in that it is only effective when a new query is subsumed by a cached previous query. In this paper, we propose caching small regions of the multidimensional space called "chunks". Chunk-based caching allows fine granularity caching, and also allows queries to partially reuse the results of previous queries with which they overlap. To facilitate the computation of chunks required by a query but not found in the cache, we propose a new organization for relational tables, which we call a "chunked file." Our experiments show that for wor...
Selection of Views to Materialize Under a Maintenance Cost Constraint
, 1999
"... . A data warehouse stores materialized views derived from one or more sources for the purpose of efficiently implementing decisionsupport or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The g ..."
Abstract
-
Cited by 61 (0 self)
- Add to MetaCart
. A data warehouse stores materialized views derived from one or more sources for the purpose of efficiently implementing decisionsupport or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and/or the cost of maintaining the selected views, given a limited amount of resource such as materialization time, storage space, or total view maintenance time. In this article, we develop algorithms to select a set of views to materialize in a data warehouse in order to minimize the total query response time under the constraint of a given total view maintenance time. As the above maintenance-cost view-selection problem is extremely intractable, we tackle some special cases and design approximation algorithms. First, we design an approximation greedy algorithm for the maintenance-cost view-selection prob...
A Survey on Logical Models for OLAP Databases
- SIGMOD Record
, 1999
"... this paper we provided a categorization of the work in the area of OLAP logical models by surveying some major efforts, from commercial tools, benchmarks and standards, and academic efforts. We have also attempted a comparison of the various models along several dimensions, including representation ..."
Abstract
-
Cited by 60 (4 self)
- Add to MetaCart
this paper we provided a categorization of the work in the area of OLAP logical models by surveying some major efforts, from commercial tools, benchmarks and standards, and academic efforts. We have also attempted a comparison of the various models along several dimensions, including representation and querying aspects.
A Methodological Framework for Data Warehouse Design
- In Proc. DOLAP
, 1998
"... Though designing a data warehouse requires techniques completely different from those adopted for operational systems, no significant effort has been made so far to develop a complete and consistent design methodology for data warehouses. In this paper we outline a general methodological framework f ..."
Abstract
-
Cited by 51 (4 self)
- Add to MetaCart
Though designing a data warehouse requires techniques completely different from those adopted for operational systems, no significant effort has been made so far to develop a complete and consistent design methodology for data warehouses. In this paper we outline a general methodological framework for data warehouse design, based on our Dimensional Fact Model (DFM). After analyzing the existing information system and collecting the user requirements, conceptual design is carried out semi-automatically starting from the operational database scheme. A workload is then characterized in terms of data volumes and expected queries, to be used as the input of the logical and physical design phases whose output is the final scheme for the data warehouse. Keywords Data warehouse, design methodology, conceptual model. 1. INTRODUCTION The database community is devoting increasing attention ...
DynaMat: A Dynamic View Management System for Data Warehouses
- In SIGMOD
, 1999
"... Pre-computation and materialization of views with aggregate functions is a common technique in Data Warehouses. Due to the complex structure of the warehouse and the different profiles of the users who submit queries, there is need for tools that will automate the selection and management of the mat ..."
Abstract
-
Cited by 50 (10 self)
- Add to MetaCart
Pre-computation and materialization of views with aggregate functions is a common technique in Data Warehouses. Due to the complex structure of the warehouse and the different profiles of the users who submit queries, there is need for tools that will automate the selection and management of the materialized data. In this paper we present DynaMat, a system that dynamically materializes information at multiple levels of granularity in order to match the demand (workload) but also takes into account the maintenance restrictions for the warehouse, such as down time to update the views and space availability. DynaMat unifies the view selection and the view maintenance problems under a single framework using a novel “goodness ” measure for the materialized views. DynaMat constantly monitors incoming queries and materializes the best set of views subject to the space constraints. During updates, DynaMat reconciles the current materialized view selection and refreshes the most beneficial subset of it within a given maintenance window. We compare DynaMat against a system that is given all queries in advance and the pre-computed optimal static view selection. The comparison is made based on a new metric, the Detailed Cost Savings Ratio introduced for quantifying the benefits of view materialization against incoming queries. These experiments show that DynaMat’s dynamic view selection outperforms the optimal static view selection and thus, any sub-optimal static algorithm that has appeared in the literature. 1
Modeling Multidimensional Databases, Cubes and Cube Operations
- In Proc. of the 10th SSDBM Conference
, 1998
"... On-Line Analytical Processing (OLAP) is a trend in database technology, which was recently introduced and has attracted the interest of a lot of research work. OLAP is based on the multidimensional view of data, supported either by multidimensional databases (MOLAP) or relational engines (ROLAP). ..."
Abstract
-
Cited by 44 (5 self)
- Add to MetaCart
On-Line Analytical Processing (OLAP) is a trend in database technology, which was recently introduced and has attracted the interest of a lot of research work. OLAP is based on the multidimensional view of data, supported either by multidimensional databases (MOLAP) or relational engines (ROLAP).
A Formal Perspective on the View Selection Problem
- The VLDB Journal
"... The view selection problem is to choose a set of views to materialize over a database schema, such that the cost of evaluating a set of workload queries is minimized and such that the views t into a prespeci ed storage constraint. The two main applications of the view selection problem are materiali ..."
Abstract
-
Cited by 40 (12 self)
- Add to MetaCart
The view selection problem is to choose a set of views to materialize over a database schema, such that the cost of evaluating a set of workload queries is minimized and such that the views t into a prespeci ed storage constraint. The two main applications of the view selection problem are materializing views in a database to speed up query processing, and selecting views to materialize in a data warehouse to answer decision support queries. We describe several fundamental results concerning the view selection problem. We consider the problem for views and workloads that consist of equalityselection, project and join queries, and show that the complexity of the problem depends crucially on the quality of the estimates that a query optimizer has on the size of the views it is considering to materialize. When a query optimizer has good estimates of the sizes of the views, we show that an optimal choice of views may involve a number of views that is exponential in the size of the database schema. On the other hand, when an optimizer uses standard estimation heuristics, we show that the number of necessary views and the expression size of each view are polynomially bounded. 1

