Results 1 - 10
of
87
Query evaluation techniques for large databases
- ACM COMPUTING SURVEYS
, 1993
"... Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On ..."
Abstract
-
Cited by 592 (7 self)
- Add to MetaCart
Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate it: In order to manipulate large sets of complex objects as efficiently as today’s database systems manipulate simple records, query processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and post-relational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Maintenance of Materialized Views: Problems, Techniques, and Applications
, 1995
"... In this paper we motivate and describe materialized views, their applications, and the problems and techniques for their maintenance. We present a taxonomy of view maintenanceproblems basedupon the class of views considered, upon the resources used to maintain the view, upon the types of modi#cati ..."
Abstract
-
Cited by 255 (9 self)
- Add to MetaCart
In this paper we motivate and describe materialized views, their applications, and the problems and techniques for their maintenance. We present a taxonomy of view maintenanceproblems basedupon the class of views considered, upon the resources used to maintain the view, upon the types of modi#cations to the base data that areconsidered during maintenance, and whether the technique works for all instances of databases and modi#cations. We describe some of the view maintenancetechniques proposed in the literature in terms of our taxonomy. Finally, we consider new and promising application domains that are likely to drive work in materialized views and view maintenance. 1 Introduction What is a view? A view is a derived relation de#ned in terms of base #stored# relations. A view thus de#nes a function from a set of base tables to a derived table; this function is typically recomputed every time the view is referenced. What is a materialized view? A view can be materialized by storin...
View Maintenance in a Warehousing Environment
- IN PROCEEDINGS OF SIGMOD
, 1995
"... A warehouse is a repository of integrated information drawn from remote data sources. Since a warehouse effectively implements materialized views, we must maintain the views as the data sources are updated. This view maintenance problem differs from the traditional one in that the view definition an ..."
Abstract
-
Cited by 231 (19 self)
- Add to MetaCart
A warehouse is a repository of integrated information drawn from remote data sources. Since a warehouse effectively implements materialized views, we must maintain the views as the data sources are updated. This view maintenance problem differs from the traditional one in that the view definition and the base data are now decoupled. We show that this decoupling can result in anomalies if traditional algorithms are applied. Weintroduce a new algorithm, ECA (for "Eager Compensating Algorithm"), that eliminates the anomalies. ECA is based on previous incremental view maintenance algorithms, but extra "compensating" queries are used to eliminate anomalies. We also introduce two streamlined versions of ECA for special cases of views and updates, and we present an initial performance study that compares ECA to a view recomputation algorithm in terms of messages transmitted, data transferred, and I/O costs.
Incremental Maintenance of Views with Duplicates
"... We study the problem of efficient maintenance of materialized views that may contain duplicates. This problem is particularly important when queries against such views involve aggregate functions, which need duplicates to produce correct results. Unlike most work on the view maintenance problem that ..."
Abstract
-
Cited by 154 (8 self)
- Add to MetaCart
We study the problem of efficient maintenance of materialized views that may contain duplicates. This problem is particularly important when queries against such views involve aggregate functions, which need duplicates to produce correct results. Unlike most work on the view maintenance problem that is based on an algorithmic approach, our approach is algebraic and based on equational reasoning. This approach has a number of advantages: it is robust and easily extendible to new language constructs, it produces output that can be used by query optimizers, and it simpli es correctness proofs. We use a natural extension of the relational algebra operations to bags (multisets) as our basic language. We present an algorithm that propagates changes from base relations to materialized views. This algorithm is based on reasoning about equivalence of bag-valued expressions. We prove that it is correct and preserves a certain notion of minimality that ensures that no unnecessary tuples are computed. Although it is generally only a heuristic that computing changes to the view rather than recomputing the view from scratch is more efficient, we prove results saying that under normal circumstances one should expect the change propagation algorithm to be significantly faster and more space efficient than complete recomputing of the view. We also show that our approach interacts nicely with aggregate functions, allowing their correct evaluation on views that change.
The Strobe Algorithms for Multi-Source Warehouse Consistency
- In International Conference on Parallel and Distributed Information Systems
, 1996
"... A warehouse is a data repository containing integrated information for efficient querying and analysis. Maintaining the consistency of warehouse data is challenging, especially if the data sources are autonomous and views of the data at the warehouse span multiple sources. Transactions containing mu ..."
Abstract
-
Cited by 91 (8 self)
- Add to MetaCart
A warehouse is a data repository containing integrated information for efficient querying and analysis. Maintaining the consistency of warehouse data is challenging, especially if the data sources are autonomous and views of the data at the warehouse span multiple sources. Transactions containing multiple updates at one or more sources, e.g., batch updates, complicate the consistency problem. In this paper we identify and discuss three fundamental transaction processing scenarios for data warehousing. We define four levels of consistency for warehouse data and present a new family of algorithms, the Strobe family, that maintain consistency as the warehouse is updated, under the various warehousing scenarios. All of the algorithms are incremental and can handle a continuous and overlapping stream of updates from the sources. Our implementation shows that the algorithms are practical and realistic choices for a wide variety of update scenarios. 1 Introduction A data warehouse is a rep...
Materialized view maintenance and integrity constraint checking: Trading space for time
, 1996
"... Abstract We investigate the problem of incremental maintenance of an SQL view in the face of database updates, and show that it is possible to reduce the total time cost of view maintenance by materializing (and maintaining) additional views. We formulate the problem of determining the optimal set o ..."
Abstract
-
Cited by 86 (5 self)
- Add to MetaCart
Abstract We investigate the problem of incremental maintenance of an SQL view in the face of database updates, and show that it is possible to reduce the total time cost of view maintenance by materializing (and maintaining) additional views. We formulate the problem of determining the optimal set of additional views to materialize as an optimization problem over the space of possible view sets (which includes the empty set). The optimization problem is harder than query optimization since it has to deal with multiple view sets, updates of multiple relations, and multiple ways of maintaining each view set for each updated relation. We develop a memoing solution for the problem; the solution can be implemented using the expression DAG representation used in rule-based optimizers such as Volcano. We demonstrate that global optimization cannot, in general, be achieved by locally optimizing each materialized subview, because common subexpressions between different materialized subviews can allow nonoptimal local plans to be combined into an optimal global plan. We identify conditions on materialized subviews in the expression DAG when local optimization is possible. Finally, we provide a systematic space of heuristics that can be used to efficiently determine a useful set of additional views to materialize. Our results are particularly important for the efficient checking of assertions (complex integrity constraints) in the SQL-92 standard, since the incremental checking of such integrity constraints is known to be essentially equivalent to the view maintenance problem.
Maintenance of Data Cubes and Summary Tables in a Warehouse
- IN SIGMOD
, 1997
"... Data warehouses contain large amounts of information, often collected from a variety of independent sources. Decisionsupport functions in a warehouse, such as on-line analytical processing (OLAP), involve hundreds of complex aggregate queries over large volumes of data. It is not feasible to compute ..."
Abstract
-
Cited by 72 (3 self)
- Add to MetaCart
Data warehouses contain large amounts of information, often collected from a variety of independent sources. Decisionsupport functions in a warehouse, such as on-line analytical processing (OLAP), involve hundreds of complex aggregate queries over large volumes of data. It is not feasible to compute these queries by scanning the data sets each time. Warehouse applications therefore build a large number of summary tables, or materialized aggregate views, to help them increase the system performance. As changes, most notably new transactional data, are collected at the data sources, all summary tables at the warehouse that depend upon this data need to be updated. Usually, source changes are loaded into the warehouse at regular intervals, usually once a day, in a batch window, and the warehouse is made unavailable for querying while it is updated. Since the number of summary tables that need to be maintained is often large, a critical issue for data warehousing is how to maintain the su...
Expiring data in a warehouse
- In Proc. of the 1998 Intl. Conf. on Very Large Data Bases
, 1998
"... Data warehouses collect data into materi-alized views for analysis. After some time, some of the data may no longer be needed or may not be of interest. In this pa-per, we handle this by expiring or remov-ing unneeded materialized view tuples. A framework supporting such expiration is presented. Wit ..."
Abstract
-
Cited by 62 (4 self)
- Add to MetaCart
Data warehouses collect data into materi-alized views for analysis. After some time, some of the data may no longer be needed or may not be of interest. In this pa-per, we handle this by expiring or remov-ing unneeded materialized view tuples. A framework supporting such expiration is presented. Within it, a user or adminis-trator can declaratively request expirations and can specify what type of modifications are expected from external sources. The lat-ter can significantly increase the amount of data that can be expired. We present effi-cient algorithms for determining what data can be expired (data not needed for main-tenance of other views), taking into account the types of updates that may occur. 1
Data Warehouse Configuration
, 1997
"... In the data warehousing approach to the integration of data from multiple information sources, selected information is extracted in advance and stored in a repository. A data warehouse (DW) can therefore be seen as a set of materialized views defined over the sources. When a query is posed, it is ev ..."
Abstract
-
Cited by 60 (4 self)
- Add to MetaCart
In the data warehousing approach to the integration of data from multiple information sources, selected information is extracted in advance and stored in a repository. A data warehouse (DW) can therefore be seen as a set of materialized views defined over the sources. When a query is posed, it is evaluated locally, using the materialized views, without accessing the original information sources. The applications using DWs require high query performance. This requirement is in conflict with the need to maintain in the DW updated information. The DW configuration problem is the problem of selecting a set of views to materialize in the DW that answers all the queries of interest while minimizing the total query evaluation and view maintenance cost. In this paper we provide a theoretical framework for this problem in terms of the relational model. We develop a method for dealing with it by formulating it as a state space optimization problem and then solving it using an exh...
K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources
, 2000
"... The integration of heterogeneous data sources and software systems is a major issue in the biomedical community and several approaches have been explored: linking databases, "on-the-fly" integration through views, and integration through warehousing. In this paper we report on our experiences with t ..."
Abstract
-
Cited by 52 (4 self)
- Add to MetaCart
The integration of heterogeneous data sources and software systems is a major issue in the biomedical community and several approaches have been explored: linking databases, "on-the-fly" integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear "winner". Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application.

