Results 1 - 10
of
66
Query evaluation techniques for large databases
- ACM COMPUTING SURVEYS
, 1993
"... Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On ..."
Abstract
-
Cited by 592 (7 self)
- Add to MetaCart
Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate it: In order to manipulate large sets of complex objects as efficiently as today’s database systems manipulate simple records, query processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and post-relational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
View Maintenance in a Warehousing Environment
- IN PROCEEDINGS OF SIGMOD
, 1995
"... A warehouse is a repository of integrated information drawn from remote data sources. Since a warehouse effectively implements materialized views, we must maintain the views as the data sources are updated. This view maintenance problem differs from the traditional one in that the view definition an ..."
Abstract
-
Cited by 231 (19 self)
- Add to MetaCart
A warehouse is a repository of integrated information drawn from remote data sources. Since a warehouse effectively implements materialized views, we must maintain the views as the data sources are updated. This view maintenance problem differs from the traditional one in that the view definition and the base data are now decoupled. We show that this decoupling can result in anomalies if traditional algorithms are applied. Weintroduce a new algorithm, ECA (for "Eager Compensating Algorithm"), that eliminates the anomalies. ECA is based on previous incremental view maintenance algorithms, but extra "compensating" queries are used to eliminate anomalies. We also introduce two streamlined versions of ECA for special cases of views and updates, and we present an initial performance study that compares ECA to a view recomputation algorithm in terms of messages transmitted, data transferred, and I/O costs.
Data Caching Issues in an Information Retrieval System
- ACM Transactions on Database Systems
, 1990
"... Currently, a variety of information retrieval systems are available to potential users. These services are provided by commercial enterprises (such as Dow Jones [6] and The Source [7]), while others are research efforts (the Boston Community Information System [S]). While in many cases these systems ..."
Abstract
-
Cited by 191 (6 self)
- Add to MetaCart
Currently, a variety of information retrieval systems are available to potential users. These services are provided by commercial enterprises (such as Dow Jones [6] and The Source [7]), while others are research efforts (the Boston Community Information System [S]). While in many cases these systems are accessed from personal computers, typically no advantage is taken of the computing resources of those machines (such as local processing and storage). In this paper we explore the possibility of using the user’s local storage capabilities to cache data at the user’s site. This would improve the response time of user queries albeit at the cost of incurring the overhead required in maintaining multiple copies. In order to reduce this overhead it may be appropriate to allow copies to diverge in a controlled fashion. This would not only make caching less costly, but would also make it possible to propagate updates to the copies more efficiently, for example, when the system is lightly loaded, when communication tariffs are lower, or by batching updates together. Just as importantly, it also makes it possible to access the copies even when the communication lines or the central site are down. Thus, we introduce the notion of quasi-copies, which embodies the ideas sketched above. We also define the types of deviations that seem useful, and discuss the available implementation strategies.
Query Caching and Optimization in Distributed Mediator Systems
- In Proc. of ACM SIGMOD Conf. on Management of Data
, 1996
"... Query processing and optimization in mediator systems that access distributed non-proprietary sources pose many novel problems. Cost-based query optimization is hard because the mediator does not have access to source statistics information and furthermore it may not be easy to model the source's pe ..."
Abstract
-
Cited by 176 (10 self)
- Add to MetaCart
Query processing and optimization in mediator systems that access distributed non-proprietary sources pose many novel problems. Cost-based query optimization is hard because the mediator does not have access to source statistics information and furthermore it may not be easy to model the source's performance. At the same time, querying remote sources may be very expensive because of high connection overhead, long computation time, financial charges, and temporary unavailability. We propose a costbased optimization technique that caches statistics of actual calls to the sources and consequently estimates the cost of the possible execution plans based on the statistics cache. We investigate issues pertaining to the design of the statistics cache and experimentally analyze various tradeoffs. We also present a query result caching mechanism that allows us to effectively use results of prior queries when the source is not readily available. We employ the novel invariants mechanism, which s...
Continual Queries for Internet Scale Event-Driven Information Delivery
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 1999
"... In this paper we introduce the concept of continual queries, describe the design of a distributed event-driven continual query system -- OpenCQ, and outline the initial implementation of OpenCQ on top of the distributed interoperable information mediation system DIOM [21, 19]. Continual queries a ..."
Abstract
-
Cited by 153 (13 self)
- Add to MetaCart
In this paper we introduce the concept of continual queries, describe the design of a distributed event-driven continual query system -- OpenCQ, and outline the initial implementation of OpenCQ on top of the distributed interoperable information mediation system DIOM [21, 19]. Continual queries are standing queries that monitor update of interest and return results whenever the update reaches specified thresholds. In OpenCQ, users may specify to the system the information they would like to monitor (such as the events or the update thresholds they are interested in). Whenever the information of interest becomes available, the system immediately delivers it to the relevant users; otherwise, the system continually monitors the arrival of the desired information and pushes it to the relevant users as it meets the specified update thresholds. In contrast to conventional pull-based data management systems such as DBMSs and Web search engines, OpenCQ exhibits two important featu...
Maintenance of Data Cubes and Summary Tables in a Warehouse
- IN SIGMOD
, 1997
"... Data warehouses contain large amounts of information, often collected from a variety of independent sources. Decisionsupport functions in a warehouse, such as on-line analytical processing (OLAP), involve hundreds of complex aggregate queries over large volumes of data. It is not feasible to compute ..."
Abstract
-
Cited by 72 (3 self)
- Add to MetaCart
Data warehouses contain large amounts of information, often collected from a variety of independent sources. Decisionsupport functions in a warehouse, such as on-line analytical processing (OLAP), involve hundreds of complex aggregate queries over large volumes of data. It is not feasible to compute these queries by scanning the data sets each time. Warehouse applications therefore build a large number of summary tables, or materialized aggregate views, to help them increase the system performance. As changes, most notably new transactional data, are collected at the data sources, all summary tables at the warehouse that depend upon this data need to be updated. Usually, source changes are loaded into the warehouse at regular intervals, usually once a day, in a batch window, and the warehouse is made unavailable for querying while it is updated. Since the number of summary tables that need to be maintained is often large, a critical issue for data warehousing is how to maintain the su...
Adapting materialized views after redefinitions
- In Proceedings of ACM SIGMOD International Conference on Management of Data
, 1995
"... We consider a variant of the view maintenance problem: How does one keep a materialized view up-to-date when the view definition itself changes? Can one do better than recomputing the view from the base relations? Traditional view maintenance tries to maintain the materialized view in response to mo ..."
Abstract
-
Cited by 70 (6 self)
- Add to MetaCart
We consider a variant of the view maintenance problem: How does one keep a materialized view up-to-date when the view definition itself changes? Can one do better than recomputing the view from the base relations? Traditional view maintenance tries to maintain the materialized view in response to modifications to the base relations; we try to “adapt ” the view in response to changes in the view definition. Such techniques are needed for applications where the user can change queries dynamically and see the changes in the results fast. Data archaeology, data visualization, and dynamic queries are examples of such applications. We consider all possible redefinitions of SQL SELECT-FROM-UHERE-GROUPBY, UNION, and EXCEPT views, and show how these views can be adapted using the old materialization for the cases where it is possible to do so. We identify extra information that can be kept with a materialization to facilitate redefinition. Multiple simultaneous changes to a view can be handled without necessarily materializing intermediate results. We iden-tify guidelines for users and database administrators that can be used to facilitate efficient view adaptation. 1
The implementation and performance evaluation of the ADMS query optimizer: Integrating query result caching and matching
- In Proceedings of the International Conference on Extending Database Technology
, 1994
"... Abstract. In this paper, we describe the design and implementation of the ADMS query optimizer. This optimizer integrates query matching into optimization and generates more e cient query plans using cached results. It features data caching and pointer caching, alternative cache replacement strategi ..."
Abstract
-
Cited by 69 (8 self)
- Add to MetaCart
Abstract. In this paper, we describe the design and implementation of the ADMS query optimizer. This optimizer integrates query matching into optimization and generates more e cient query plans using cached results. It features data caching and pointer caching, alternative cache replacement strategies, and di erent cache update methods. A comprehensive set of experiments were conducted using a benchmark database and synthetic queries. The results showed that pointer caching and dynamic cache update strategies substantially saved query execution time and, thus, increased query throughput under situations with fair query correlation and update load. The requirement of the disk cache space is relatively small, and the extra optimization overhead introduced is more than o set by the time saved in query evaluation. 1
Subsumption between Queries to Object-Oriented Databases
, 1994
"... Most work on query optimization in relational and object-oriented databases has concentrated on tuning algebraic expressions and the physical access to the database contents. The attention to semantic query optimization, however, has been restricted due to its inherent complexity. We take a second l ..."
Abstract
-
Cited by 66 (8 self)
- Add to MetaCart
Most work on query optimization in relational and object-oriented databases has concentrated on tuning algebraic expressions and the physical access to the database contents. The attention to semantic query optimization, however, has been restricted due to its inherent complexity. We take a second look at the problem for queries in object-oriented databases and find that reasoning techniques for concept languages developed in Artificial Intelligence apply for the following reasons: concept languages have been tailored for efficiency and their semantics is compatible with class and query definitions in object-oriented databases. We propose a query optimizer which decides subset relationships between a query and a view (a simpler query whose answer is stored) in polynomial time. This work was supported in part by the Commission of the European Communities under ESPRIT Basic Research Action 6810 (Compulog 2), by the German Ministry of Research and Technology under grant ITW 92-01 (TACOS...

