Results 1 - 10
of
74
NiagaraCQ: A Scalable Continuous Query System for Internet Databases
- In SIGMOD
, 2000
"... Continuous queries are persistent queries that allow users to receive new results when they become available. While continuous query systems can transform a passive web into an active environment, they need to be able to support millions of queries due to the scale of the Internet. No existing syste ..."
Abstract
-
Cited by 441 (7 self)
- Add to MetaCart
Continuous queries are persistent queries that allow users to receive new results when they become available. While continuous query systems can transform a passive web into an active environment, they need to be able to support millions of queries due to the scale of the Internet. No existing systems have achieved this level of scalability. NiagaraCQ addresses this problem by grouping continuous queries based on the observation that many web queries share similar structures. Grouped queries can share the common computation, tend to fit in memory and can reduce the I/O cost significantly. Furthermore, grouping on selection predicates can eliminate a large number of unnecessary query invocations. Our grouping technique is distinguished from previous group optimization approaches in the following ways. First, we use an incremental group optimization strategy with dynamic re-grouping. New queries are added to existing query groups, without having to regroup already installed queries. Second, we use a query-split scheme that requires minimal changes to a general-purpose query engine. Third, NiagaraCQ groups both change-based and timer-based queries in a uniform way. To insure that NiagaraCQ is scalable, we have also employed other techniques including incremental evaluation of continuous queries, use of both pull and push models for detecting heterogeneous data source changes, and memory caching. This paper presents the design of NiagaraCQ system and gives some experimental results on the system’s performance and scalability. 1.
The Implementation Of Postgres
- IEEE Transactions on Knowledge and Data Engineering
, 1990
"... Currently, POSTGRES is about 90,000 lines of code in C and is being used by assorted "bold and brave" early users. The system has been constructed by a team of 5 part time students led by a full time chief programmer over the last three years. During this period, we have made a large number of desig ..."
Abstract
-
Cited by 371 (22 self)
- Add to MetaCart
Currently, POSTGRES is about 90,000 lines of code in C and is being used by assorted "bold and brave" early users. The system has been constructed by a team of 5 part time students led by a full time chief programmer over the last three years. During this period, we have made a large number of design and implementation choices. Moreover, in some areas we would do things quite differently if we were to start from scratch again. The purpose of this paper is to reflect on the design and implementation decisions we made and to offer advice to implementors who might follow some of our paths. In this paper we restrict our attention to the DBMS "backend" functions. In another paper some of us treat PICASSO, the application development environment that is being built on top of POSTGRES. 1. INTRODUCTION Current relational DBMSs are oriented toward efficient support for business data processing applications where large numbers of instances of fixed format records must be stored and accessed. Tr...
TelegraphCQ: Continuous Dataflow Processing for an Uncertan World
, 2003
"... Increasingly pervasive networks are leading towards a world where data is constantly in motion. In such a world, conventional techniques for query processing, which were developed under the assumption of a far more static and predictable computational environment, will not be sufficient. Instead, qu ..."
Abstract
-
Cited by 329 (18 self)
- Add to MetaCart
Increasingly pervasive networks are leading towards a world where data is constantly in motion. In such a world, conventional techniques for query processing, which were developed under the assumption of a far more static and predictable computational environment, will not be sufficient. Instead, query processors based on adaptive dataflow will be necessary. The Telegraph project has developed a suite of novel technologies for continuously adaptive query processing. The next generation Telegraph system, called TelegraphCQ, is focused on meeting the challenges that arise in handling large streams of continuous queries over high-volume, highly-variable data streams. In this paper, we describe the system architecture and its underlying technology, and report on our ongoing implementation effort, which leverages the PostgreSQL open source code base. We also discuss open issues and our research agenda.
Continuously Adaptive Continuous Queries over Streams
- In SIGMOD
, 2002
"... We present a continuously adaptive, continuous query (CACQ) implementation based on the eddy query processing framework. We show that our design provides significant performance benefits over existing approaches to evaluating continuous queries, not only because of its adaptivity, but also because o ..."
Abstract
-
Cited by 204 (6 self)
- Add to MetaCart
We present a continuously adaptive, continuous query (CACQ) implementation based on the eddy query processing framework. We show that our design provides significant performance benefits over existing approaches to evaluating continuous queries, not only because of its adaptivity, but also because of the aggressive crossquery sharing of work and space that it enables. By breaking the abstraction of shared relational algebra expressions, our Telegraph CACQ implementation is able to share physical operators -- both selections and join state -- at a very fine grain. We augment these features with a grouped-filter index to simultaneously evaluate multiple selection predicates. We include measurements of the performance of our core system, along with a comparison to existing continuous query approaches.
Selection of Views to Materialize in a Data Warehouse
, 1997
"... . A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decisionsupport or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The ..."
Abstract
-
Cited by 161 (5 self)
- Add to MetaCart
. A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decisionsupport or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and the cost of maintaining the selected views, given a limited amount of resource, e.g., materialization time, storage space etc. In this article, we develop a theoretical framework for the general problem of selection of views in a data warehouse. We present competitive polynomial-time heuristics for selection of views to optimize total query response time, for some important special cases of the general data warehouse scenario, viz.: (i) an AND view graph, where each query/view has a unique evaluation, and (ii) an OR view graph, in which any view can be computed from any one of its related views, e.g.,...
Temporal and Real-Time Databases: A Survey
- IEEE Transactions on Knowledge and Data Engineering
, 1995
"... A temporal database contains time-varying data. In a real-time database transactions have deadlines or timing constraints. In this paper we review the substantial research in these two heretofore separate research areas. We first characterize the time domain, then investigate temporal and real-time ..."
Abstract
-
Cited by 154 (9 self)
- Add to MetaCart
A temporal database contains time-varying data. In a real-time database transactions have deadlines or timing constraints. In this paper we review the substantial research in these two heretofore separate research areas. We first characterize the time domain, then investigate temporal and real-time data models. We evaluate temporal and real-time query languages along several dimensions. Temporal and real-time DBMS implementation is examined. We conclude with a summary of the major accomplishments of the research to date, and list several research questions that should be addressed next. Keywords: object-oriented database, relational databases, query language, temporal data model, time-constrained database, transaction time, user-defined time, valid time 1 Introduction Time is an important aspect of all real-world phenomena. Events occur at specific points in time; objects and the relationships among objects exist over time. The ability to model this temporal dimension of the real worl...
Continual Queries for Internet Scale Event-Driven Information Delivery
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 1999
"... In this paper we introduce the concept of continual queries, describe the design of a distributed event-driven continual query system -- OpenCQ, and outline the initial implementation of OpenCQ on top of the distributed interoperable information mediation system DIOM [21, 19]. Continual queries a ..."
Abstract
-
Cited by 153 (13 self)
- Add to MetaCart
In this paper we introduce the concept of continual queries, describe the design of a distributed event-driven continual query system -- OpenCQ, and outline the initial implementation of OpenCQ on top of the distributed interoperable information mediation system DIOM [21, 19]. Continual queries are standing queries that monitor update of interest and return results whenever the update reaches specified thresholds. In OpenCQ, users may specify to the system the information they would like to monitor (such as the events or the update thresholds they are interested in). Whenever the information of interest becomes available, the system immediately delivers it to the relevant users; otherwise, the system continually monitors the arrival of the desired information and pushes it to the relevant users as it meets the specified update thresholds. In contrast to conventional pull-based data management systems such as DBMSs and Web search engines, OpenCQ exhibits two important featu...
Implementing Declarative Overlays
, 2005
"... Overlay networks are used today in a variety of distributed systems ranging from file-sharing and storage systems to communication infrastructures. However, designing, building and adapting these overlays to the intended application and the target environment is a di#cult and time consuming process. ..."
Abstract
-
Cited by 128 (46 self)
- Add to MetaCart
Overlay networks are used today in a variety of distributed systems ranging from file-sharing and storage systems to communication infrastructures. However, designing, building and adapting these overlays to the intended application and the target environment is a di#cult and time consuming process.
Query Optimization
, 1996
"... Imagine yourself standing in front of an exquisite buffet filled with numerous delicacies. Your goal is to try them all out, but you need to decide in what order. What exchange of tastes will maximize the overall pleasure of your palate? Although much less pleasurable and subjective, that is the typ ..."
Abstract
-
Cited by 102 (2 self)
- Add to MetaCart
Imagine yourself standing in front of an exquisite buffet filled with numerous delicacies. Your goal is to try them all out, but you need to decide in what order. What exchange of tastes will maximize the overall pleasure of your palate? Although much less pleasurable and subjective, that is the type of problem that query optimizers are called to solve. Given a query, there are many plans that a database management system (DBMS) can follow to process it and produce its answer. All plans are equivalent in terms of their final output but vary in their cost, i.e., the amount of time that they need to run. What is the plan that needs the least amount of time? Such query optimization is absolutely necessary in a DBMS. The cost difference between two alternatives can be enormous. For example, consider the following database schema, which will be...
Efficient and Extensible Algorithms for Multi Query Optimization
, 2000
"... Complex queries are becoming commonplace, with the growing use of decision support systems. These complex queries often have a lot of common sub-expressions, either within a single query, or across multiple such queries run as a batch. Multi-query optimization aims at exploiting common subexpressi ..."
Abstract
-
Cited by 99 (4 self)
- Add to MetaCart
Complex queries are becoming commonplace, with the growing use of decision support systems. These complex queries often have a lot of common sub-expressions, either within a single query, or across multiple such queries run as a batch. Multi-query optimization aims at exploiting common subexpressions to reduce evaluation cost. Multi-query optimization has hither-to been viewed as impractical, since earlier algorithms were exhaustive, and explore a doubly exponential search space. In this paper we demonstrate that multi-query optimization using heuristics is practical, and provides significant benefits. We propose three cost-based heuristic algorithms: Volcano-SH and Volcano-RU, which are based on simple modifications to the Volcano search strategy, and a greedy heuristic. Our greedy heuristic incorporates novel optimizations that improve efficiency greatly. Our algorithms are designed to be easily added to existing optimizers. We present a performance study comparing the algo...

