Results 1 - 10
of
96
Models and issues in data stream systems
- In PODS
, 2002
"... In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work releva ..."
Abstract
-
Cited by 520 (18 self)
- Add to MetaCart
In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work relevant to data stream systems and current projects in the area, the paper explores topics in stream query languages, new requirements and challenges in query processing, and algorithmic issues. 1
Gossip-Based Computation of Aggregate Information
, 2003
"... between computers, and a resulting paradigm shift from centralized to highly distributed systems. With massive scale also comes massive instability, as node and link failures become the norm rather than the exception. For such highly volatile systems, decentralized gossip-based protocols are emergin ..."
Abstract
-
Cited by 215 (1 self)
- Add to MetaCart
between computers, and a resulting paradigm shift from centralized to highly distributed systems. With massive scale also comes massive instability, as node and link failures become the norm rather than the exception. For such highly volatile systems, decentralized gossip-based protocols are emerging as an approach to maintaining simplicity and scalability while achieving fault-tolerant information dissemination.
The CQL Continuous Query Language: Semantic Foundations and Query Execution
- VLDB Journal
, 2003
"... CQL, a Continuous Query Language, is supported by the STREAM prototype Data Stream Management System at Stanford. CQL is an expressive SQL-based declarative language for registering continuous queries against streams and updatable relations. We begin by presenting an abstract semantics that relie ..."
Abstract
-
Cited by 185 (4 self)
- Add to MetaCart
CQL, a Continuous Query Language, is supported by the STREAM prototype Data Stream Management System at Stanford. CQL is an expressive SQL-based declarative language for registering continuous queries against streams and updatable relations. We begin by presenting an abstract semantics that relies only on "black box" mappings among streams and relations.
Issues in Data Stream Management
, 2003
"... Traditional databases store sets of relatively static records with no pre-defined notion of time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories of personal information, many current and emerging applications require sup ..."
Abstract
-
Cited by 105 (5 self)
- Add to MetaCart
Traditional databases store sets of relatively static records with no pre-defined notion of time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories of personal information, many current and emerging applications require support for online analysis of rapidly changing data streams. Limitations of traditional DBMSs in supporting streaming applications have been recognized, prompting research to augment existing technologies and build new systems to manage streaming data. The purpose of this paper is to review recent work in data stream management systems, with an emphasis on application requirements, data models, continuous query languages, and query evaluation.
Characterizing Memory Requirements for Queries over Continuous Data
- In PODS
, 2002
"... This paper deals with continuous conjunctive queries with arithmetic comparisons and optional aggregation over multiple data streams. An algorithm is presented for determining whether or not any given query can be evaluated using a bounded amount of memory for all possible instances of the data stre ..."
Abstract
-
Cited by 79 (9 self)
- Add to MetaCart
This paper deals with continuous conjunctive queries with arithmetic comparisons and optional aggregation over multiple data streams. An algorithm is presented for determining whether or not any given query can be evaluated using a bounded amount of memory for all possible instances of the data streams. For queries that can be evaluated using bounded memory, an execution strategy based on constant-sized synopses of the data streams is proposed. For queries that cannot be evaluated using bounded memory, data stream scenarios are identified in which evaluating the queries requires memory linear in the size of the unbounded streams
Processing sliding window multi-joins in continuous queries over data streams
- Proceedings of the 29th international
, 2003
"... We study sliding window multi-join processing in continuous queries over data streams. Several algorithms are reported for performing continuous, incremental joins, under the assumption that all the sliding windows fit in main memory. The algorithms include multiway incremental nested loop joins (NL ..."
Abstract
-
Cited by 79 (9 self)
- Add to MetaCart
We study sliding window multi-join processing in continuous queries over data streams. Several algorithms are reported for performing continuous, incremental joins, under the assumption that all the sliding windows fit in main memory. The algorithms include multiway incremental nested loop joins (NLJs) and multi-way incremental hash joins. We also propose join ordering heuristics to minimize the processing cost per unit time. We test a possible implementation of these algorithms and show that, as expected, hash joins are faster than NLJs for performing equi-joins, and that the overall processing cost is influenced by the strategies used to remove expired tuples from the sliding windows. 1
A Transducer-Based XML Query Processor
, 2002
"... The XML Stream Machine (XSM) system is a novel XQuery processing paradigm that is tuned to the efficient processing of sequentially accessed XML data (streams). The system compiles a given XQuery into an XSM, which is an XML stream transducer, i.e., an abstract device that takes as input one o ..."
Abstract
-
Cited by 66 (1 self)
- Add to MetaCart
The XML Stream Machine (XSM) system is a novel XQuery processing paradigm that is tuned to the efficient processing of sequentially accessed XML data (streams). The system compiles a given XQuery into an XSM, which is an XML stream transducer, i.e., an abstract device that takes as input one or more XML data streams and produces one or more output streams, potentially using internal buffers. We present a systematic way to translate XQueries into efficient XSMs: First the XQuery is translated into a network of XSMs that correspond to the basic operators of the XQuery language and exchange streams. The network is reduced to a single XSM by repeated application of an XSM composition operation that is optimized to reduce the number of tests and actions that the XSM performs as well as the number of intermediate buffers that it uses. Finally, the optimized XSM is compiled into a C program. First empirical results illustrate the performance benefits of the XSM-based processor.
Clock synchronization for wireless sensor networks: A Survey
- Ad Hoc Networks (Elsevier
, 2005
"... ..."
Power-Conserving Computation of Order-Statistics over Sensor Networks
- In PODS
, 2004
"... We study the problem of power-conserving computation of order statistics in sensor networks. Significant power-reducing optimizations have been devised for computing simple aggregate queries such as count, average, or max over sensor networks. In contrast, aggregate queries such as median have seen ..."
Abstract
-
Cited by 54 (0 self)
- Add to MetaCart
We study the problem of power-conserving computation of order statistics in sensor networks. Significant power-reducing optimizations have been devised for computing simple aggregate queries such as count, average, or max over sensor networks. In contrast, aggregate queries such as median have seen little progress over the brute force approach of forwarding all data to a central server. Moreover, battery life of current sensors seems largely determined by communication costs --- therefore we aim to minimize the number of bytes transmitted. Unoptimized aggregate queries typically impose extremely high power consumption on a subset of sensors located near the server. Metrics such as total communication cost underestimate the penalty of such imbalance: network lifetime may be dominated by the worst-case replacement time for depleted batteries.
Sketching streams through the net: Distributed approximate query tracking
- In VLDB
, 2005
"... While traditional database systems optimize for performance on one-shot query processing, emerging large-scale monitoring applications require continuous tracking of complex dataanalysis queries over collections of physically-distributed streams. Thus, effective solutions have to be simultaneously s ..."
Abstract
-
Cited by 50 (16 self)
- Add to MetaCart
While traditional database systems optimize for performance on one-shot query processing, emerging large-scale monitoring applications require continuous tracking of complex dataanalysis queries over collections of physically-distributed streams. Thus, effective solutions have to be simultaneously space/time efficient (at each remote monitor site), communication efficient (across the underlying communication network), and provide continuous, guaranteed-quality approximate query answers. In this paper, we propose novel algorithmic solutions for the problem of continuously tracking a broad class of complex aggregate queries in such a distributed-streams setting. Our tracking schemes maintain approximate query answers with provable error guarantees, while simultaneously optimizing the storage space and processing time at each remote site, as well as the communication cost across the network. In a nutshell, our algorithms rely on tracking general-purpose randomized sketch summaries of local streams at remote sites along with concise prediction models of local site behavior in order to produce highly communication- and space/time-efficient solutions. The end result is a powerful approximate query tracking framework that readily incorporates several complex analysis queries (including distributed join and multi-join aggregates, and approximate wavelet representations), thus giving the first known low-overhead tracking solution for such queries in the distributed-streams model. Experiments with real data validate our approach, revealing significant savings over naive solutions as well as our analytical worst-case guarantees. 1

