Results 1 - 10
of
86
Issues in Data Stream Management
, 2003
"... Traditional databases store sets of relatively static records with no pre-defined notion of time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories of personal information, many current and emerging applications require sup ..."
Abstract
-
Cited by 105 (5 self)
- Add to MetaCart
Traditional databases store sets of relatively static records with no pre-defined notion of time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories of personal information, many current and emerging applications require support for online analysis of rapidly changing data streams. Limitations of traditional DBMSs in supporting streaming applications have been recognized, prompting research to augment existing technologies and build new systems to manage streaming data. The purpose of this paper is to review recent work in data stream management systems, with an emphasis on application requirements, data models, continuous query languages, and query evaluation.
Processing sliding window multi-joins in continuous queries over data streams
- Proceedings of the 29th international
, 2003
"... We study sliding window multi-join processing in continuous queries over data streams. Several algorithms are reported for performing continuous, incremental joins, under the assumption that all the sliding windows fit in main memory. The algorithms include multiway incremental nested loop joins (NL ..."
Abstract
-
Cited by 79 (9 self)
- Add to MetaCart
We study sliding window multi-join processing in continuous queries over data streams. Several algorithms are reported for performing continuous, incremental joins, under the assumption that all the sliding windows fit in main memory. The algorithms include multiway incremental nested loop joins (NLJs) and multi-way incremental hash joins. We also propose join ordering heuristics to minimize the processing cost per unit time. We test a possible implementation of these algorithms and show that, as expected, hash joins are faster than NLJs for performing equi-joins, and that the overall processing cost is influenced by the strategies used to remove expired tuples from the sliding windows. 1
Approximate Join Processing Over Data Streams
, 2003
"... We consider the problem of approximating sliding window joins over data streams in a data stream processing system with limited resources. In our model, we deal with resource constraints by shedding load in the form of dropping tuples from the data streams. We first discuss alternate architectural m ..."
Abstract
-
Cited by 79 (2 self)
- Add to MetaCart
We consider the problem of approximating sliding window joins over data streams in a data stream processing system with limited resources. In our model, we deal with resource constraints by shedding load in the form of dropping tuples from the data streams. We first discuss alternate architectural models for data stream join processing, and we survey suitable measures for the quality of an approximation of a set-valued query result. We then consider the number of generated result tuples as the quality measure, and we give optimal offline and fast online algorithms for it. In a thorough experimental study with synthetic and real data we show the efficacy of our solutions. For applications with demand for exact results we introduce a new Archive-metric which captures the amount of work needed to complete the join in case the streams are archived for later processing.
Load shedding for aggregation queries over data streams (full version
- In preparation
"... Systems for processing continuous monitoring queries over data streams must be adaptive because data streams are often bursty and data characteristics may vary over time. In this paper, we focus on one particular type of adaptivity: the ability to gracefully degrade performance via “load shedding ” ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
Systems for processing continuous monitoring queries over data streams must be adaptive because data streams are often bursty and data characteristics may vary over time. In this paper, we focus on one particular type of adaptivity: the ability to gracefully degrade performance via “load shedding ” (dropping unprocessed tuples to reduce system load) when the demands placed on the system cannot be met in full given available resources. Focusing on aggregation queries, we present algorithms that determine at what points in a query plan should load shedding be performed and what amount of load should be shed at each point in order to minimize the degree of inaccuracy introduced into query answers. We report the results of experiments that validate our analytical conclusions. 1
Scheduling for Shared Window Joins Over Data Streams
- In VLDB
, 2003
"... Continuous Ouery (CO) systems typically exploit commonality among query expressions to achieve improved efficiency through shared processing. Recently proposed CO systems have introduced window specifications in order to support unbounded data streams. There has been, however, little investigat ..."
Abstract
-
Cited by 58 (13 self)
- Add to MetaCart
Continuous Ouery (CO) systems typically exploit commonality among query expressions to achieve improved efficiency through shared processing. Recently proposed CO systems have introduced window specifications in order to support unbounded data streams. There has been, however, little investigation of sharing for windowed query operators.
Memory-Limited Execution of Windowed Stream Joins
, 2004
"... We address the problem of computing approximate answers to continuous sliding-window joins over data streams when the available memory may be insufficient to keep the entire join state. One approximation scenario is to provide a maximum subset of the result, with the objective of losing as few ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
We address the problem of computing approximate answers to continuous sliding-window joins over data streams when the available memory may be insufficient to keep the entire join state. One approximation scenario is to provide a maximum subset of the result, with the objective of losing as few result tuples as possible. An alternative scenario is to provide a random sample of the join result, e.g., if the output of the join is being aggregated.
Operator Placement for In-Network Stream Query Processing
- In Proc. the 24th ACM SIGACT-SIGMOD-SIGART Symposium on Principle of Database Systems(PODS
, 2005
"... In sensor networks, data acquisition frequently takes place at low-capability devices. The acquired data is then transmitted through a hierarchy of nodes having progressively increasing network bandwidth and computational power. We consider the problem of executing queries over these data streams, p ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
In sensor networks, data acquisition frequently takes place at low-capability devices. The acquired data is then transmitted through a hierarchy of nodes having progressively increasing network bandwidth and computational power. We consider the problem of executing queries over these data streams, posed at the root of the hierarchy. To minimize data transmission, it is desirable to perform “in-network ” query processing: do some part of the work at intermediate nodes as the data travels to the root. Most previous work on in-network query processing has focused on aggregation and inexpensive filters. In this paper, we address in-network processing for queries involving possibly expensive conjunctive filters, and joins. We consider the problem of placing operators along the nodes of the hierarchy so that the overall cost of computation and data transmission is minimized. We show that the problem is tractable, give an optimal algorithm, and demonstrate that a simpler greedy operator placement algorithm can fail to find the optimal solution. Finally we define a number of interesting variations of the basic operator placement problem and demonstrate their hardness. 1
Dynamic plan migration for continuous queries over data streams
- In ACM SIGMOD
, 2004
"... Dynamic plan migration is concerned with the on-the-fly transition from one continuous query plan to a semantically equivalent yet more efficient plan. Migration is important for stream monitoring systems where long-running queries may have to withstand fluctuations in stream workloads and data char ..."
Abstract
-
Cited by 44 (4 self)
- Add to MetaCart
Dynamic plan migration is concerned with the on-the-fly transition from one continuous query plan to a semantically equivalent yet more efficient plan. Migration is important for stream monitoring systems where long-running queries may have to withstand fluctuations in stream workloads and data characteristics. Existing migration methods generally adopt a pause-drain-resume strategy that pauses the processing of new data, purges all old data in the existing plan, until finally the new plan can be plugged into the system. However, these existing strategies do not address the problem of migrating query plans that contain stateful operators, such as joins. We now develop solutions for online plan migration for continuous stateful plans. In particular, in this paper, we propose two alternative strategies, called the moving state strategy and the parallel track strategy, one exploiting reusability and the second employs parallelism to seamlessly migrate between continuous join plans without affecting the results of the query. We develop cost models for both migration strategies to analytically compare them. We embed these migration strategies into the CAPE [7], a prototype system of a stream query engine, and conduct a comparative experimental study to evaluate these two strategies for window-based join plans. Our experimental results illustrate that the two strategies can vary significantly in terms of output rates and intermediate storage spaces given distinct system configurations and stream workloads. 1.
Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams
- ACM Transactions on Database Systems, TODS
, 2004
"... We consider the problem of efficiently processing continuous queries over multiple continuous data streams inthe presence of constraints on the datastreams. We specify several types of constraints, and for each constrainttype we identify an “ adherence parameter ” that captures how closely a given s ..."
Abstract
-
Cited by 41 (9 self)
- Add to MetaCart
We consider the problem of efficiently processing continuous queries over multiple continuous data streams inthe presence of constraints on the datastreams. We specify several types of constraints, and for each constrainttype we identify an “ adherence parameter ” that captures how closely a given stream or joining pair of streams adheres to a constraint of that type. We then present a query execution algorithm that takes-constraints over streams into account in order to reduce memory overhead. In general, the tighter the adherence parameters are in the-constraints, the less memory required. Furthermore, if input streams do not adhere to constraints within the specified adherence parameters, our algorithm automatically degrades gracefully to provide continuous approximate answers. We have implemented our approach in a testbed continuous query processor and preliminary experimental results are reported. 1
Stream Window Join: Tracking Moving Objects in Sensor-Network Databases
- In SSDBM
, 2003
"... The widespread use of sensor networks presents revolutionary opportunities for life and environmental science applications. Many of these applications involve continuous queries that require the tracking, monitoring, and correlation of multi-sensor data that represent moving objects. We propose to a ..."
Abstract
-
Cited by 38 (11 self)
- Add to MetaCart
The widespread use of sensor networks presents revolutionary opportunities for life and environmental science applications. Many of these applications involve continuous queries that require the tracking, monitoring, and correlation of multi-sensor data that represent moving objects. We propose to answer these queries using a multi-way stream window join operator. This form of join over multisensor data must cope with the infinite nature of sensor data streams and the delays in network transmission. This paper introduces a class of join algorithms, termed W-join, for joining multiple infinite data streams. W-join addresses the infinite nature of the data streams by joining stream data items that lie within a sliding window and that match a certain join condition. W-join can be used to track the motion of a moving object or detect the propagation of clouds of hazardous material or pollution spills over time in a sensor network environment. We describe two new algorithms for W-join, and address variations and local/global optimizations related to specifying the nature of the window constraints to fulfill the posed queries. The performance of the proposed algorithms are studied experimentally in a prototype stream database system, using synthetic data streams and real time-series data. Tradeoffs of the proposed algorithms and their advantages and disadvantages are highlighted, given variations in the aggregate arrival rates of the input data streams and the desired response times per query.

