Results 1 - 10
of
10
Adaptive load shedding for windowed stream joins
- In Proc. Int. Conf. on Information and Knowledge Management (CIKM
, 2005
"... We present an adaptive load shedding approach for windowed stream joins. In contrast to the conventional approach of dropping tuples from the input streams, we explore the concept of selective processing for load shedding. We allow stream tuples to be stored in the windows and shed excessive CPU loa ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
We present an adaptive load shedding approach for windowed stream joins. In contrast to the conventional approach of dropping tuples from the input streams, we explore the concept of selective processing for load shedding. We allow stream tuples to be stored in the windows and shed excessive CPU load by performing the join operations, not on the entire set of tuples within the windows, but on a dynamically changing subset of tuples that are learned to be highly beneficial. We support such dynamic selective processing through three forms of runtime adaptations: adaptation to input stream rates, adaptation to time correlation between the streams and adaptation to join directions. Our load shedding approach enables us to integrate utility-based load shedding with time correlation-based load shedding. Indexes are used to further speed up the execution of stream joins. Experiments are conducted to evaluate our adaptive load shedding in terms of output rate and utility. The results show that our selective processing approach to load shedding is very effective and significantly outperforms the approach that drops tuples from the input streams. 1
Duplicate detection in click streams
- In WWW ’05: Proceedings of the 14th international conference on World Wide Web
, 2005
"... We consider the problem of finding duplicates in data streams. Duplicate detection in data streams is utilized in various applications including fraud detection. We develop a solution based on Bloom Filters [9], and discuss the space and time requirements for running the proposed algorithm in both t ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
We consider the problem of finding duplicates in data streams. Duplicate detection in data streams is utilized in various applications including fraud detection. We develop a solution based on Bloom Filters [9], and discuss the space and time requirements for running the proposed algorithm in both the contexts of sliding, and landmark stream windows. We run a comprehensive set of experiments, using both real and synthetic click streams, to evaluate the performance of the proposed solution. The results demonstrate that the proposed solution yields extremely low error rates. 1
Continuous Query Processing in Spatiotemporal Databases
- In Proceedings of the ICDE/EDBT PhD Workshop
, 2004
"... The tremendous increase of cellular phones, GPS-like devices, and RFIDs results in highly dynamic environments where objects as well as queries are continuously moving. In this paper, we present a continuous query processor designed specifically for highly dynamic environments (e.g., location-aware ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
The tremendous increase of cellular phones, GPS-like devices, and RFIDs results in highly dynamic environments where objects as well as queries are continuously moving. In this paper, we present a continuous query processor designed specifically for highly dynamic environments (e.g., location-aware environments). We implemented the proposed continuous query processor inside the PLACE server (Pervasive Location-Aware Computing Environments); a scalable location-aware database server currently developed at Purdue University. The PLACE server extends data streaming management systems to support location-aware environments. Such environments are characterized by the wide variety of continuous spatio-temporal queries and the unbounded spatio-temporal streams. The proposed continuous query processor mainly includes: (1) Developing new incremental spatio-temporal operators to support a wide variety of continuous spatio-temporal queries, (2) Extending the semantic of sliding window queries to deal with spatial sliding windows as well as temporal sliding windows, and (3) Providing a shared execution framework for scalable execution of a set of concurrent continuous spatio-temporal queries. Preliminary experimental evaluation shows the promising performance of the continuous query processor of the PLACE server.
Sliding Window Query Processing over Data Streams
, 2006
"... I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Database management systems (DBMSs) have been used suc ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Database management systems (DBMSs) have been used successfully in traditional business applications that require persistent data storage and an efficient querying mechanism. Typically, it is assumed that the data are static, unless explicitly modified or deleted by a user or application. Database queries are executed when issued and their answers reflect the current state of the data. However, emerging applications, such as sensor networks, real-time Internet traffic analysis, and on-line financial trading, require support for processing of unbounded data streams. The fundamental assumption of a data stream management system (DSMS) is that new data are generated continually, making it infeasible to store a stream in its entirety. At best, a sliding window of recently arrived data may be maintained, meaning that old data must be removed as time goes on. Furthermore, as the contents of the sliding windows evolve over time, it makes
On concurrency control in sliding window queries over data streams
- In Proc. Int. Conf. on Extending Database Technology (EDBT
, 2006
"... Work done while the author was visiting the University of Waterloo. Data stream systems execute a dynamic workload of long-running and one-time queries, with the streaming inputs typically bounded by sliding windows. For efficiency, windows may be advanced periodically by replacing the oldest part o ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Work done while the author was visiting the University of Waterloo. Data stream systems execute a dynamic workload of long-running and one-time queries, with the streaming inputs typically bounded by sliding windows. For efficiency, windows may be advanced periodically by replacing the oldest part of the window with a batch of newly arrived data. Existing work on stream processing assumes that a window cannot be advanced while it is being accessed by a query. In this paper, we argue that concurrent processing of queries (reads) and window-slides (writes) is required by data stream systems in order to allow prioritized query scheduling and improve the freshness of answers. We prove that the traditional notion of conflict serializability is insufficient in this context and define stronger isolation levels that restrict the allowed serialization orders. We also design and experimentally evaluate a transaction scheduler that efficiently enforces the new isolation levels by taking advantage of the access patterns of sliding window queries. 1
A load shedding framework and optimizations for M-way windowed stream joins
- In Proc. of IEEE ICDE
, 2007
"... Tuple dropping, though commonly used for load shedding in most stream operations, is inadequate for m-way, windowed stream joins. The join output rate can be overly reduced because it fails to exploit the time correlations likely to exist among interrelated streams. In this paper, we introduce GrubJ ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Tuple dropping, though commonly used for load shedding in most stream operations, is inadequate for m-way, windowed stream joins. The join output rate can be overly reduced because it fails to exploit the time correlations likely to exist among interrelated streams. In this paper, we introduce GrubJoin: an adaptive, m-way, windowed stream join that effectively performs time correlation-aware CPU load shedding. GrubJoin maximizes the output rate by achieving nearoptimal window harvesting, which picks only the most profitable window segments for the join. Due to combinatorial explosion of possible m-way join sequences involving window segments, m-way, windowed stream joins pose several unique challenges. We focus on addressing two of them: (1) How can we quickly determine the optimal window harvesting configuration for any m-way, windowed stream join? (2) How can we monitor and learn the time correlations among the streams with high accuracy and minimal overhead? To tackle these challenges, we formalize window harvesting as an optimization problem, develop greedy heuristics to determine nearoptimal window harvesting configurations and use approximation techniques to capture the time correlations. Our experimental results show that GrubJoin is vastly superior to tuple dropping when time correlations exist and is equally effective when time correlations are nonexistent. 1
Estreams: Towards an Integrated Model for Event and Stream Processing
, 2004
"... In this paper, we analyze the similarities and differences between the event and stream processing models. Although research seems to address different types of applications, there are a number of similarities and differences between the two models. We argue that for most of the applications conside ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In this paper, we analyze the similarities and differences between the event and stream processing models. Although research seems to address different types of applications, there are a number of similarities and differences between the two models. We argue that for most of the applications considered for stream processing, event component is needed and is not currently supported. By synthesizing these two and combining their strengths, we argue that the integrated model will be better than the sum of its parts. We then propose our integrated model and its computation to combine the capabilities of both models for applications that require not only to monitor changes through continuous queries, but also to express and process complex events generated by continuous queries. We introduce the notion of a semantic window, which significantly extends the current window concept for continuous queries, and stream modifiers in order to extend current stream computation model for complicated change detection. We further discuss the extension of event specification to include continuous queries. Finally, we discuss the implementation of our integrated model using the MavStream system and the Local Event Detector of Sentinel.
GRUBJOIN: An Adaptive Multi-Way Windowed Stream Join with Time Correlation-Aware CPU Load Shedding
, 2005
"... Dropping tuples has been commonly used for load shedding. However, tuple dropping generally is inadequate to shed load for multiway windowed stream joins. The output rate can be unnecessarily and severely degraded because tuple dropping does not recognize time correlations likely to exist among the ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Dropping tuples has been commonly used for load shedding. However, tuple dropping generally is inadequate to shed load for multiway windowed stream joins. The output rate can be unnecessarily and severely degraded because tuple dropping does not recognize time correlations likely to exist among the streams. This paper introduces GrubJoin: an adaptive multi-way windowed stream join that efficiently performs time correlation-aware CPU load shedding. GrubJoin maximizes the output rate by achieving nearoptimal window harvesting within an operator throttling framework, i.e., regulating the fractions of the join windows that are processed by the multi-way join. Window harvesting performs the join using only certain more useful segments of the join windows. Due mainly to the combinatorial explosion of possible multi-way join sequences involving various segments of individual join windows, GrubJoin faces a set of unique challenges, such as determining the optimal window harvesting configuration and learning the time correlations among the streams. To tackle these challenges, we formalize window harvesting as an optimization problem, develop greedy heuristics to determine near-optimal window harvesting configurations and use approximation techniques to capture the time correlations among the streams. Experimental results show that GrubJoin is vastly superior to tuple dropping when time correlations exist among the streams and is equally effective as tuple dropping in the absence of time correlations.
Querying Sliding Windows over On-Line Data Streams ⋆
"... Abstract. A data stream is a real-time, continuous, ordered sequence of items generated by sources such as sensor networks, Internet traffic flow, credit card transaction logs, and on-line financial tickers. Processing continuous queries over data streams introduces a number of research problems, on ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. A data stream is a real-time, continuous, ordered sequence of items generated by sources such as sensor networks, Internet traffic flow, credit card transaction logs, and on-line financial tickers. Processing continuous queries over data streams introduces a number of research problems, one of which concerns evaluating queries over sliding windows defined on the inputs. In this paper, we describe our research on sliding window query processing, with an emphasis on query models and algebras, physical and logical optimization, efficient processing of multiple windowed queries, and generating approximate answers. We outline previous work in streaming query processing and sliding window algorithms, summarize our contributions to date, and identify directions for future work. 1
A SURVEY OF JOIN PROCESSING IN DATA STREAMS
"... Given the fundamental role played by joins in querying relational databases, it is not surprising that stream join has also been the focus of much research on streams. Recall that relational (theta) join between two non-streaming relations R1 and R2, denoted R1⊲ ⊳ θR2, returns the set of all pairs 〈 ..."
Abstract
- Add to MetaCart
Given the fundamental role played by joins in querying relational databases, it is not surprising that stream join has also been the focus of much research on streams. Recall that relational (theta) join between two non-streaming relations R1 and R2, denoted R1⊲ ⊳ θR2, returns the set of all pairs 〈r1,r2〉,wherer1 ∈ R1,

