Results 1 - 10
of
13
SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases
- In SIGMOD
, 2004
"... This paper introduces the Scalable INcremental hash-based Algorithm (SINA, for short); a new algorithm for evaluating a set of concurrent continuous spatio-temporal queries. SINA is designed with two goals in mind: (1) Scalability in terms of the number of concurrent continuous spatiotemporal querie ..."
Abstract
-
Cited by 84 (8 self)
- Add to MetaCart
This paper introduces the Scalable INcremental hash-based Algorithm (SINA, for short); a new algorithm for evaluating a set of concurrent continuous spatio-temporal queries. SINA is designed with two goals in mind: (1) Scalability in terms of the number of concurrent continuous spatiotemporal queries, and (2) Incremental evaluation of continuous spatio-temporal queries. SINA achieves scalability by employing a shared execution paradigm where the execution of continuous spatio-temporal queries is abstracted as a spatial join between a set of moving objects and a set of moving queries. Incremental evaluation is achieved by computing only the updates of the previously reported answer. We introduce two types of updates, namely positive and negative updates. Positive or negative updates indicate that a certain object should be added to or removed from the previously reported answer, respectively. SINA manages the computation of positive and negative updates via three phases: the hashing phase, the invalidation phase, and the joining phase. The hashing phase employs an in-memory hash-based join algorithm that results in a set of positive updates. The invalidation phase is triggered every T seconds or when the memory is fully occupied to produce a set of negative updates. Finally, the joining phase is triggered by the end of the invalidation phase to produce a set of both positive and negative updates that result from joining in-memory data with in-disk data. Experimental results show that SINA is scalable and is more e#cient than other index-based spatio-temporal algorithms.
Detection and tracking of discrete phenomena in sensor-network databases
- In SSDBM
, 2005
"... This paper introduces a framework for Phenomena Detection and Tracking (PDT, for short) in sensor network databases. Examples of detectable phenomena include the propagation over time of a pollution cloud or an oil spill region. We provide a crisp definition of a phenomenon that takes into considera ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
This paper introduces a framework for Phenomena Detection and Tracking (PDT, for short) in sensor network databases. Examples of detectable phenomena include the propagation over time of a pollution cloud or an oil spill region. We provide a crisp definition of a phenomenon that takes into consideration both the strength and the time span of the phenomenon. We focus on discrete phenomena where sensor readings are drawn from a discrete set of values, e.g., item numbers or pollutant IDs, and we point out how our work can be extended to handle continuous phenomena. The challenge for the proposed PDT framework is to detect as much phenomena as possible, given the large number of sensors, the overall high arrival rates of sensor data, and the limited system resources. Our proposed PDT framework uses continuous SQL queries to detect and track phenomena. Execution of these continuous queries is performed in three phases; the joining phase, the candidate selection phase, and the grouping/output phase. The joining phase employs an in-memory multi-way join algorithm that produces a set of sensor pairs with similar readings. The candidate selection phase filters the output of the joining phase to select candidate join pairs, with enough strength and time span, as specified by the phenomenon definition. The grouping/output phase constructs the overall phenomenon from the candidate join pairs. We introduce two optimizations to increase the likelihood of phenomena detection while using less system resources. Experimental studies illustrate the performance gains of both the proposed PDT framework and the proposed optimizations.
Efficient Execution of Sliding-Window Queries Over Data Streams
, 2003
"... Emerging data stream processing systems rely on windowing to enable on-the-fly processing of continuous queries over unbounded streams. As a result, several recent e#orts have developed window-aware implementations of query operators such as joins and aggregates. This focus on individual operators, ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Emerging data stream processing systems rely on windowing to enable on-the-fly processing of continuous queries over unbounded streams. As a result, several recent e#orts have developed window-aware implementations of query operators such as joins and aggregates. This focus on individual operators, however, ignores the larger issue of how to coordinate the pipelined execution of such operators when combined into a full windowed query plan. In this paper, we first show how the straightforward application of traditional pipelined query processing techniques to sliding window queries can result in ine#cient and incorrect behavior. We then present three alternative execution techniques that guarantee correct behavior for pipelined sliding window queries and develop new algorithms for correctly evaluating window-based duplicateelimination, Group-By and Set operators in this context. We implemented all of these techniques in a prototype data stream system and report the results of a detailed performance study of the system.
Early Hash Join: A Configurable Algorithm for the Efficient and Early Production of Join Results
, 2005
"... Minimizing both the response time to produce the first few thousand results and the overall execution time is important for interactive querying. Current join algorithms either minimize the execution time at the expense of response time or minimize response time by producing results early witho ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Minimizing both the response time to produce the first few thousand results and the overall execution time is important for interactive querying. Current join algorithms either minimize the execution time at the expense of response time or minimize response time by producing results early without optimizing the total time. We present a hashbased join algorithm, called early hash join, which can be dynamically configured at any point during join processing to tradeoff faster production of results for overall execution time. We demonstrate that varying how inputs are read has a major effect on these two factors and provide formulas that allow an optimizer to calculate the expected rate of join output and the number of I/O operations performed using different input reading strategies. Experimental results show that early hash join performs significantly fewer I/O operations and executes faster than other early join algorithms, especially for one-to-many joins. Its overall execution time is comparable to standard hybrid hash join, but its response time is an order of magnitude faster. Thus, early hash join can replace hybrid hash join in any situation where a fast initial response time is beneficial without the penalty in overall execution time exhibited by other early join algorithms.
Phenomenon-aware sensor database systems
- In Proc. of the EDBT Ph.D. Workshop
, 2006
"... Abstract. Recent advances in large-scale sensor-network technologies enable the deployment of a huge number of sensors in the surrounding environment. Sensors do not live in isolation. Instead, close-by sensors experience similar environmental conditions. Hence, close-by sensors may indulge in a cor ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Abstract. Recent advances in large-scale sensor-network technologies enable the deployment of a huge number of sensors in the surrounding environment. Sensors do not live in isolation. Instead, close-by sensors experience similar environmental conditions. Hence, close-by sensors may indulge in a correlated behavior and generate a “phenomenon”. A phenomenon is characterized by a group of sensors that show “similar” behavior over a period of time. Examples of detectable phenomena include the propagation over time of a pollution cloud or an oil spill region. In this research, we propose a framework to detect and track various forms of phenomena in a sensor field. This framework empowers sensor database systems with phenomenon-awareness capabilities. Phenomenon-aware sensor database systems use high-level knowledge about phenomena in the sensor field to control the acquisition of sensor data and to optimize subsequent user queries. As a vehicle for our research, we build the Nile-PDT system, a framework for Phenomenon Detection and Tracking inside Nile, a prototype data stream management system developed at Purdue University. 1
Sliding Window Query Processing over Data Streams
, 2006
"... I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Database management systems (DBMSs) have been used suc ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Database management systems (DBMSs) have been used successfully in traditional business applications that require persistent data storage and an efficient querying mechanism. Typically, it is assumed that the data are static, unless explicitly modified or deleted by a user or application. Database queries are executed when issued and their answers reflect the current state of the data. However, emerging applications, such as sensor networks, real-time Internet traffic analysis, and on-line financial trading, require support for processing of unbounded data streams. The fundamental assumption of a data stream management system (DSMS) is that new data are generated continually, making it infeasible to store a stream in its entirety. At best, a sliding window of recently arrived data may be maintained, meaning that old data must be removed as time goes on. Furthermore, as the contents of the sliding windows evolve over time, it makes
Scalability management in sensor-network phenomenabases
- In Proc. of SSDBM
, 2006
"... A phenomenon appears in a sensor network when a group of sensors persist to generate similar behavior over a period of time. PhenomenaBases (or databases of phenomena) are equipped with Phenomena Detection and Tracking (PDT) techniques that continuously run in the background of a sensor database sys ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
A phenomenon appears in a sensor network when a group of sensors persist to generate similar behavior over a period of time. PhenomenaBases (or databases of phenomena) are equipped with Phenomena Detection and Tracking (PDT) techniques that continuously run in the background of a sensor database system to detect new phenomena and to track already existing phenomena. The process of phenomena detection and tracking is initiated by a multi-way join operator that comes at the core of PDT techniques to report similar sensor readings. With the increase in the sensor network size, the join operator (and, consequently, query processing in the PhenomenaBase) face several scalability challenges. In this paper, we present a join operator for PhenomenaBases (the SNJoin operator) that is specially-designed for dynamically-configured large-scale sensor networks with distributed processing capabilities. Experimental studies illustrate the scalability of the proposed join operator in PhenomenaBases with respect to the number of detected phenomena and the output delay. 1
A.: The sort-merge-shrink join
- ACM Trans. Database Syst
, 2006
"... One of the most common operations in analytic query processing is the application of an aggregate function to the result of a relational join. We describe an algorithm called the Sort-Merge-Shrink (SMS) Join for computing the answer to such a query over large, disk-based input tables. The key innova ..."
Abstract
- Add to MetaCart
One of the most common operations in analytic query processing is the application of an aggregate function to the result of a relational join. We describe an algorithm called the Sort-Merge-Shrink (SMS) Join for computing the answer to such a query over large, disk-based input tables. The key innovation of the SMS join is that if the input data are clustered in a statistically random fashion on disk, then at all times, the join provides an online, statistical estimator for the eventual answer to the query as well as probabilistic confidence bounds. Thus, a user can monitor the progress of the join throughout its execution and stop the join when satisfied with the estimate’s accuracy or run the algorithm to completion with a total time requirement that is not much longer than that of other common join algorithms. This contrasts with other online join algorithms, which either do not offer such statistical guarantees or can only offer guarantees so long as the input data can fit into main memory.
Adapting State-Intensive Non-Blocking Queries over Distributed Environments
, 2005
"... Main memory is a critical resource in push-based non-blocking query processing, especially for queries with stateful operators. Works in the literature apply partitioned parallel processing to alleviate the stringent memory demands. However, main memory of a distributed system remains limited. Thus, ..."
Abstract
- Add to MetaCart
Main memory is a critical resource in push-based non-blocking query processing, especially for queries with stateful operators. Works in the literature apply partitioned parallel processing to alleviate the stringent memory demands. However, main memory of a distributed system remains limited. Thus, there is a demand for efficient main memory usage even for partitioned parallel queries. In this work, we first investigate two adaptations, namely, disk-based adaptation and distributed adaptation, that adapt operator states when memory overflow happens for complex multi-input operators. We analyze the tradeoffs regarding the factors and polices to be used when adapting operator states to overcome memory overflow. Two approaches, namely, lazy-disk and active-disk adaptations, are proposed to integrate the disk-based and distributed adaptations when the aggregated main memory of a distributed system is not sufficient for the query processing. Both appproaches aim to maximize the overall throughput. Extensive experiments have been conducted on a working system. These experiments reveal various aspects of partitioned parallel processing and their adaptation strategies.
Phenomenon-aware data stream management systems
, 2007
"... Recent advances in large scale data streaming technologies enabled the deploy-
ment of a huge number of streaming sources in the surrounding environment, e.g.,
sensor fields. Streaming sources do not live in isolation. Instead, close-by stream-
ing sources experience similar environmental condition ..."
Abstract
- Add to MetaCart
Recent advances in large scale data streaming technologies enabled the deploy-
ment of a huge number of streaming sources in the surrounding environment, e.g.,
sensor fields. Streaming sources do not live in isolation. Instead, close-by stream-
ing sources experience similar environmental conditions. Hence, close-by streaming
sources may indulge in a correlated behavior and generate a “phenomenon”. A
phenomenon is characterized by a group of streaming sources that show “similar be-
havior” over a period of time. Examples of detectable phenomena include pollution
clouds in the air, oil spills at the ocean surface, fire zones in a building, water floods
of a river, migration of birds, and epidemic spread of diseases. This dissertation pro-
poses a framework to detect, track, and query various forms of phenomena in data
streaming environments. This framework empowers data stream management sys-
tems (DSMSs) with phenomenon-awareness capabilities. Phenomenon-aware data
stream systems use high-level knowledge about phenomena in the data streaming
environment to optimize the execution of subsequent user queries.
To approach the above goal, this dissertation proposes the principle that “phe-
nomenon detection guides query processing” and explores this principle’s implica-
tions on DSMSs. Hence, user queries have the option to view the streaming envi-
ronment at a higher level, i.e., the phenomenon level. In such a phenomenon-aware
query processing paradigm, streams are prioritized and are processed based on a
mechanism that tunes query processing towards data streams that contribute to
detected phenomena.
This dissertation provides a formal definition for a phenomenon, models the phe-
nomenon behavior, and proposes an extended syntax that enables the users to reg-
ister their interesting phenomenon patterns with the system. Also, this dissertation
adopts the concept of phenomenon-aware query processing by adding two major com-
ponents to DSMSs: the Phenomenon Detection and Tracking module (PDT-module)
and the phenomenon-aware optimizer. The PDT-module encompasses scalable tech-
niques to detect the appearance of new phenomena and to track the propagation
of already-detected phenomena. The phenomenon-aware optimizer is an adaptive
optimizer that optimizes user queries continuously based on the feedback it receives
from the PDT-module. Finally, this dissertation considers phenomenon awareness at
the distributed setup of sensor networks by providing a phenomenon-aware data ac-
quisition protocol and by extending the phenomenon detection process to the sensor-
network platform. As a vehicle for this research, the Nile-PhenomenaBase system
is prototyped as a framework for phenomenon-aware query processing inside Nile, a
data stream management system developed at Purdue University.

