Results 1 
7 of
7
Sliding Window Query Processing over Data Streams
, 2006
"... I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Database management systems (DBMSs) have been used suc ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Database management systems (DBMSs) have been used successfully in traditional business applications that require persistent data storage and an efficient querying mechanism. Typically, it is assumed that the data are static, unless explicitly modified or deleted by a user or application. Database queries are executed when issued and their answers reflect the current state of the data. However, emerging applications, such as sensor networks, realtime Internet traffic analysis, and online financial trading, require support for processing of unbounded data streams. The fundamental assumption of a data stream management system (DSMS) is that new data are generated continually, making it infeasible to store a stream in its entirety. At best, a sliding window of recently arrived data may be maintained, meaning that old data must be removed as time goes on. Furthermore, as the contents of the sliding windows evolve over time, it makes
Sketching asynchronous streams over a sliding window
 In PODC
, 2006
"... We study the problem of maintaining sketches of recent elements of a data stream. Motivated by applications involving network data, we consider streams that are asynchronous, in which the observed order of data is not the same as the time order in which the data was generated. The notion of recent e ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
We study the problem of maintaining sketches of recent elements of a data stream. Motivated by applications involving network data, we consider streams that are asynchronous, in which the observed order of data is not the same as the time order in which the data was generated. The notion of recent elements of a stream is modeled by the sliding timestamp window, which is the set of elements with timestamps that are close to the current time. We design algorithms for maintaining sketches of all elements within the sliding timestamp window that can give provably accurate estimates of two basic aggregates, the sum and the median, of a stream of numbers. The space taken by the sketches, the time needed for querying the sketch, and the time for inserting new elements into the sketch are all polylog with respect to the maximum window size and the values of the data items in the window. Our sketches can be easily combined in a lossless and compact way, making them useful for distributed computations over data streams. Previous works on sketching recent elements of a data stream have all considered the more restrictive scenario of synchronous streams, where the observed order of data is the same as the time order in which the data was generated. Our notion of recency of elements is more general than that studied in previous work, and thus our sketches are more robust to network delays and asynchrony.
TimeDecaying Sketches for Sensor Data Aggregation
, 2007
"... We present a new sketch for summarizing network data. The sketch has the following properties which make it useful in communicationefficient aggregation in distributed streaming scenarios, such as sensor networks: the sketch is duplicateinsensitive, i.e. reinsertions of the same data will not affe ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
We present a new sketch for summarizing network data. The sketch has the following properties which make it useful in communicationefficient aggregation in distributed streaming scenarios, such as sensor networks: the sketch is duplicateinsensitive, i.e. reinsertions of the same data will not affect the sketch, and hence the estimates of aggregates. Unlike previous duplicateinsensitive sketches for sensor data aggregation [26, 12], it is also timedecaying, so that the weight of a data item in the sketch can decrease with time according to a userspecified decay function. The sketch can give provably approximate guarantees for various aggregates of data, including the sum, median, quantiles, and frequent elements. The size of the sketch and the time taken to update it are both polylogarithmic in the size of the relevant data. Further, multiple sketches computed over distributed data can be combined without losing the accuracy guarantees. To our knowledge, this is the first sketch that combines all the above properties.
Distinctvalues estimation over data streams
 In Data Stream Management: Processing HighSpeed Data
"... Abstract. In this chapter, we consider the problem of estimating the number of distinct values in a data stream with repeated values. Distinctvalues estimation was one of the first data stream problems studied: In the mid1980’s, Flajolet and Martin gave an effective algorithm that uses only logarit ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract. In this chapter, we consider the problem of estimating the number of distinct values in a data stream with repeated values. Distinctvalues estimation was one of the first data stream problems studied: In the mid1980’s, Flajolet and Martin gave an effective algorithm that uses only logarithmic space. Recent work has built upon their technique, improving the accuracy guarantees on the estimation, proving lower bounds, and considering other settings such as sliding windows, distributed streams, and sensor networks. 1
1 Approximate Algorithms for Computing Spatial Distance Histograms with Accuracy Guarantees
"... Abstract — Particle simulation has become an important research tool in many scientific and engineering fields. Data generated by such simulations imposes great challenges to database storage and query processing. One of the queries against particle simulation data, the spatial distance histogram (S ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract — Particle simulation has become an important research tool in many scientific and engineering fields. Data generated by such simulations imposes great challenges to database storage and query processing. One of the queries against particle simulation data, the spatial distance histogram (SDH) query, is the building block of many highlevel analytics, and requires quadratic time to compute using a straightforward algorithm. Previous work has developed efficient algorithms that compute exact SDHs with time complexity O ` N 3 ´ 2 for twodimensional data, and O ` N 5 ´ 3 for threedimensional data, where N is the number of particles in the simulation system. While beating the naive solution, such algorithms are still not practical in processing SDH queries against largescale simulation data. In this paper, we take a different path to tackle this problem by focusing on approximate algorithms with provable error bounds. We first present a solution derived from the aforementioned exact SDH algorithm, and this solution has running time that is unrelated to the input size N. While an error bound can be easily identified, experimental results show that the accuracy of such an algorithm is significantly higher than what is given by such a (loose) bound. To study the difference between the experimental results and the theoretical bound, we develop a mathematical model to analyze the mechanism that leads to errors in the basic approximate algorithm. Our model provides insights on how the algorithm can be improved to achieve higher accuracy and efficiency. Such insights give rise to a new approximate algorithm with improved time/accuracy tradeoff. Experimental results confirm our analysis. Index Terms — molecular simulation, particle simulation, spatial distance histogram, radial distribution functions, quadtree, scientific databases I.
Norm, Point, and Distance Estimation Over Multiple Signals Using Max–Stable Distributions
"... Consider a set of signals fs: {1,..., N} → [0,..., M] appearing as a stream of tuples (i, fs(i)) in arbitrary order of i and s. We would like to devise one pass approximate algorithms for estimating various functionals on the dominant signal fmax, defined as fmax = {(i, maxs fs(i)), ∀i}. For exampl ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Consider a set of signals fs: {1,..., N} → [0,..., M] appearing as a stream of tuples (i, fs(i)) in arbitrary order of i and s. We would like to devise one pass approximate algorithms for estimating various functionals on the dominant signal fmax, defined as fmax = {(i, maxs fs(i)), ∀i}. For example, the “worst case influence ” which is the F1– norm of the dominant signal [7], general Fp–norms, and special types of distances between dominant signals. The only known previous work in this setting are the algorithms of Cormode and Muthukrishnan [7] and Pavan and Tirthapura [18] which can only estimate the F1–norm over fmax. No previous work addressed more general norms or distance estimation. In this work, we use a novel sketch, based on the properties of max–stable distributions, for these more general problems. The max–stable sketch is a significant improvement over previous alternatives in terms of simplicity of implementation, space requirements, and insertion cost, while providing similar approximation guarantees. To assert our statements, we also conduct an experimental evaluation using real datasets. 1
Two Improved RangeEfficient Algorithms for F0 Estimation ⋆
"... Abstract. We present two new algorithms for rangeefficient F0 estimating problem and improve the previously best known result, proposed by Pavan and Tirthapura in [15]. Furthermore, these algorithms presented in our paper also improve the previously best known result for MaxDominance Norm Problem. ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We present two new algorithms for rangeefficient F0 estimating problem and improve the previously best known result, proposed by Pavan and Tirthapura in [15]. Furthermore, these algorithms presented in our paper also improve the previously best known result for MaxDominance Norm Problem. 1