Results 1  10
of
52
Models and issues in data stream systems
 IN PODS
, 2002
"... In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, timevarying data streams. In addition to reviewing past work releva ..."
Abstract

Cited by 620 (19 self)
 Add to MetaCart
In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, timevarying data streams. In addition to reviewing past work relevant to data stream systems and current projects in the area, the paper explores topics in stream query languages, new requirements and challenges in query processing, and algorithmic issues.
An improved data stream summary: The CountMin sketch and its applications
 J. Algorithms
, 2004
"... Abstract. We introduce a new sublinear space data structure—the CountMin Sketch — for summarizing data streams. Our sketch allows fundamental queries in data stream summarization such as point, range, and inner product queries to be approximately answered very quickly; in addition, it can be applie ..."
Abstract

Cited by 293 (36 self)
 Add to MetaCart
Abstract. We introduce a new sublinear space data structure—the CountMin Sketch — for summarizing data streams. Our sketch allows fundamental queries in data stream summarization such as point, range, and inner product queries to be approximately answered very quickly; in addition, it can be applied to solve several important problems in data streams such as finding quantiles, frequent items, etc. The time and space bounds we show for using the CM sketch to solve these problems significantly improve those previously known — typically from 1/ε 2 to 1/ε in factor. 1
Maintaining Stream Statistics over Sliding Windows (Extended Abstract)
, 2002
"... Mayur Datar Aristides Gionis y Piotr Indyk z Rajeev Motwani x Abstract We consider the problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far. We refer to this model as the sliding window model. We consider the following basic ..."
Abstract

Cited by 228 (8 self)
 Add to MetaCart
Mayur Datar Aristides Gionis y Piotr Indyk z Rajeev Motwani x Abstract We consider the problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far. We refer to this model as the sliding window model. We consider the following basic problem: Given a stream of bits, maintain a count of the number of 1's in the last N elements seen from the stream. We show that using O( 1 ffl log 2 N) bits of memory, we can estimate the number of 1's to within a factor of 1 + ffl. We also give a matching lower bound of \Omega\Gamma 1 ffl log 2 N) memory bits for any deterministic or randomized algorithms. We extend our scheme to maintain the sum of the last N positive integers. We provide matching upper and lower bounds for this more general problem as well. We apply our techniques to obtain efficient algorithms for the Lp norms (for p 2 [1; 2]) of vectors under the sliding window model. Using the algorithm for the basic counting problem, one can adapt many other techniques to work for the sliding window model, with a multiplicative overhead of O( 1 ffl log N) in memory and a 1 + ffl factor loss in accuracy. These include maintaining approximate histograms, hash tables, and statistics or aggregates such as sum and averages.
SIA: Secure Information Aggregation in Sensor Networks
, 2003
"... Sensor networks promise viable solutions to many monitoring problems. However, the practical deployment of sensor networks faces many challenges imposed by realworld demands. Sensor nodes often have limited computation and communication resources and battery power. Moreover, in many applications se ..."
Abstract

Cited by 175 (11 self)
 Add to MetaCart
Sensor networks promise viable solutions to many monitoring problems. However, the practical deployment of sensor networks faces many challenges imposed by realworld demands. Sensor nodes often have limited computation and communication resources and battery power. Moreover, in many applications sensors are deployed in open environments, and hence are vulnerable to physical attacks, potentially compromising the sensor's cryptographic keys. One of the basic and indispensable functionalities of sensor networks is the ability to answer queries over the data acquired by the sensors. The resource constraints and security issues make designing mechanisms for information aggregation in large sensor networks particularly challenging.
DataStreams and Histograms
, 2001
"... Histograms have been used widely to capture data distribution, to represent the data by a small number of step functions. Dynamic programming algorithms which provide optimal construction of these histograms exist, albeit running in quadratic time and linear space. In this paper we provide linear ti ..."
Abstract

Cited by 130 (8 self)
 Add to MetaCart
Histograms have been used widely to capture data distribution, to represent the data by a small number of step functions. Dynamic programming algorithms which provide optimal construction of these histograms exist, albeit running in quadratic time and linear space. In this paper we provide linear time construction of 1 + epsilon approximation of optimal histograms, running in polylogarithmic space. Our results extend to the context of datastreams, and in fact generalize to give 1 + epsilon approximation of several problems in datastreams which require partitioning the index set into intervals. The only assumptions required are that the cost of an interval is monotonic under inclusion (larger interval has larger cost) and that the cost can be computed or approximated in small space. This exhibits a nice class of problems for which we can have near optimal datastream algorithms.
Issues in Data Stream Management
, 2003
"... Traditional databases store sets of relatively static records with no predefined notion of time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories of personal information, many current and emerging applications require sup ..."
Abstract

Cited by 125 (6 self)
 Add to MetaCart
Traditional databases store sets of relatively static records with no predefined notion of time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories of personal information, many current and emerging applications require support for online analysis of rapidly changing data streams. Limitations of traditional DBMSs in supporting streaming applications have been recognized, prompting research to augment existing technologies and build new systems to manage streaming data. The purpose of this paper is to review recent work in data stream management systems, with an emphasis on application requirements, data models, continuous query languages, and query evaluation.
Comparing data streams using hamming norms (how to zero in)
, 2003
"... Massive data streams are now fundamental to many data processing applications. For example, Internet routers produce large scale diagnostic data streams. Such streams are rarely stored in traditional databases and instead must be processed “on the fly” as they are produced. Similarly, sensor networ ..."
Abstract

Cited by 71 (7 self)
 Add to MetaCart
Massive data streams are now fundamental to many data processing applications. For example, Internet routers produce large scale diagnostic data streams. Such streams are rarely stored in traditional databases and instead must be processed “on the fly” as they are produced. Similarly, sensor networks produce multiple data streams of observations from their sensors. There is growing focus on manipulating data streams and, hence, there is a need to identify basic operations of interest in managing data streams, and to support them efficiently. We propose computation of the Hamming norm as a basic operation of interest. The Hamming norm formalizes ideas that are used throughout data processing. When applied to a single stream, the Hamming norm gives the number of distinct items that are present in that data stream, which is a statistic of great interest in databases. When applied to a pair of streams, the Hamming norm gives an important measure of (dis)similarity: the number of unequal item counts in the two streams. Hamming norms have many uses in comparing data streams. We present a novel approximation technique for estimating the Hamming norm for massive data streams; this relies on what we call the “l0 sketch ” and we prove its accuracy. We test our approximation method on a large quantity of synthetic and real stream data, and show that the estimation is accurate to within a few percentage points.
Estimating aggregates on a peertopeer network
, 2003
"... As PeertoPeer (P2P) networks become popular, there is an emerging need to collect a variety of statistical summary information about the participating nodes. The P2P networks of today lack mechanisms to compute even such basic aggregates as MIN, MAX, SUM, COUNT or AVG. In this paper, we define and ..."
Abstract

Cited by 65 (4 self)
 Add to MetaCart
As PeertoPeer (P2P) networks become popular, there is an emerging need to collect a variety of statistical summary information about the participating nodes. The P2P networks of today lack mechanisms to compute even such basic aggregates as MIN, MAX, SUM, COUNT or AVG. In this paper, we define and study the NODEAGGREGATION problem that is concerned with aggregating data stored at nodes in the network. We present generic schemes that can be used to compute any of the basic aggregation functions accurately and robustly. Our schemes can be used as building blocks for tools to collect statistics on network topology, user behavior and other node characteristics. This is a STUDENT paper intended as a REGULAR presentation. I.