## Maintaining Stream Statistics over Sliding Windows (Extended Abstract) (2002)

### Cached

### Download Links

- [www.stanford.edu]
- [www-cs-students.stanford.edu]
- [www-cs.stanford.edu]
- [www.dei.unipd.it]
- DBLP

### Other Repositories/Bibliography

Citations: | 242 - 9 self |

### BibTeX

@MISC{Datar02maintainingstream,

author = {Mayur Datar and Aristides Gionis and Piotr Indyk and Rajeev Motwani},

title = {Maintaining Stream Statistics over Sliding Windows (Extended Abstract)},

year = {2002}

}

### Years of Citing Articles

### OpenURL

### Abstract

Mayur Datar Aristides Gionis y Piotr Indyk z Rajeev Motwani x Abstract We consider the problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far. We refer to this model as the sliding window model. We consider the following basic problem: Given a stream of bits, maintain a count of the number of 1's in the last N elements seen from the stream. We show that using O( 1 ffl log 2 N) bits of memory, we can estimate the number of 1's to within a factor of 1 + ffl. We also give a matching lower bound of \Omega\Gamma 1 ffl log 2 N) memory bits for any deterministic or randomized algorithms. We extend our scheme to maintain the sum of the last N positive integers. We provide matching upper and lower bounds for this more general problem as well. We apply our techniques to obtain efficient algorithms for the Lp norms (for p 2 [1; 2]) of vectors under the sliding window model. Using the algorithm for the basic counting problem, one can adapt many other techniques to work for the sliding window model, with a multiplicative overhead of O( 1 ffl log N) in memory and a 1 + ffl factor loss in accuracy. These include maintaining approximate histograms, hash tables, and statistics or aggregates such as sum and averages.

### Citations

1967 | Randomized Algorithms
- Motwani, Raghavan
- 1995
(Show Context)
Citation Context ...east k N 16 log2 k bits of memory. Proof. Define an algorithm A to be ǫ-correct for an input instance I if the value returned by A on input I has relative error less than ǫ. The Yao minimax principle =-=[15]-=- implies that the expected space complexity of the optimal ǫ-correct deterministic algorithm for an arbitrarily chosen input distribution p is a lower bound on the expected space complexity of the opt... |

739 | The space complexity of approximating the frequency moments
- ALON, MATIAS, et al.
- 1999
(Show Context)
Citation Context ...r scheme to maintain the sum of the last N positive integers and provide matching upper and lower bounds for this more general problem as well. We also show how to efficiently compute the Lp norms (p∈=-=[1, 2]-=-) of vectors in the sliding window model using our techniques. Using our algorithm, one can adapt many other techniques to work for the sliding window model with a multiplicative overhead of O( 1 log ... |

277 | Clustering data streams
- Guha, Mishra, et al.
- 2000
(Show Context)
Citation Context ...constraints. In data mining applications, for example, the volume of data stored on disk is so large that it is only possible to make one pass (or perhaps a very small number of passes) over the data =-=[12, 11]-=-. The objective is to perform the required computations using the stream generated by a single scan of the data, using only a bounded amount of memory and without recourse to indexes, hash tables, or ... |

270 | Stable distributions, pseudorandom generators, embeddings and data stream computation
- INDYK
(Show Context)
Citation Context ...levant to gathering statistics or answering queries is the set of the last N elements to arrive. The sliding window refers to the window of active data elements at a given time instant. Previous work =-=[1, 5, 13]-=- on stream computations addresses the problems of approximating frequency moments and computing the Lp differences of streams. There has also been work on maintaining histograms [14, 10]. While Jagadi... |

206 | Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
- Gilbert, Kotidis, et al.
- 2001
(Show Context)
Citation Context ... the latter work are range or point queries over the time attribute. In the earlier work, the underlying model is that all of the data elements seen so far are relevant. Recent work by Gilbert et al. =-=[8]-=- considers, among other things, the problem of maintaining aged aggregates over data streams. For a data stream . . . , a (−2) , a (−1) , a (0) , where a (0) is the most recently seen data element, th... |

171 | Computing on data streams
- Henzinger, Raghavan, et al.
- 1999
(Show Context)
Citation Context ...constraints. In data mining applications, for example, the volume of data stored on disk is so large that it is only possible to make one pass (or perhaps a very small number of passes) over the data =-=[12, 11]-=-. The objective is to perform the required computations using the stream generated by a single scan of the data, using only a bounded amount of memory and without recourse to indexes, hash tables, or ... |

150 | Optimal histograms with quality guarantees - Jagadish, Koudas, et al. - 1998 |

147 |
On random sampling over joins
- Chaudhuri, Motwani, et al.
(Show Context)
Citation Context ... as intermediate results of pipelined operators during evaluation of a query plan in an SQL database; without materializing some or all of the temporary results, only one pass on the data is possible =-=[3]-=-. In most of these applications, the goal is to make decisions based on the statistics or models gathered over the “recently observed” data elements. For example, ones1796 M. DATAR, A. GIONIS, P. INDY... |

141 | Computing iceberg queries efficiently - Fang, Shivakumar, et al. - 1998 |

138 | Data-streams and histograms
- Guha, Koudas, et al.
- 2001
(Show Context)
Citation Context ...evious work [1, 5, 13] on stream computations addresses the problems of approximating frequency moments and computing the Lp differences of streams. There has also been work on maintaining histograms =-=[14, 10]-=-. While Jagadish et al. [14] address the off-line version of computing optimal histograms, Guha and Koudas [10] provide a technique for maintaining near optimal time-based histograms in an on-line fas... |

95 |
An approximate L1difference algorithm for massive data streams
- FEIGENBAUM, KANNAN, et al.
- 2002
(Show Context)
Citation Context ...levant to gathering statistics or answering queries is the set of the last N elements to arrive. The sliding window refers to the window of active data elements at a given time instant. Previous work =-=[1, 5, 13]-=- on stream computations addresses the problems of approximating frequency moments and computing the Lp differences of streams. There has also been work on maintaining histograms [14, 10]. While Jagadi... |

72 |
Probabilistic counting
- Flajolet, Martin
- 1983
(Show Context)
Citation Context ...ordered. In that case, the expected length of the list is O(log N), and the space complexity is given by O(log N log R). 7.5. Distinct values. It is easy to adapt the technique of Flajolet and Martin =-=[6]-=- to estimate the number of distinct elements in the last N data elements. Their probabilistic counting technique 2 maintains a bitmap of size O(log R), where R is an upper bound on the number of disti... |

60 | Hancock: a language for extracting signatures from data streams
- Cortes, Fisher, et al.
- 2000
(Show Context)
Citation Context ...O( 1 ǫ (log N + log M)(log N)d) bits of memory and give a relative error of ǫ. However, for high dimensional vectors, we propose the use of sketches. We denote (Lp(B)) p by fp and estimate fp for p ∈ =-=[1,2]-=-. The function fp clearly admits properties P1–P4, assuming M ≤ N O(1) . For P5, fp(B) admits a sketching technique 2 C2sSTREAM STATISTICS OVER SLIDING WINDOWS 1809 which requires O(log M log(1/δ)/ˆǫ ... |

53 | Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation - Guha, Koudas - 2002 |

11 |
Architecture of a Passive Monitoring System for Backbone IP Networks
- Fraleigh, Moon, et al.
- 2000
(Show Context)
Citation Context ...n is network traffic engineering, in which information about current network performance—latency, bandwidth, etc.—is generated online and is used to monitor and adjust network performance dynamically =-=[7, 16]-=-. In this application, it is generally both impractical and unnecessary to process anything but the most recent data. There are other traditional and emerging applications in which data streams play a... |

2 | Available at http://www.cisco.com/warp/public/cc/pd/iosw/ioft/neflct/tech/napps wp.htm - Whitepaper - 2000 |

1 |
Netflow Services and Applications, White paper, Cisco Systems
- Systems
- 2000
(Show Context)
Citation Context ...n is network traffic engineering, in which information about current network performance—latency, bandwidth, etc.—is generated online and is used to monitor and adjust network performance dynamically =-=[7, 16]-=-. In this application, it is generally both impractical and unnecessary to process anything but the most recent data. There are other traditional and emerging applications in which data streams play a... |

1 |
Architecture of a Passive Monitoring
- Fraleigh, Moon, et al.
- 2000
(Show Context)
Citation Context ...n is network traffic engineering, in which information about current network performance—latency, bandwidth, etc.—is generated online and is used to monitor and adjust network performance dynamically =-=[7, 16]-=-. In this application, it is generally both impractical and unnecessary to process anything but the most recent data. There are other traditional and emerging applications in which data streams play a... |