Results 1 
5 of
5
Computing on Data Streams
, 1998
"... In this paper we study the space requirement of algorithms that make only one (or a small number of) pass(es) over the input data. We study such algorithms under a model of data streams that we introduce here. We give a number of upper and lower bounds for problems stemming from queryprocessing, ..."
Abstract

Cited by 180 (3 self)
 Add to MetaCart
In this paper we study the space requirement of algorithms that make only one (or a small number of) pass(es) over the input data. We study such algorithms under a model of data streams that we introduce here. We give a number of upper and lower bounds for problems stemming from queryprocessing, invoking in the process tools from the area of communication complexity.
A OnePass SpaceEfficient Algorithm for Finding Quantiles
 IN PROC. 7TH INTL. CONF. MANAGEMENT OF DATA (COMAD95)
, 1995
"... We present an algorithm for finding the quantile values of a large unordered dataset with unknown distribution. The algorithm has the following features: i) it requires only one pass over the data; ii) it is space efficient  it uses a small bounded amount of memory independent of the number of val ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
We present an algorithm for finding the quantile values of a large unordered dataset with unknown distribution. The algorithm has the following features: i) it requires only one pass over the data; ii) it is space efficient  it uses a small bounded amount of memory independent of the number of values in the dataset; and iii) the true quantile is guaranteed to lie within the lower and upper bounds produced by the algorithm. Empirical evaluation using synthetic data with various distributions as well as real data show that the bounds obtained are quite tight. The algorithm has several applications in database systems, for example in database governors, query optimization, load balancing in multiprocessor database systems, and data mining.
Stream Quantiles via Maximal Entropy Histograms
"... Abstract. We address the problem of estimating the running quantile of a data stream when the memory for storing observations is limited. We (i) highlight the limitations of approaches previously described in the literature which make them unsuitable for nonstationary streams, (ii) describe a novel ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We address the problem of estimating the running quantile of a data stream when the memory for storing observations is limited. We (i) highlight the limitations of approaches previously described in the literature which make them unsuitable for nonstationary streams, (ii) describe a novel principle for the utilization of the available storage space, and (iii) introduce two novel algorithms which exploit the proposed principle. Experiments on three large realworld data sets demonstrate that the proposed methods vastly outperform the existing alternatives. 1
Editor Histograms Without Storing Observations
"... ABSTRACT: A heuristic algorithm is proposed for dynamic calculation qf the median and other quantiles. The estimates are produced dynamically as the observations are generated. The observations are not stored; therefore, the algorithm has a very small and fixed storage requirement regardless of the ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT: A heuristic algorithm is proposed for dynamic calculation qf the median and other quantiles. The estimates are produced dynamically as the observations are generated. The observations are not stored; therefore, the algorithm has a very small and fixed storage requirement regardless of the number of observations. This makes it ideal for implementing in a quantile chip that can be used in industrial controllers and recorders. The algorithm is further extended to histogram plotting. The accuracy of the al,gorithm is analyzed. 1. INTRODIJCTION In the field of simulation modeling, there is a trend toward repo:rting medians or o.%quantile:s rather than mean and st.andard deviation alone. (The pquantile of a distributi0.n is defined as the value below which 100~
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS 1 Two Maximum Entropy Based Algorithms for Running Quantile Estimation in NonStationary Data Streams
"... Abstract—The need to estimate a particular quantile of a distribution is an important problem which frequently arises in many computer vision and signal processing applications. For example, our work was motivated by the requirements of many semiautomatic surveillance analytics systems which detect ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—The need to estimate a particular quantile of a distribution is an important problem which frequently arises in many computer vision and signal processing applications. For example, our work was motivated by the requirements of many semiautomatic surveillance analytics systems which detect abnormalities in closecircuit television (CCTV) footage using statistical models of lowlevel motion features. In this paper we specifically address the problem of estimating the running quantile of a data stream when the memory for storing observations is limited. We make several major contributions: (i) we highlight the limitations of approaches previously described in the literature which make them unsuitable for nonstationary streams, (ii) we describe a novel principle for the utilization of the available storage space, (iii) we introduce two novel algorithms which exploit the proposed principle in different ways, and (iv) we present a comprehensive evaluation and analysis of the proposed algorithms and the existing methods in the literature on both synthetic data sets and three large ‘realworld ’ streams acquired in the course of operation of an existing commercial surveillance system. Our findings convincingly demonstrate that both of the proposed methods are highly successful and vastly outperform the existing alternatives. We show that the better of the two algorithms (‘dataaligned histogram’) exhibits far superior performance in comparison with the previously described methods, achieving more than 10 times lower estimate errors on realworld data, even when its available working memory is an order of magnitude smaller. Index Terms—Novelty, histogram, surveillance, video. I.