Results 1  10
of
19
Optimal tracking of distributed heavy hitters and quantiles
 In PODS
, 2009
"... We consider the the problem of tracking heavy hitters and quantiles in the distributed streaming model. The heavy hitters and quantiles are two important statistics for characterizing a data distribution. Let A be a multiset of elements, drawn from the universe U = {1,..., u}. For a given 0 ≤ φ ≤ 1, ..."
Abstract

Cited by 24 (9 self)
 Add to MetaCart
(Show Context)
We consider the the problem of tracking heavy hitters and quantiles in the distributed streaming model. The heavy hitters and quantiles are two important statistics for characterizing a data distribution. Let A be a multiset of elements, drawn from the universe U = {1,..., u}. For a given 0 ≤ φ ≤ 1, the φheavy hitters are those elements of A whose frequency in A is at least φA; the φquantile of A is an element x of U such that at most φA  elements of A are smaller than A and at most (1 − φ)A  elements of A are greater than x. Suppose the elements of A are received at k remote sites over time, and each of the sites has a twoway communication channel to a designated coordinator, whose goal is to track the set of φheavy hitters and the φquantile of A approximately at all times with minimum communication. We give tracking algorithms with worstcase communication cost O(k/ǫ · log n) for both problems, where n is the total number of items in A, and ǫ is the approximation error. This substantially improves upon the previous known algorithms. We also give matching lower bounds on the communication costs for both problems, showing that our algorithms are optimal. We also consider a more general version of the problem where we simultaneously track the φquantiles for all 0 ≤ φ ≤ 1. 1
Mergeable Summaries
"... We study the mergeability of data summaries. Informally speaking, mergeability requires that, given two summaries on two data sets, there is a way to merge the two summaries into a single summary on the union of the two data sets, while preserving the error and size guarantees. This property means t ..."
Abstract

Cited by 22 (7 self)
 Add to MetaCart
(Show Context)
We study the mergeability of data summaries. Informally speaking, mergeability requires that, given two summaries on two data sets, there is a way to merge the two summaries into a single summary on the union of the two data sets, while preserving the error and size guarantees. This property means that the summaries can be merged in a way like other algebraic operators such as sum and max, which is especially useful for computing summaries on massive distributed data. Several data summaries are trivially mergeable by construction, most notably all the sketches that are linear functions of the data sets. But some other fundamental ones like those for heavy hitters and quantiles, are not (known to be) mergeable. In this paper, we demonstrate that these summaries are indeed mergeable or can be made mergeable after appropriate modifications. Specifically, we show that for εapproximate heavy hitters, there is a deterministic mergeable summary of size O(1/ε); for εapproximate quantiles, there is a deterministic summary of size O ( 1 log(εn)) that has a restricted form of mergeability, ε and a randomized one of size O ( 1 1 log3/2) with full mergeε ε ability. We also extend our results to geometric summaries such as εapproximations and εkernels. We also achieve two results of independent interest: (1) we provide the best known randomized streaming bound for εapproximate quantiles that depends only on ε, of size O ( 1 1 log3/2), and (2) we demonstrate that the MG and the ε ε SpaceSaving summaries for heavy hitters are isomorphic. Supported by NSF under grants CNS0540347, IIS07
Robust shape fitting via peeling and grating coresets
 In Proc. 17th ACMSIAM Sympos. Discrete Algorithms
, 2006
"... Let P be a set of n points in R d. A subset S of P is called a (k, ε)kernel if for every direction, the direction width of S εapproximates that of P, when k “outliers ” can be ignored in that direction. We show that a (k, ε)kernel of P of size O(k/ε (d−1)/2) can be computed in time O(n+k 2 /ε d−1 ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
Let P be a set of n points in R d. A subset S of P is called a (k, ε)kernel if for every direction, the direction width of S εapproximates that of P, when k “outliers ” can be ignored in that direction. We show that a (k, ε)kernel of P of size O(k/ε (d−1)/2) can be computed in time O(n+k 2 /ε d−1). The new algorithm works by repeatedly “peeling” away (0, ε)kernels from the point set. We also present a simple εapproximation algorithm for fitting various shapes through a set of points with at most k outliers. The algorithm is incremental and works by repeatedly “grating ” critical points into a working set, till the working set provides the required approximation. We prove that the size of the working set is independent of n, and thus results in a simple and practical, nearlinear εapproximation algorithm for shape fitting with outliers in low dimensions. We demonstrate the practicality of our algorithms by showing their empirical performance on various inputs and problems. 1
An almost spaceoptimal streaming algorithm for coresets in fixed dimensions
 Algorithmica
"... We present a new streaming algorithm for maintaining an εkernel of a point set in Rd using O((1/ε(d−1)/2) log(1/ε)) space. The space used by our algorithm is optimal up to a small logarithmic factor. This significantly improves (for any fixed dimension d> 3) the best previous algorithm for this ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
(Show Context)
We present a new streaming algorithm for maintaining an εkernel of a point set in Rd using O((1/ε(d−1)/2) log(1/ε)) space. The space used by our algorithm is optimal up to a small logarithmic factor. This significantly improves (for any fixed dimension d> 3) the best previous algorithm for this problem that uses O(1/εd−(3/2)) space, presented by Agarwal and Yu. Our algorithm immediately improves the space complexity of the previous streaming algorithms for a number of fundamental geometric optimization problems in fixed dimensions, including width, minimumvolume bounding box, minimumradius enclosing cylinder, minimumwidth enclosing annulus, etc. 1
Streaming algorithms for extent problems in high dimensions
 in SODA ’10: Proc. TwentyFirst ACMSIAM Symposium on Discrete Algorithms
, 2010
"... We develop (singlepass) streaming algorithms for maintaining extent measures of a stream S of n points in R d. We focus on designing streaming algorithms whose working space is polynomial in d (poly(d)) and sublinear in n. For the problems of computing diameter, width and minimum enclosing ball of ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
We develop (singlepass) streaming algorithms for maintaining extent measures of a stream S of n points in R d. We focus on designing streaming algorithms whose working space is polynomial in d (poly(d)) and sublinear in n. For the problems of computing diameter, width and minimum enclosing ball of S, we obtain lower bounds on the worstcase approximation ratio of any streaming algorithm that uses poly(d) space. On the positive side, we introduce the notion of blurred ball cover and use it for answering approximate farthestpoint queries and maintaining approximate minimum enclosing ball and diameter of S. We describe a streaming algorithm for maintaining a blurred ball cover whose working space is linear in d and independent of n. 1
Streaming Algorithms for Line Simplification
"... We study the following variant of the wellknown linesimplification problem: we are getting a possibly infinite sequence of points p0, p1, p2,... defining a polygonal path, and as we receive the points we wish to maintain a simplification of the path seen so far. We study this problem in a streamin ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
We study the following variant of the wellknown linesimplification problem: we are getting a possibly infinite sequence of points p0, p1, p2,... defining a polygonal path, and as we receive the points we wish to maintain a simplification of the path seen so far. We study this problem in a streaming setting, where we only have a limited amount of storage so that we cannot store all the points. We analyze the competitive ratio of our algorithms, allowing resource augmentation: we let our algorithm maintain a simplification with 2k (internal) points, and compare the error of our simplification to the error of the optimal simplification with k points. We obtain the algorithms with O(1) competitive ratio for three cases: convex paths where the error is measured using the Hausdorff distance, xymonotone paths where the error is measured using the Hausdorff distance, and general paths where the error is measured using the Fréchet distance. In the first case the algorithm needs O(k) additional storage, and in the latter two cases the algorithm needs O(k 2) additional storage. 1
Streaming and Dynamic Algorithms for Minimum Enclosing Balls in High Dimensions
"... Abstract. At SODA’10, Agarwal and Sharathkumar presented a streaming algorithm for approximating the minimum enclosing ball of a set of points in ddimensional Euclidean space. Their algorithm requires one pass, uses O(d) space, and was shown to have approximation factor at most (1 + √ 3)/2 + ε ≈ 1. ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract. At SODA’10, Agarwal and Sharathkumar presented a streaming algorithm for approximating the minimum enclosing ball of a set of points in ddimensional Euclidean space. Their algorithm requires one pass, uses O(d) space, and was shown to have approximation factor at most (1 + √ 3)/2 + ε ≈ 1.3661. We prove that the same algorithm has approximation factor less than 1.22, which brings us much closer to a (1 + √ 2)/2 ≈ 1.207 lower bound given by Agarwal and Sharathkumar. We also apply this technique to the dynamic version of the minimum enclosing ball problem (in the nonstreaming setting). We give an O(dn)space data structure that can maintain a 1.22approximate minimum enclosing ball in O(d log n) expected amortized time per insertion/deletion. 1
Small and Stable Descriptors of Distributions for Geometric Statistical Problems
, 2009
"... This thesis explores how to sparsely represent distributions of points for geometric statistical problems. A coreset C is a small summary of a point set P such that if a certain statistic is computed on P and C, then the difference in the results is guaranteed to be bounded by a parameter ε. Two exa ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
This thesis explores how to sparsely represent distributions of points for geometric statistical problems. A coreset C is a small summary of a point set P such that if a certain statistic is computed on P and C, then the difference in the results is guaranteed to be bounded by a parameter ε. Two examples of coresets are εsamples and εkernels. An εsample can estimate the density of a point set in any range from a geometric family of ranges (e.g., disks, axisaligned rectangles). An εkernel approximates the width of a point set in all directions. Both coresets have size that depends only on ε, the error parameter, not the size of the original data set. We demonstrate several improvements to these coresets and how they are useful for geometric statistical problems. We reduce the size of εsamples for density queries in axisaligned rectangles to nearly a square root of the size when the queries are with respect to more general families of shapes, such as disks. We also show how to construct εsamples of probability distributions. We show how to maintain “stable” εkernels, that is, if the point set P changes by
Stability of εKernels
, 2009
"... Given a set P of n points in R d, an εkernel K ⊆ P approximates the directional width of P in every direction within a relative (1 − ε) factor. In this paper we study the stability of εkernels under dynamic insertion and deletion of points to P and by changing the approximation factor ε. In the fi ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Given a set P of n points in R d, an εkernel K ⊆ P approximates the directional width of P in every direction within a relative (1 − ε) factor. In this paper we study the stability of εkernels under dynamic insertion and deletion of points to P and by changing the approximation factor ε. In the first case, we say an algorithm for dynamically maintaining a εkernel is stable if at most O(1) points change in K as one point is inserted or deleted from P. We describe an algorithm to maintain an εkernel of size O(1/ε (d−1)/2) in O(1/ε (d−1)/2 + log n) time per update. Not only does our algorithm maintain a stable εkernel, its update time is faster than any known algorithm that maintains an εkernel of size O(1/ε (d−1)/2). Next, we show that if there is an εkernel of P of size κ, which may be dramatically less than O(1/ε (d−1)/2), then there is an (ε/2)kernel of P of size O(min{1/ε (d−1)/2, κ ⌊d/2 ⌋ log d−2 (1/ε)}). Moreover, there exists a point set P in R d and a parameter ε> 0 such that if every εkernel of P has size at least κ, then any (ε/2)kernel of P has size Ω(κ ⌊d/2 ⌋).