Results 1  10
of
29
Geometric approximation via coresets
 COMBINATORIAL AND COMPUTATIONAL GEOMETRY, MSRI
, 2005
"... The paradigm of coresets has recently emerged as a powerful tool for efficiently approximating various extent measures of a point set P. Using this paradigm, one quickly computes a small subset Q of P, called a coreset, that approximates the original set P and and then solves the problem on Q usin ..."
Abstract

Cited by 76 (9 self)
 Add to MetaCart
(Show Context)
The paradigm of coresets has recently emerged as a powerful tool for efficiently approximating various extent measures of a point set P. Using this paradigm, one quickly computes a small subset Q of P, called a coreset, that approximates the original set P and and then solves the problem on Q using a relatively inefficient algorithm. The solution for Q is then translated to an approximate solution to the original point set P. This paper describes the ways in which this paradigm has been successfully applied to various optimization and extent measure problems.
Mergeable Summaries
"... We study the mergeability of data summaries. Informally speaking, mergeability requires that, given two summaries on two data sets, there is a way to merge the two summaries into a single summary on the union of the two data sets, while preserving the error and size guarantees. This property means t ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
(Show Context)
We study the mergeability of data summaries. Informally speaking, mergeability requires that, given two summaries on two data sets, there is a way to merge the two summaries into a single summary on the union of the two data sets, while preserving the error and size guarantees. This property means that the summaries can be merged in a way like other algebraic operators such as sum and max, which is especially useful for computing summaries on massive distributed data. Several data summaries are trivially mergeable by construction, most notably all the sketches that are linear functions of the data sets. But some other fundamental ones like those for heavy hitters and quantiles, are not (known to be) mergeable. In this paper, we demonstrate that these summaries are indeed mergeable or can be made mergeable after appropriate modifications. Specifically, we show that for εapproximate heavy hitters, there is a deterministic mergeable summary of size O(1/ε); for εapproximate quantiles, there is a deterministic summary of size O ( 1 log(εn)) that has a restricted form of mergeability, ε and a randomized one of size O ( 1 1 log3/2) with full mergeε ε ability. We also extend our results to geometric summaries such as εapproximations and εkernels. We also achieve two results of independent interest: (1) we provide the best known randomized streaming bound for εapproximate quantiles that depends only on ε, of size O ( 1 1 log3/2), and (2) we demonstrate that the MG and the ε ε SpaceSaving summaries for heavy hitters are isomorphic. Supported by NSF under grants CNS0540347, IIS07
A SpaceOptimal DataStream Algorithm for Coresets in the Plane
"... Given a point set P ⊆ R², a subset Q ⊆ P is an εkernel of P if for every slab W containing Q, the (1+ε)expansion of W also contains P. We present a datastream algorithm for maintaining an εkernel of a stream of points in R² that uses O(1/√ε) space and takes O(log(1/ε)) amortized time to process ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
Given a point set P ⊆ R², a subset Q ⊆ P is an εkernel of P if for every slab W containing Q, the (1+ε)expansion of W also contains P. We present a datastream algorithm for maintaining an εkernel of a stream of points in R² that uses O(1/√ε) space and takes O(log(1/ε)) amortized time to process each point. This is the first spaceoptimal datastream algorithm for this problem. As a consequence, we obtain improved datastream approximation algorithms for other extent measures, such as width, robust kernels, as well as εkernels in higher dimensions.
Lower Bounds for NumberinHand Multiparty Communication Complexity, Made Easy ∗
"... In this paper we prove lower bounds on randomized multiparty communication complexity, both in the blackboard model (where each message is written on a blackboard for all players to see) and (mainly) in the messagepassing model, where messages are sent playertoplayer. We introduce a new technique ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
(Show Context)
In this paper we prove lower bounds on randomized multiparty communication complexity, both in the blackboard model (where each message is written on a blackboard for all players to see) and (mainly) in the messagepassing model, where messages are sent playertoplayer. We introduce a new technique for proving such bounds, called symmetrization, which is natural, intuitive, and often easy to use. For example, for the problem where each of k players gets a bitvector of length n, and the goal is to compute the coordinatewise XOR of these vectors, we prove a tight lower bounds of Ω(nk) in the blackboard model. For the same problem with AND instead of XOR, we prove a lower bounds of roughly Ω(nk) in the messagepassing model (assuming k ≤ n/3200) and Ω(n log k) in the blackboard model. We also prove lower bounds for bitwise majority, for a graphconnectivity problem, and for other problems; the technique seems applicable to a wide range of other problems as well. The obtained communication lower bounds imply new lower bounds in the functional monitoring model [11] (also called the distributed streaming model). All of our lower bounds allow randomized communication protocols with twosided error. We also use the symmetrization technique to prove several directsumlike results for multiparty communication. 1
Robust shape fitting via peeling and grating coresets
 In Proc. 17th ACMSIAM Sympos. Discrete Algorithms
, 2006
"... Let P be a set of n points in R d. A subset S of P is called a (k, ε)kernel if for every direction, the direction width of S εapproximates that of P, when k “outliers ” can be ignored in that direction. We show that a (k, ε)kernel of P of size O(k/ε (d−1)/2) can be computed in time O(n+k 2 /ε d−1 ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
Let P be a set of n points in R d. A subset S of P is called a (k, ε)kernel if for every direction, the direction width of S εapproximates that of P, when k “outliers ” can be ignored in that direction. We show that a (k, ε)kernel of P of size O(k/ε (d−1)/2) can be computed in time O(n+k 2 /ε d−1). The new algorithm works by repeatedly “peeling” away (0, ε)kernels from the point set. We also present a simple εapproximation algorithm for fitting various shapes through a set of points with at most k outliers. The algorithm is incremental and works by repeatedly “grating ” critical points into a working set, till the working set provides the required approximation. We prove that the size of the working set is independent of n, and thus results in a simple and practical, nearlinear εapproximation algorithm for shape fitting with outliers in low dimensions. We demonstrate the practicality of our algorithms by showing their empirical performance on various inputs and problems. 1
Metric functional dependencies
 In ICDE
, 2009
"... Abstract—When merging data from various sources, it is often the case that small variations in data format and interpretation cause traditional functional dependencies (FDs) to be violated, without there being an intrinsic violation of semantics. Examples include differing address formats, or differ ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
Abstract—When merging data from various sources, it is often the case that small variations in data format and interpretation cause traditional functional dependencies (FDs) to be violated, without there being an intrinsic violation of semantics. Examples include differing address formats, or different reported latitude/longitudes for a given address. In this paper, we define metric functional dependencies, which strictly generalize traditional FDs by allowing small differences (controlled by a metric) in values of the consequent attribute of an FD. We present efficient algorithms for the verification problem: determining whether a given metric FD holds for a given relation. We experimentally demonstrate the validity and efficiency of our approach on various data sets that lie in multidimensional spaces. I.
Shape Fitting on Point Sets with Probability Distributions
"... Abstract. We consider problems on data sets where each data point has uncertainty described by an individual probability distribution. We develop several frameworks and algorithms for calculating statistics on these uncertain data sets. Our examples focus on geometric shape fitting problems. We prov ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
(Show Context)
Abstract. We consider problems on data sets where each data point has uncertainty described by an individual probability distribution. We develop several frameworks and algorithms for calculating statistics on these uncertain data sets. Our examples focus on geometric shape fitting problems. We prove approximation guarantees for the algorithms with respect to the full probability distributions. We then empirically demonstrate that our algorithms are simple and practical, solving for a constant hidden by asymptotic analysis so that a user can reliably trade speed and size for accuracy. 1
An almost spaceoptimal streaming algorithm for coresets in fixed dimensions
 Algorithmica
"... We present a new streaming algorithm for maintaining an εkernel of a point set in Rd using O((1/ε(d−1)/2) log(1/ε)) space. The space used by our algorithm is optimal up to a small logarithmic factor. This significantly improves (for any fixed dimension d> 3) the best previous algorithm for this ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
We present a new streaming algorithm for maintaining an εkernel of a point set in Rd using O((1/ε(d−1)/2) log(1/ε)) space. The space used by our algorithm is optimal up to a small logarithmic factor. This significantly improves (for any fixed dimension d> 3) the best previous algorithm for this problem that uses O(1/εd−(3/2)) space, presented by Agarwal and Yu. Our algorithm immediately improves the space complexity of the previous streaming algorithms for a number of fundamental geometric optimization problems in fixed dimensions, including width, minimumvolume bounding box, minimumradius enclosing cylinder, minimumwidth enclosing annulus, etc. 1
Processing a Large Number of Continuous Preference Topk Queries ∗
"... Given a set of objects, each with multiple numeric attributes, a (preference) topk query retrieves the k objects with the highest scores according to a user preference, defined as a linear combination of attribute values. We consider the problem of processing a large number of continuous topk quer ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
Given a set of objects, each with multiple numeric attributes, a (preference) topk query retrieves the k objects with the highest scores according to a user preference, defined as a linear combination of attribute values. We consider the problem of processing a large number of continuous topk queries, each with its own preference. Whenobjectsoruserpreferenceschange, thequeryresults must be updated. We present a dynamic index that supports the reverse topk query, which is of independent interest. Combining thisindexwithanotheronefortopkqueries,wedevelopascalable solutionforprocessingmanycontinuoustopkqueriesthatexploits the clusteredness in user preferences. We also define an approximate version of the problem and present a solution significantly more efficient than the exact one with little loss in accuracy.