Results 1  10
of
80
Geometric approximation via coresets
 Combinatorial and Computational Geometry, MSRI
, 2005
"... Abstract. The paradigm of coresets has recently emerged as a powerful tool for efficiently approximating various extent measures of a point set P. Using this paradigm, one quickly computes a small subset Q of P, called a coreset, that approximates the original set P and and then solves the problem o ..."
Abstract

Cited by 60 (7 self)
 Add to MetaCart
Abstract. The paradigm of coresets has recently emerged as a powerful tool for efficiently approximating various extent measures of a point set P. Using this paradigm, one quickly computes a small subset Q of P, called a coreset, that approximates the original set P and and then solves the problem on Q using a relatively inefficient algorithm. The solution for Q is then translated to an approximate solution to the original point set P. This paper describes the ways in which this paradigm has been successfully applied to various optimization and extent measure problems. 1.
Coresets for kMeans and kMedian Clustering and their Applications
 In Proc. 36th Annu. ACM Sympos. Theory Comput
, 2003
"... In this paper, we show the existence of small coresets for the problems of computing kmedian and kmeans clustering for points in low dimension. In other words, we show that given a point set P in IR , one can compute a weighted set S P , of size log n), such that one can compute the kmed ..."
Abstract

Cited by 46 (13 self)
 Add to MetaCart
In this paper, we show the existence of small coresets for the problems of computing kmedian and kmeans clustering for points in low dimension. In other words, we show that given a point set P in IR , one can compute a weighted set S P , of size log n), such that one can compute the kmedian/means clustering on S instead of on P , and get an (1 + ")approximation.
RELATIVEERROR CUR MATRIX DECOMPOSITIONS
 SIAM J. MATRIX ANAL. APPL
, 2008
"... Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the ..."
Abstract

Cited by 39 (9 self)
 Add to MetaCart
Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an m × n matrix A and a rank parameter k. In our first algorithm, C is chosen, and we let A ′ = CC + A, where C + is the Moore–Penrose generalized inverse of C. In our second algorithm C, U, R are chosen, and we let A ′ = CUR. (C and R are matrices that consist of actual columns and rows, respectively, of A, and U is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least 1 − δ, ‖A − A ′ ‖F ≤ (1 + ɛ) ‖A − Ak‖F, where Ak is the “best ” rankk approximation provided by truncating the SVD of A, and where ‖X‖F is the Frobenius norm of the matrix X. The number of columns of C and rows of R is a lowdegree polynomial in k, 1/ɛ, and log(1/δ). Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants
Clustering Motion
 In Proc. 42nd Annu. IEEE Sympos. Found. Comput. Sci
, 2003
"... Given a set of moving points in IR , we show how to cluster them in advance, using a small number of clusters, so that at any time this static clustering is competitive with the optimal kcenter clustering at that time. The advantage of this approach is that it avoids updating the clustering a ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
Given a set of moving points in IR , we show how to cluster them in advance, using a small number of clusters, so that at any time this static clustering is competitive with the optimal kcenter clustering at that time. The advantage of this approach is that it avoids updating the clustering as time passes. We also show how to maintain this static clustering eciently under insertions and deletions.
Shape Fitting with Outliers
 SIAM J. Comput
, 2003
"... we present an algorithm that "approximates the extent between the top and bottom k levels of the arrangement of H in time O(n+(k=") ), where c is a constant depending on d. The algorithm relies on computing a subset of H of size O(k=" ), in near linear time, such that the klevel of the a ..."
Abstract

Cited by 28 (11 self)
 Add to MetaCart
we present an algorithm that "approximates the extent between the top and bottom k levels of the arrangement of H in time O(n+(k=") ), where c is a constant depending on d. The algorithm relies on computing a subset of H of size O(k=" ), in near linear time, such that the klevel of the arrangement of the subset approximates that of the original arrangement. Using this algorithm, we propose ecient approximation algorithms for shape tting with outliers for various shapes. This is the rst algorithms to handle outliers eciently for the shape tting problems considered.
Practical Methods for Shape Fitting and Kinetic Data Structures using Core Sets
 In Proc. 20th Annu. ACM Sympos. Comput. Geom
, 2004
"... The notion of εkernel was introduced by Agarwal et al. [5] to set up a unified framework for computing various extent measures of a point set P approximately. Roughly speaking, a subset Q ⊆ P is an εkernel of P if for every slab W containing Q, the expanded slab (1 + ε)W contains P. They illustrat ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
The notion of εkernel was introduced by Agarwal et al. [5] to set up a unified framework for computing various extent measures of a point set P approximately. Roughly speaking, a subset Q ⊆ P is an εkernel of P if for every slab W containing Q, the expanded slab (1 + ε)W contains P. They illustrated the significance of εkernel by showing that it yields approximation algorithms for a wide range of geometric optimization problems. We present a simpler and more practical algorithm for computing the εkernel of a set P of points in R d. We demonstrate the practicality of our algorithm by showing its empirical performance on various inputs. We then describe an incremental algorithm for fitting various shapes and use the ideas of our algorithm for computing εkernels to analyze the performance of this algorithm. We illustrate the versatility and practicality of this technique by implementing approximation algorithms for minimum enclosing cylinder, minimumvolume bounding box, and minimumwidth annulus. Finally, we show that εkernels can be effectively used to expedite the algorithms for maintaining extents of moving points. 1
Adaptive spatial partitioning for multidimensional data streams
 In ISAAC
, 2004
"... We propose a spaceefficient scheme for summarizing multidimensional data streams. Our sketch can be used to solve spatial versions of several classical data stream queries efficiently. For instance, we can track εhotspots, which are congruent boxes containing at least an ε fraction of the stream, ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
We propose a spaceefficient scheme for summarizing multidimensional data streams. Our sketch can be used to solve spatial versions of several classical data stream queries efficiently. For instance, we can track εhotspots, which are congruent boxes containing at least an ε fraction of the stream, and maintain hierarchical heavy hitters in d dimensions. Our sketch can also be viewed as a multidimensional generalization of the εapproximate quantile summary. The space complexity of our scheme is O ( 1 ε log R) if the points lie in the domain [0, R]d, where d is assumed to be a constant. The scheme extends to the sliding window model with a log(εn) factor increase in space, where n is the size of the sliding window. Our sketch can also be used to answer εapproximate rectangular range queries over a stream of ddimensional points. 1
Adaptive sampling for geometric problems over data streams
 In Proc. 23rd ACM Sympos. Principles Database Syst
, 2004
"... Geometric coordinates are an integral part of many data streams. Examples include sensor locations in environmental monitoring, vehicle locations in traffic monitoring or battlefield simulations, scientific measurements of earth or atmospheric phenomena, etc. How can one summarize such data streams ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
Geometric coordinates are an integral part of many data streams. Examples include sensor locations in environmental monitoring, vehicle locations in traffic monitoring or battlefield simulations, scientific measurements of earth or atmospheric phenomena, etc. How can one summarize such data streams using limited storage so that many natural geometric queries can be answered faithfully? Some examples of such queries are: report the smallest convex region in which a chemical leak has been sensed, or track the diameter of the dataset. One can also pose queries over multiple streams: track the minimum distance between the convex hulls of two data streams; or report when datasets A and B are no longer linearly separable. In this paper, we propose an adaptive sampling scheme that gives provably optimal error bounds for extremal problems of this nature. All our results follow from a single technique for computing the approximate convex hull of a point stream in a single pass. Our main result is this: given a stream of twodimensional points and an integer r, wecan maintain an adaptive sample of at most 2r +1pointssuch that the distance between the true convex hull and the convex hull of the sample points is O(D/r 2), where D is the diameter of the sample set. With our sample convex hull, all the queries mentioned above can be answered in either O(log r) orO(r) time. 1
Private coresets
, 2009
"... A coreset of a point set P is a small weighted set of points that captures some geometric properties of P. Coresets have found use in a vast host of geometric settings. We forge a link between coresets, and differentially private sanitizations that can answer any number of queries without compromisi ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
A coreset of a point set P is a small weighted set of points that captures some geometric properties of P. Coresets have found use in a vast host of geometric settings. We forge a link between coresets, and differentially private sanitizations that can answer any number of queries without compromising privacy. We define the notion of private coresets, which are simultaneously both coresets and differentially private, and show how they may be constructed. We first show that the existence of a small coreset with low generalized sensitivity (i.e., replacing a single point in the original point set slightly affects the quality of the coreset) implies (in an inefficient manner) the existence of a private coreset for the same queries. This greatly extends the works of Blum, Ligett, and Roth [STOC 2008] and McSherry and Talwar [FOCS 2007]. We also give an efficient algorithm to compute private coresets for kmedian and kmean queries in ℜ d, immediately implying efficient differentially private sanitizations for such queries. Following McSherry and Talwar, this construction also gives efficient coalition proof (approximately dominant strategy) mechanisms for location problems. Unlike coresets which only have a multiplicative approximation factor, we prove that private coresets must exhibit additive error. We present a new technique for showing lower bounds on this error.
SAMPLING ALGORITHMS AND CORESETS FOR ℓp REGRESSION
 SIAM J. COMPUT. VOL. 38, NO. 5, PP. 2060–2078
, 2009
"... The ℓp regression problem takes as input a matrix A ∈ Rn×d, a vector b ∈ Rn, and a number p ∈ [1, ∞), and it returns as output a number Z and a vector xopt ∈ Rd such that Z =minx∈Rd‖Ax − b‖p = ‖Axopt − b‖p. In this paper, we construct coresets and obtain an efficient twostage samplingbased approx ..."
Abstract

Cited by 18 (6 self)
 Add to MetaCart
The ℓp regression problem takes as input a matrix A ∈ Rn×d, a vector b ∈ Rn, and a number p ∈ [1, ∞), and it returns as output a number Z and a vector xopt ∈ Rd such that Z =minx∈Rd‖Ax − b‖p = ‖Axopt − b‖p. In this paper, we construct coresets and obtain an efficient twostage samplingbased approximation algorithm for the very overconstrained (n ≫ d) version of this classical problem, for all p ∈ [1, ∞). The first stage of our algorithm nonuniformly samples ˆr1 = O(36pdmax{p/2+1,p}+1)rowsofAand the corresponding elements of b, and then it solves the ℓp regression problem on the sample; we prove this is an 8approximation. The second stage of our algorithm uses the output of the first stage to resample ˆr1/ɛ2 constraints, and then it solves the ℓp regression problem on the new sample; we prove this is a (1 + ɛ)approximation. Our algorithm unifies, improves upon, and extends the existing algorithms for special cases of ℓp regression, namely,