Results 1  10
of
23
Approximate clustering via coresets
 In Proc. 34th Annu. ACM Sympos. Theory Comput
, 2002
"... In this paper, we show that for several clustering problems one can extract a small set of points, so that using those coresets enable us to perform approximate clustering efficiently. The surprising property of those coresets is that their size is independent of the dimension. Using those, we pre ..."
Abstract

Cited by 111 (15 self)
 Add to MetaCart
In this paper, we show that for several clustering problems one can extract a small set of points, so that using those coresets enable us to perform approximate clustering efficiently. The surprising property of those coresets is that their size is independent of the dimension. Using those, we present a ¡ 1 ¢ ε £approximation algorithms for the kcenter clustering and kmedian clustering problems in Euclidean space. The running time of the new algorithms has linear or near linear dependency on the number of points and the dimension, and exponential dependency on 1 ¤ ε and k. As such, our results are a substantial improvement over what was previously known. We also present some other clustering results including ¡ 1 ¢ ε £approximate 1cylinder clustering, and kcenter clustering with outliers. 1
Fast construction of nets in lowdimensional metrics and their applications
 SIAM Journal on Computing
, 2006
"... We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, s ..."
Abstract

Cited by 98 (10 self)
 Add to MetaCart
We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, spanner construction, compact representation scheme, doubling measure, and computation of the (approximate) Lipschitz constant of a function. In all cases, the running (preprocessing) time is near linear and the space being used is linear. 1
Coresets for kMeans and kMedian Clustering and their Applications
 In Proc. 36th Annu. ACM Sympos. Theory Comput
, 2003
"... In this paper, we show the existence of small coresets for the problems of computing kmedian and kmeans clustering for points in low dimension. In other words, we show that given a point set P in IR , one can compute a weighted set S P , of size log n), such that one can compute the kmed ..."
Abstract

Cited by 46 (13 self)
 Add to MetaCart
In this paper, we show the existence of small coresets for the problems of computing kmedian and kmeans clustering for points in low dimension. In other words, we show that given a point set P in IR , one can compute a weighted set S P , of size log n), such that one can compute the kmedian/means clustering on S instead of on P , and get an (1 + ")approximation.
Projective Clustering in High Dimensions using CoreSets
, 2002
"... Let P be a set of n points in IRd, and for any integer 0 < = k < = d 1, let RDk(P) denote the minimum over all kflats F of maxp2P dist(p, F). We present an algorithm that computes, for any 0 < " < 1, a kflat that is within a distance of (1 + ")RDk(P) from each point of P. The running ti ..."
Abstract

Cited by 32 (9 self)
 Add to MetaCart
Let P be a set of n points in IRd, and for any integer 0 < = k < = d 1, let RDk(P) denote the minimum over all kflats F of maxp2P dist(p, F). We present an algorithm that computes, for any 0 < " < 1, a kflat that is within a distance of (1 + ")RDk(P) from each point of P. The running time of the algorithm is dnO(k/" 5 log(1/")). The crucial step in obtaining this algorithm is a structural result that says that there is a nearoptimal flat that lies in an affine subspace spanned by a small subset of points in P. The size of this "coreset" depends on k and ε but is independent of the dimension. This
Selfimproving algorithms
 in SODA ’06: Proceedings of the seventeenth annual ACMSIAM symposium on Discrete algorithm
"... We investigate ways in which an algorithm can improve its expected performance by finetuning itself automatically with respect to an arbitrary, unknown input distribution. We give such selfimproving algorithms for sorting and computing Delaunay triangulations. The highlights of this work: (i) an al ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
We investigate ways in which an algorithm can improve its expected performance by finetuning itself automatically with respect to an arbitrary, unknown input distribution. We give such selfimproving algorithms for sorting and computing Delaunay triangulations. The highlights of this work: (i) an algorithm to sort a list of numbers with optimal expected limiting complexity; and (ii) an algorithm to compute the Delaunay triangulation of a set of points with optimal expected limiting complexity. In both cases, the algorithm begins with a training phase during which it adjusts itself to the input distribution, followed by a stationary regime in which the algorithm settles to its optimized incarnation. 1
Algorithmic luckiness
 Journal of Machine Learning Research
, 2002
"... Classical statistical learning theory studies the generalisation performance of machine learning algorithms rather indirectly. One of the main detours is that algorithms are studied in terms of the hypothesis class that they draw their hypotheses from. In this paper, motivated by the luckiness frame ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
Classical statistical learning theory studies the generalisation performance of machine learning algorithms rather indirectly. One of the main detours is that algorithms are studied in terms of the hypothesis class that they draw their hypotheses from. In this paper, motivated by the luckiness framework of ShaweTaylor et al. (1998), we study learning algorithms more directly and in a way that allows us to exploit the serendipity of the training sample. The main dierence to previous approaches lies in the complexity measure; rather than covering all hypotheses in a given hypothesis space it is only necessary to cover the functions which could have been learned using the fixed learning algorithm. We show how the resulting framework relates to the VC, luckiness and compression frameworks. Finally, we present an application of this framework to the maximum margin algorithm for linear classiers which results in a bound that exploits the margin, the sparsity of the resultant weight vector, and the degree of clustering of the training data in feature space.
Private coresets
, 2009
"... A coreset of a point set P is a small weighted set of points that captures some geometric properties of P. Coresets have found use in a vast host of geometric settings. We forge a link between coresets, and differentially private sanitizations that can answer any number of queries without compromisi ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
A coreset of a point set P is a small weighted set of points that captures some geometric properties of P. Coresets have found use in a vast host of geometric settings. We forge a link between coresets, and differentially private sanitizations that can answer any number of queries without compromising privacy. We define the notion of private coresets, which are simultaneously both coresets and differentially private, and show how they may be constructed. We first show that the existence of a small coreset with low generalized sensitivity (i.e., replacing a single point in the original point set slightly affects the quality of the coreset) implies (in an inefficient manner) the existence of a private coreset for the same queries. This greatly extends the works of Blum, Ligett, and Roth [STOC 2008] and McSherry and Talwar [FOCS 2007]. We also give an efficient algorithm to compute private coresets for kmedian and kmean queries in ℜ d, immediately implying efficient differentially private sanitizations for such queries. Following McSherry and Talwar, this construction also gives efficient coalition proof (approximately dominant strategy) mechanisms for location problems. Unlike coresets which only have a multiplicative approximation factor, we prove that private coresets must exhibit additive error. We present a new technique for showing lower bounds on this error.
Approximation Algorithms for the Mobile Piercing Set Problem with Applications to Clustering in Adhoc Networks
 In Proc. of the 6 th International Workshop on Discrete Algorithms and Methods for Mobile Computing and Communications (DIALM
, 2001
"... The main contributions of this paper are twofold. First, we present a simple, general framework for obtaining ecient constantfactor approximation algorithms for the mobile piercing set (MPS) problem on unitdisks for standard metrics in xed dimension vector spaces. More speci cally, we provide ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
The main contributions of this paper are twofold. First, we present a simple, general framework for obtaining ecient constantfactor approximation algorithms for the mobile piercing set (MPS) problem on unitdisks for standard metrics in xed dimension vector spaces. More speci cally, we provide low constant approximations for L 1 and L1 norms on a ddimensional space, for any xed d > 0, and for the L 2 norm on two and threedimensional spaces. Our framework provides a family of fullydistributed and decentralized algorithms, which adapts (asymptotically) optimally to the mobility of disks, at the expense of a low degradation on the best known approximation factors of the respective centralized algorithms: Our algorithms take O(1) time to update the piercing set maintained, per movement of a disk. We also present a family of fullydistributed algorithms for the MPS problem which either match or improve the best known approximation bounds of centralized algorithms for the respective norms and space dimensions.
Scuba: Scalable clusterbased algorithm for evaluating continuous spatiotemporal queries on moving objects
 In EDBT
, 2006
"... Abstract. In this paper, we propose, SCUBA, a Scalable Cluster Based Algorithm for evaluating a large set of continuous queries over spatiotemporal data streams. The key idea of SCUBA is to group moving objects and queries based on common spatiotemporal properties at runtime into moving clusters to ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Abstract. In this paper, we propose, SCUBA, a Scalable Cluster Based Algorithm for evaluating a large set of continuous queries over spatiotemporal data streams. The key idea of SCUBA is to group moving objects and queries based on common spatiotemporal properties at runtime into moving clusters to optimize query execution and thus facilitate scalability. SCUBA exploits shared clusterbased execution by abstracting the evaluation of a set of spatiotemporal queries as a spatial join first between moving clusters. This clusterbased filtering prunes true negatives. Then the execution proceeds with a finegrained withinmovingcluster join process for all pairs of moving clusters identified as potentially joinable by a positive clusterjoin match. A moving cluster can serve as an approximation of the location of its members. We show how moving clusters can serve as means for intelligent load shedding of spatiotemporal data to avoid performance degradation with minimal harm to result quality. Our experiments on real datasets demonstrate that SCUBA can achieve a substantial improvement when executing continuous queries on spatiotemporal data streams. 1
Smoothed motion complexity
 Proc. of the 11th European Symp. on Algorithms
, 2003
"... Abstract. We propose a new complexity measure for movement of objects, the smoothed motion complexity. Many applications are based on algorithms dealing with moving objects, but usually data of moving objects is inherently noisy due to measurement errors. Smoothed motion complexity considers this im ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Abstract. We propose a new complexity measure for movement of objects, the smoothed motion complexity. Many applications are based on algorithms dealing with moving objects, but usually data of moving objects is inherently noisy due to measurement errors. Smoothed motion complexity considers this imprecise information and uses smoothed analysis [13] to model noisy data. The input is object to slight random perturbation and the smoothed complexity is the worst case expected complexity over all inputs w.r.t. the random noise. We think that the usually applied worst case analysis of algorithms dealing with moving objects, e.g., kinetic data structures, often does not reflect the real world behavior and that smoothed motion complexity is much better suited to estimate dynamics. We illustrate this approach on the problem of maintaining an orthogonal bounding box of a set of n points in R d under linear motion. We assume speed vectors and initial positions from [−1, 1] d. The motion complexity is then the number of combinatorial changes to the description of the bounding box. Under perturbation with Gaussian normal noise of deviation σ the smoothed motion complexity is only polylogarithmic: O(d · (1 + 1/σ) · log n 3/2) and Ω(d · √ log n). We also consider the case when only very little information about the noise distribution is known. We assume that the density function is monotonically increasing on R≤0 and monotonically decreasing on R≥0 and bounded by some value C. Then the motion complexity is O ( √ n log n · C + log n) and Ω(d · min { 5 √ n/σ, n}).