Results 1 
7 of
7
Approximate Clustering without the Approximation
"... Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as kmedian, kmeans, and minsum clustering. This quest for ..."
Abstract

Cited by 34 (17 self)
 Add to MetaCart
Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as kmedian, kmeans, and minsum clustering. This quest for better approximation algorithms is further fueled by the implicit hope that these better approximations also give us more accurate clusterings. E.g., for many problems such as clustering proteins by function, or clustering images by subject, there is some unknown “correct” target clustering and the implicit hope is that approximately optimizing these objective functions will in fact produce a clustering that is close (in symmetric difference) to the truth. In this paper, we show that if we make this implicit assumption explicit—that is, if we assume that any capproximation to the given clustering objective F is ǫclose to the target—then we can produce clusterings that are O(ǫ)close to the target, even for values c for which obtaining a capproximation is NPhard. In particular, for kmedian and kmeans objectives, we show that we can achieve this guarantee for any constant c> 1, and for minsum objective we can do this for any constant c> 2. Our results also highlight a somewhat surprising conceptual difference between assuming that the optimal solution to, say, the kmedian objective is ǫclose to the target, and assuming that any approximately optimal solution is ǫclose to the target, even for approximation factor say c = 1.01. In the former case, the problem of finding a solution that is O(ǫ)close to the target remains computationally hard, and yet for the latter we have an efficient algorithm.
A nonlinear approach to dimension reduction
 CoRR
"... The ℓ2 flattening lemma of Johnson and Lindenstrauss [JL84] is a powerful tool for dimension reduction. It has been conjectured that the target dimension bounds can be refined and bounded in terms of the intrinsic dimensionality of the data set (for example, the doubling dimension). One such problem ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
The ℓ2 flattening lemma of Johnson and Lindenstrauss [JL84] is a powerful tool for dimension reduction. It has been conjectured that the target dimension bounds can be refined and bounded in terms of the intrinsic dimensionality of the data set (for example, the doubling dimension). One such problem was proposed by Lang and Plaut [LP01] (see also [GKL03, Mat02, ABN08, CGT10]), and is still open. We prove another result in this line of work: The snowflake metric d1/2 of a doubling set S ⊂ ℓ2 can be embedded with arbitrarily low distortion into ℓD 2, for dimension D that depends solely on the doubling constant of the metric. In fact, the target dimension is polylogarithmic in the doubling constant. Our techniques are robust and extend to the more difficult spaces ℓ1 and ℓ∞, although the dimension bounds here are quantitatively inferior than those for ℓ2. 1
Thoughts on clustering ∗
"... Clustering is a somewhat confusing topic theoretically. In large part this is because there are many different kinds of clustering problems. In addition, the true goals in clustering are often difficult to measure, making the task seem not welldefined and underspecified. In this note we discuss on ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Clustering is a somewhat confusing topic theoretically. In large part this is because there are many different kinds of clustering problems. In addition, the true goals in clustering are often difficult to measure, making the task seem not welldefined and underspecified. In this note we discuss one approach from [6] to theoretically formulating a certain broad class of clustering problems, and discuss relations between this and other approaches. We also make a few suggestions and state a few open directions. 1
Variancebased criteria for clustering and their application to the analysis of management styles of mutual funds based on time series of daily returns
, 2008
"... The problem of clustering is formulated as the problem of minimization of a certain objective function over the set of all possible clusterings. The objective function measures mathematically the quality of a clustering. According to a previously published theoretical result, if the objective functi ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The problem of clustering is formulated as the problem of minimization of a certain objective function over the set of all possible clusterings. The objective function measures mathematically the quality of a clustering. According to a previously published theoretical result, if the objective function being minimized is strictly convex, then the corresponding clustering surface is strictly convex. As a direct implication of this result followed the construction of a basic gradient algorithm of search for locally optimal solutions (i.e., clusterings). This gradient procedure constitutes the core of the clustering algorithms proposed in this work for minimization of two novel objective functions. An important direction in statistical sampling theory deals with construction of optimal stratified samples from a population. One of the problems addressed by stratified sampling is the construction of a sample for estimation of the mean value of a particular scalar parameter, such that the variance of the estimate is minimized. For this purpose, a criterion for optimal partitioning of the population into a certain number of groups (strata) was derived. This criterion is known as Neyman’s criterion for optimal stratified sampling.
Improved spectralnorm bounds for clustering
 In APPROXRANDOM. 37–49
, 2012
"... Aiming to unify known results about clustering mixtures of distributions under separation conditions, Kumar and Kannan [KK10] introduced a deterministic condition for clustering datasets. They showed that this single deterministic condition encompasses many previously studied clustering assumptions. ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Aiming to unify known results about clustering mixtures of distributions under separation conditions, Kumar and Kannan [KK10] introduced a deterministic condition for clustering datasets. They showed that this single deterministic condition encompasses many previously studied clustering assumptions. More specifically, their proximity condition requires that in the target kclustering, the projection of a point x onto the line joining its cluster center µ and some other center µ ′ , is a large additive factor closer to µ than to µ ′. This additive factor can be roughly described as k times the spectral norm of the matrix representing the differences between the given (known) dataset and the means of the (unknown) target clustering. Clearly, the proximity condition implies center separation – the distance between any two centers must be as large as the above mentioned bound. In this paper we improve upon the work of Kumar and Kannan [KK10] along several axes. First, we weaken the center separation bound by a factor of √ k, and secondly we weaken the proximity condition by a factor of k (in other words, the revised separation condition is independent of k). Using these weaker bounds we still achieve the same guarantees when all
Beyond WorstCase Analysis in Privacy and Clustering: Exploiting Explicit and Implicit Assumptions
, 2013
"... ..."