Results 1  10
of
31
Correlation Clustering
 MACHINE LEARNING
, 2002
"... We consider the following clustering problem: we have a complete graph on # vertices (items), where each edge ### ## is labeled either # or depending on whether # and # have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as mu ..."
Abstract

Cited by 230 (4 self)
 Add to MetaCart
(Show Context)
We consider the following clustering problem: we have a complete graph on # vertices (items), where each edge ### ## is labeled either # or depending on whether # and # have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of # edges within clusters, plus the number of edges between clusters (equivalently, minimizes the number of disagreements: the number of edges inside clusters plus the number of # edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function # learned from past data, and the goal is to partition the current set of documents in a way that correlates with # as much as possible; it can also be viewed as a kind of "agnostic learning" problem. An interesting
Databasefriendly Random Projections
, 2001
"... A classic result of Johnson and Lindenstrauss asserts that any set of n points in ddimensional Euclidean space can be embedded into kdimensional Euclidean space  where k is logarithmic in n and independent of d  so that all pairwise distances are maintained within an arbitrarily small factor. Al ..."
Abstract

Cited by 167 (3 self)
 Add to MetaCart
(Show Context)
A classic result of Johnson and Lindenstrauss asserts that any set of n points in ddimensional Euclidean space can be embedded into kdimensional Euclidean space  where k is logarithmic in n and independent of d  so that all pairwise distances are maintained within an arbitrarily small factor. All known constructions of such embeddings involve projecting the n points onto a random kdimensional hyperplane. We give a novel construction of the embedding, suitable for database applications, which amounts to computing a simple aggregate over k random attribute partitions.
The effectiveness of lloydtype methods for the kmeans problem
 In FOCS
, 2006
"... We investigate variants of Lloyd’s heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data s ..."
Abstract

Cited by 54 (4 self)
 Add to MetaCart
(Show Context)
We investigate variants of Lloyd’s heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd’s heuristic that quickly lead to provably nearoptimal clustering solutions when applied to wellclusterable instances. This is the first performance guarantee for a variant of Lloyd’s heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd’s method. In addition, our other algorithms are faster on wellclusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloydtype iteration. 1
Sublinear Time Approximate Clustering
, 2001
"... Clustering is of central importance in a number of disciplines including Machine Learning, Statistics, and Data Mining. This paper has two foci: (1) It describes how existing algorithms for clustering can benefit from simple sampling techniques arising from work in statistics [Pol84]. (2) It motivat ..."
Abstract

Cited by 45 (2 self)
 Add to MetaCart
Clustering is of central importance in a number of disciplines including Machine Learning, Statistics, and Data Mining. This paper has two foci: (1) It describes how existing algorithms for clustering can benefit from simple sampling techniques arising from work in statistics [Pol84]. (2) It motivates and introduces a new model of clustering that is in the spirit of the "PAC (probably approximately correct)" learning model, and gives examples of efficient PACclustering algorithms.
Approximate Clustering without the Approximation
"... Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as kmedian, kmeans, and minsum clustering. This quest for ..."
Abstract

Cited by 37 (18 self)
 Add to MetaCart
(Show Context)
Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as kmedian, kmeans, and minsum clustering. This quest for better approximation algorithms is further fueled by the implicit hope that these better approximations also give us more accurate clusterings. E.g., for many problems such as clustering proteins by function, or clustering images by subject, there is some unknown “correct” target clustering and the implicit hope is that approximately optimizing these objective functions will in fact produce a clustering that is close (in symmetric difference) to the truth. In this paper, we show that if we make this implicit assumption explicit—that is, if we assume that any capproximation to the given clustering objective F is ǫclose to the target—then we can produce clusterings that are O(ǫ)close to the target, even for values c for which obtaining a capproximation is NPhard. In particular, for kmedian and kmeans objectives, we show that we can achieve this guarantee for any constant c> 1, and for minsum objective we can do this for any constant c> 2. Our results also highlight a somewhat surprising conceptual difference between assuming that the optimal solution to, say, the kmedian objective is ǫclose to the target, and assuming that any approximately optimal solution is ǫclose to the target, even for approximation factor say c = 1.01. In the former case, the problem of finding a solution that is O(ǫ)close to the target remains computationally hard, and yet for the latter we have an efficient algorithm.
Polynomial Time Approximation Schemes for Geometric kClustering
 J. OF THE ACM
, 2001
"... The JohnsonLindenstrauss lemma states that n points in a high dimensional Hilbert space can be embedded with small distortion of the distances into an O(log n) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random linea ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
(Show Context)
The JohnsonLindenstrauss lemma states that n points in a high dimensional Hilbert space can be embedded with small distortion of the distances into an O(log n) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random linear transformations over the Hamming cube. We use these transformations to solve NPhard clustering problems in the cube as well as in geometric settings. More specifically, we address the following clustering problem. Given n points in a larger set (for example, R^d) endowed with a distance function (for example, L² distance), we would like to partition the data set into k disjoint clusters, each with a "cluster center", so as to minimize the sum over all data points of the distance between the point and the center of the cluster containing the point. The problem is provably NPhard in some high dimensional geometric settings, even for k = 2. We give polynomial time approximation schemes for this problem in several settings, including the binary cube {0, 1}^d with Hamming distance, and R^d either with L¹ distance, or with L² distance, or with the square of L&sup2; distance. In all these settings, the best previous results were constant factor approximation guarantees. We note that our problem is similar in flavor to the kmedian problem (and the related facility location problem), which has been considered in graphtheoretic and fixed dimensional geometric settings, where it becomes hard when k is part of the input. In contrast, we study the problem when k is fixed, but the dimension is part of the input.
The fast JohnsonLindenstrauss transform and approximate nearest neighbors
 SIAM J. Comput
, 2009
"... Abstract. We introduce a new lowdistortion embedding of ℓd n) 2 into ℓO(log p (p =1, 2) called the fast Johnson–Lindenstrauss transform (FJLT). The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We introduce a new lowdistortion embedding of ℓd n) 2 into ℓO(log p (p =1, 2) called the fast Johnson–Lindenstrauss transform (FJLT). The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for lowdistortion embeddings. We overcome this handicap by exploiting the “Heisenberg principle ” of the Fourier transform, i.e., its localglobal duality. The FJLT can be used to speed up search algorithms based on lowdistortion embeddings in ℓ1 and ℓ2. We consider the case of approximate nearest neighbors in ℓd 2. We provide a faster algorithm using classical projections, which we then speed up further by plugging in the FJLT. We also give a faster algorithm for searching over the hypercube.
Correlation Clustering in General Weighted Graphs
 Theoretical Computer Science
, 2006
"... We consider the following general correlationclustering problem [1]: given a graph with real nonnegative edge weights and a 〈+〉/〈− 〉 edge labeling, partition the vertices into clusters to minimize the total weight of cut 〈+ 〉 edges and uncut 〈− 〉 edges. Thus, 〈+ 〉 edges with large weights (represen ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
We consider the following general correlationclustering problem [1]: given a graph with real nonnegative edge weights and a 〈+〉/〈− 〉 edge labeling, partition the vertices into clusters to minimize the total weight of cut 〈+ 〉 edges and uncut 〈− 〉 edges. Thus, 〈+ 〉 edges with large weights (representing strong correlations between endpoints) encourage those endpoints to belong to a common cluster while 〈− 〉 edges with large weights encourage the endpoints to belong to different clusters. In contrast to most clustering problems, correlation clustering specifies neither the desired number of clusters nor a distance threshold for clustering; both of these parameters are effectively chosen to be the best possible by the problem definition. Correlation clustering was introduced by Bansal, Blum, and Chawla [1], motivated by both document clustering and agnostic learning. They proved NPhardness and gave constantfactor approximation algorithms for the special case in which the graph is complete (full information) and every edge has the same weight. We give an O(log n)approximation algorithm for the general case based on a linearprogramming rounding and the “regiongrowing ” technique. We also prove that this linear program has a gap of Ω(log n), and therefore our approximation is tight under this approach. We also give an O(r 3)approximation algorithm for Kr,rminorfree graphs. On the other hand, we show that the problem is equivalent to minimum multicut, and therefore APXhard and difficult to approximate better than Θ(logn). 1
Matrix RowColumn Sampling for the ManyLight Problem
 SIGGRAPH 2007
, 2007
"... Rendering complex scenes with indirect illumination, high dynamic range environment lighting, and many direct light sources remains a challenging problem. Prior work has shown that all these effects can be approximated by many point lights. This paper presents a scalable solution to the manylight ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
Rendering complex scenes with indirect illumination, high dynamic range environment lighting, and many direct light sources remains a challenging problem. Prior work has shown that all these effects can be approximated by many point lights. This paper presents a scalable solution to the manylight problem suitable for a GPU implementation. We view the problem as a large matrix of samplelight interactions; the ideal final image is the sum of the matrix columns. We propose an algorithm for approximating this sum by sampling entire rows and columns of the matrix on the GPU using shadow mapping. The key observation is that the inherent structure of the transfer matrix can be revealed by sampling just a small number of rows and columns. Our prototype implementation can compute the light transfer within a few seconds for scenes with indirect and environment illumination, area lights, complex geometry and arbitrary shaders. We believe this approach can be very useful for rapid previewing in applications like cinematic and architectural lighting design.
Random Projection, Margins, Kernels, and FeatureSelection
 LNCS
, 2005
"... Random projection is a simple technique that has had a number of applications in algorithm design. In the context of machine learning, it can provide insight into questions such as "why is a learning problem easier if data is separable by a large margin?" and "in what sense is cho ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Random projection is a simple technique that has had a number of applications in algorithm design. In the context of machine learning, it can provide insight into questions such as "why is a learning problem easier if data is separable by a large margin?" and "in what sense is choosing a kernel much like choosing a set of features?" This talk is intended to provide an introduction to random projection and to survey some simple learning algorithms and other applications to learning based on it. I will also discuss how, given a kernel as a blackbox function, we can use various forms of random projection to extract an explicit small feature space that captures much of what the kernel is doing. This talk is based in large part on work in [BB05,BBV04] joint with Nina Balcan and Santosh Vempala.