Results 1  10
of
240
On Clusterings: Good, Bad and Spectral
, 2000
"... We motivate and develop a natural bicriteria measure for assessing the quality of a clustering which avoids the drawbacks of existing measures. A simple recursive heuristic has polylogarithmic worstcase guarantees under the new measure. The main result of the paper is the analysis of a popular spe ..."
Abstract

Cited by 254 (12 self)
 Add to MetaCart
We motivate and develop a natural bicriteria measure for assessing the quality of a clustering which avoids the drawbacks of existing measures. A simple recursive heuristic has polylogarithmic worstcase guarantees under the new measure. The main result of the paper is the analysis of a popular spectral algorithm. One variant of spectral clustering turns out to have effective worstcase guarantees
Survey of clustering data mining techniques
, 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract

Cited by 247 (0 self)
 Add to MetaCart
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique
Expander Flows, Geometric Embeddings and Graph Partitioning
 IN 36TH ANNUAL SYMPOSIUM ON THE THEORY OF COMPUTING
, 2004
"... We give a O( log n)approximation algorithm for sparsest cut, balanced separator, and graph conductance problems. This improves the O(log n)approximation of Leighton and Rao (1988). We use a wellknown semidefinite relaxation with triangle inequality constraints. Central to our analysis is a ..."
Abstract

Cited by 238 (18 self)
 Add to MetaCart
We give a O( log n)approximation algorithm for sparsest cut, balanced separator, and graph conductance problems. This improves the O(log n)approximation of Leighton and Rao (1988). We use a wellknown semidefinite relaxation with triangle inequality constraints. Central to our analysis is a geometric theorem about projections of point sets in , whose proof makes essential use of a phenomenon called measure concentration.
Expander Graphs and their Applications
, 2003
"... Contents 1 The Magical Mystery Tour 7 1.1 Some Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.1 Hardness results for linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.2 Error Correcting Codes . . . . . . . ..."
Abstract

Cited by 188 (5 self)
 Add to MetaCart
Contents 1 The Magical Mystery Tour 7 1.1 Some Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.1 Hardness results for linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.2 Error Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.3 Derandomizing Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2 Magical Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.1 A Super Concentrator with O(n) edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.2 Error Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.3 Derandomizing Random Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Nearlylinear time algorithms for graph partitioning, graph sparsification, and solving linear systems (Extended Abstract)
 STOC'04
, 2004
"... We present algorithms for solving symmetric, diagonallydominant linear systems to accuracy ɛ in time linear in their number of nonzeros and log(κf (A)/ɛ), where κf (A) isthe condition number of the matrix defining the linear system. Our algorithm applies the preconditioned Chebyshev iteration with ..."
Abstract

Cited by 136 (8 self)
 Add to MetaCart
We present algorithms for solving symmetric, diagonallydominant linear systems to accuracy ɛ in time linear in their number of nonzeros and log(κf (A)/ɛ), where κf (A) isthe condition number of the matrix defining the linear system. Our algorithm applies the preconditioned Chebyshev iteration with preconditioners designed using nearlylinear time algorithms for graph sparsification and graph partitioning.
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract

Cited by 120 (10 self)
 Add to MetaCart
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse realworld networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large realworld networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “communitylike.” This behavior is not explained, even at a qualitative level, by any of the commonlyused network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are wellembeddable in a lowdimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
Euclidean distortion and the Sparsest Cut
 In Proceedings of the 37th Annual ACM Symposium on Theory of Computing
, 2005
"... BiLipschitz embeddings of finite metric spaces, a topic originally studied in geometric analysis and Banach space theory, became an integral part of theoretical computer science following work of Linial, London, and Rabinovich [29]. They presented an algorithmic version of a result of Bourgain [8] ..."
Abstract

Cited by 93 (20 self)
 Add to MetaCart
BiLipschitz embeddings of finite metric spaces, a topic originally studied in geometric analysis and Banach space theory, became an integral part of theoretical computer science following work of Linial, London, and Rabinovich [29]. They presented an algorithmic version of a result of Bourgain [8] which shows that every
Relations Between Average Case Complexity and Approximation Complexity (Extended Abstract)
 In Proceedings of the 34th Annual ACM Symposium on Theory of Computing
, 2002
"... We investigate relations between average case complexity and the complexity of approximation. Our preliminary findings indicate that this is a research direction that leads to interesting insights. Under the assumption that refuting 3SAT is hard on average on a natural distribution, we derive hardne ..."
Abstract

Cited by 89 (9 self)
 Add to MetaCart
We investigate relations between average case complexity and the complexity of approximation. Our preliminary findings indicate that this is a research direction that leads to interesting insights. Under the assumption that refuting 3SAT is hard on average on a natural distribution, we derive hardness of approximation results for min bisection, dense ksubgraph, max bipartite clique and the 2catalog segmentation problem. No NPhardness of approximation results are currently known for these problems.
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
, 2008
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract

Cited by 79 (6 self)
 Add to MetaCart
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large realworld networks, ranging from traditional and online social networks, to technological and information networks and
A New Rounding Procedure for the Assignment Problem with Applications to Dense Graph Arrangement Problems
, 2001
"... We present a randomized procedure for rounding fractional perfect matchings to (integral) matchings. If the original fractional matching satis es any linear inequality, then with high probability, the new matching satis es that linear inequality in an approximate sense. This extends the wellkn ..."
Abstract

Cited by 77 (3 self)
 Add to MetaCart
We present a randomized procedure for rounding fractional perfect matchings to (integral) matchings. If the original fractional matching satis es any linear inequality, then with high probability, the new matching satis es that linear inequality in an approximate sense. This extends the wellknown LP rounding procedure of Raghavan and Thompson, which is usually used to round fractional solutions of linear programs.