Results 1  10
of
183
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract

Cited by 289 (15 self)
 Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacianbased methods in a statistical setting.
Expander Graphs and their Applications
, 2003
"... Contents 1 The Magical Mystery Tour 7 1.1 Some Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.1 Hardness results for linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.2 Error Correcting Codes . . . . . . . ..."
Abstract

Cited by 186 (5 self)
 Add to MetaCart
Contents 1 The Magical Mystery Tour 7 1.1 Some Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.1 Hardness results for linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.2 Error Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.3 Derandomizing Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2 Magical Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.1 A Super Concentrator with O(n) edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.2 Error Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.3 Derandomizing Random Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The unique games conjecture, integrality gap for cut problems and embeddability of negative type metrics into ℓ1
 In Proceedings of the 46th IEEE Symposium on Foundations of Computer Science
, 2005
"... In this paper we disprove the following conjecture due to Goemans [16] and Linial [24] (also see [5, 26]): “Every negative type metric embeds into ℓ1 with constant distortion.” We show that for every δ>0, and for large enough n, there is an npoint negative type metric which requires distortion atl ..."
Abstract

Cited by 127 (11 self)
 Add to MetaCart
In this paper we disprove the following conjecture due to Goemans [16] and Linial [24] (also see [5, 26]): “Every negative type metric embeds into ℓ1 with constant distortion.” We show that for every δ>0, and for large enough n, there is an npoint negative type metric which requires distortion atleast (log log n) 1/6−δ to embed into ℓ1. Surprisingly, our construction is inspired by the Unique Games Conjecture (UGC) of Khot [19], establishing a previously unsuspected connection between PCPs and the theory of metric embeddings. We first prove that the UGC implies superconstant hardness results for (nonuniform) SPARSEST CUT and MINIMUM UNCUT problems. It is already known that the UGC also implies an optimal hardness result for MAXIMUM CUT [20]. Though these hardness results depend on the UGC, the integrality gap instances rely “only ” on the PCP reductions for the respective problems. Towards this, we first construct an integrality gap instance for a natural SDP relaxation of UNIQUE GAMES. Then, we “simulate ” the PCP reduction and “translate ” the integrality gap instance of UNIQUE GAMES to integrality gap instances for the respective cut problems! This enables us to prove a (log log n) 1/6−δ integrality gap for (nonuniform) SPARSEST CUT and MINIMUM UNCUT, and an optimal integrality gap for MAXIMUM CUT. All our SDP solutions satisfy the socalled “triangle inequality ” constraints. This also shows, for the first time, that the triangle inequality constraints do not add any power to the GoemansWilliamson’s SDP relaxation of MAXIMUM CUT. The integrality gap for SPARSEST CUT immediately implies a lower bound for embedding negative type metrics into ℓ1. It also disproves the nonuniform version of Arora, Rao and Vazirani’s Conjecture [5], asserting that the integrality gap of the SPARSEST CUT SDP, with the triangle inequality constraints, is bounded from above by a constant.
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract

Cited by 123 (10 self)
 Add to MetaCart
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse realworld networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large realworld networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “communitylike.” This behavior is not explained, even at a qualitative level, by any of the commonlyused network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are wellembeddable in a lowdimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
Euclidean distortion and the Sparsest Cut
 In Proceedings of the 37th Annual ACM Symposium on Theory of Computing
, 2005
"... BiLipschitz embeddings of finite metric spaces, a topic originally studied in geometric analysis and Banach space theory, became an integral part of theoretical computer science following work of Linial, London, and Rabinovich [29]. They presented an algorithmic version of a result of Bourgain [8] ..."
Abstract

Cited by 95 (20 self)
 Add to MetaCart
BiLipschitz embeddings of finite metric spaces, a topic originally studied in geometric analysis and Banach space theory, became an integral part of theoretical computer science following work of Linial, London, and Rabinovich [29]. They presented an algorithmic version of a result of Bourgain [8] which shows that every
Measured descent: A new embedding method for finite metrics
 In Proc. 45th FOCS
, 2004
"... We devise a new embedding technique, which we call measured descent, based on decomposing a metric space locally, at varying speeds, according to the density of some probability measure. This provides a refined and unified framework for the two primary methods of constructing Fréchet embeddings for ..."
Abstract

Cited by 83 (26 self)
 Add to MetaCart
We devise a new embedding technique, which we call measured descent, based on decomposing a metric space locally, at varying speeds, according to the density of some probability measure. This provides a refined and unified framework for the two primary methods of constructing Fréchet embeddings for finite metrics, due to [Bourgain, 1985] and [Rao, 1999]. We prove that any npoint metric space (X, d) embeds in Hilbert space with distortion O ( √ αX · log n), where αX is a geometric estimate on the decomposability of X. As an immediate corollary, we obtain an O ( √ (log λX)log n) distortion embedding, where λX is the doubling constant of X. Since λX ≤ n, this result recovers Bourgain’s theorem, but when the metric X is, in a sense, “lowdimensional, ” improved bounds are achieved. Our embeddings are volumerespecting for subsets of arbitrary size. One consequence is the existence of (k, O(log n)) volumerespecting embeddings for all 1 ≤ k ≤ n, which is the best possible, and answers positively a question posed by U. Feige. Our techniques are also used to answer positively a question of Y. Rabinovich, showing that any weighted npoint planar graph O(log n) embeds in ℓ∞ with O(1) distortion. The O(log n) bound on the dimension is optimal, and improves upon the previously known bound of O((log n) 2). 1
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
, 2008
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract

Cited by 81 (6 self)
 Add to MetaCart
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large realworld networks, ranging from traditional and online social networks, to technological and information networks and
On the Hardness of Approximating Multicut and SparsestCut
 In Proceedings of the 20th Annual IEEE Conference on Computational Complexity
, 2005
"... We show that the MULTICUT, SPARSESTCUT, and MIN2CNF ≡ DELETION problems are NPhard to approximate within every constant factor, assuming the Unique Games Conjecture of Khot [STOC, 2002]. A quantitatively stronger version of the conjecture implies inapproximability factor of Ω(log log n). 1. ..."
Abstract

Cited by 74 (4 self)
 Add to MetaCart
We show that the MULTICUT, SPARSESTCUT, and MIN2CNF ≡ DELETION problems are NPhard to approximate within every constant factor, assuming the Unique Games Conjecture of Khot [STOC, 2002]. A quantitatively stronger version of the conjecture implies inapproximability factor of Ω(log log n). 1.
Empirical comparison of algorithms for network community detection
 In Proc. WWW’10
, 2010
"... Detecting clusters or communities in large realworld graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity ..."
Abstract

Cited by 64 (4 self)
 Add to MetaCart
Detecting clusters or communities in large realworld graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity than external connectivity, and then one applies approximation algorithms or heuristics to extract sets of nodes that are related to the objective function and that “look like” good communities for the application of interest. In this paper, we explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify. We evaluate several common objective functions that are used to formalize the notion of a network community, and we examine several different classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objective and asking for an approximation to the best cluster of any size, we consider a sizeresolved version of the optimization problem. Considering community quality as a function of its size provides a much finer lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have nonobvious sizedependent behavior.