Results 1  10
of
115
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract

Cited by 286 (15 self)
 Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacianbased methods in a statistical setting.
On Clusterings: Good, Bad and Spectral
, 2000
"... We motivate and develop a natural bicriteria measure for assessing the quality of a clustering which avoids the drawbacks of existing measures. A simple recursive heuristic has polylogarithmic worstcase guarantees under the new measure. The main result of the paper is the analysis of a popular spe ..."
Abstract

Cited by 254 (12 self)
 Add to MetaCart
We motivate and develop a natural bicriteria measure for assessing the quality of a clustering which avoids the drawbacks of existing measures. A simple recursive heuristic has polylogarithmic worstcase guarantees under the new measure. The main result of the paper is the analysis of a popular spectral algorithm. One variant of spectral clustering turns out to have effective worstcase guarantees
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract

Cited by 120 (10 self)
 Add to MetaCart
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse realworld networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large realworld networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “communitylike.” This behavior is not explained, even at a qualitative level, by any of the commonlyused network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are wellembeddable in a lowdimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
Towards a theoretical foundation for Laplacianbased manifold methods
, 2005
"... Abstract. In recent years manifold methods have attracted a considerable amount of attention in machine learning. However most algorithms in that class may be termed “manifoldmotivated ” as they lack any explicit theoretical guarantees. In this paper we take a step towards closing the gap between t ..."
Abstract

Cited by 103 (10 self)
 Add to MetaCart
Abstract. In recent years manifold methods have attracted a considerable amount of attention in machine learning. However most algorithms in that class may be termed “manifoldmotivated ” as they lack any explicit theoretical guarantees. In this paper we take a step towards closing the gap between theory and practice for a class of Laplacianbased manifold methods. We show that under certain conditions the graph Laplacian of a point cloud converges to the LaplaceBeltrami operator on the underlying manifold. Theorem 1 contains the first result showing convergence of a random graph Laplacian to manifold Laplacian in the machine learning context. 1
Local graph partitioning using PageRank vectors
 In FOCS ’06: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
, 2006
"... A local graph partitioning algorithm finds a cut near a specified starting vertex, with a running time that depends largely on the size of the small side of the cut, rather than the size of the input graph. In this paper, we present an algorithm for local graph partitioning using personalized PageRa ..."
Abstract

Cited by 100 (22 self)
 Add to MetaCart
A local graph partitioning algorithm finds a cut near a specified starting vertex, with a running time that depends largely on the size of the small side of the cut, rather than the size of the input graph. In this paper, we present an algorithm for local graph partitioning using personalized PageRank vectors. We develop an improved algorithm for computing approximate PageRank vectors, and derive a mixing result for PageRank vectors similar to that for random walks. Using this mixing result, we derive an analogue of the Cheeger inequality for PageRank, which shows that a sweep over a single PageRank vector can find a cut with conductance φ, provided there exists a cut with conductance at most f(φ), where f(φ) is Ω(φ 2 / log m), and where m is the number of edges in the graph. By extending this result to approximate PageRank vectors, we develop an algorithm for local graph partitioning that can be used to a find a cut with conductance at most φ, whose small side has volume at least 2 b, in time O(2 b log 3 m/φ 2). Using this local graph partitioning algorithm as a subroutine, we obtain an algorithm that finds a cut with conductance φ and approximately optimal balance in time O(m log 4 m/φ 3). 1
Some Applications of Laplace Eigenvalues of Graphs
 GRAPH SYMMETRY: ALGEBRAIC METHODS AND APPLICATIONS, VOLUME 497 OF NATO ASI SERIES C
, 1997
"... In the last decade important relations between Laplace eigenvalues and eigenvectors of graphs and several other graph parameters were discovered. In these notes we present some of these results and discuss their consequences. Attention is given to the partition and the isoperimetric properties of ..."
Abstract

Cited by 93 (0 self)
 Add to MetaCart
In the last decade important relations between Laplace eigenvalues and eigenvectors of graphs and several other graph parameters were discovered. In these notes we present some of these results and discuss their consequences. Attention is given to the partition and the isoperimetric properties of graphs, the maxcut problem and its relation to semidefinite programming, rapid mixing of Markov chains, and to extensions of the results to infinite graphs.
Spectral Partitioning of Random Graphs
, 2001
"... Problems such as bisection, graph coloring, and clique are generally believed hard in the worst case. However, they can be solved if the input data is drawn randomly from a distribution over graphs containing acceptable solutions. In this paper we show that a simple spectral algorithm can solve all ..."
Abstract

Cited by 87 (3 self)
 Add to MetaCart
Problems such as bisection, graph coloring, and clique are generally believed hard in the worst case. However, they can be solved if the input data is drawn randomly from a distribution over graphs containing acceptable solutions. In this paper we show that a simple spectral algorithm can solve all three problems above in the average case, as well as a more general problem of partitioning graphs based on edge density. In nearly all cases our approach meets or exceeds previous parameters, while introducing substantial generality. We apply spectral techniques, using foremost the observation that in all of these problems, the expected adjacency matrix is a low rank matrix wherein the structure of the solution is evident.
Finding a large hidden clique in a random graph
, 1998
"... ABSTRACT: We consider the following probabilistic model of a graph on n labeled vertices. First choose a random graph Gn,1�2 Ž., and then choose randomly a subset Q of vertices of size k and force it to be a clique by joining every pair of vertices of Q by an edge. The problem is to give a polynomia ..."
Abstract

Cited by 83 (5 self)
 Add to MetaCart
ABSTRACT: We consider the following probabilistic model of a graph on n labeled vertices. First choose a random graph Gn,1�2 Ž., and then choose randomly a subset Q of vertices of size k and force it to be a clique by joining every pair of vertices of Q by an edge. The problem is to give a polynomial time algorithm for finding this hidden clique almost surely for various values of k. This question was posed independently, in various variants, by Jerrum and by Kucera. In this paper we present an efficient algorithm for all k�cn0.5 ˇ, for
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
, 2008
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract

Cited by 79 (6 self)
 Add to MetaCart
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large realworld networks, ranging from traditional and online social networks, to technological and information networks and