## Computing communities in large networks using random walks (2004)

### Cached

### Download Links

- [www.liafa.jussieu.fr]
- [arxiv.org]
- [arxiv.org]
- [www.emis.de]
- [www.cs.brown.edu]
- [www.emis.math.ca]
- [jgaa.info]
- [emis.maths.tcd.ie]
- [emis.maths.adelaide.edu.au]
- [emis.u-strasbg.fr]
- [emis.library.cornell.edu]
- [www.univie.ac.at]
- [www.emis.ams.org]
- [www.math.ethz.ch]
- [www.maths.soton.ac.uk]
- [www.maths.tcd.ie]
- [www.liafa.jussieu.fr]
- DBLP

### Other Repositories/Bibliography

Venue: | J. of Graph Alg. and App. bf |

Citations: | 108 - 2 self |

### BibTeX

@ARTICLE{Pons04computingcommunities,

author = {Pascal Pons and Matthieu Latapy},

title = {Computing communities in large networks using random walks},

journal = {J. of Graph Alg. and App. bf},

year = {2004},

volume = {10},

pages = {284--293}

}

### Years of Citing Articles

### OpenURL

### Abstract

Dense subgraphs of sparse graphs (communities), which appear in most real-world complex networks, play an important role in many contexts. Computing them however is generally expensive. We propose here a measure of similarities between vertices based on random walks which has several important advantages: it captures well the community structure in a network, it can be computed efficiently, and it can be used in an agglomerative algorithm to compute efficiently the community structure of a network. We propose such an algorithm, called Walktrap, which runs in time O(mn 2) and space O(n 2) in the worst case, and in time O(n 2 log n) and space O(n 2) in most real-world cases (n and m are respectively the number of vertices and edges in the input graph). Extensive comparison tests show that our algorithm surpasses previously proposed ones concerning the quality of the obtained community structures and that it stands among the best ones concerning the running time.

### Citations

1550 | The structure and function of complex networks
- Newman
(Show Context)
Citation Context ...nt domains such as sociology (acquaintance networks, collaboration networks), biology (metabolic networks, gene networks) or computer science (internet topology, web graph, p2p networks). We refer to =-=[45, 42, 1, 31, 12]-=- for reviews from different perspectives and for an extensive bibliography. The associated graphs are in general globally sparse but locally dense: there exist groups of vertices, called communities, ... |

1370 | Data Clustering: a Review - Jain, Murty, et al. - 1999 |

1318 | Statistical mechanics of complex networks
- Albert, Barabási
(Show Context)
Citation Context ...nt domains such as sociology (acquaintance networks, collaboration networks), biology (metabolic networks, gene networks) or computer science (internet topology, web graph, p2p networks). We refer to =-=[45, 42, 1, 31, 12]-=- for reviews from different perspectives and for an extensive bibliography. The associated graphs are in general globally sparse but locally dense: there exist groups of vertices, called communities, ... |

1085 |
An ecient heuristic procedure for partitioning graphs
- Kernighan, Lin
- 1970
(Show Context)
Citation Context ... very recent works, but this topic is related to the classical problem of graph partitioning that consists in splitting a graph into a given number of groups while minimizing the cost of the edge cut =-=[17, 35, 28]-=-. However, these algorithms are not well suited to our case because they need the number of communities and their size as parameters. The recent interest in the domain has started with a new divisive ... |

814 |
Community structure in social and biological networks
- Girvan, Newman
- 2002
(Show Context)
Citation Context ...l suited to our case because they need the number of communities and their size as parameters. The recent interest in the domain has started with a new divisive approach proposed by Girvan and Newman =-=[23, 33]-=-: the edges with the largest betweenness (number of shortest paths passing through an edge) are removed one by one in order to split hierarchically the graph into communities. This algorithm runs in t... |

799 |
Finding and evaluating community structure in networks, Phys
- Newman, Girvan
- 2004
(Show Context)
Citation Context ...fore, we will design an algorithm which finds communities satisfying this criterion. More precisley, we will evaluate the quality of a partition into communities using a quantity (known as modularity =-=[32, 33]-=-) which captures this. We will consider throughout this paper an undirected graph G = (V, E) with n = |V | vertices and m = |E| edges. We impose that each vertex is linked to itself by a loop (we add ... |

617 |
Social Network Analysis
- Wasserman, Faust
- 1994
(Show Context)
Citation Context ...nt domains such as sociology (acquaintance networks, collaboration networks), biology (metabolic networks, gene networks) or computer science (internet topology, web graph, p2p networks). We refer to =-=[45, 42, 1, 31, 12]-=- for reviews from different perspectives and for an extensive bibliography. The associated graphs are in general globally sparse but locally dense: there exist groups of vertices, called communities, ... |

565 |
Hierarchical grouping to optimize an objective function
- Ward
- 1963
(Show Context)
Citation Context ...g problem. We will use here an efficient hierarchical clustering algorithm that allows us to find community structures at different scales. We present an agglomerative approach based on Ward’s method =-=[44]-=- that is well suited to our distance and gives very good results while reducing the number of distance computations. We start from a partition P1 = {{v}, v ∈ V } of the graph into n communities reduce... |

553 |
Comparing partitions
- Hubert, Arabie
- 1985
(Show Context)
Citation Context ... 13 10 1 NsTo evaluate the quality of the partition found by the algorithms, we compare them to the original generated partition. To achieve this, we use the Rand index corrected by Hubert and Arabie =-=[37, 25]-=- which evaluates the similarities between two partitions. The Rand index R(P1, P2) is the ratio of pairs of vertices correlated by the partitions P1 and P2 (two vertices are correlated by the partitio... |

514 |
Partitioning sparse matrices with eigenvectors of graphs
- Pothen, Simon, et al.
- 1990
(Show Context)
Citation Context ... very recent works, but this topic is related to the classical problem of graph partitioning that consists in splitting a graph into a given number of groups while minimizing the cost of the edge cut =-=[17, 35, 28]-=-. However, these algorithms are not well suited to our case because they need the number of communities and their size as parameters. The recent interest in the domain has started with a new divisive ... |

484 |
Objective Criteria for Evaluation of Clustering Methods
- Rand
(Show Context)
Citation Context ... 13 10 1 NsTo evaluate the quality of the partition found by the algorithms, we compare them to the original generated partition. To achieve this, we use the Rand index corrected by Hubert and Arabie =-=[37, 25]-=- which evaluates the similarities between two partitions. The Rand index R(P1, P2) is the ratio of pairs of vertices correlated by the partitions P1 and P2 (two vertices are correlated by the partitio... |

432 |
Algebraic connectivity of graphs
- Fiedler
- 1973
(Show Context)
Citation Context ... very recent works, but this topic is related to the classical problem of graph partitioning that consists in splitting a graph into a given number of groups while minimizing the cost of the edge cut =-=[17, 35, 28]-=-. However, these algorithms are not well suited to our case because they need the number of communities and their size as parameters. The recent interest in the domain has started with a new divisive ... |

407 |
Finding community structure in very large networks, Phys
- Clauset, Newman, et al.
- 2004
(Show Context)
Citation Context ...unity structure in time O(mnH) where H is the height of the corresponding dendrogram. The worst case is O(mn 2 ). But most real-world complex networks are sparse (m = O(n)) and, as already noticed in =-=[8]-=-, H is generally small and tends to the most favourable case in which the dendrogram is balanced (H = O(log n)). In this case, the complexity is therefore O(n 2 log n). We finally evaluate the perform... |

387 | Reversible Markov chains and random walks on graphs, in progress. Manuscript available at www.stat.berkeley.edu/∼aldous/RWG/book.html
- Aldous, Fill
- 1999
(Show Context)
Citation Context ...d our results to weighted graphs (Aij ∈ R + instead of Aij ∈ {0, 1}), which is an advantage of this approach. Let us consider a discrete random walk process (or diffusion process) on the graph G (see =-=[30, 4]-=- for a complete presentation of the topic). At each time step a walker is on a vertex and moves to a vertex chosen randomly and uniformly among its neighbors. The sequence of visited vertices is a Mar... |

384 |
Evolution of Networks: From Biological Nets to the Internet and WWW
- Dorogovtsev, Mendes
- 2000
(Show Context)
Citation Context |

382 |
Exploring complex networks
- Strogatz
- 2001
(Show Context)
Citation Context |

350 |
Fast algorithm for detecting community structure in networks
- Newman
- 2004
(Show Context)
Citation Context ...fore, we will design an algorithm which finds communities satisfying this criterion. More precisley, we will evaluate the quality of a partition into communities using a quantity (known as modularity =-=[32, 33]-=-) which captures this. We will consider throughout this paper an undirected graph G = (V, E) with n = |V | vertices and m = |E| edges. We impose that each vertex is linked to itself by a loop (we add ... |

291 |
Uncovering the overlapping community structure of complex networks in nature and society
- Palla, Derényi, et al.
- 2005
(Show Context)
Citation Context ...cale visualization tool for large networks, and it may be relevant for the computation of overlapping communities (which often occurs in real-world cases and on which very few has been done until now =-=[34]-=-). We consider these two points as promising directions for further work. Finally, we pointed out that the method is directly usable for weighted networks. For directed ones (like the important case o... |

270 |
Hierarchical organization of modularity in metabolic networks
- Ravasz, Somera, et al.
- 2002
(Show Context)
Citation Context ...h few links to other vertices. This kind of structure brings out much information about the network. For example, in a metabolic network the communities correspond to biological functions of the cell =-=[38]-=-. In the web graph the communities correspond to topics of interest [29, 18]. This notion of community is however difficult to define formally. Many definitions have been proposed in social networks s... |

264 | On clustering: Good, bad and spectral - Kannan, Vempala, et al. - 2000 |

261 | Random walks on graphs: a survey
- Lovász
- 1993
(Show Context)
Citation Context ...d our results to weighted graphs (Aij ∈ R + instead of Aij ∈ {0, 1}), which is an advantage of this approach. Let us consider a discrete random walk process (or diffusion process) on the graph G (see =-=[30, 4]-=- for a complete presentation of the topic). At each time step a walker is on a vertex and moves to a vertex chosen randomly and uniformly among its neighbors. The sequence of visited vertices is a Mar... |

226 |
Graph Clustering by Flow Simulation
- Dongen
- 2000
(Show Context)
Citation Context ...sage time of walkers. Zhou and Lipowsky [48] introduced another dissimilarity index based on the same quantity; it has been used in a hierarchical algorithm (called Netwalk). Markov Cluster Algorithm =-=[43]-=- iterates two matrix operations (one corresponding to random walks) bringing out clusters in the limit state. Unfortunately the three last approaches run in O(n 3 ) and cannot manage networks with mor... |

182 | An information flow model for conflict and fission in small groups
- Zachary
- 1977
(Show Context)
Citation Context ...k. We only compared the value of the modularity found by the different algorithms. The results are reported in Table 2. We used the following real world networks : • The Zachary’s karate club network =-=[47]-=-, a small social network that has been widely used to test most of the community detection algorithms. • The college football network from [23]. • The protein interaction network studied in [27]. • A ... |

175 | The diameter of the World Wide Web - Albert, Jeong, et al. - 1999 |

160 | SelfOrganization and Identification of Web Communities
- Flake, Lawrence, et al.
(Show Context)
Citation Context ...formation about the network. For example, in a metabolic network the communities correspond to biological functions of the cell [38]. In the web graph the communities correspond to topics of interest =-=[29, 18]-=-. This notion of community is however difficult to define formally. Many definitions have been proposed in social networks studies [45], but they are too restrictive or cannot be computed efficiently.... |

157 | Comparing community structure identification - Danon, Diaz-Guilera, et al. |

145 |
Defining and identifying communities in networks
- Radicci, Castellano, et al.
(Show Context)
Citation Context ...s passing through an edge) are removed one by one in order to split hierarchically the graph into communities. This algorithm runs in time O(m 2 n). Similar algorithms were proposed by Radicchi et al =-=[36]-=- and by Fortunato et al [19]. The first one uses a local quantity (the number of loops of a given length containing an edge) to choose the edges to remove and runs in time O(m 2 ). The second one uses... |

144 | Functional cartography of complex metabolic networks, Nature 433 - Guimera, A - 2005 |

95 |
Exploring complex networks, Nature 410
- Strogatz
- 2001
(Show Context)
Citation Context ...different domains such as sociology (acquaintance or collaboration networks), biology (metabolic networks, gene networks) or computer science (Internet topology, Web graph, P2P networks). We refer to =-=[1,2,3,4,5]-=- for reviews from different perspectives and for an extensive bibliography. The associated graphs are in general globally sparse but locally dense: there exist groups of vertices, called communities, ... |

94 |
Community detection in complex networks using extremal optimization, phys
- Duch, Arenas
- 2005
(Show Context)
Citation Context ...larities between vertices. The complexity is determined by the computation of all the eigenvectors, in O(n 3 ) time for sparse matrices. Other interesting methods have been proposed, see for instance =-=[46, 9, 39, 5, 7, 14]-=-. Random walks themselves have already been used to infer structural properties of networks in some previous works. Gaume [21] used this notion in linguistic context. Fouss et al [20] used the Euclide... |

91 |
Barabási AL, Statistical Mechanics of Complex Networks
- Albert
- 2002
(Show Context)
Citation Context ...different domains such as sociology (acquaintance or collaboration networks), biology (metabolic networks, gene networks) or computer science (Internet topology, Web graph, P2P networks). We refer to =-=[1,2,3,4,5]-=- for reviews from different perspectives and for an extensive bibliography. The associated graphs are in general globally sparse but locally dense: there exist groups of vertices, called communities, ... |

87 | Random graphs - Gilbert - 1959 |

82 | editors. Network Analysis: Methodological Foundations - Brandes, Erlebach - 2005 |

81 |
The structure of the web
- Kleinberg, Lawrence
- 2001
(Show Context)
Citation Context ...formation about the network. For example, in a metabolic network the communities correspond to biological functions of the cell [38]. In the web graph the communities correspond to topics of interest =-=[29, 18]-=-. This notion of community is however difficult to define formally. Many definitions have been proposed in social networks studies [45], but they are too restrictive or cannot be computed efficiently.... |

76 | Clustering in Large graphs via Singular Value Decomposition
- Drineas, Frieze, et al.
- 2004
(Show Context)
Citation Context ... is a greedy algorithm that tries to solve the problem of maximizing σk for each k. This problem is known to be NP-hard: even for a given k, maximizing σk is the NP-hard “K-Median clustering problem” =-=[16, 13]-=- for K = (n − k) clusters. The existing approximation algorithms [16, 13] are exponential with the number of clusters to find and unsuitable for our purpose. So for each pair of adjacent communities {... |

73 |
Finding local community structure in networks
- Clauset
- 2005
(Show Context)
Citation Context ...larities between vertices. The complexity is determined by the computation of all the eigenvectors, in O(n 3 ) time for sparse matrices. Other interesting methods have been proposed, see for instance =-=[46, 9, 39, 5, 7, 14]-=-. Random walks themselves have already been used to infer structural properties of networks in some previous works. Gaume [21] used this notion in linguistic context. Fouss et al [20] used the Euclide... |

71 | Diffusion maps, spectral clustering and eigenfunctions of fokker-planck operators - Nadler, Lafon, et al. |

70 |
Random Walks on Graphs: a Survey, Combinatorics, Paul Erdős is Eighty
- Lovász
- 1996
(Show Context)
Citation Context ...his paper. It is however trivial to extend our results to weighted graphs (Aij ∈ R + instead of Aij ∈{0, 1}). Let us consider a discrete random walk process (or diffusion process) on the graph G (see =-=[23]-=- for a complete presentation of the topic). At each time step a walker is on a vertex and moves to a vertex chosen randomly and uniformly among its neighbors. The sequence of visited vertices is a Mar... |

60 |
Finding communities in linear time: A physics approach
- Wu, Huberman
- 2004
(Show Context)
Citation Context ...larities between vertices. The complexity is determined by the computation of all the eigenvectors, in O(n 3 ) time for sparse matrices. Other interesting methods have been proposed, see for instance =-=[46, 9, 39, 5, 7, 14]-=-. Random walks themselves have already been used to infer structural properties of networks in some previous works. Gaume [21] used this notion in linguistic context. Fouss et al [20] used the Euclide... |

57 |
Approximation schemes for clustering problems
- Vega, Karpinski, et al.
- 2003
(Show Context)
Citation Context ... is a greedy algorithm that tries to solve the problem of maximizing σk for each k. This problem is known to be NP-hard: even for a given k, maximizing σk is the NP-hard “K-Median clustering problem” =-=[16, 13]-=- for K = (n − k) clusters. The existing approximation algorithms [16, 13] are exponential with the number of clusters to find and unsuitable for our purpose. So for each pair of adjacent communities {... |

53 | Experiments on Graph Clustering Algorithms - Brandes, Gaertler, et al. - 2003 |

51 |
Detecting Fuzzy Community Structures in Complex Networks with a Potts
- Reichardt, Bornholdt
- 2004
(Show Context)
Citation Context |

48 |
Detecting network communities: a new systematic and efficient algorithm
- Donetti, Muñoz
(Show Context)
Citation Context ...ularity which measures the quality of a partition. This algorithm runs in O(mn) and has recently been improved to a complexity O(mH log n) (with our notations) [8]. The algorithm of Donetti and Muñoz =-=[10]-=- also uses a hierarchical clustering method: they use the eigenvectors of the Laplacian matrix of the graph to measure the similarities between vertices. The complexity is determined by the computatio... |

35 | Local method for detecting communities
- Bagrow, Bollt
- 2005
(Show Context)
Citation Context |

30 |
A method to find community structure based on information centrality
- FORTUNATO, LATORA, et al.
- 2004
(Show Context)
Citation Context ... V ) represents a good community structure if the proportion of edges inside the Ci (internal edges) is high compared to the proportion of edges between them (see for example the definitions given in =-=[19]-=-). Therefore, we will design an algorithm which finds communities satisfying this criterion. More precisley, we will evaluate the quality of a partition into communities using a quantity (known as mod... |

29 | On clustering using random walks - Harel, Koren - 2001 |

25 | Clustering using a random-walk based distance measure - Yen, Vanvyve, et al. - 2005 |

18 | A novel way of computing dissimilarities between nodes of a graph, with application to collaborative filtering
- Fouss, Pirotte, et al.
- 2004
(Show Context)
Citation Context ...[46, 9, 39, 5, 7, 14]. Random walks themselves have already been used to infer structural properties of networks in some previous works. Gaume [21] used this notion in linguistic context. Fouss et al =-=[20]-=- used the Euclidean commute time distance based on the average first-passage time of walkers. Zhou and Lipowsky [48] introduced another dissimilarity index based on the same quantity; it has been used... |

15 |
Cluster Analysis and Data Analysis
- Jambu, Lebeaux
- 1983
(Show Context)
Citation Context ...here is to notice that these quantities can be efficiently computed thanks to the fact that our distance is a Euclidean distance, which makes it possible to obtain the two following classical results =-=[26]-=-: Theorem 5 The increase of σ after the merging of two communities C1 and C2 is directly related to the distance rC1C2 by: ∆σ(C1, C2) = 1 |C1||C2| n |C1| + |C2| r2 C1C2 Proof : First notice that � i∈C... |

13 |
Network Brownian motion: A new method to measure vertex-vertex proximity and to identify communities and subcommunities
- Zhou, Lipowsky
- 2004
(Show Context)
Citation Context ...some previous works. Gaume [21] used this notion in linguistic context. Fouss et al [20] used the Euclidean commute time distance based on the average first-passage time of walkers. Zhou and Lipowsky =-=[48]-=- introduced another dissimilarity index based on the same quantity; it has been used in a hierarchical algorithm (called Netwalk). Markov Cluster Algorithm [43] iterates two matrix operations (one cor... |