#### DMCA

## Mixing local and global information for community detection in large networks

### Citations

4579 |
The Strength of Weak Ties
- Granovetter
- 1973
(Show Context)
Citation Context ...ial networks such as Facebook [15, 16]; (i) the assessment of sociological conjectures that involve finding clusters according to importance of edges, for example the strength of the weak ties theory =-=[24]-=-, and (iii) enhancing the performance of different state-of-the-art clustering algorithms (such as COPRA [25] or OSLOM [30]) by pre-processing networks by means of the random walk based measure of cen... |

3775 | Normalized cuts and image segmentation
- Shi, Malik
- 2000
(Show Context)
Citation Context ...omputational cost, which restricts their applicability to toy models or small instances (e.g., samples) of real-world networks. Recently, attention is increasingly paid to spectral clustering methods =-=[28, 40, 44, 48]-=-, which strive to optimize suitable cost functions. Empirical analysis provides evidence that spectral clustering methods are able to achieve excellent performance in some domains, e.g., image segment... |

3606 |
Social Network Analysis: Methods and Applications
- Wasserman, Faust
- 1994
(Show Context)
Citation Context ...e κ-path centrality values have been computed, CONCLUDE proceeds to compute the distance between each pair of connected vertices. Such a definition is based on the principle of structural equivalence =-=[47]-=-: two vertices i and j are considered close if their neighbors are close too. In particular, a vertex k which is a neighbor of both i and j is assumed to be close to both i and j if the probability th... |

2206 | Probability inequalities for sums of bounded random variables.
- Hoeffding
- 1963
(Show Context)
Citation Context ...hat 1 ≤ i ≤ n, the random variable Xi ranges in the real interval [ai, bi]. Let X = (X1 + · · · + Xn)/n. For any t ≥ 0 we have Pr(|X − E[X]| ≥ t) ≤ 2 exp ( − 2t 2n2∑n i=1(bi − ai)2 ) . (3) Proof. See =-=[27]-=- As a special case, if all random variables Xi can only take up value 0 or 1 then Equation (3) simplifies to Pr(|X − E[X]| ≥ t) ≤ 2 exp ( −2t2n ) . (4) We are now able to prove our claims. Theorem 3... |

1702 | Y.: On spectral clustering: Analysis and an algorithm
- Ng, Jordan, et al.
- 2001
(Show Context)
Citation Context ...omputational cost, which restricts their applicability to toy models or small instances (e.g., samples) of real-world networks. Recently, attention is increasingly paid to spectral clustering methods =-=[28, 40, 44, 48]-=-, which strive to optimize suitable cost functions. Empirical analysis provides evidence that spectral clustering methods are able to achieve excellent performance in some domains, e.g., image segment... |

1508 |
Community structure in social and biological networks
- Girvan, Newman
(Show Context)
Citation Context ...should take into account the whole network topology. Approaches based on the knowledge of the whole network structure are defined as global approaches. Among them, we cite the Girvan-Newman algorithm =-=[21]-=- (which is to the best of our knowledge the first algorithm that attempt to maximize Q), the method based on information centrality [19] and several others (please see [17] for a survey). The worst-ca... |

1474 |
Finding and evaluating community structure in networks
- Newman, Girvan
(Show Context)
Citation Context ...n a graph for any candidate partition. In the latest years, several algorithms that try to maximize Q have been designed, see, e.g., [4, 8, 14, 37]. Empirical studies carried out by Newman and Girvan =-=[37, 39]-=- on a wide variety of artificial and real networks highlighted a correlation between high values of modularity and the actual community structure of a network. As a further empirical result, Newman an... |

813 | Community detection in graphs
- Fortunato
- 2010
(Show Context)
Citation Context ...groups of vertices called communities or clusters such that there is a large number of edges connecting vertices inside the same community and few edges linking vertices located in different clusters =-=[17]-=-. For a given network, represented by a graph G = 〈V, E〉 where V is the set ∗Corresponding author Email addresses: pdemeo@unime.it (Pasquale De Meo ), ferrarae@indiana.edu (Emilio Ferrara ), gfiumara@... |

755 |
Matrix Analysis and Applied Linear Algebra
- Meyer
- 2000
(Show Context)
Citation Context ...rtex i to the vertex j and it is equal to 1 otherwise. By construction, the matrix EG is symmetric and, therefore, there is an orthogonal family of eigenvectors of ẽ1, . . . , ẽn associated with EG =-=[34]-=-. This means that ẽti · ẽ j = δi j, being δi j the above mentioned Kronecker function. We can write EG as EG = ∑n i=1 λiẽiẽti, being λi the i-th eigenvalue. The 13 probability of going from the ve... |

687 |
Finding community structure in very large networks
- Clauset, Newman, et al.
(Show Context)
Citation Context ...wn facts about real social networks: the size of communities greatly varies, ranging from few communities gathering a large number of individuals, to many communities containing only few participants =-=[8, 9, 41]-=-. A breakthrough in community detection has been the introduction of a cost function called network modularity (or, in short, modularity, usually denoted as Q); it is based on the edge density in a gr... |

601 |
Uncovering the overlapping community structure of complex networks in nature and society
- Palla, Derenyi, et al.
- 2005
(Show Context)
Citation Context ...wn facts about real social networks: the size of communities greatly varies, ranging from few communities gathering a large number of individuals, to many communities containing only few participants =-=[8, 9, 41]-=-. A breakthrough in community detection has been the introduction of a cost function called network modularity (or, in short, modularity, usually denoted as Q); it is based on the edge density in a gr... |

586 | Fast unfolding of communities in large networks
- Blondel, Guillaume, et al.
(Show Context)
Citation Context ...odularity, usually denoted as Q); it is based on the edge density in a graph for any candidate partition. In the latest years, several algorithms that try to maximize Q have been designed, see, e.g., =-=[4, 8, 14, 37]-=-. Empirical studies carried out by Newman and Girvan [37, 39] on a wide variety of artificial and real networks highlighted a correlation between high values of modularity and the actual community str... |

564 |
Fast algorithm for detecting community structure in networks
- Newman
(Show Context)
Citation Context ...odularity, usually denoted as Q); it is based on the edge density in a graph for any candidate partition. In the latest years, several algorithms that try to maximize Q have been designed, see, e.g., =-=[4, 8, 14, 37]-=-. Empirical studies carried out by Newman and Girvan [37, 39] on a wide variety of artificial and real networks highlighted a correlation between high values of modularity and the actual community str... |

543 | A faster algorithm for betweenness centrality
- Brandes
- 2002
(Show Context)
Citation Context ... large graphs. The most time-expensive part of the Girvan-Newman strategy is the calculation of the betweenness centrality. Efficient algorithms have been designed to approximate the edge betweenness =-=[5]-=-, or to efficiently compute shortest paths, for example in the context of weighted graphs [43]. For real-world graphs, however, the computational costs still remains prohibitive. Several variants of t... |

436 | Searching in metric spaces
- Chavez, Navarro, et al.
(Show Context)
Citation Context ...r of items composing the database, then it is possible to design efficient distance-based data structures which make the retrieval or the indexing of that items easy and fast even on huge collections =-=[7]-=-. Our approach to compute distances relies on the computation κ-path edge centralities and this provides two main advantages. First, the computation of distances is fully automatic and does not requir... |

380 | Segmentation using eigenvectors: A unifying view
- Weiss
- 1999
(Show Context)
Citation Context ...omputational cost, which restricts their applicability to toy models or small instances (e.g., samples) of real-world networks. Recently, attention is increasingly paid to spectral clustering methods =-=[28, 40, 44, 48]-=-, which strive to optimize suitable cost functions. Empirical analysis provides evidence that spectral clustering methods are able to achieve excellent performance in some domains, e.g., image segment... |

309 | Resolution limit in community detection - Fortunato, Barthélemy - 2007 |

285 | Comparing community structure identification
- Danon, D́ıaz-Guilera, et al.
- 2005
(Show Context)
Citation Context ...wn facts about real social networks: the size of communities greatly varies, ranging from few communities gathering a large number of individuals, to many communities containing only few participants =-=[8, 9, 41]-=-. A breakthrough in community detection has been the introduction of a cost function called network modularity (or, in short, modularity, usually denoted as Q); it is based on the edge density in a gr... |

276 | A measure of betweenness centrality based on random walks
- Newman
(Show Context)
Citation Context ...this limitation, we decided to use multiple random walks to simulate the propagation of a message. Such an idea has been already successfully exploited to compute the centrality of vertices in graphs =-=[1, 38]-=-. The usage of random walks to simulate simple κ-paths allowed us to design a heuristic algorithm to efficiently approximate edge centrality. Our algorithm is called ERW-Kpath - Edge Random Walk κ-pat... |

265 | Functional cartography of complex metabolic networks. Nature
- Guimera, Amaral
- 2005
(Show Context)
Citation Context ...e and functions of a module would impact the overall system. As an example, in the biological domain community detection algorithms have been deployed to clarify the functioning of metabolic networks =-=[26]-=- or to understand how some proteins interact in small groups (or subsystems) called ’modules,’ or forming so-called complexes [46]. In Computer Science and Sociology, community detection algorithms ha... |

211 | Sampling from large graphs
- Leskovec, Faloutsos
- 2006
(Show Context)
Citation Context ...munity structure artificially-generated according to the LFR benchmark [29]. In particular, Datasets 1–5 represent the undirected networks of coauthors of article appeared in Arxiv3, as of April 2003 =-=[31]-=-, in the field of, respectively, General Relativity and Quantum Cosmology – CA-GrQc, High Energy Physics (Theory) – CA-HepTh, High Energy Physics (Phenomenology) – CA-HepPh, Astro Physics – CA-AstroPh... |

201 | On the evolution of user interaction in facebook
- Viswanath, Mislove, et al.
- 2009
(Show Context)
Citation Context ...henomenology) – CA-HepPh, Astro Physics – CA-AstroPh, and Condensed Matter Physics – CA-CondMat. Dataset 6 describes a small sample of the Facebook network, representing its directed friendship graph =-=[45]-=-. Finally, dataset 7 represent a large sample of the Facebook network collected by Gjoka et al. [22]. This experiment has been designed to quantitatively evaluate the performance of our strategy in re... |

183 |
Near linear time algorithm to detect community structures in large-scale networks
- Raghavan, Albert, et al.
- 2007
(Show Context)
Citation Context ...ttained by using three different techniques: (i) the already presented Louvain method (LM), (ii) COPRA [25], which is a fast clustering detection algorithm based on the principle of label propagation =-=[42]-=-, and, finally (iii) OSLOM [30], a local optimization algorithm able to finding statistically significant clusters. Prior to presenting the results of our tests, we briefly describe the main features ... |

152 |
A.: Community detection in complex networks using extremal optimization
- Duch, Arenas
- 2005
(Show Context)
Citation Context ...odularity, usually denoted as Q); it is based on the edge density in a graph for any candidate partition. In the latest years, several algorithms that try to maximize Q have been designed, see, e.g., =-=[4, 8, 14, 37]-=-. Empirical studies carried out by Newman and Girvan [37, 39] on a wide variety of artificial and real networks highlighted a correlation between high values of modularity and the actual community str... |

148 |
Benchmark graphs for testing community detection algorithms
- Lancichinetti, Fortunato, et al.
- 2008
(Show Context)
Citation Context ...LUDE clustering against the results of well-known algorithms such as the Louvain Method alone, COPRA [25] and OSLOM [30]. As for synthetic (artificially-generated) networks, we used the LFR benchmark =-=[29]-=- to generate 72 networks whose community structure was known in advance. We compared communities found by CONCLUDE with the actual ones by using the so-called Normalized Mutual Information measure fro... |

113 | Walking in Facebook: A Case Study of Unbiased Sampling of OSNs
- Gjoka, Kurant, et al.
- 2010
(Show Context)
Citation Context ...aset 6 describes a small sample of the Facebook network, representing its directed friendship graph [45]. Finally, dataset 7 represent a large sample of the Facebook network collected by Gjoka et al. =-=[22]-=-. This experiment has been designed to quantitatively evaluate the performance of our strategy in real-world applications. To configure the ERW-Kpath, the values of ρ and β have been tuned as previous... |

94 | A decentralized algorithm for spectral analysis
- Kempe, McSherry
- 2004
(Show Context)
Citation Context |

77 |
Detecting network communities: a new systematic and efficient algorithm
- Donetti, Munoz
- 2004
(Show Context)
Citation Context ...used on the problem of computing distances between vertices of a graph and, subsequently, to use such a distance with the purpose of clustering the graph. A nice approach is due to Donetti and Muñoz =-=[13]-=-. In that paper, the authors suggested to consider the Laplacian LG associated with a graph G, which is defined as L(G) = DG − AG. Here DG is a diagonal matrix such that DG[i, i] is equal to the degre... |

70 |
Performance of modularity maximization in practical contexts.
- Good, Montjoye, et al.
- 2010
(Show Context)
Citation Context ...larity depends on the fact that there exists an exponential number of partitions of a graph whose modularity values are close each other and these values are also close to the the global maximum of Q =-=[23]-=-. On the one hand, such a result explains why methods which are in principle very different each other generate graph clusterings whose modularity values are quite close. On the other hand, different ... |

69 | An algorithm to find overlapping community structure
- Gregory
(Show Context)
Citation Context ...ebook, consists of 63,731 vertices and 1,545,684 edges. We compared the modularity achieved by CONCLUDE clustering against the results of well-known algorithms such as the Louvain Method alone, COPRA =-=[25]-=- and OSLOM [30]. As for synthetic (artificially-generated) networks, we used the LFR benchmark [29] to generate 72 networks whose community structure was known in advance. We compared communities foun... |

58 | Finding statistically significant communities in networks
- Lancichinetti, Radicchi, et al.
- 2011
(Show Context)
Citation Context ... of 63,731 vertices and 1,545,684 edges. We compared the modularity achieved by CONCLUDE clustering against the results of well-known algorithms such as the Louvain Method alone, COPRA [25] and OSLOM =-=[30]-=-. As for synthetic (artificially-generated) networks, we used the LFR benchmark [29] to generate 72 networks whose community structure was known in advance. We compared communities found by CONCLUDE w... |

53 | Analysis of the structure of complex networks at different resolution levels
- Arenas, Fernandez, et al.
- 2008
(Show Context)
Citation Context ...ploring the graph at various levels and, depending on the specific value of λ, it can favor the discovery of small communities or of large communities respectively. An analogous study is presented in =-=[2]-=-. Berry et al. [3] studied how to alleviate the resolution limit problem in the context of weighted graphs. In that paper, the authors suggested to assign a weight equal to 1 to each inter-cluster edg... |

48 |
Method to find community structures based on information centrality
- Fortunato, Latora, et al.
- 2004
(Show Context)
Citation Context ...bal approaches. Among them, we cite the Girvan-Newman algorithm [21] (which is to the best of our knowledge the first algorithm that attempt to maximize Q), the method based on information centrality =-=[19]-=- and several others (please see [17] for a survey). The worst-case time complexity of global approaches is, unfortunately, very high. Thus, these strategies cannot be successfully applied on very larg... |

37 | On finding graph clusterings with maximum modularity
- Brandes, Delling, et al.
(Show Context)
Citation Context ...n: two vertices i and j provide a non zero contribution to the value of Q if and only if they belong to the same community. The problem of maximizing Q has been proved to be NP-hard by Brandes et al. =-=[6]-=-. The first, non-trivial, approximability results beyond the NP-Hardness were proposed by Das, Gupta and Desai [10]. They studied dense graphs separately from sparse ones and their main result proves ... |

37 | Horizons of observability and limits of informal control
- Friedkin
- 1983
(Show Context)
Citation Context ... coverage (i.e., she wants that the message is delivered to as many users as possible.) Therefore, she must avoid that the message is sent twice to the same user. 2. Bounded Length Paths. As shown in =-=[20]-=-, distant vertices in social networks (i.e., those vertices that are connected by long paths only) are unlikely to influence each other. We agree with this observation and figure that two vertices are... |

36 |
Genome evolution reveals biochemical networks and functional modules
- Mering, Zdobnov, et al.
- 2003
(Show Context)
Citation Context ...s have been deployed to clarify the functioning of metabolic networks [26] or to understand how some proteins interact in small groups (or subsystems) called ’modules,’ or forming so-called complexes =-=[46]-=-. In Computer Science and Sociology, community detection algorithms have been exploited to understand the social structures arising from the interactions of single individuals; this has relevant pract... |

35 |
A.: Local Modularity Measure for Network Clusterizations.
- Muff, Rao, et al.
- 2005
(Show Context)
Citation Context ...efore require to assume that each vertex has a partial horizon, and, consequently, it is allowed to interact with just a portion of the graph. This would lead to consider local measures of modularity =-=[35]-=-. Li et al. [32] introduced a function, called modularity density which is not based on a null model. The usage of modularity density is proven to yield better results than those achieved by the modul... |

30 | Hierarchical modularity in human brain functional networks
- Meunier, Lambiotte, et al.
- 2009
(Show Context)
Citation Context ... different domains of application: for example, the application of CONCLUDE could be promising in the context of Neuroinformatics, applied to the connectome (i.e., the human brain functional network) =-=[33]-=- or Bioinformatics, to detect protein complexes in proteininteraction networks [36]. Further extensions of CONCLUDE will be designed to face additional scientific challenges, such as the possibility o... |

25 |
Tolerating the community detection resolution limit with edge weighting
- Berry, Hendrickson, et al.
- 2011
(Show Context)
Citation Context ...topology of the network. Several authors have proposed ad-hoc solutions to alleviate the resolution limit problem such as providing novel definitions of modularity [32] or adding weights to the edges =-=[3]-=-. 2 In this article we both define a methodology for clustering graphs which couples the accuracy of global approaches with the computational performances guaranteed by local methods and, at the same ... |

23 |
Quantitative function for community detection
- Li, Zhang, et al.
- 2008
(Show Context)
Citation Context ...ances, that typically depend on the topology of the network. Several authors have proposed ad-hoc solutions to alleviate the resolution limit problem such as providing novel definitions of modularity =-=[32]-=- or adding weights to the edges [3]. 2 In this article we both define a methodology for clustering graphs which couples the accuracy of global approaches with the computational performances guaranteed... |

22 | Detecting overlapping protein complexes in protein-protein interaction networks,”
- Nepusz, Yu, et al.
- 2012
(Show Context)
Citation Context ...e promising in the context of Neuroinformatics, applied to the connectome (i.e., the human brain functional network) [33] or Bioinformatics, to detect protein complexes in proteininteraction networks =-=[36]-=-. Further extensions of CONCLUDE will be designed to face additional scientific challenges, such as the possibility of discovering overlapping clusters. So far, our algorithm is able to produce a stro... |

19 |
A Large-Scale Community Structure Analysis in Facebook
- Ferrara
- 2012
(Show Context)
Citation Context ...authors. Our ongoing research efforts focus on adopting CONCLUDE in several contexts. We are investigating: (i) the emergence of a community structure in large online social networks such as Facebook =-=[15, 16]-=-; (i) the assessment of sociological conjectures that involve finding clusters according to importance of edges, for example the strength of the weak ties theory [24], and (iii) enhancing the performa... |

12 |
K-path centrality: a new centrality measure in social networks
- Alahakoon, Tripathi, et al.
(Show Context)
Citation Context ...this limitation, we decided to use multiple random walks to simulate the propagation of a message. Such an idea has been already successfully exploited to compute the centrality of vertices in graphs =-=[1, 38]-=-. The usage of random walks to simulate simple κ-paths allowed us to design a heuristic algorithm to efficiently approximate edge centrality. Our algorithm is called ERW-Kpath - Edge Random Walk κ-pat... |

10 |
On the complexity of newman’s community finding approach for biological and social networks
- DasGupta, Desai
- 2012
(Show Context)
Citation Context ...ommunity. The problem of maximizing Q has been proved to be NP-hard by Brandes et al. [6]. The first, non-trivial, approximability results beyond the NP-Hardness were proposed by Das, Gupta and Desai =-=[10]-=-. They studied dense graphs separately from sparse ones and their main result proves the (1 + ε)-in-approximability of Q in the case of dense graphs and a logarithmic approximation in the case of spar... |

10 | Community structure discovery in Facebook
- Ferrara
- 2012
(Show Context)
Citation Context ...authors. Our ongoing research efforts focus on adopting CONCLUDE in several contexts. We are investigating: (i) the emergence of a community structure in large online social networks such as Facebook =-=[15, 16]-=-; (i) the assessment of sociological conjectures that involve finding clusters according to importance of edges, for example the strength of the weak ties theory [24], and (iii) enhancing the performa... |

9 | A novel measure of edge centrality in social networks, Knowledge-Based Systems 30
- Meo, Ferrara, et al.
- 2012
(Show Context)
Citation Context ... second stage, it adopts distances between points as a guide to perform clustering. In the first phase, in order to map graph vertices onto points, CONCLUDE relies on the concept of κ-path centrality =-=[11, 12]-=-. The κ-path edge centrality of an edge is defined as the probability that the edge is selected to spread information in the network; such probability is computed by a suitable process of information ... |

5 | Enhancing community detection using a network weighting strategy
- Meo, Ferrara, et al.
(Show Context)
Citation Context ... second stage, it adopts distances between points as a guide to perform clustering. In the first phase, in order to map graph vertices onto points, CONCLUDE relies on the concept of κ-path centrality =-=[11, 12]-=-. The κ-path edge centrality of an edge is defined as the probability that the edge is selected to spread information in the network; such probability is computed by a suitable process of information ... |

1 |
Approximate shortest paths in weighted graphs
- Raphael
(Show Context)
Citation Context ...of the betweenness centrality. Efficient algorithms have been designed to approximate the edge betweenness [5], or to efficiently compute shortest paths, for example in the context of weighted graphs =-=[43]-=-. For real-world graphs, however, the computational costs still remains prohibitive. Several variants of this strategy have been proposed during the years, such as the Fast Clustering Algorithm provid... |