Results 1 - 10
of
65
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract
-
Cited by 65 (6 self)
- Add to MetaCart
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse real-world networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large real-world networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “community-like.” This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are well-embeddable in a low-dimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
Characterization of complex networks: A survey of measurements
- Advances in Physics
"... Each complex network (or class of networks) presents specific topological features which characterize its connectivity and highly influence the dynamics and function of processes executed on the network. The analysis, discrimination, and synthesis of complex networks therefore rely on the use of mea ..."
Abstract
-
Cited by 50 (4 self)
- Add to MetaCart
Each complex network (or class of networks) presents specific topological features which characterize its connectivity and highly influence the dynamics and function of processes executed on the network. The analysis, discrimination, and synthesis of complex networks therefore rely on the use of measurements capable of expressing the most relevant topological features. This article presents a survey of such measurements. It includes general considerations about complex network characterization, a brief review of the principal models, and the presentation of the main existing measurements organized into classes. Special attention is given to relating complex network analysis with the areas of pattern recognition and feature selection, as well as on surveying some concepts and measurements from traditional graph theory which are potentially useful for complex network research. Depending on the network and the analysis task one has in mind, a specific set of features may be chosen. It is hoped that the present survey will help the
Computing communities in large networks using random walks
- J. of Graph Alg. and App. bf
, 2004
"... Dense subgraphs of sparse graphs (communities), which appear in most real-world complex networks, play an important role in many contexts. Computing them however is generally expensive. We propose here a measure of similarities between vertices based on random walks which has several important advan ..."
Abstract
-
Cited by 43 (1 self)
- Add to MetaCart
Dense subgraphs of sparse graphs (communities), which appear in most real-world complex networks, play an important role in many contexts. Computing them however is generally expensive. We propose here a measure of similarities between vertices based on random walks which has several important advantages: it captures well the community structure in a network, it can be computed efficiently, and it can be used in an agglomerative algorithm to compute efficiently the community structure of a network. We propose such an algorithm, called Walktrap, which runs in time O(mn 2) and space O(n 2) in the worst case, and in time O(n 2 log n) and space O(n 2) in most real-world cases (n and m are respectively the number of vertices and edges in the input graph). Extensive comparison tests show that our algorithm surpasses previously proposed ones concerning the quality of the obtained community structures and that it stands among the best ones concerning the running time.
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
- CoRR
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large real-world networks, ranging from traditional and on-line social networks, to technological and information networks and
Distributed community detection in delay tolerant networks
- In: Proc. of Int. Wrkshp. on Mobility in the Evolving Internet Architecture, MobiArch
, 2007
"... Community is an important attribute of Pocket Switched Networks (PSN), because mobile devices are carried by people who tend to belong to communities. We analysed community structure from mobility traces and used for forwarding algorithms [12], which shows significant impact of community. Here, we p ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
Community is an important attribute of Pocket Switched Networks (PSN), because mobile devices are carried by people who tend to belong to communities. We analysed community structure from mobility traces and used for forwarding algorithms [12], which shows significant impact of community. Here, we propose and evaluate three novel distributed community detection approaches with great potential to detect both static and temporal communities. We find that with suitable configuration of the threshold values, the distributed community detection can approximate their corresponding centralised methods up to 90 % accuracy.
A socio-aware overlay for publish/subscribe communication in delay tolerant networks
- In Proc. MSWiM
, 2007
"... The emergence of Delay Tolerant Networks (DTNs) has culminated in a new generation of wireless networking. We focus on a type of human-to-human communication in DTNs, where human behaviour exhibits the characteristics of networks by forming a community. We show the characteristics of such networks f ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
The emergence of Delay Tolerant Networks (DTNs) has culminated in a new generation of wireless networking. We focus on a type of human-to-human communication in DTNs, where human behaviour exhibits the characteristics of networks by forming a community. We show the characteristics of such networks from extensive study of realworld human connectivity traces. We exploit distributed community detection from the trace and propose a Socio-Aware Overlay over detected communities for publish/subscribe communication. Centrality nodes have the best visibility to the other nodes in the network. We create an overlay with such centrality nodes from communities. Distributed community detection operates when nodes (i.e. devices) are in contact by gossipping, and subscription propagation is performed along with this operation. We validate our message dissemination algorithms for publish/subscribe with connectivity traces.
Relational learning via latent social dimensions, in 'KDD '09
- Proceedings di of the 15th ACM SIGKDD international ti conference on Knowledge
, 2009
"... Social media such as blogs, Facebook, Flickr, etc., presents data in a network format rather than classical IID distribution. To address the interdependency among data instances, relational learning has been proposed, and collective inference based on network connectivity is adopted for prediction. ..."
Abstract
-
Cited by 15 (9 self)
- Add to MetaCart
Social media such as blogs, Facebook, Flickr, etc., presents data in a network format rather than classical IID distribution. To address the interdependency among data instances, relational learning has been proposed, and collective inference based on network connectivity is adopted for prediction. However, the connections in social media are often multi-dimensional. An actor can connect to another actor due to different factors, e.g., alumni, colleagues, living in the same city or sharing similar interest, etc. Collective inference normally does not differentiate these connections. In this work, we propose to extract latent social dimensions based on network information first, and then utilize them as features for discriminative learning. These social dimensions describe different affiliations of social actors hidden in the network, and the subsequent discriminative learning can automatically determine which affiliations are better aligned with the class labels. Such a scheme is preferred when multiple diverse relations are associated with the same network. We conduct extensive experiments on social media data (one from a real-world blog site and the other from a popular content sharing site). Our model outperforms representative relational learning methods based on collective inference, especially when few labeled data are available. The sensitivity of this model and its connection to existing methods are also carefully examined.
Latent social structure in open source projects
- PROCEEDINGS OF THE 16TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING
, 2008
"... Commercial software project managers design project organizational structure carefully, mindful of available skills, division of labour, geographical boundaries, etc. These organizational “cathedrals ” are to be contrasted with the “bazaarlike” nature of Open Source Software (OSS) Projects, which ha ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Commercial software project managers design project organizational structure carefully, mindful of available skills, division of labour, geographical boundaries, etc. These organizational “cathedrals ” are to be contrasted with the “bazaarlike” nature of Open Source Software (OSS) Projects, which have no pre-designed organizational structure. Any structure that exists is dynamic, self-organizing, latent, and usually not explicitly stated. However, in large, complex, successful, OSS projects, we expect that sub-communities will form organically within the “bazaar ” of developer teams. Studying these sub-communities, and their behavior can shed light on how successful OSS projects self-organize. This phenomenon could even hold important lessons for how commercial software teams might be organized. Building on wellestablished techniques for detecting community structure in complex networks, we extract and evaluate latent subcommunities from the email social network of several projects: Apache HTTPD, Python, PostgresSQL, Perl, and Apache ANT. We then validate them with software development activity history. Our results show that subcommunities do indeed form within these projects. We find, in other words, that “chapels ” (if not cathedrals) spontaneously arise within the bazaar as OSS systems and the teams evolve. We also find that these subgroups manifest most strongly in technical discussions, and are significantly connected with collaboration behaviour. 1.
Communities in networks
- Notices of the American Mathematical Society
, 2009
"... Economic Forum within the framework of the ..."
Modularity-Maximizing Graph Communities via Mathematical Programming
"... In many networks, it is of great interest to identify communities, unusually densely knit groups of individuals. Such communities often shed light on the function of the networks or underlying properties of the individuals. Recently, Newman suggested modularity as a natural measure of the quality ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In many networks, it is of great interest to identify communities, unusually densely knit groups of individuals. Such communities often shed light on the function of the networks or underlying properties of the individuals. Recently, Newman suggested modularity as a natural measure of the quality of a network partitioning into communities. Since then, various algorithms have been proposed for (approximately) maximizing the modularity of the partitioning determined. In this paper, we introduce the technique of rounding mathematical programs to the problem of modularity maximization, presenting two novel algorithms. More specifically, the algorithms round solutions to linear and vector programs. Importantly, the linear programing algorithm comes with an a posteriori approximation guarantee: by comparing the solution quality to the fractional solution of the linear program, a bound on the available “room for improvement ” can be obtained. The vector programming algorithm provides a similar bound for the best partition into two communities. We evaluate both algorithms using experiments on several standard test cases for network partitioning algorithms, and find that they perform comparably or better than past algorithms, while being more efficient than exhaustive techniques.

