Results 1  10
of
207
The Peer Sampling Service: Experimental Evaluation of Unstructured GossipBased Implementations
 In Middleware ’04: Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
, 2004
"... Abstract. In recent years, the gossipbased communication model in largescale distributed systems has become a general paradigm with important applications which include information dissemination, aggregation, overlay topology management and synchronization. At the heart of all of these protocols l ..."
Abstract

Cited by 143 (29 self)
 Add to MetaCart
Abstract. In recent years, the gossipbased communication model in largescale distributed systems has become a general paradigm with important applications which include information dissemination, aggregation, overlay topology management and synchronization. At the heart of all of these protocols lies a fundamental distributed abstraction: the peer sampling service. In short, the aim of this service is to provide every node with peers to exchange information with. Analytical studies reveal a high reliability and efficiency of gossipbased protocols, under the (often implicit) assumption that the peers to send gossip messages to are selected uniformly at random from the set of all nodes. In practice—instead of requiring all nodes to know all the peer nodes so that a random sample could be drawn—a scalable and efficient way to implement the peer sampling service is by constructing and maintaining dynamic unstructured overlays through gossiping membership information itself. This paper presents a generic framework to implement reliable and efficient peer sampling services. The framework generalizes existing approaches and makes it easy to introduce new ones. We use this framework to explore and compare several implementations of our abstract scheme. Through extensive experimental analysis, we show that all of them lead to different peer sampling services none of which is uniformly random. This clearly renders traditional theoretical approaches invalid, when the underlying peer sampling service is based on a gossipbased scheme. Our observations also help explain important differences between design choices of peer sampling algorithms, and how these impact the reliability of the corresponding service. 1
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract

Cited by 120 (10 self)
 Add to MetaCart
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse realworld networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large realworld networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “communitylike.” This behavior is not explained, even at a qualitative level, by any of the commonlyused network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are wellembeddable in a lowdimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
The phase transition in inhomogeneous random graphs, preprint available from http://www.arxiv.org/abs/math.PR/0504589
"... Abstract. The ‘classical ’ random graph models, in particular G(n, p), are ‘homogeneous’, in the sense that the degrees (for example) tend to be concentrated around a typical value. Many graphs arising in the real world do not have this property, having, for example, powerlaw degree distributions. ..."
Abstract

Cited by 101 (30 self)
 Add to MetaCart
Abstract. The ‘classical ’ random graph models, in particular G(n, p), are ‘homogeneous’, in the sense that the degrees (for example) tend to be concentrated around a typical value. Many graphs arising in the real world do not have this property, having, for example, powerlaw degree distributions. Thus there has been a lot of recent interest in defining and studying ‘inhomogeneous ’ random graph models. One of the most studied properties of these new models is their ‘robustness’, or, equivalently, the ‘phase transition ’ as an edge density parameter is varied. For G(n, p), p = c/n, the phase transition at c = 1 has been a central topic in the study of random graphs for well over 40 years. Many of the new inhomogenous models are rather complicated; although there are exceptions, in most cases precise questions such as determining exactly the critical point of the phase transition are approachable only when there is independence between the edges. Fortunately, some models studied have this already, and others can be approximated by models with
Characterization of complex networks: A survey of measurements
 Advances in Physics
"... Each complex network (or class of networks) presents specific topological features which characterize its connectivity and highly influence the dynamics and function of processes executed on the network. The analysis, discrimination, and synthesis of complex networks therefore rely on the use of mea ..."
Abstract

Cited by 89 (7 self)
 Add to MetaCart
Each complex network (or class of networks) presents specific topological features which characterize its connectivity and highly influence the dynamics and function of processes executed on the network. The analysis, discrimination, and synthesis of complex networks therefore rely on the use of measurements capable of expressing the most relevant topological features. This article presents a survey of such measurements. It includes general considerations about complex network characterization, a brief review of the principal models, and the presentation of the main existing measurements organized into classes. Special attention is given to relating complex network analysis with the areas of pattern recognition and feature selection, as well as on surveying some concepts and measurements from traditional graph theory which are potentially useful for complex network research. Depending on the network and the analysis task one has in mind, a specific set of features may be chosen. It is hoped that the present survey will help the
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
, 2008
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract

Cited by 79 (6 self)
 Add to MetaCart
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large realworld networks, ranging from traditional and online social networks, to technological and information networks and
SmallWorld FileSharing Communities
, 2003
"... Web caches, content distribution networks, peertopeer file sharing networks, distributed file systems, and data grids all have in common that they involve a community of users who generate requests for shared data. In each case, overall system performance can be improved significantly if we can fi ..."
Abstract

Cited by 68 (9 self)
 Add to MetaCart
Web caches, content distribution networks, peertopeer file sharing networks, distributed file systems, and data grids all have in common that they involve a community of users who generate requests for shared data. In each case, overall system performance can be improved significantly if we can first identify and then exploit interesting structure within a community's access patterns. To this end, we propose a novel perspective on file sharing based on the study of the relationships that form among users based on the files in which they are interested. We propose a new structure that captures common user interests in datathe datasharing graph and justify its utility with studies on three datadistribution systems: a highenergy physics collaboration, the Web, and the Kazaa peertopeer network. We find smallworld patterns in the datasharing graphs of all three communities. We analyze these graphs and propose some probable causes for these emergent smallworld patterns. The significance of smallworld patterns is twofold: it provides a rigorous support to intuition and, perhaps most importantly, it suggests ways to design mechanisms that exploit these naturally emerging patterns.
Centerpiece subgraphs: Problem definition and fast solutions
 In KDD
, 2006
"... Given Q nodes in a social network (say, authorship network), how can we find the node/author that is the centerpiece, and has direct or indirect connections to all, or most of them? For example, this node could be the common advisor, or someone who started the research area that the Q nodes belong t ..."
Abstract

Cited by 52 (17 self)
 Add to MetaCart
Given Q nodes in a social network (say, authorship network), how can we find the node/author that is the centerpiece, and has direct or indirect connections to all, or most of them? For example, this node could be the common advisor, or someone who started the research area that the Q nodes belong to. Isomorphic scenarios appear in law enforcement (find the mastermind criminal, connected to all current suspects), gene regulatory networks (find the protein that participates in pathways with all or most of the given Q proteins), viral marketing and many more. Connection subgraphs is an important first step, handling the case of Q=2 query nodes. Then, the connection subgraph algorithm finds the b intermediate nodes, that provide a good connection between the two original query nodes. Here we generalize the challenge in multiple dimensions: First, we allow more than two query nodes. Second, we allow a whole family of queries, ranging from ’OR ’ to ’AND’, with ’softAND ’ inbetween. Finally, we design and compare a fast approximation, and study the quality/speed tradeoff. We also present experiments on the DBLP dataset. The experiments confirm that our proposed method naturally deals with multisource queries and that the resulting subgraphs agree with our intuition. Wallclock timing results on the DBLP dataset show that our proposed approximation achieve good accuracy for about 6: 1 speedup. This material is based upon work supported by the
On the Eigenvalue Power Law
, 2002
"... We show that the largest eigenvalues of graphs whose highest degrees are Zipflike distributed with slope are distributed according to a power law with slope =2. This follows as a direct and almost certain corollary of the degree power law. Our result has implications for the singular value deco ..."
Abstract

Cited by 50 (0 self)
 Add to MetaCart
We show that the largest eigenvalues of graphs whose highest degrees are Zipflike distributed with slope are distributed according to a power law with slope =2. This follows as a direct and almost certain corollary of the degree power law. Our result has implications for the singular value decomposition method in information retrieval.
Machine Perception and Learning of Complex Social Systems
 PH.D. THESIS, PROGRAM IN MEDIA ARTS AND SCIENCES, MASSACHUSETTS INSTITUTE OF TECHNOLOGY
, 2005
"... The study of complex social systems has traditionally been an arduous process, involving extensive surveys, interviews, ethnographic studies, or analysis of online behavior. Today, however, it is possible to use the unprecedented amount of information generated by pervasive mobile phones to provide ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
The study of complex social systems has traditionally been an arduous process, involving extensive surveys, interviews, ethnographic studies, or analysis of online behavior. Today, however, it is possible to use the unprecedented amount of information generated by pervasive mobile phones to provide insights into the dynamics of both individual and group behavior. Information such as continuous proximity, location, communication and activity data, has been gathered from the phones of 100 human subjects at MIT. Systematic measurements from these 100 people over the course of eight months have generated one of the largest datasets of continuous human behavior ever collected, representing over 300,000 hours of daily activity. In this thesis we describe how this data can be used to uncover regular rules and structure in behavior of both individuals and organizations, infer relationships between subjects, verify selfreport
Relevance of Massively Distributed Explorations of the Internet Topology: Simulation Results
, 2005
"... Internet maps are generally constructed using the traceroute tool from a few sources to many destinations. It appeared recently that this exploration process gives a partial and biased view of the real topology, which leads to the idea of increasing the number of sources to improve the quality of ..."
Abstract

Cited by 33 (10 self)
 Add to MetaCart
Internet maps are generally constructed using the traceroute tool from a few sources to many destinations. It appeared recently that this exploration process gives a partial and biased view of the real topology, which leads to the idea of increasing the number of sources to improve the quality of the maps. In this paper, we present a set of experiments we have conduced to evaluate the relevance of this approach. It appears that the statistical properties of the underlying network have a strong influence on the quality of the obtained maps, which can be improved using massively distributed explorations. Conversely, we show that the exploration process induces some properties on the maps. We validate our analysis using realworld data and experiments and we discuss its implications.