Results 1 - 10
of
59
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
- CoRR
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large real-world networks, ranging from traditional and on-line social networks, to technological and information networks and
An event-based framework for characterizing the evolution of interaction graphs
, 2007
"... Interaction graphs are ubiquitous in many fields such as bioinformatics, sociology and physical sciences. There have been many studies in the literature targeted at studying and mining these graphs. However, almost all of them have studied these graphs from a static point of view. The study of the e ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
Interaction graphs are ubiquitous in many fields such as bioinformatics, sociology and physical sciences. There have been many studies in the literature targeted at studying and mining these graphs. However, almost all of them have studied these graphs from a static point of view. The study of the evolution of these graphs over time can provide tremendous insight on the behavior of entities, communities and the flow of information among them. In this work, we present an event-based characterization of critical behavioral patterns for temporally varying interaction graphs. We use non-overlapping snapshots of interaction graphs and develop a framework for capturing and identifying interesting events from them. We use these events to characterize complex behavioral patterns of individuals and communities over time. We show how semantic information can be incorporated to reason about community-behavior events. We also demonstrate the application of behavioral patterns for the purposes of modeling evolution, link prediction and influence maximization. Finally, we present a diffusion model for evolving networks, based on our framework.
Inferring Networks of Diffusion and Influence
"... Information diffusion and virus propagation are fundamental processes talking place in networks. While it is often possible to directly observe when nodes become infected, observing individual transmissions (i.e., who infects whom or who influences whom) is typically very difficult. Furthermore, in ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
Information diffusion and virus propagation are fundamental processes talking place in networks. While it is often possible to directly observe when nodes become infected, observing individual transmissions (i.e., who infects whom or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NP-hard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and in practice gives provably near-optimal performance. We demonstrate the effectiveness of our approach by tracing information cascades in a set of 170 million blogs and news articles over a one year period to infer how information flows through the online media space. We find that the diffusion network of news tends to have a core-periphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them.
Weighted Graphs and Disconnected Components Patterns and a Generator
"... The vast majority of earlier work has focused on graphs which are both connected (typically by ignoring all but the giant connected component), and unweighted. Here we study numerous, real, weighted graphs, and report surprising discoveries on the way in which new nodes join and form links in a soci ..."
Abstract
-
Cited by 22 (12 self)
- Add to MetaCart
The vast majority of earlier work has focused on graphs which are both connected (typically by ignoring all but the giant connected component), and unweighted. Here we study numerous, real, weighted graphs, and report surprising discoveries on the way in which new nodes join and form links in a social network. The motivating questions were the following: How do connected components in a graph form and change over time? What happens after new nodes join a network – how common are repeated edges? We study numerous diverse, real graphs (citation networks, networks in social media, internet traffic, and others); and make the following contributions: (a) we observe that the non-giant connected components seem to stabilize in size, (b) we observe the weights on the edges follow several power laws with surprising exponents, and (c) we propose an intuitive, generative model for graph growth that obeys observed patterns.
Sybil-resilient online content voting
- In Proceedings of the 6th Symposium on Networked System Design and Implementation (NSDI
, 2009
"... Obtaining user opinion (using votes) is essential to ranking user-generated online content. However, any content voting system is susceptible to the Sybil attack where adversaries can out-vote real users by creating many Sybil identities. In this paper, we present SumUp, a Sybilresilient vote aggreg ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
Obtaining user opinion (using votes) is essential to ranking user-generated online content. However, any content voting system is susceptible to the Sybil attack where adversaries can out-vote real users by creating many Sybil identities. In this paper, we present SumUp, a Sybilresilient vote aggregation system that leverages the trust network among users to defend against Sybil attacks. SumUp uses the technique of adaptive vote flow aggregation to limit the number of bogus votes cast by adversaries to no more than the number of attack edges in the trust network (with high probability). Using user feedback on votes, SumUp further restricts the voting power of adversaries who continuously misbehave to below the number of their attack edges. Using detailed evaluation of several existing social networks (YouTube, Flickr), we show SumUp’s ability to handle Sybil attacks. By applying SumUp on the voting trace of Digg, a popular news voting site, we have found strong evidence of attack on many articles marked “popular ” by Digg. 1
Radius Plots for Mining Tera-byte Scale Graphs: Algorithms, Patterns, and Observations
"... Given large, multi-million node graphs (e.g., FaceBook, web-crawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers of the graphs? We show that the Radius Plot (pdf of node radii) can answer these questions. However, computing the Radius Plot ..."
Abstract
-
Cited by 13 (10 self)
- Add to MetaCart
Given large, multi-million node graphs (e.g., FaceBook, web-crawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers of the graphs? We show that the Radius Plot (pdf of node radii) can answer these questions. However, computing the Radius Plot is prohibitively expensive for graphs reaching the planetary scale. There are two major contributions in this paper: (a) We propose HADI (HAdoop DIameter and radii estimator), a carefully designed and fine-tuned algorithm to compute the diameter of massive graphs, that runs on the top of the HADOOP /MAPREDUCE system, with excellent scale-up on the number of available machines (b) We run HADI on several real world datasets including YahooWeb (6B edges, 1/8 of a Terabyte), one of the largest public graphs ever analyzed. Thanks to HADI, we report fascinating patterns on large networks, like the surprisingly small effective diameter, the multi-modal/bi-modal shape of the Radius Plot, and its palindrome motion over time. 1
Kronecker Graphs: An Approach to Modeling Networks
- JOURNAL OF MACHINE LEARNING RESEARCH 11 (2010) 985-1042
, 2010
"... How can we generate realistic networks? In addition, how can we do so with a mathematically tractable model that allows for rigorous analysis of network properties? Real networks exhibit a long list of surprising properties: Heavy tails for the in- and out-degree distribution, heavy tails for the ei ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
How can we generate realistic networks? In addition, how can we do so with a mathematically tractable model that allows for rigorous analysis of network properties? Real networks exhibit a long list of surprising properties: Heavy tails for the in- and out-degree distribution, heavy tails for the eigenvalues and eigenvectors, small diameters, and densification and shrinking diameters over time. Current network models and generators either fail to match several of the above properties, are complicated to analyze mathematically, or both. Here we propose a generative model for networks that is both mathematically tractable and can generate networks that have all the above mentioned structural properties. Our main idea here is to use a non-standard matrix operation, the Kronecker product, to generate graphs which we refer to as “Kronecker graphs”. First, we show that Kronecker graphs naturally obey common network properties. In fact, we rigorously prove that they do so. We also provide empirical evidence showing that Kronecker graphs can effectively model the structure of real networks. We then present KRONFIT, a fast and scalable algorithm for fitting the Kronecker graph generation model to large real networks. A naive approach to fitting would take super-exponential
Rumour spreading and graph conductance
- IN PROCEEDINGS OF THE 21ST ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA
, 2010
"... We show that if a connected graph with n nodes has conductance φ then rumour spreading, also known as randomized broadcast, successfully broadcasts a message within O(log 4 n/φ 6) many steps, with high probability, using the PUSH-PULL strategy. An interesting feature of our approach is that it draws ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
We show that if a connected graph with n nodes has conductance φ then rumour spreading, also known as randomized broadcast, successfully broadcasts a message within O(log 4 n/φ 6) many steps, with high probability, using the PUSH-PULL strategy. An interesting feature of our approach is that it draws a connection between rumour spreading and the spectral sparsification procedure of Spielman and Teng [23].
Analyzing Patterns of User Content Generation in Online Social Networks
"... Various online social networks (OSNs) have been developed rapidly on the Internet. Researchers have analyzed different properties of such OSNs, mainly focusing on the formation and evolution of the networks as well as the information propagation over the networks. In knowledge-sharing OSNs, such as ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Various online social networks (OSNs) have been developed rapidly on the Internet. Researchers have analyzed different properties of such OSNs, mainly focusing on the formation and evolution of the networks as well as the information propagation over the networks. In knowledge-sharing OSNs, such as blogs and question answering systems, issues on how users participate in the network and how users “generate/contribute” knowledge are vital to the sustained and healthy growth of the networks. However, related discussions have not been reported in the research literature. In this work, we empirically study workloads from three popular knowledge-sharing OSNs, including a blog system, a social bookmark sharing network, and a question answering social network to examine these properties. Our analysis consistently shows that (1) users ’ posting behavior in these networks exhibits strong daily and weekly patterns, but the user active time in these OSNs does not follow exponential distributions; (2) the user posting behavior in these OSNs follows stretched exponential distributions instead of power-law distributions, indicating the influence of a small number of core users cannot dominate the network; (3) the distributions of user contributions on high-quality and effort-consuming contents in these OSNs have smaller stretch factors for the stretched exponential distribution. Our study provides insights into user activity patterns and lays out an analytical foundation for further understanding various properties of these OSNs.
Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition
"... As Internet communications and applications become more complex, operating, managing and securing networks have become increasingly challenging tasks. There are urgent demands for more sophisticated techniques for understanding and analyzing the behavioral characteristics of network traffic. In this ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
As Internet communications and applications become more complex, operating, managing and securing networks have become increasingly challenging tasks. There are urgent demands for more sophisticated techniques for understanding and analyzing the behavioral characteristics of network traffic. In this paper, we study the network traffic behaviors using traffic activity graphs (TAGs), which capture the interactions among hosts engaging in certain types of communications and their collective behavior. TAGs derived from real network traffic are large, sparse, yet seemingly complex and richly connected, therefore difficult to visualize and comprehend. In order to analyze and characterize these TAGs, we propose a novel statistical traffic graph decomposition technique based on orthogonal nonnegative matrix tri-factorization (tNMF) to decompose and extract the core host interaction patterns and other structural properties. Using the real network traffic traces, we demonstrate that our tNMF-based graph decomposition technique produces meaningful and interpretable results. It enables us to characterize and quantify the key structural properties of large and sparse TAGs associated with various applications, and study their formation and evolution.

