Results 1  10
of
75
Graph evolution: Densification and shrinking diameters
 ACM TKDD
, 2007
"... How do real graphs evolve over time? What are “normal” growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include hea ..."
Abstract

Cited by 127 (13 self)
 Add to MetaCart
How do real graphs evolve over time? What are “normal” growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include heavy tails for in and outdegree distributions, communities, smallworld phenomena, and others. However, given the lack of information about network evolution over long periods, it has been hard to convert these findings into statements about trends over time. Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time, with the number of edges growing superlinearly in the number of nodes. Second, the average distance between nodes often shrinks over time, in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O(log n) or O(log(log n)). Existing graph generation models do not exhibit these types of behavior, even at a qualitative level. We provide a new graph generator, based on a “forest fire” spreading process, that has a simple, intuitive justification, requires very few parameters (like the “flammability ” of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study. We also notice that the “forest fire” model exhibits a sharp transition between sparse graphs and graphs that are densifying. Graphs with decreasing distance between the nodes are generated around this transition point. Last, we analyze the connection between the temporal evolution of the degree distribution and densification of a graph. We find that the two are fundamentally related. We also observe that real networks exhibit this type of r
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract

Cited by 125 (10 self)
 Add to MetaCart
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse realworld networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large realworld networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “communitylike.” This behavior is not explained, even at a qualitative level, by any of the commonlyused network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are wellembeddable in a lowdimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
, 2008
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract

Cited by 82 (6 self)
 Add to MetaCart
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large realworld networks, ranging from traditional and online social networks, to technological and information networks and
Scalable modeling of real graphs using Kronecker multiplication
 IN 24TH ICML
, 2007
"... Given a large, real graph, how can we generate a synthetic graph that matches its properties, i.e., it has similar degree distribution, similar (small) diameter, similar spectrum, etc? We propose to use “Kronecker graphs”, which naturally obey all of the above properties, and we present KronFit, a f ..."
Abstract

Cited by 53 (9 self)
 Add to MetaCart
Given a large, real graph, how can we generate a synthetic graph that matches its properties, i.e., it has similar degree distribution, similar (small) diameter, similar spectrum, etc? We propose to use “Kronecker graphs”, which naturally obey all of the above properties, and we present KronFit, a fast and scalable algorithm for fitting the Kronecker graph generation model to real networks. A naive approach to fitting would take superexponential time. In contrast, KronFit takes linear time, by exploiting the structure of Kronecker product and by using sampling. Experiments on large real and synthetic graphs show that KronFit indeed mimics very well the patterns found in the target graphs. Once fitted, the model parameters and the resulting synthetic graphs can be used for anonymization, extrapolations, and graph summarization.
Efficient Aggregation for Graph Summarization
"... Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying characteristics of large graphs, graph summarization techniques are critical. However, existing graph summarization methods are mo ..."
Abstract

Cited by 37 (3 self)
 Add to MetaCart
Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying characteristics of large graphs, graph summarization techniques are critical. However, existing graph summarization methods are mostly statistical (studying statistics such as degree distributions, hopplots and clustering coefficients). These statistical methods are very useful, but the resolutions of the summaries are hard to control. In this paper, we introduce two databasestyle operations to summarize graphs. Like the OLAPstyle aggregation methods that allow users to drilldown or rollup to control the resolution of summarization, our methods provide an analogous functionality for large graph datasets. The first operation, called SNAP, produces a summary graph by grouping nodes based on userselected node attributes and relationships. The second operation, called kSNAP, further allows users to control the resolutions of summaries and provides the “drilldown ” and “rollup ” abilities to navigate through summaries with different resolutions. We propose an efficient algorithm to evaluate the SNAP operation. In addition, we prove that the kSNAP computation is NPcomplete. We propose two heuristic methods to approximate the kSNAP results. Through extensive experiments on a variety of real and synthetic datasets, we demonstrate the effectiveness and efficiency of the proposed methods.
Weighted Graphs and Disconnected Components Patterns and a Generator
"... The vast majority of earlier work has focused on graphs which are both connected (typically by ignoring all but the giant connected component), and unweighted. Here we study numerous, real, weighted graphs, and report surprising discoveries on the way in which new nodes join and form links in a soci ..."
Abstract

Cited by 35 (18 self)
 Add to MetaCart
The vast majority of earlier work has focused on graphs which are both connected (typically by ignoring all but the giant connected component), and unweighted. Here we study numerous, real, weighted graphs, and report surprising discoveries on the way in which new nodes join and form links in a social network. The motivating questions were the following: How do connected components in a graph form and change over time? What happens after new nodes join a network – how common are repeated edges? We study numerous diverse, real graphs (citation networks, networks in social media, internet traffic, and others); and make the following contributions: (a) we observe that the nongiant connected components seem to stabilize in size, (b) we observe the weights on the edges follow several power laws with surprising exponents, and (c) we propose an intuitive, generative model for graph growth that obeys observed patterns.
TopicLink LDA: Joint Models of Topic and Author Community
"... Given a largescale linked document collection, such as a collection of blog posts or a research literature archive, there are two fundamental problems that have generated a lot of interest in the research community. One is to identify a set of highlevel topics covered by the documents in the colle ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
Given a largescale linked document collection, such as a collection of blog posts or a research literature archive, there are two fundamental problems that have generated a lot of interest in the research community. One is to identify a set of highlevel topics covered by the documents in the collection; the other is to uncover and analyze the social network of the authors of the documents. So far these problems have been viewed as separate problems and considered independently from each other. In this paper we argue that these two problems are in fact interdependent and should be addressed together. We develop a Bayesian hierarchical approach that performs topic modeling and author community discovery in one unified framework. The effectiveness of our model is demonstrated on two blog data sets in different domains and one research paper citation data from CiteSeer. 1.
Dynamics of Large Networks
, 2008
"... A basic premise behind the study of large networks is that interaction leads to complex collective behavior. In our work we found very interesting and counterintuitive patterns for time evolving networks, which change some of the basic assumptions that were made in the past. We then develop models ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
A basic premise behind the study of large networks is that interaction leads to complex collective behavior. In our work we found very interesting and counterintuitive patterns for time evolving networks, which change some of the basic assumptions that were made in the past. We then develop models that explain processes which govern the network evolution, fit such models to real networks, and use them to generate realistic graphs or give formal explanations about their properties. In addition, our work has a wide range of applications: it can help us spot anomalous graphs and outliers, forecast future graph structure and run simulations of network evolution. Another important aspect of our research is the study of “local ” patterns and structures of propagation in networks. We aim to identify building blocks of the networks and find the patterns of influence that these blocks have on information or virus propagation over the network. Our recent work included the study of the spread of influence in a large persontoperson
Uncovering Groups via Heterogeneous Interaction Analysis
"... Abstract—With the pervasive availability of Web 2.0 and social networking sites, people can interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment shared content (bookmark, photos, videos), and users can ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
Abstract—With the pervasive availability of Web 2.0 and social networking sites, people can interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment shared content (bookmark, photos, videos), and users can tag their own favorite content. Users can also connect to each other, and subscribe to or become a fan or a follower of others. These diverse individual activities result in a multidimensional network among actors, forming crossdimension group structures with group members sharing certain similarities. It is challenging to effectively integrate the network information of multiple dimensions in order to discover crossdimension group structures. In this work, we propose a twophase strategy to identify the hidden structures shared across dimensions in multidimensional networks. We extract structural features from each dimension of the network via modularity analysis, and then integrate them all to find out a robust community structure among actors. Experiments on synthetic and realworld data validate the superiority of our strategy, enabling the analysis of collective behavior underneath diverse individual activities in a large scale.
Coevolution of social and affiliation networks
 In 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD
, 2009
"... In our work, we address the problem of modeling social network generation which explains both link and group formation. Recent studies on social network evolution propose generative models which capture the statistical properties of realworld networks related only to nodetonode link formation. We ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
In our work, we address the problem of modeling social network generation which explains both link and group formation. Recent studies on social network evolution propose generative models which capture the statistical properties of realworld networks related only to nodetonode link formation. We propose a novel model which captures the coevolution of social and affiliation networks. We provide surprising insights into group formation based on observations in several realworld networks, showing that users often join groups for reasons other than their friends. Our experiments show that the model is able to capture both the newly observed and previously studied network properties. This work is the first to propose a generative model which captures the statistical properties of these complex networks. The proposed model facilitates controlled experiments which study the effect of actors ’ behavior on the network evolution, and it allows the generation of realistic synthetic datasets.