Results 1  10
of
125
Graph evolution: Densification and shrinking diameters
 ACM TKDD
, 2007
"... How do real graphs evolve over time? What are “normal” growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include hea ..."
Abstract

Cited by 263 (16 self)
 Add to MetaCart
(Show Context)
How do real graphs evolve over time? What are “normal” growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include heavy tails for in and outdegree distributions, communities, smallworld phenomena, and others. However, given the lack of information about network evolution over long periods, it has been hard to convert these findings into statements about trends over time. Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time, with the number of edges growing superlinearly in the number of nodes. Second, the average distance between nodes often shrinks over time, in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O(log n) or O(log(log n)). Existing graph generation models do not exhibit these types of behavior, even at a qualitative level. We provide a new graph generator, based on a “forest fire” spreading process, that has a simple, intuitive justification, requires very few parameters (like the “flammability ” of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study. We also notice that the “forest fire” model exhibits a sharp transition between sparse graphs and graphs that are densifying. Graphs with decreasing distance between the nodes are generated around this transition point. Last, we analyze the connection between the temporal evolution of the degree distribution and densification of a graph. We find that the two are fundamentally related. We also observe that real networks exhibit this type of r
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract

Cited by 242 (14 self)
 Add to MetaCart
(Show Context)
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse realworld networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large realworld networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “communitylike.” This behavior is not explained, even at a qualitative level, by any of the commonlyused network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are wellembeddable in a lowdimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
, 2008
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract

Cited by 198 (17 self)
 Add to MetaCart
(Show Context)
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large realworld networks, ranging from traditional and online social networks, to technological and information networks and
Efficient Aggregation for Graph Summarization
"... Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying characteristics of large graphs, graph summarization techniques are critical. However, existing graph summarization methods are mo ..."
Abstract

Cited by 81 (5 self)
 Add to MetaCart
(Show Context)
Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying characteristics of large graphs, graph summarization techniques are critical. However, existing graph summarization methods are mostly statistical (studying statistics such as degree distributions, hopplots and clustering coefficients). These statistical methods are very useful, but the resolutions of the summaries are hard to control. In this paper, we introduce two databasestyle operations to summarize graphs. Like the OLAPstyle aggregation methods that allow users to drilldown or rollup to control the resolution of summarization, our methods provide an analogous functionality for large graph datasets. The first operation, called SNAP, produces a summary graph by grouping nodes based on userselected node attributes and relationships. The second operation, called kSNAP, further allows users to control the resolutions of summaries and provides the “drilldown ” and “rollup ” abilities to navigate through summaries with different resolutions. We propose an efficient algorithm to evaluate the SNAP operation. In addition, we prove that the kSNAP computation is NPcomplete. We propose two heuristic methods to approximate the kSNAP results. Through extensive experiments on a variety of real and synthetic datasets, we demonstrate the effectiveness and efficiency of the proposed methods.
Scalable modeling of real graphs using Kronecker multiplication
 IN 24TH ICML
, 2007
"... Given a large, real graph, how can we generate a synthetic graph that matches its properties, i.e., it has similar degree distribution, similar (small) diameter, similar spectrum, etc? We propose to use “Kronecker graphs”, which naturally obey all of the above properties, and we present KronFit, a f ..."
Abstract

Cited by 73 (9 self)
 Add to MetaCart
Given a large, real graph, how can we generate a synthetic graph that matches its properties, i.e., it has similar degree distribution, similar (small) diameter, similar spectrum, etc? We propose to use “Kronecker graphs”, which naturally obey all of the above properties, and we present KronFit, a fast and scalable algorithm for fitting the Kronecker graph generation model to real networks. A naive approach to fitting would take superexponential time. In contrast, KronFit takes linear time, by exploiting the structure of Kronecker product and by using sampling. Experiments on large real and synthetic graphs show that KronFit indeed mimics very well the patterns found in the target graphs. Once fitted, the model parameters and the resulting synthetic graphs can be used for anonymization, extrapolations, and graph summarization.
TopicLink LDA: Joint Models of Topic and Author Community
, 2009
"... Given a largescale linked document collection, such as a collection of blog posts or a research literature archive, there are two fundamental problems that have generated a lot of interest in the research community. One is to identify a set of highlevel topics covered by the documents in the colle ..."
Abstract

Cited by 53 (1 self)
 Add to MetaCart
(Show Context)
Given a largescale linked document collection, such as a collection of blog posts or a research literature archive, there are two fundamental problems that have generated a lot of interest in the research community. One is to identify a set of highlevel topics covered by the documents in the collection; the other is to uncover and analyze the social network of the authors of the documents. So far these problems have been viewed as separate problems and considered independently from each other. In this paper we argue that these two problems are in fact interdependent and should be addressed together. We develop a Bayesian hierarchical approach that performs topic modeling and author community discovery in one unified framework. The effectiveness of our model is demonstrated on two blog data sets in different domains and one research paper citation data from CiteSeer.
Scalable Graph Clustering Using Stochastic Flows: Applications to Community Discovery
"... Algorithms based on simulating stochastic flows are a simple and natural solution for the problem of clustering graphs, but their widespread use has been hampered by their lack of scalability and fragmentation of output. In this article we present a multilevel algorithm for graph clustering using ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
Algorithms based on simulating stochastic flows are a simple and natural solution for the problem of clustering graphs, but their widespread use has been hampered by their lack of scalability and fragmentation of output. In this article we present a multilevel algorithm for graph clustering using flows that delivers significant improvements in both quality and speed. The graph is first successively coarsened to a manageable size, and a small number of iterations of flow simulation is performed on the coarse graph. The graph is then successively refined, with flows from the previous graph used as initializations for brief flow simulations on each of the intermediate graphs. When we reach the final refined graph, the algorithm is run to convergence and the highflow regions are clustered together, with regions without any flow forming the natural boundaries of the clusters. Extensive experimental results on several real and synthetic datasets demonstrate the effectiveness of our approach when compared to stateoftheart algorithms.
Weighted Graphs and Disconnected Components Patterns and a Generator
"... The vast majority of earlier work has focused on graphs which are both connected (typically by ignoring all but the giant connected component), and unweighted. Here we study numerous, real, weighted graphs, and report surprising discoveries on the way in which new nodes join and form links in a soci ..."
Abstract

Cited by 46 (20 self)
 Add to MetaCart
(Show Context)
The vast majority of earlier work has focused on graphs which are both connected (typically by ignoring all but the giant connected component), and unweighted. Here we study numerous, real, weighted graphs, and report surprising discoveries on the way in which new nodes join and form links in a social network. The motivating questions were the following: How do connected components in a graph form and change over time? What happens after new nodes join a network – how common are repeated edges? We study numerous diverse, real graphs (citation networks, networks in social media, internet traffic, and others); and make the following contributions: (a) we observe that the nongiant connected components seem to stabilize in size, (b) we observe the weights on the edges follow several power laws with surprising exponents, and (c) we propose an intuitive, generative model for graph growth that obeys observed patterns.
Coevolution of social and affiliation networks
 In 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD
, 2009
"... In our work, we address the problem of modeling social network generation which explains both link and group formation. Recent studies on social network evolution propose generative models which capture the statistical properties of realworld networks related only to nodetonode link formation. We ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
(Show Context)
In our work, we address the problem of modeling social network generation which explains both link and group formation. Recent studies on social network evolution propose generative models which capture the statistical properties of realworld networks related only to nodetonode link formation. We propose a novel model which captures the coevolution of social and affiliation networks. We provide surprising insights into group formation based on observations in several realworld networks, showing that users often join groups for reasons other than their friends. Our experiments show that the model is able to capture both the newly observed and previously studied network properties. This work is the first to propose a generative model which captures the statistical properties of these complex networks. The proposed model facilitates controlled experiments which study the effect of actors ’ behavior on the network evolution, and it allows the generation of realistic synthetic datasets.