Results 1  10
of
179
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract

Cited by 226 (14 self)
 Add to MetaCart
(Show Context)
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse realworld networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large realworld networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “communitylike.” This behavior is not explained, even at a qualitative level, by any of the commonlyused network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are wellembeddable in a lowdimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
, 2008
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract

Cited by 182 (17 self)
 Add to MetaCart
(Show Context)
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large realworld networks, ranging from traditional and online social networks, to technological and information networks and
Defining and Evaluating Network Communities based on Groundtruth. Extended version
, 2012
"... Abstract—Nodes in realworld networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task mainly due to a plethora of definitions of a community, intractabili ..."
Abstract

Cited by 89 (4 self)
 Add to MetaCart
(Show Context)
Abstract—Nodes in realworld networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task mainly due to a plethora of definitions of a community, intractability of algorithms, issues with evaluation and the lack of a reliable goldstandard groundtruth. In this paper we study a set of 230 large realworld social, collaboration and information networks where nodes explicitly state their group memberships. For example, in social networks nodes explicitly join various interest based social groups. We use such groups to define a reliable and robust notion of groundtruth communities. We then propose a methodology which allows us to compare and quantitatively evaluate how different structural definitions of network communities correspond to groundtruth communities. We choose 13 commonly used structural definitions of network communities and examine their sensitivity, robustness and performance in identifying the groundtruth. We show that the 13 structural definitions are heavily correlated and naturally group into four classes. We find that two of these definitions, Conductance and Triadparticipationratio, consistently give the best performance in identifying groundtruth communities. We also investigate a task of detecting communities given a single seed node. We extend the local spectral clustering algorithm into a heuristic parameterfree community detection method that easily scales to networks with more than hundred million nodes. The proposed method achieves 30 % relative improvement over current local clustering methods. I.
TwiceRamanujan sparsifiers
 IN PROC. 41ST STOC
, 2009
"... We prove that for every d> 1 and every undirected, weighted graph G = (V, E), there exists a weighted graph H with at most ⌈d V  ⌉ edges such that for every x ∈ IR V, 1 ≤ xT LHx x T LGx ≤ d + 1 + 2 √ d d + 1 − 2 √ d, where LG and LH are the Laplacian matrices of G and H, respectively. ..."
Abstract

Cited by 87 (12 self)
 Add to MetaCart
(Show Context)
We prove that for every d> 1 and every undirected, weighted graph G = (V, E), there exists a weighted graph H with at most ⌈d V  ⌉ edges such that for every x ∈ IR V, 1 ≤ xT LHx x T LGx ≤ d + 1 + 2 √ d d + 1 − 2 √ d, where LG and LH are the Laplacian matrices of G and H, respectively.
LowerStretch Spanning Trees
, 2005
"... ... as a subgraph a spanning tree into which the edges of G can be embedded with average stretch exp (O ( √ log n log log n)), and that there exists an nvertex graph G such that all its spanning trees have average stretch Ω(log n). Closing the exponential gap between these upper and lower bounds i ..."
Abstract

Cited by 81 (11 self)
 Add to MetaCart
(Show Context)
... as a subgraph a spanning tree into which the edges of G can be embedded with average stretch exp (O ( √ log n log log n)), and that there exists an nvertex graph G such that all its spanning trees have average stretch Ω(log n). Closing the exponential gap between these upper and lower bounds is listed as one of the longstanding open questions in the area of lowdistortion embeddings of metrics (Matousek 2002). We significantly reduce this gap by constructing a spanning tree in G of average stretch O((log n log log n) 2). Moreover, we show that this tree can be constructed in time O(m log 2 n) in general, and in time O(m log n) if the input graph is unweighted. The main ingredient in our construction is a novel graph decomposition technique. Our new algorithm can be immediately used to improve the running time of the recent solver for diagonally dominant linear systems of Spielman and Teng from to m2 (O( √ log n log log n)) log(1/ɛ) m log O(1) n log(1/ɛ), and to O(n(log n log log n) 2 log(1/ɛ)) when the system is planar. Applying a recent reduction of Boman, Hendrickson and Vavasis, this provides an O(n(log n log log n) 2 log(1/ɛ)) time algorithm for solving the linear systems that arise when applying the finite element method to solve twodimensional elliptic partial differential equations. Our result can also be used to improve several earlier approximation algorithms that use lowstretch spanning trees.
A local clustering algorithm for massive graphs and its application to nearlylinear time graph partitioning, Preprint (2008). Available at http://arxiv.org/abs/0809.3232
"... Abstract. We study the design of local algorithms for massive graphs. A local graph algorithm is one that finds a solution containing or near a given vertex without looking at the whole graph. We present a local clustering algorithm. Our algorithm finds a good cluster—a subset of vertices whose inte ..."
Abstract

Cited by 55 (8 self)
 Add to MetaCart
(Show Context)
Abstract. We study the design of local algorithms for massive graphs. A local graph algorithm is one that finds a solution containing or near a given vertex without looking at the whole graph. We present a local clustering algorithm. Our algorithm finds a good cluster—a subset of vertices whose internal connections are significantly richer than its external connections—near a given vertex. The running time of our algorithm, when it finds a nonempty local cluster, is nearly linear in the size of the cluster it outputs. The running time of our algorithm also depends polylogarithmically on the size of the graph and polynomially on the conductance of the cluster it produces. Our clustering algorithm could be a useful primitive for handling massive graphs, such as social networks and webgraphs. As an application of this clustering algorithm, we present a partitioning algorithm that finds an approximate sparsest cut with nearly optimal balance. Our algorithm takes time nearly linear in the number edges of the graph. Using the partitioning algorithm of this paper, we have designed a nearly linear time algorithm for constructing spectral sparsifiers of graphs, which we in turn use in a nearly linear time algorithm for solving linear systems in symmetric, diagonally dominant matrices. The linear system solver also leads to a nearly linear time algorithm for approximating the secondsmallest eigenvalue and corresponding eigenvector of the Laplacian matrix of a graph. These other results are presented in two companion papers.
Locally adapted hierarchical basis preconditioning
 ACM Transactions on Graphics
, 2006
"... This paper develops locally adapted hierarchical basis functions for effectively preconditioning large optimization problems that arise in computer vision, computer graphics, and computational photography applications such as surface interpolation, optic flow, tone mapping, gradientdomain blending ..."
Abstract

Cited by 45 (6 self)
 Add to MetaCart
(Show Context)
This paper develops locally adapted hierarchical basis functions for effectively preconditioning large optimization problems that arise in computer vision, computer graphics, and computational photography applications such as surface interpolation, optic flow, tone mapping, gradientdomain blending, and colorization. By looking at the local structure of the coefficient matrix and performing a recursive set of variable eliminations, combined with a simplification of the resulting coarse level problems, we obtain bases better suited for problems with inhomogeneous (spatially varying) data, smoothness, and boundary constraints. Our approach removes the need to heuristically adjust the optimal number of preconditioning levels, significantly outperforms previous approaches, and also maps cleanly onto dataparallel architectures such as modern GPUs. [ Errata in (8) and (9) fixed October, 2007]
Approaching optimality for solving SDD linear systems
, 2010
"... We present an algorithm that on input a graph G with n vertices and m + n − 1 edges and a value k, produces an incremental sparsifier ˆ G with n − 1+m/k edges, such that the condition number of G with ˆ G is bounded above by Õ(k log2 n), with probability 1 − p. The algorithm runs in time Õ((m log n ..."
Abstract

Cited by 43 (7 self)
 Add to MetaCart
(Show Context)
We present an algorithm that on input a graph G with n vertices and m + n − 1 edges and a value k, produces an incremental sparsifier ˆ G with n − 1+m/k edges, such that the condition number of G with ˆ G is bounded above by Õ(k log2 n), with probability 1 − p. The algorithm runs in time Õ((m log n + n log 2 n) log(1/p)). 1 As a result, we obtain an algorithm that on input an n × n symmetric diagonally dominant matrix A with m + n − 1 nonzero entries and a vector b, computes a vector ¯x satisfying x − A + bA <ɛA + bA, in time Õ(m log 2 n log(1/ɛ)). The solver is based on a recursive application of the incremental sparsifier that produces a hierarchy of graphs which is then used to construct a recursive preconditioned Chebyshev iteration.