Results 1  10
of
242
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract

Cited by 134 (10 self)
 Add to MetaCart
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse realworld networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large realworld networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “communitylike.” This behavior is not explained, even at a qualitative level, by any of the commonlyused network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are wellembeddable in a lowdimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
The Unique Games Conjecture, integrality gap for cut problems and embeddability of negative type metrics into `1
 In Proc. 46th IEEE Symp. on Foundations of Comp. Sci
, 2005
"... In this paper we disprove the following conjecture due to Goemans [17] and Linial [25] (also see [5, 27]): “Every negative type metric embeds into `1 with constant distortion. ” We show that for every δ> 0, and for large enough n, there is an npoint negative type metric which requires distortion ..."
Abstract

Cited by 127 (11 self)
 Add to MetaCart
In this paper we disprove the following conjecture due to Goemans [17] and Linial [25] (also see [5, 27]): “Every negative type metric embeds into `1 with constant distortion. ” We show that for every δ> 0, and for large enough n, there is an npoint negative type metric which requires distortion atleast (log log n)1/6−δ to embed into `1. Surprisingly, our construction is inspired by the Unique Games Conjecture (UGC) of Khot [20], establishing a previously unsuspected connection between PCPs and the theory of metric embeddings. We first prove that the UGC implies superconstant hardness results for (nonuniform) Sparsest Cut and Minimum Uncut problems. It is already known that the UGC also implies an optimal hardness result for Maximum Cut [21]. Though these hardness results rely on the UGC, we demonstrate, nevertheless, that the corresponding PCP reductions can be used to construct “integrality gap instances ” for the respective problems. Towards this, we first construct an integrality gap instance for a natural SDP relaxation of Unique Games. Then, we “simulate ” the PCP reduction, and “translate ” the integrality gap instance of Unique Games to integrality gap instances for the respective cut problems! This enables us to prove
Euclidean distortion and the Sparsest Cut
 In Proceedings of the 37th Annual ACM Symposium on Theory of Computing
, 2005
"... BiLipschitz embeddings of finite metric spaces, a topic originally studied in geometric analysis and Banach space theory, became an integral part of theoretical computer science following work of Linial, London, and Rabinovich [29]. They presented an algorithmic version of a result of Bourgain [8] ..."
Abstract

Cited by 92 (21 self)
 Add to MetaCart
BiLipschitz embeddings of finite metric spaces, a topic originally studied in geometric analysis and Banach space theory, became an integral part of theoretical computer science following work of Linial, London, and Rabinovich [29]. They presented an algorithmic version of a result of Bourgain [8] which shows that every
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
, 2008
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract

Cited by 85 (7 self)
 Add to MetaCart
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large realworld networks, ranging from traditional and online social networks, to technological and information networks and
Measured descent: A new embedding method for finite metrics
 In Proc. 45th FOCS
, 2004
"... We devise a new embedding technique, which we call measured descent, based on decomposing a metric space locally, at varying speeds, according to the density of some probability measure. This provides a refined and unified framework for the two primary methods of constructing Fréchet embeddings for ..."
Abstract

Cited by 84 (28 self)
 Add to MetaCart
We devise a new embedding technique, which we call measured descent, based on decomposing a metric space locally, at varying speeds, according to the density of some probability measure. This provides a refined and unified framework for the two primary methods of constructing Fréchet embeddings for finite metrics, due to [Bourgain, 1985] and [Rao, 1999]. We prove that any npoint metric space (X, d) embeds in Hilbert space with distortion O ( √ αX · log n), where αX is a geometric estimate on the decomposability of X. As an immediate corollary, we obtain an O ( √ (log λX)log n) distortion embedding, where λX is the doubling constant of X. Since λX ≤ n, this result recovers Bourgain’s theorem, but when the metric X is, in a sense, “lowdimensional, ” improved bounds are achieved. Our embeddings are volumerespecting for subsets of arbitrary size. One consequence is the existence of (k, O(log n)) volumerespecting embeddings for all 1 ≤ k ≤ n, which is the best possible, and answers positively a question posed by U. Feige. Our techniques are also used to answer positively a question of Y. Rabinovich, showing that any weighted npoint planar graph O(log n) embeds in ℓ∞ with O(1) distortion. The O(log n) bound on the dimension is optimal, and improves upon the previously known bound of O((log n) 2). 1
On the Hardness of Approximating Multicut and SparsestCut
 In Proceedings of the 20th Annual IEEE Conference on Computational Complexity
, 2005
"... We show that the MULTICUT, SPARSESTCUT, and MIN2CNF ≡ DELETION problems are NPhard to approximate within every constant factor, assuming the Unique Games Conjecture of Khot [STOC, 2002]. A quantitatively stronger version of the conjecture implies inapproximability factor of Ω(log log n). 1. ..."
Abstract

Cited by 75 (4 self)
 Add to MetaCart
We show that the MULTICUT, SPARSESTCUT, and MIN2CNF ≡ DELETION problems are NPhard to approximate within every constant factor, assuming the Unique Games Conjecture of Khot [STOC, 2002]. A quantitatively stronger version of the conjecture implies inapproximability factor of Ω(log log n). 1.
Empirical comparison of algorithms for network community detection
 In Proc. WWW’10
, 2010
"... Detecting clusters or communities in large realworld graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity ..."
Abstract

Cited by 71 (4 self)
 Add to MetaCart
Detecting clusters or communities in large realworld graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity than external connectivity, and then one applies approximation algorithms or heuristics to extract sets of nodes that are related to the objective function and that “look like” good communities for the application of interest. In this paper, we explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify. We evaluate several common objective functions that are used to formalize the notion of a network community, and we examine several different classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objective and asking for an approximation to the best cluster of any size, we consider a sizeresolved version of the optimization problem. Considering community quality as a function of its size provides a much finer lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have nonobvious sizedependent behavior.
Can ISPs and P2P users cooperate for improved performance
 ACM SIGCOMM Computer Communication Review
, 2007
"... This paper addresses the antagonistic relationship between overlay/p2p networks and IPS providers: they both try to manage and control traffic at different level and with different goals, but in a way that inevitably leads to overlapping, duplicated, and conflicting behavior. The creation of a p2p n ..."
Abstract

Cited by 71 (3 self)
 Add to MetaCart
This paper addresses the antagonistic relationship between overlay/p2p networks and IPS providers: they both try to manage and control traffic at different level and with different goals, but in a way that inevitably leads to overlapping, duplicated, and conflicting behavior. The creation of a p2p network and the routing at the p2p layer are ultimately treading on the routing functions of ISPs. The paper proposes a solution to develop a synergistic relationship between p2p and ISPs: ISPs maintain an “oracle ” to help p2p networks in making better choices in picking neighboring nodes. The solution provides benefits to both parties. ISPs become able to influence the p2p decisions, and ultimately the amount of traffic that flows in and out of their network, while p2p networks get performance information for “free. ” The reviewers find that the problem is important and the solution is interesting and shows promise. An advantage of the method is that ISPs do not run into legal issues, since they do not engage in caching of potentially illegal content, they just provide performance information. a c m s i g c o m m Public review written by