Results 1 
5 of
5
A scalable generative graph model with community structure
, 2014
"... Abstract. Network data is ubiquitous and growing, yet we lack realistic generative network models that can be calibrated to match realworld data. The recently proposed Block TwoLevel ErdősRényi (BTER) model can be tuned to capture two fundamental properties: degree distribution and clustering ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Network data is ubiquitous and growing, yet we lack realistic generative network models that can be calibrated to match realworld data. The recently proposed Block TwoLevel ErdősRényi (BTER) model can be tuned to capture two fundamental properties: degree distribution and clustering coefficients. The latter is particularly important for reproducing graphs with community structure, such as social networks. In this paper, we compare BTER to other scalable models and show that it gives a better fit to real data. We provide a scalable implementation that requires only O(dmax) storage where dmax is the maximum number of neighbors for a single node. The generator is trivially parallelizable, and we show results for a Hadoop MapReduce implementation for a modeling a realworld web graph with over 4.6 billion edges. We propose that the BTER model can be used as a graph generator for benchmarking purposes and provide idealized degree distributions and clustering coefficient profiles that can be tuned for user specifications. Key words. graph generator, network data, block twolevel ErdősRényi (BTER) model, largescale graph benchmarks 1. Introduction. Network
Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions
"... Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, kdensest subgraph) are NPhard. Furthermore, the goal is ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, kdensest subgraph) are NPhard. Furthermore, the goal is rarely to find the “true optimum”, but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine relationships among them. Current dense subgraph finding algorithms usually optimize some objective, and only find a few such subgraphs without providing any structural relations. We define the nucleus decomposition of a graph, which represents the graph as a forest of nuclei. Each nucleus is a subgraph where smaller cliques are present in many larger cliques. The forest of nuclei is a hierarchy by containment, where the edge density increases as we proceed towards leaf nuclei. Sibling nuclei can have limited intersections, which enables discovering overlapping dense subgraphs. With the right parameters, the nucleus decomposition generalizes the classic notions of kcores and ktruss decompositions. We give provably efficient algorithms for nucleus decompositions, and empirically evaluate their behavior in a variety of real graphs. The tree of nuclei consistently gives a global, hierarchical snapshot of dense substructures, and outputs dense subgraphs of higher quality than other stateoftheart solutions. Our algorithm can process graphs with tens of millions of edges in less than an hour. ∗Work done while the author was interning at Sandia National Laboratories, Livermore, CA.
CS167: Reading in Algorithms TriangleDense Graphs∗
, 2014
"... TriangleDense Graphs ” [1]. The motivation of the paper is to develop a theory of algorithms for social networks, like the graphs derived from Facebook or Twitter data mentioned in the first lecture. That is, the goal is to develop graph algorithms that work well on social networks, if not on worst ..."
Abstract
 Add to MetaCart
(Show Context)
TriangleDense Graphs ” [1]. The motivation of the paper is to develop a theory of algorithms for social networks, like the graphs derived from Facebook or Twitter data mentioned in the first lecture. That is, the goal is to develop graph algorithms that work well on social networks, if not on worstcase graphs. What’s special about social networks? In the first lecture we mentioned three common properties: they are generally big, sparse, and have a skewed degree distribution. This paper discusses a different property: large triangle density. Intuitively, this is similar to having a large average clustering coefficient. Formally, the triangle density of a graph is the fraction of “filled in ” 2hop paths (aka “wedges”), which can also be written as 3 · number of triangles number of wedges Note the factor of “3 ” comes in because each triangle of a graph spawns three distinct wedges. Every graph has a triangle density between 0 and 1. An acyclic graph has triangle density 0; so does a cycle of length at least 4. A triangle has a triangle density of 1, as does a clique, as does a disjoint union of cliques. A little thought shows the converse is also true: a graph has triangle density 1 only if it is the disjoint union of cliques. The paper studies graphs with “large ” triangle density, with the motivation that large social networks tend to have this property (among others). For example, let’s compare the Facebook graph with a random graph. By a “random graph, ” we mean the following (called an “ErdősRényi graph”): for parameters n and p ∈ [0, 1], form a graph with n vertices by including each of the
gSparsify: Graph Motif Based Sparsification for Graph Clustering
"... Graph clustering is a fundamental problem that partitions vertices of a graph into clusters with an objective to optimize the intuitive notions of intracluster density and intercluster sparsity. In many realworld applications, however, the sheer sizes and inherent complexity of graphs may render ..."
Abstract
 Add to MetaCart
(Show Context)
Graph clustering is a fundamental problem that partitions vertices of a graph into clusters with an objective to optimize the intuitive notions of intracluster density and intercluster sparsity. In many realworld applications, however, the sheer sizes and inherent complexity of graphs may render existing graph clustering methods inefficient or incapable of yielding quality graph clusters. In this paper, we propose gSparsify, a graph sparsification method, to preferentially retain a small subset of edges from a graph which are more likely to be within clusters, while eliminating others with less or no structure correlation to clusters. The resultant simplified graph is succinct in size with core cluster structures well preserved, thus enabling faster graph clustering without a compromise to clustering quality. We consider a quantitative approach to modeling the evidence that edges within densely knitted clusters are frequently involved in smallsize graph motifs, which are adopted as prime features to differentiate edges with varied cluster significance. Pathbased indexes and pathjoin algorithms are further designed to compute graphmotif based cluster significance of edges for graph sparsification. We perform experimental studies in realworld graphs, and results demonstrate that gSparsify can bring significant speedup to existing graph clustering methods with an improvement to graph clustering quality.
A Novel Approach to Finding NearCliques: The TriangleDensest Subgraph Problem
, 2014
"... Many graph mining applications rely on detecting subgraphs which are large nearcliques. There exists a dichotomy between the results in the existing work related to this problem: on the one hand formulations that are geared towards finding large nearcliques are NPhard and frequently inapproximabl ..."
Abstract
 Add to MetaCart
(Show Context)
Many graph mining applications rely on detecting subgraphs which are large nearcliques. There exists a dichotomy between the results in the existing work related to this problem: on the one hand formulations that are geared towards finding large nearcliques are NPhard and frequently inapproximable due to connections with the Maximum Clique problem. On the other hand, the densest subgraph problem (DSProblem) which maximizes the average degree over all subgraphs and other indirect approaches which optimize tractable objectives fail to detect large nearcliques in many networks. In this work, we propose a formulation which combines the best of both worlds: it is solvable in polynomial time and succeeds consistently in finding large nearcliques. Surprisingly, our formulation is a simple variation of the DSProblem. Specifically, we define the triangle densest subgraph problem (TDSProblem): given a graph G(V,E), find a subset of vertices S ∗ such that τ(S∗) = max S⊆V t(S) S  , where t(S) is the number of triangles induced by the set S. We provide various exact and approximation algorithms which the solve TDSProblem efficiently. Furthermore, we show how our algorithms adapt to the more general problem of maximizing the kclique average density, k ≥ 2. We illustrate the success of the proposed formulation in extracting large nearcliques from graphs by performing numerous experiments on realworld networks. 1