Results 1  10
of
15
Patric: A parallel algorithm for counting triangles and computing clustering coefficients in massive networks
, 2012
"... We present MPIbased parallel algorithms for counting triangles and computing clustering coefficients in massive networks. � A triangle in a graph G(V, E) is a set of three nodes u, v, w ∊V such that there is an edge between each pair of nodes. The number of triangles incident on node v, with adjace ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
We present MPIbased parallel algorithms for counting triangles and computing clustering coefficients in massive networks. � A triangle in a graph G(V, E) is a set of three nodes u, v, w ∊V such that there is an edge between each pair of nodes. The number of triangles incident on node v, with adjacency list N(v), is defined as, �  { ( u, w) � E  u, w � N ( v)} Counting triangles is important in the analysis of various networks, e.g., social, biological, web etc. Emerging massive networks do not fit in the main memory of a single machine and are very challenging to work with. Our distributedmemory parallel algorithm allows us to deal with such massive networks in a time and spaceefficient manner. We were able to count triangles in a graph with 2 billions of nodes and 50 billions of edges in 10 minutes. � The clustering coefficient (CC) of a node v ∊V with degree dv is defined as,
COUNTING TRIANGLES IN MASSIVE GRAPHS WITH MAPREDUCE
, 2013
"... Graphs and networks are used to model interactions in a variety of contexts. There is a growing need to quickly assess the characteristics of a graph in order to understand its underlying structure. Some of the most useful metrics are trianglebased and give a measure of the connectedness of mutual ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
Graphs and networks are used to model interactions in a variety of contexts. There is a growing need to quickly assess the characteristics of a graph in order to understand its underlying structure. Some of the most useful metrics are trianglebased and give a measure of the connectedness of mutual friends. This is often summarized in terms of clustering coefficients, which measure the likelihood that two neighbors of a node are themselves connected. Computing these measures exactly for largescale networks is prohibitively expensive in both memory and time. However, a recent wedge sampling algorithm has proved successful in efficiently and accurately estimating clustering coefficients. In this paper, we describe how to implement this approach in MapReduce to deal with extremely massive graphs. We show results on publiclyavailable networks, the largest of which is 132M nodes and 4.7B edges, as well as artificially generated networks (using the Graph500 benchmark), the largest of which has 240M nodes and 8.5B edges. We can estimate the clustering coefficient by degree bin (e.g., we use exponential binning) and the number of triangles per bin, as well as the global clustering coefficient and total number of triangles, in an average of 0.33 sec. per million edges plus overhead (approximately 225 sec. total for our configuration). The technique can also be used to study triangle statistics such as the ratio of the highest and lowest degree, and we highlight differences between social and nonsocial networks. To the best of our knowledge, these are the largest trianglebased graph computations published to date.
Scaling Techniques for Massive ScaleFree Graphs in Distributed (External) Memory
"... Abstract—We present techniques to process large scalefree graphs in distributed memory. Our aim is to scale to trillions of edges, and our research is targeted at leadership class supercomputers and clusters with local nonvolatile memory, e.g., NAND Flash. We apply an edge list partitioning techni ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We present techniques to process large scalefree graphs in distributed memory. Our aim is to scale to trillions of edges, and our research is targeted at leadership class supercomputers and clusters with local nonvolatile memory, e.g., NAND Flash. We apply an edge list partitioning technique, designed to accommodate highdegree vertices (hubs) that create scaling challenges when processing scalefree graphs. In addition to partitioning hubs, we use ghost vertices to represent the hubs to reduce communication hotspots. We present a scaling study with three important graph algorithms: BreadthFirst Search (BFS), KCore decomposition, and Triangle Counting. We also demonstrate scalability on BG/P Intrepid by comparing to best known Graph500 results [1]. We show results on two clusters with local NVRAM storage that are capable of traversing trillionedge scalefree graphs. By leveraging nodelocal NAND Flash, our approach can process thirtytwo times larger datasets with only a 39 % performance degradation in Traversed Edges Per Second (TEPS). Keywordsparallel algorithms; graph algorithms; big data; distributed computing. I.
Graph Sample and Hold: A Framework for BigGraph Analytics
"... Sampling is a standard approach in biggraph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in c ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Sampling is a standard approach in biggraph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs (e.g. web graphs, social networks), where an underlying network connects the units of the population. Therefore, a good sample will be representative in the sense that graph properties of interest can be estimated with a known degree of accuracy. While previous work focused particularly on sampling schemes to estimate certain graph properties (e.g. triangle count), much less is known for the case when we need to estimate various graph properties with the same sampling scheme. In this paper, we propose a generic stream sampling framework for biggraph analytics,
Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions
"... Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, kdensest subgraph) are NPhard. Furthermore, the goal is ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, kdensest subgraph) are NPhard. Furthermore, the goal is rarely to find the “true optimum”, but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine relationships among them. Current dense subgraph finding algorithms usually optimize some objective, and only find a few such subgraphs without providing any structural relations. We define the nucleus decomposition of a graph, which represents the graph as a forest of nuclei. Each nucleus is a subgraph where smaller cliques are present in many larger cliques. The forest of nuclei is a hierarchy by containment, where the edge density increases as we proceed towards leaf nuclei. Sibling nuclei can have limited intersections, which enables discovering overlapping dense subgraphs. With the right parameters, the nucleus decomposition generalizes the classic notions of kcores and ktruss decompositions. We give provably efficient algorithms for nucleus decompositions, and empirically evaluate their behavior in a variety of real graphs. The tree of nuclei consistently gives a global, hierarchical snapshot of dense substructures, and outputs dense subgraphs of higher quality than other stateoftheart solutions. Our algorithm can process graphs with tens of millions of edges in less than an hour. ∗Work done while the author was interning at Sandia National Laboratories, Livermore, CA.
Wedge sampling for computing clustering coefficients and triangle counts on large graphs
 Statistical Analysis and Data Mining
, 2014
"... Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Algorithms to compute them can be extremely expensive, ev ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Algorithms to compute them can be extremely expensive, even for moderatelysized graphs with only millions of edges. Previous work has considered node and edge sampling; in contrast, we consider wedge sampling, which provides faster and more accurate approximations than competing techniques. Additionally, wedge sampling enables estimation local clustering coefficients, degreewise clustering coefficients, uniform triangle sampling, and directed triangle counts. Our methods come with provable and practical probabilistic error estimates for all computations. We provide extensive results that show our methods are both more accurate and faster than stateoftheart alternatives. 1
Multicore Triangle Computations Without Tuning
"... Abstract—Triangle counting and enumeration has emerged as a basic tool in largescale network analysis, fueling the development of algorithms that scale to massive graphs. Most of the existing algorithms, however, are designed for the distributedmemory setting or the externalmemory setting, and ca ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Triangle counting and enumeration has emerged as a basic tool in largescale network analysis, fueling the development of algorithms that scale to massive graphs. Most of the existing algorithms, however, are designed for the distributedmemory setting or the externalmemory setting, and cannot take full advantage of a multicore machine, whose capacity has grown to accommodate even the largest of realworld graphs. This paper describes the design and implementation of simple and fast multicore parallel algorithms for exact, as well as approximate, triangle counting and other triangle computations that scale to billions of nodes and edges. Our algorithms are provably cachefriendly, easy to implement in a language that supports dynamic parallelism, such as Cilk Plus or OpenMP, and do not require parameter tuning. On a 40core machine with twoway hyperthreading, our parallel exact global and local triangle counting algorithms obtain speedups of 17–50x on a set of realworld and synthetic graphs, and are faster than previous parallel exact triangle counting algorithms. We can compute the exact triangle count of the Yahoo Web graph (over 6 billion edges) in under 1.5 minutes. In addition, for approximate triangle counting, we are able to approximate the count for the Yahoo graph to within 99.6 % accuracy in under 10 seconds, and for a given accuracy we are much faster than existing parallel approximate triangle counting implementations. I.
A Spaceefficient Parallel Algorithm for Counting Exact Triangles in Massive Networks
"... Abstract—Finding the number of triangles in a network (graph) is an important problem in mining and analysis of complex networks. Massive networks emerging from numerous application areas pose a significant challenge in network analytics since these networks consist of millions, or even billions, of ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Finding the number of triangles in a network (graph) is an important problem in mining and analysis of complex networks. Massive networks emerging from numerous application areas pose a significant challenge in network analytics since these networks consist of millions, or even billions, of nodes and edges. Such massive networks necessitate the development of efficient parallel algorithms. There exist several MapReduce and an only MPI (Message Passing Interface) based distributedmemory parallel algorithms for counting triangles. MapReduce based algorithms generate prohibitively large intermediate data. The MPI based algorithm can work on quite large networks, however, the overlapping partitions employed by the algorithm limit its capability to deal with very massive networks. In this paper, we present a spaceefficient MPI based parallel algorithm for counting exact number of triangles in massive networks. The algorithm divides the network into nonoverlapping partitions. Our results demonstrate up to 25fold space saving over the algorithm with overlapping partitions. This space efficiency allows the algorithm to deal with networks which are 25 times larger. We present a novel approach that reduces communication cost drastically (up to 90%) leading to both a space and runtimeefficient algorithm. Our adaptation of a parallel partitioning scheme by computing a novel weight function adds further to the efficiency of the algorithm. Denoting average degree of nodes and the number of partitions by d ̄ and P, respectively, our algorithm achieves up to O(P 2)factor space efficiency over existing MapReduce based algorithms and up to O(d̄)factor over the algorithm with overlapping partitioning. Keywordscounting triangles, parallel algorithms, massive networks, social networks, graph mining, space efficiency. I.
Tracking Triadic Cardinality Distributions for Burst Detection in Social Activity Streams
"... Abstract—In online social networks (OSNs), we often observe abnormally frequent interactions among people before or during some important day, e.g., we receive/send more greetings from/to friends on Christmas Day than usual. We also often observe some viral videos suddenly become worldwide popular t ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—In online social networks (OSNs), we often observe abnormally frequent interactions among people before or during some important day, e.g., we receive/send more greetings from/to friends on Christmas Day than usual. We also often observe some viral videos suddenly become worldwide popular through one night diffusion in OSNs. Do these seemingly different phenomena share common structure? All these phenomena are related to sudden surges of user activities in OSNs, and are referred to as bursts in this work. We find that the emergence of a burst is accompanied with the formation of new triangles in networks. This finding provokes a new method for detecting bursts in OSNs. We first introduce a new measure, named triadic cardinality distribution, which measures the fraction of nodes with certain number of triangles, i.e., triadic cardinality, in a network. The distribution will change when burst occurs, and is congenitally immunized against spamming social bots. Hence, by tracking triadic cardinality distributions, we are able to detect bursts in OSNs. To relieve the burden of handling huge activity data generated by OSN users, we then develop a delicately designed sampleestimate solution to estimate triadic cardinality distribution efficiently from sampled data. Extensive experiments conducted on real data demonstrate the usefulness of this triadic cardinality distribution and effectiveness of our sampleestimate solution. I.