Results 1 
8 of
8
Doulion: Counting Triangles in Massive Graphs with a Coin
 PROCEEDINGS OF ACM KDD,
, 2009
"... Counting the number of triangles in a graph is a beautiful algorithmic problem which has gained importance over the last years due to its significant role in complex network analysis. Metrics frequently computed such as the clustering coefficient and the transitivity ratio involve the execution of a ..."
Abstract

Cited by 53 (16 self)
 Add to MetaCart
Counting the number of triangles in a graph is a beautiful algorithmic problem which has gained importance over the last years due to its significant role in complex network analysis. Metrics frequently computed such as the clustering coefficient and the transitivity ratio involve the execution of a triangle counting algorithm. Furthermore, several interesting graph mining applications rely on computing the number of triangles in the graph of interest. In this paper, we focus on the problem of counting triangles in a graph. We propose a practical method, out of which all triangle counting algorithms can potentially benefit. Using a straightforward triangle counting algorithm as a black box, we performed 166 experiments on realworld networks and on synthetic datasets as well, where we show that our method works with high accuracy, typically more than 99 % and gives significant speedups, resulting in even ≈ 130 times faster performance.
Triadic Measures on Graphs: The Power of Wedge Sampling
, 2012
"... Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of a graph. Some of the most useful graph metrics, especially those measuring social cohesion, are based on triangles. Despite the importance of these triadic measures, associa ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of a graph. Some of the most useful graph metrics, especially those measuring social cohesion, are based on triangles. Despite the importance of these triadic measures, associated algorithms can be extremely expensive. We propose a new method based on wedge sampling. This versatile technique allows for the fast and accurate approximation of all current variants of clustering coefficients and enables rapid uniform sampling of the triangles of a graph. Our methods come with provable and practical timeapproximation tradeoffs for all computations. We provide extensive results that show our methods are orders of magnitude faster than the stateoftheart, while providing nearly the accuracy of full enumeration. Our results will enable more widescale adoption of triadic measures for analysis of extremely large graphs, as demonstrated on several realworld examples.
COUNTING TRIANGLES IN MASSIVE GRAPHS WITH MAPREDUCE
, 2013
"... Graphs and networks are used to model interactions in a variety of contexts. There is a growing need to quickly assess the characteristics of a graph in order to understand its underlying structure. Some of the most useful metrics are trianglebased and give a measure of the connectedness of mutual ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
Graphs and networks are used to model interactions in a variety of contexts. There is a growing need to quickly assess the characteristics of a graph in order to understand its underlying structure. Some of the most useful metrics are trianglebased and give a measure of the connectedness of mutual friends. This is often summarized in terms of clustering coefficients, which measure the likelihood that two neighbors of a node are themselves connected. Computing these measures exactly for largescale networks is prohibitively expensive in both memory and time. However, a recent wedge sampling algorithm has proved successful in efficiently and accurately estimating clustering coefficients. In this paper, we describe how to implement this approach in MapReduce to deal with extremely massive graphs. We show results on publiclyavailable networks, the largest of which is 132M nodes and 4.7B edges, as well as artificially generated networks (using the Graph500 benchmark), the largest of which has 240M nodes and 8.5B edges. We can estimate the clustering coefficient by degree bin (e.g., we use exponential binning) and the number of triangles per bin, as well as the global clustering coefficient and total number of triangles, in an average of 0.33 sec. per million edges plus overhead (approximately 225 sec. total for our configuration). The technique can also be used to study triangle statistics such as the ratio of the highest and lowest degree, and we highlight differences between social and nonsocial networks. To the best of our knowledge, these are the largest trianglebased graph computations published to date.
Wedge sampling for computing clustering coefficients and triangle counts on large graphs
 Statistical Analysis and Data Mining
, 2014
"... Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Algorithms to compute them can be extremely expensive, ev ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Algorithms to compute them can be extremely expensive, even for moderatelysized graphs with only millions of edges. Previous work has considered node and edge sampling; in contrast, we consider wedge sampling, which provides faster and more accurate approximations than competing techniques. Additionally, wedge sampling enables estimation local clustering coefficients, degreewise clustering coefficients, uniform triangle sampling, and directed triangle counts. Our methods come with provable and practical probabilistic error estimates for all computations. We provide extensive results that show our methods are both more accurate and faster than stateoftheart alternatives. 1
Clustering Coefficient Queries on Massive Dynamic Social Networks
"... Abstract. The Clustering Coefficient (CC) is a fundamental measure in social network analysis assessing the degree to which nodes tend to cluster together. While CC computation on static graphs is well studied, emerging applications have new requirements for online query of the “global ” CC of a giv ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The Clustering Coefficient (CC) is a fundamental measure in social network analysis assessing the degree to which nodes tend to cluster together. While CC computation on static graphs is well studied, emerging applications have new requirements for online query of the “global ” CC of a given subset of a graph. As social networks are widely stored in databases for easy updating and accessing, computing CC of their subset becomes a timeconsuming task, especially when the network grows large and cannot fit in memory. This paper presents a novel method called “Approximate Neighborhood Index (ANI) ” to significantly reduce the query latency for CC computation compared to traditional SQL based database queries. A Bloomfilterlike data structure is leveraged to construct ANI in front of a relational database. Experimental results show that the proposed approach can guarantee the correctness of a CC query while significantly reducing the query latency at a reasonable memory cost. 1
Data Mining with MAPREDUCE: Graph and Tensor Algorithms with Applications
, 2010
"... ..."
(Show Context)
Path Sampling: A Fast and Provable Method for Estimating 4Vertex Subgraph Counts∗
"... Counting the frequency of small subgraphs is a fundamental technique in network analysis across various domains, most notably in bioinformatics and social networks. The special case of triangle counting has received much attention. Getting results for 4vertex patterns is highly challenging, and th ..."
Abstract
 Add to MetaCart
(Show Context)
Counting the frequency of small subgraphs is a fundamental technique in network analysis across various domains, most notably in bioinformatics and social networks. The special case of triangle counting has received much attention. Getting results for 4vertex patterns is highly challenging, and there are few practical results known that can scale to massive sizes. Indeed, even a highly tuned enumeration code takes more than a day on a graph with millions of edges. Most previous work that runs for truly massive graphs employ clusters and massive parallelization. We provide a sampling algorithm that provably and accurately approximates the frequencies of all 4vertex pattern subgraphs. Our algorithm is based on a novel technique of 3path sampling and a special pruning scheme to decrease