Results 1  10
of
44
Doulion: Counting Triangles in Massive Graphs with a Coin
 PROCEEDINGS OF ACM KDD,
, 2009
"... Counting the number of triangles in a graph is a beautiful algorithmic problem which has gained importance over the last years due to its significant role in complex network analysis. Metrics frequently computed such as the clustering coefficient and the transitivity ratio involve the execution of a ..."
Abstract

Cited by 53 (16 self)
 Add to MetaCart
Counting the number of triangles in a graph is a beautiful algorithmic problem which has gained importance over the last years due to its significant role in complex network analysis. Metrics frequently computed such as the clustering coefficient and the transitivity ratio involve the execution of a triangle counting algorithm. Furthermore, several interesting graph mining applications rely on computing the number of triangles in the graph of interest. In this paper, we focus on the problem of counting triangles in a graph. We propose a practical method, out of which all triangle counting algorithms can potentially benefit. Using a straightforward triangle counting algorithm as a black box, we performed 166 experiments on realworld networks and on synthetic datasets as well, where we show that our method works with high accuracy, typically more than 99 % and gives significant speedups, resulting in even ≈ 130 times faster performance.
Truss Decomposition in Massive Networks
"... The ktruss is a type of cohesive subgraphs proposed recently for the study of networks. While the problem of computing most cohesive subgraphs is NPhard, there exists a polynomial time algorithm for computing ktruss. Compared with kcore which is also efficient to compute, ktruss represents the ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
(Show Context)
The ktruss is a type of cohesive subgraphs proposed recently for the study of networks. While the problem of computing most cohesive subgraphs is NPhard, there exists a polynomial time algorithm for computing ktruss. Compared with kcore which is also efficient to compute, ktruss represents the “core ” of a kcore that keeps the key information of, while filtering out less important information from, the kcore. However, existing algorithms for computing ktruss are inefficient for handling today’s massive networks. We first improve the existing inmemory algorithm for computing ktruss in networks of moderate size. Then, we propose two I/Oefficient algorithms to handle massive networks that cannot fit in main memory. Our experiments on real datasets verify the efficiency of our algorithms and the value of ktruss. 1.
Efficient Triangle Counting in Large Graphs via Degreebased Vertex Partitioning
"... The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering t ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
(Show Context)
The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering the hidden thematic structures in the Web and link recommendation. Counting triangles in graphs with millions and billions of edges requires algorithms which run fast, use small amount of space, provide accurate estimates of the number of triangles and preferably are parallelizable. In this paper we present an efficient triangle counting approximation algorithm which can be adapted to the semistreaming model [23]. The key idea of our algorithm is to combine the sampling algorithm of [51,52] and the partitioning of the set of vertices into a high degree and a low degree subset respectively as in [5], treating each set appropriately. From a mathematical perspective, we show a simplified proof of [52] which uses the powerful KimVu concentration inequality [31] based on the HajnalSzemerédi theorem [25]. Furthermore, we improve bounds of existing triple sampling ( techniques based on a theorem of Ahlswede and Katona [3]. We obtain a running time O m + m3/2 log n tɛ2) and an (1 ± ɛ)
Graph Mining Applications to Social Network Analysis.
 Managing and Mining Graph Data
, 2010
"... ..."
Triangle sparsifiers
 Journal of Graph Algorithms and Applications
"... In this work, we introduce the notion of triangle sparsifiers, i.e., sparse graphs which are approximately the same to the original graph with respect to the triangle count. This results in a practical triangle counting method with strong theoretical guarantees. For instance, for unweighted graphs w ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
(Show Context)
In this work, we introduce the notion of triangle sparsifiers, i.e., sparse graphs which are approximately the same to the original graph with respect to the triangle count. This results in a practical triangle counting method with strong theoretical guarantees. For instance, for unweighted graphs we show a randomized algorithm for approximately counting the number of triangles in a graph G, which proceeds as follows: keep each edge independently with probability p, enumerate the triangles in the sparsified graph G ′ and return the number of triangles found in G ′ multiplied by p −3. We prove that under mild assumptions on G and p our algorithm returns a good approximation for the number of triangles with high probability. Specifically, we show that if p ≥ max ( polylog(n)∆ t polylog(n) t1/3), where n, t, ∆, and T denote the number of vertices in G, the number of triangles in G, the maximum number of triangles an edge of G is contained and our triangle count estimate respectively, then T is strongly concentrated around t: Pr [T − t  ≥ ɛt] ≤ n −K. We illustrate the efficiency of our algorithm on various large realworld datasets where we obtain significant speedups. Finally, we investigate cut and spectral sparsifiers with respect to triangle counting and show that they are not optimal. Submitted:
Patric: A parallel algorithm for counting triangles and computing clustering coefficients in massive networks
, 2012
"... We present MPIbased parallel algorithms for counting triangles and computing clustering coefficients in massive networks. � A triangle in a graph G(V, E) is a set of three nodes u, v, w ∊V such that there is an edge between each pair of nodes. The number of triangles incident on node v, with adjace ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
We present MPIbased parallel algorithms for counting triangles and computing clustering coefficients in massive networks. � A triangle in a graph G(V, E) is a set of three nodes u, v, w ∊V such that there is an edge between each pair of nodes. The number of triangles incident on node v, with adjacency list N(v), is defined as, �  { ( u, w) � E  u, w � N ( v)} Counting triangles is important in the analysis of various networks, e.g., social, biological, web etc. Emerging massive networks do not fit in the main memory of a single machine and are very challenging to work with. Our distributedmemory parallel algorithm allows us to deal with such massive networks in a time and spaceefficient manner. We were able to count triangles in a graph with 2 billions of nodes and 50 billions of edges in 10 minutes. � The clustering coefficient (CC) of a node v ∊V with degree dv is defined as,
Triadic Measures on Graphs: The Power of Wedge Sampling
, 2012
"... Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of a graph. Some of the most useful graph metrics, especially those measuring social cohesion, are based on triangles. Despite the importance of these triadic measures, associa ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of a graph. Some of the most useful graph metrics, especially those measuring social cohesion, are based on triangles. Despite the importance of these triadic measures, associated algorithms can be extremely expensive. We propose a new method based on wedge sampling. This versatile technique allows for the fast and accurate approximation of all current variants of clustering coefficients and enables rapid uniform sampling of the triangles of a graph. Our methods come with provable and practical timeapproximation tradeoffs for all computations. We provide extensive results that show our methods are orders of magnitude faster than the stateoftheart, while providing nearly the accuracy of full enumeration. Our results will enable more widescale adoption of triadic measures for analysis of extremely large graphs, as demonstrated on several realworld examples.
COUNTING TRIANGLES IN MASSIVE GRAPHS WITH MAPREDUCE
, 2013
"... Graphs and networks are used to model interactions in a variety of contexts. There is a growing need to quickly assess the characteristics of a graph in order to understand its underlying structure. Some of the most useful metrics are trianglebased and give a measure of the connectedness of mutual ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
Graphs and networks are used to model interactions in a variety of contexts. There is a growing need to quickly assess the characteristics of a graph in order to understand its underlying structure. Some of the most useful metrics are trianglebased and give a measure of the connectedness of mutual friends. This is often summarized in terms of clustering coefficients, which measure the likelihood that two neighbors of a node are themselves connected. Computing these measures exactly for largescale networks is prohibitively expensive in both memory and time. However, a recent wedge sampling algorithm has proved successful in efficiently and accurately estimating clustering coefficients. In this paper, we describe how to implement this approach in MapReduce to deal with extremely massive graphs. We show results on publiclyavailable networks, the largest of which is 132M nodes and 4.7B edges, as well as artificially generated networks (using the Graph500 benchmark), the largest of which has 240M nodes and 8.5B edges. We can estimate the clustering coefficient by degree bin (e.g., we use exponential binning) and the number of triangles per bin, as well as the global clustering coefficient and total number of triangles, in an average of 0.33 sec. per million edges plus overhead (approximately 225 sec. total for our configuration). The technique can also be used to study triangle statistics such as the ratio of the highest and lowest degree, and we highlight differences between social and nonsocial networks. To the best of our knowledge, these are the largest trianglebased graph computations published to date.
Massive graph triangulation
 In ACM SIGMOD Conference on Management of Data
, 2013
"... This paper studies I/Oefficient algorithms for settling the classic triangle listing problem, whose solution is a basic operator in dealing with many other graph problems. Specifically, given an undirected graph G, the objective of triangle listing is to find all the cliques involving 3 vertices ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
This paper studies I/Oefficient algorithms for settling the classic triangle listing problem, whose solution is a basic operator in dealing with many other graph problems. Specifically, given an undirected graph G, the objective of triangle listing is to find all the cliques involving 3 vertices in G. The problem has been well studied in internal memory, but remains an urgent difficult challenge when G does not fit in memory, rendering any algorithm to entail frequent I/O accesses. Although previous research has attempted to tackle the challenge, the stateoftheart solutions rely on a set of crippling assumptions to guarantee good performance. Motivated by this, we develop a new algorithm that is provably I/O and CPU efficient at the same time, without making any assumption on the input G at all. The algorithm uses ideas drastically different from all the previous approaches, and outperformed the existing competitors by a factor over an order of magnitude in our extensive experimentation.
Spectral counting of triangles in powerlaw networks via elementwise sparsification
 In SODA ’02: Proceedings of the thirteenth annual ACMSIAM symposium on Discrete algorithms
, 2009
"... Triangle counting is an important problem in graph mining. The clustering coefficient and the transitivity ratio, two commonly used measures effectively quantify the triangle density in order to quantify the fact that friends of friends tend to be friends themselves. Furthermore, several successful ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
Triangle counting is an important problem in graph mining. The clustering coefficient and the transitivity ratio, two commonly used measures effectively quantify the triangle density in order to quantify the fact that friends of friends tend to be friends themselves. Furthermore, several successful graph mining applications rely on the number of triangles. In this paper, we study the problem of counting triangles in large, powerlaw networks. Our algorithm, SPARSIFYINGEIGENTRIANGLE, relies on the spectral properties of powerlaw networks and the AchlioptasMcSherry sparsification process. SPARSIFYINGEIGENTRIANGLE is easy to parallelize, fast and accurate. We verify the validity of our approach with several experiments in realworld graphs, where we achieve at the same time high accuracy and important speedup versus a straightforward exact counting competitor. 1