Results 1  10
of
21
COLORFUL TRIANGLE COUNTING AND A MAPREDUCE IMPLEMENTATION
"... In this note we introduce a new randomized algorithm for counting triangles in graphs. We show that under mild conditions, the estimate of our algorithm is strongly concentrated around the true number of triangles. Specifically, let G be a graph with n vertices, t triangles and let ∆ be the maximum ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
(Show Context)
In this note we introduce a new randomized algorithm for counting triangles in graphs. We show that under mild conditions, the estimate of our algorithm is strongly concentrated around the true number of triangles. Specifically, let G be a graph with n vertices, t triangles and let ∆ be the maximum number of triangles an edge of G is contained in. Also, let N = 1/p the number of colors we ∆ log n use in our randomized algorithm. We show that if p ≥ max ( log n
Vertex Neighborhoods, Low Conductance Cuts, and Good Seeds for Local Community Methods
"... The communities of a social network are sets of vertices with more connections inside the set than outside. We theoretically demonstrate that two commonly observed properties of social networks, heavytailed degree distributions and large clustering coefficients, imply the existence of vertex neighb ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
The communities of a social network are sets of vertices with more connections inside the set than outside. We theoretically demonstrate that two commonly observed properties of social networks, heavytailed degree distributions and large clustering coefficients, imply the existence of vertex neighborhoods (also known as egonets) that are themselves good communities. We evaluate these neighborhood communities on a range of graphs. What we find is that the neighborhood communities can exhibit conductance scores that are as good as the Fiedler cut. Also, the conductance of neighborhood communities shows similar behavior as the network community profile computed with a personalized PageRank community detection method. Neighborhood communities give us a simple and powerful heuristic for speeding up local partitioning methods. Since finding good seeds for the PageRank clustering method is difficult, most approaches involve an expensive sweep over a great many starting vertices. We show how to use neighborhood communities to quickly generate a small set of seeds.
Triangle sparsifiers
 Journal of Graph Algorithms and Applications
"... In this work, we introduce the notion of triangle sparsifiers, i.e., sparse graphs which are approximately the same to the original graph with respect to the triangle count. This results in a practical triangle counting method with strong theoretical guarantees. For instance, for unweighted graphs w ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
(Show Context)
In this work, we introduce the notion of triangle sparsifiers, i.e., sparse graphs which are approximately the same to the original graph with respect to the triangle count. This results in a practical triangle counting method with strong theoretical guarantees. For instance, for unweighted graphs we show a randomized algorithm for approximately counting the number of triangles in a graph G, which proceeds as follows: keep each edge independently with probability p, enumerate the triangles in the sparsified graph G ′ and return the number of triangles found in G ′ multiplied by p −3. We prove that under mild assumptions on G and p our algorithm returns a good approximation for the number of triangles with high probability. Specifically, we show that if p ≥ max ( polylog(n)∆ t polylog(n) t1/3), where n, t, ∆, and T denote the number of vertices in G, the number of triangles in G, the maximum number of triangles an edge of G is contained and our triangle count estimate respectively, then T is strongly concentrated around t: Pr [T − t  ≥ ɛt] ≤ n −K. We illustrate the efficiency of our algorithm on various large realworld datasets where we obtain significant speedups. Finally, we investigate cut and spectral sparsifiers with respect to triangle counting and show that they are not optimal. Submitted:
Patric: A parallel algorithm for counting triangles and computing clustering coefficients in massive networks
, 2012
"... We present MPIbased parallel algorithms for counting triangles and computing clustering coefficients in massive networks. � A triangle in a graph G(V, E) is a set of three nodes u, v, w ∊V such that there is an edge between each pair of nodes. The number of triangles incident on node v, with adjace ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
We present MPIbased parallel algorithms for counting triangles and computing clustering coefficients in massive networks. � A triangle in a graph G(V, E) is a set of three nodes u, v, w ∊V such that there is an edge between each pair of nodes. The number of triangles incident on node v, with adjacency list N(v), is defined as, �  { ( u, w) � E  u, w � N ( v)} Counting triangles is important in the analysis of various networks, e.g., social, biological, web etc. Emerging massive networks do not fit in the main memory of a single machine and are very challenging to work with. Our distributedmemory parallel algorithm allows us to deal with such massive networks in a time and spaceefficient manner. We were able to count triangles in a graph with 2 billions of nodes and 50 billions of edges in 10 minutes. � The clustering coefficient (CC) of a node v ∊V with degree dv is defined as,
Triadic Measures on Graphs: The Power of Wedge Sampling
, 2012
"... Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of a graph. Some of the most useful graph metrics, especially those measuring social cohesion, are based on triangles. Despite the importance of these triadic measures, associa ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of a graph. Some of the most useful graph metrics, especially those measuring social cohesion, are based on triangles. Despite the importance of these triadic measures, associated algorithms can be extremely expensive. We propose a new method based on wedge sampling. This versatile technique allows for the fast and accurate approximation of all current variants of clustering coefficients and enables rapid uniform sampling of the triangles of a graph. Our methods come with provable and practical timeapproximation tradeoffs for all computations. We provide extensive results that show our methods are orders of magnitude faster than the stateoftheart, while providing nearly the accuracy of full enumeration. Our results will enable more widescale adoption of triadic measures for analysis of extremely large graphs, as demonstrated on several realworld examples.
COUNTING TRIANGLES IN MASSIVE GRAPHS WITH MAPREDUCE
, 2013
"... Graphs and networks are used to model interactions in a variety of contexts. There is a growing need to quickly assess the characteristics of a graph in order to understand its underlying structure. Some of the most useful metrics are trianglebased and give a measure of the connectedness of mutual ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
Graphs and networks are used to model interactions in a variety of contexts. There is a growing need to quickly assess the characteristics of a graph in order to understand its underlying structure. Some of the most useful metrics are trianglebased and give a measure of the connectedness of mutual friends. This is often summarized in terms of clustering coefficients, which measure the likelihood that two neighbors of a node are themselves connected. Computing these measures exactly for largescale networks is prohibitively expensive in both memory and time. However, a recent wedge sampling algorithm has proved successful in efficiently and accurately estimating clustering coefficients. In this paper, we describe how to implement this approach in MapReduce to deal with extremely massive graphs. We show results on publiclyavailable networks, the largest of which is 132M nodes and 4.7B edges, as well as artificially generated networks (using the Graph500 benchmark), the largest of which has 240M nodes and 8.5B edges. We can estimate the clustering coefficient by degree bin (e.g., we use exponential binning) and the number of triangles per bin, as well as the global clustering coefficient and total number of triangles, in an average of 0.33 sec. per million edges plus overhead (approximately 225 sec. total for our configuration). The technique can also be used to study triangle statistics such as the ratio of the highest and lowest degree, and we highlight differences between social and nonsocial networks. To the best of our knowledge, these are the largest trianglebased graph computations published to date.
The input/output complexity of triangle enumeration
 In PODS'14
, 2014
"... ar ..."
(Show Context)
Listing triangles
 In Automata, Languages, and Programming  41st International Colloquium, ICALP 2014
"... Abstract. We present new algorithms for listing triangles in dense and sparse graphs. The running time of our algorithm for dense graphs is Õ(nω + n3(ω−1)/(5−ω)t2(3−ω)/(5−ω)), and the running time of the algorithm for sparse graphs is Õ(m2ω/(ω+1) + m3(ω−1)/(ω+1)t(3−ω)/(ω+1)), where n is the numbe ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We present new algorithms for listing triangles in dense and sparse graphs. The running time of our algorithm for dense graphs is Õ(nω + n3(ω−1)/(5−ω)t2(3−ω)/(5−ω)), and the running time of the algorithm for sparse graphs is Õ(m2ω/(ω+1) + m3(ω−1)/(ω+1)t(3−ω)/(ω+1)), where n is the number of vertices, m is the number of edges, t is the number of triangles to be listed, and ω < 2.373 is the exponent of fast matrix multiplication. With the current bound on ω, the running times of our algorithms are Õ(n2.373 +n1.568 t0.478) and Õ(m1.408 +m1.222 t0.186), respectively. We first obtain randomized algorithms with the desired running times and then derandomize them using sparse recovery techniques. If ω = 2, the running times of the algorithms become Õ(n2 + nt2/3) and Õ(m4/3 +mt1/3), respectively. In particular, if ω = 2, our algorithm lists m triangles in Õ(m4/3) time. Pǎtraşcu (STOC 2010) showed that Ω(m4/3−o(1)) time is required for listing m triangles, unless there exist subquadratic algorithms for 3SUM. We show that unless one can solve quadratic equation systems over a finite field significantly faster than the brute force algorithm, our triangle listing runtime bounds are tight assuming ω = 2, also for graphs with more triangles. 1
Parallel triangle counting in massive streaming graphs
 in Proc. of CIKM
, 2013
"... The number of triangles in a graph is a fundamental metric, used in social network analysis, link classification and recommendation, and more. Driven by these applications and the trend that modern graph datasets are both large and dynamic, we present the design and implementation of a fast and cac ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
The number of triangles in a graph is a fundamental metric, used in social network analysis, link classification and recommendation, and more. Driven by these applications and the trend that modern graph datasets are both large and dynamic, we present the design and implementation of a fast and cacheefficient parallel algorithm for estimating the number of triangles in a massive undirected graph whose edges arrive as a stream. It brings together the benefits of streaming algorithms and parallel algorithms. By building on the streaming algorithms framework, the algorithm has a small memory footprint. By leveraging the paralell cacheoblivious framework, it makes efficient use of the memory hierarchy of modern multicore machines without needing to know its specific parameters. We prove theoretical bounds on accuracy, memory access cost, and parallel runtime complexity, as well as showing empirically that the algorithm yields accurate results and substantial speedups compared to an optimized sequential implementation. (This is an expanded version of a CIKM’13 paper of the same title.) 1
Wedge sampling for computing clustering coefficients and triangle counts on large graphs
 Statistical Analysis and Data Mining
, 2014
"... Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Algorithms to compute them can be extremely expensive, ev ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Algorithms to compute them can be extremely expensive, even for moderatelysized graphs with only millions of edges. Previous work has considered node and edge sampling; in contrast, we consider wedge sampling, which provides faster and more accurate approximations than competing techniques. Additionally, wedge sampling enables estimation local clustering coefficients, degreewise clustering coefficients, uniform triangle sampling, and directed triangle counts. Our methods come with provable and practical probabilistic error estimates for all computations. We provide extensive results that show our methods are both more accurate and faster than stateoftheart alternatives. 1