Results

**11 - 17**of**17**### Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts∗

"... Counting the frequency of small subgraphs is a fundamental technique in network analysis across various domains, most notably in bioinformatics and social networks. The special case of triangle counting has received much attention. Get-ting results for 4-vertex patterns is highly challenging, and th ..."

Abstract
- Add to MetaCart

(Show Context)
Counting the frequency of small subgraphs is a fundamental technique in network analysis across various domains, most notably in bioinformatics and social networks. The special case of triangle counting has received much attention. Get-ting results for 4-vertex patterns is highly challenging, and there are few practical results known that can scale to mas-sive sizes. Indeed, even a highly tuned enumeration code takes more than a day on a graph with millions of edges. Most previous work that runs for truly massive graphs em-ploy clusters and massive parallelization. We provide a sampling algorithm that provably and accu-rately approximates the frequencies of all 4-vertex pattern subgraphs. Our algorithm is based on a novel technique of 3-path sampling and a special pruning scheme to decrease

### MADHAV JHA, Sandia National Laboratories C. SESHADHRI, Sandia National Laboratories

"... space efficient streaming algorithm for estimating transitivity and ..."

(Show Context)
### A Novel Approach to Finding Near-Cliques: The Triangle-Densest Subgraph Problem

, 2014

"... Many graph mining applications rely on detecting subgraphs which are large near-cliques. There exists a dichotomy between the results in the existing work related to this problem: on the one hand formulations that are geared towards finding large near-cliques are NP-hard and frequently inapproximabl ..."

Abstract
- Add to MetaCart

Many graph mining applications rely on detecting subgraphs which are large near-cliques. There exists a dichotomy between the results in the existing work related to this problem: on the one hand formulations that are geared towards finding large near-cliques are NP-hard and frequently inapproximable due to connections with the Maximum Clique problem. On the other hand, the densest subgraph problem (DS-Problem) which maximizes the average degree over all subgraphs and other indirect approaches which optimize tractable objectives fail to detect large near-cliques in many networks. In this work, we propose a formulation which combines the best of both worlds: it is solvable in polynomial time and succeeds consistently in finding large near-cliques. Surprisingly, our formulation is a simple variation of the DS-Problem. Specifically, we define the triangle densest subgraph problem (TDS-Problem): given a graph G(V,E), find a subset of vertices S ∗ such that τ(S∗) = max S⊆V t(S) |S | , where t(S) is the number of triangles induced by the set S. We provide various exact and approximation algorithms which the solve TDS-Problem efficiently. Furthermore, we show how our algorithms adapt to the more general problem of maximizing the k-clique average density, k ≥ 2. We illustrate the success of the proposed formulation in extracting large near-cliques from graphs by performing numerous experiments on real-world networks. 1

### FENNEL: Streaming Graph Partitioning . . .

, 2012

"... ... efficient solving of a wide range of computational tasks and querying over large-scale graph data, such as computing node centralities using iterative computations, and personalized recommendations. In this work, we introduce a unifying framework for graph partitioning which enables a well princ ..."

Abstract
- Add to MetaCart

... efficient solving of a wide range of computational tasks and querying over large-scale graph data, such as computing node centralities using iterative computations, and personalized recommendations. In this work, we introduce a unifying framework for graph partitioning which enables a well principled design of scalable, streaming graph partitioning algorithms that are amenable to distributed implementation. We show that many previously proposed methods are special instances of this framework, we derive a novel onepass, streaming graph partitioning algorithm and show that it yields significant benefits over previous approaches, using a large set of real-world and synthetic graphs. Surprisingly, despite the fact that our algorithm is a onepass streaming algorithm, we found its performance to be overall comparable to the de-facto standard offline software