## Efficient Aggregation for Graph Summarization

Citations: | 39 - 3 self |

### BibTeX

@MISC{Tian_efficientaggregation,

author = {Yuanyuan Tian and Richard A. Hankins and Jignesh M. Patel},

title = {Efficient Aggregation for Graph Summarization},

year = {}

}

### OpenURL

### Abstract

Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying characteristics of large graphs, graph summarization techniques are critical. However, existing graph summarization methods are mostly statistical (studying statistics such as degree distributions, hop-plots and clustering coefficients). These statistical methods are very useful, but the resolutions of the summaries are hard to control. In this paper, we introduce two database-style operations to summarize graphs. Like the OLAP-style aggregation methods that allow users to drill-down or roll-up to control the resolution of summarization, our methods provide an analogous functionality for large graph datasets. The first operation, called SNAP, produces a summary graph by grouping nodes based on user-selected node attributes and relationships. The second operation, called k-SNAP, further allows users to control the resolutions of summaries and provides the “drill-down ” and “roll-up ” abilities to navigate through summaries with different resolutions. We propose an efficient algorithm to evaluate the SNAP operation. In addition, we prove that the k-SNAP computation is NPcomplete. We propose two heuristic methods to approximate the k-SNAP results. Through extensive experiments on a variety of real and synthetic datasets, we demonstrate the effectiveness and efficiency of the proposed methods.

### Citations

1632 | The structure and function of complex networks
- Newman
- 2003
(Show Context)
Citation Context ...h summarization methods are required to help users extract and understand the underlying information. Most existing graph summarization methods use simple statistics to describe graph characteristics =-=[6, 7, 13]-=-; for example, researchers plot degree distributions to investigate the scale-free property of graphs, employ hop-plots to study the small world effect, and utilize clustering coefficients to measure ... |

828 |
Finding and evaluating community structure in networks
- Newman, Girvan
(Show Context)
Citation Context ...re also employed to understand the characteristics of large graphs. However, these algorithms often produce a large number of results that can easily overwhelm the user. Graph partitioning algorithms =-=[14, 18, 22]-=- have been used to detect community structures (dense subgraphs) in large networks. However, the community detection is based purely on nodes connectivities, and the attributes of nodes are largely ig... |

478 | gspan: Graph-based substructure pattern mining
- Yan, Jiawei
- 2002
(Show Context)
Citation Context ...“clumpiness”of large graphs. While these methods are useful, the summaries contain limited information and can be difficult to interpret and manipulate. Methods that mine graphs for frequent patterns =-=[11, 19, 20, 23]-=- are also employed to understand the characteristics of large graphs. However, these algorithms often produce a large number of results that can easily overwhelm the user. Graph partitioning algorithm... |

369 | Graph visualization and navigation in information visualization: a survey
- Herman, Melancon, et al.
- 2000
(Show Context)
Citation Context ...ity structures (dense subgraphs) in large networks. However, the community detection is based purely on nodes connectivities, and the attributes of nodes are largely ignored. Graph drawing techniques =-=[3, 10]-=- can help one better visualize graphs, but visualizing large graphs quickly becomes overwhelming. What users need is a more controlled and intuitive method for summarizing graphs. The summarization me... |

211 | The political blogosphere and the 2004 US election: divided they blog
- Adamic, Glance
- 2005
(Show Context)
Citation Context ...een authors. The statistics for these four datasets are shown in Table 1. Political Blogs Dataset This dataset is a network of 1490 webblogs on US politics and 19090 hyperlinks between these webblogs =-=[1]-=- (downloaded from http://www-personal. umich.edu/~mejn/netdata/). Each blog in this dataset has an attribute describing its political leaning as either liberal or conservative. 1 DB: VLDB J., TODS, KD... |

190 | The Web as a Graph
- Kumar, Raghavan, et al.
- 2000
(Show Context)
Citation Context ...nough. Unlike these existing methods, we introduce two database-style operations to summarize large graphs. Our method allows users to easily control and navigate through summaries. Previous research =-=[4, 5, 15]-=- have also studied the problem of compressing large graphs, especially Web graphs. However, these graph compression methods mainly focus on compact graph representation for easy storage and manipulati... |

172 | The WebGraph framework I: Compression techniques
- Boldi, Vigna
- 2004
(Show Context)
Citation Context ...nough. Unlike these existing methods, we introduce two database-style operations to summarize large graphs. Our method allows users to easily control and navigate through summaries. Previous research =-=[4, 5, 15]-=- have also studied the problem of compressing large graphs, especially Web graphs. However, these graph compression methods mainly focus on compact graph representation for easy storage and manipulati... |

151 | R-MAT: A recursive model for graph mining
- Chakrabarti, Zhan, et al.
- 2004
(Show Context)
Citation Context ..., ECOOP; AI: IJCAI, AAAI, AAAI/IAAI, Artif. Intell. Synthetic Dataset Most real world graphs show powerlaw degree distributions and small-world characteristics [13]. Therefore, we use the R-MAT model =-=[8]-=- in the GTgraph suites [2] to generate graphs with power-law degree distributions and small-world characteristics. Based on the statistics in Table 1, we set the average node degree in each synthetic ... |

104 |
Graph Drawing: Algorithms for the Visualization of Graphs
- Battista, Eades, et al.
- 1999
(Show Context)
Citation Context ...ity structures (dense subgraphs) in large networks. However, the community detection is based purely on nodes connectivities, and the attributes of nodes are largely ignored. Graph drawing techniques =-=[3, 10]-=- can help one better visualize graphs, but visualizing large graphs quickly becomes overwhelming. What users need is a more controlled and intuitive method for summarizing graphs. The summarization me... |

77 | Graph mining: Laws, generators, and algorithms
- CHAKRABARTI, FALOUTSOS
- 2006
(Show Context)
Citation Context ...h summarization methods are required to help users extract and understand the underlying information. Most existing graph summarization methods use simple statistics to describe graph characteristics =-=[6, 7, 13]-=-; for example, researchers plot degree distributions to investigate the scale-free property of graphs, employ hop-plots to study the small world effect, and utilize clustering coefficients to measure ... |

77 |
State of the Art of Graph-based Data Mining
- Washio, Motoda
- 2003
(Show Context)
Citation Context ...“clumpiness”of large graphs. While these methods are useful, the summaries contain limited information and can be difficult to interpret and manipulate. Methods that mine graphs for frequent patterns =-=[11, 19, 20, 23]-=- are also employed to understand the characteristics of large graphs. However, these algorithms often produce a large number of results that can easily overwhelm the user. Graph partitioning algorithm... |

69 | Spin: mining maximal frequent subgraphs from graph databases
- Huan, Wang, et al.
- 2004
(Show Context)
Citation Context ...“clumpiness”of large graphs. While these methods are useful, the summaries contain limited information and can be difficult to interpret and manipulate. Methods that mine graphs for frequent patterns =-=[11, 19, 20, 23]-=- are also employed to understand the characteristics of large graphs. However, these algorithms often produce a large number of results that can easily overwhelm the user. Graph partitioning algorithm... |

59 | An efficient algorithm for graph isomorphism
- Corneil, Gottlieb
- 1970
(Show Context)
Citation Context ...ent of relationships between node groups, and produces user controllable multi-resolution summaries. The SNAP algorithm (Algorithm 1) shares similarity with the automorphism partitioning algorithm in =-=[9]-=-. However, the automorphism partitioning algorithm only partitions nodes based on node degrees and relationships, whereas SNAP can be evaluated based on arbitrary node attributes and relationships tha... |

59 | Scan: A structural clustering algorithm for networks
- Xu, Yuruk, et al.
- 2007
(Show Context)
Citation Context ...re also employed to understand the characteristics of large graphs. However, these algorithms often produce a large number of results that can easily overwhelm the user. Graph partitioning algorithms =-=[14, 18, 22]-=- have been used to detect community structures (dense subgraphs) in large networks. However, the community detection is based purely on nodes connectivities, and the attributes of nodes are largely ig... |

36 | Compact Representations of Separable Graphs
- Blandford, Blelloch, et al.
- 2003
(Show Context)
Citation Context ...nough. Unlike these existing methods, we introduce two database-style operations to summarize large graphs. Our method allows users to easily control and navigate through summaries. Previous research =-=[4, 5, 15]-=- have also studied the problem of compressing large graphs, especially Web graphs. However, these graph compression methods mainly focus on compact graph representation for easy storage and manipulati... |

10 | How hard is it to determine if a graph has a 2-role assignment? Networks 37
- Roberts, Sheng
- 2001
(Show Context)
Citation Context ...nomial time that ∆(ΦA) ≤ D. And an A-compatible grouping ΦA of size k can be generated by a polynomial time algorithm. (2) This problem contains a known NP-complete problem 2-Role Assignability (2RA) =-=[16]-=- as a special case. By restricting A = ∅, |R| = 1, k = 2 and D = 0, this problem becomes 2RA (which decides whether the nodes in a graph can be assigned with 2 roles, each node with one of the roles, ... |

9 |
Less is more: Sparse graph mining with compact matrix decomposition
- Sun, Xie, et al.
- 2008
(Show Context)
Citation Context ...re also employed to understand the characteristics of large graphs. However, these algorithms often produce a large number of results that can easily overwhelm the user. Graph partitioning algorithms =-=[14, 18, 22]-=- have been used to detect community structures (dense subgraphs) in large networks. However, the community detection is based purely on nodes connectivities, and the attributes of nodes are largely ig... |

9 | Graphminer: a structural patternmining system for large disk-based graph databases and its applications
- Wang, Wang, et al.
- 2005
(Show Context)
Citation Context |

7 | Y.: Visualization of large networks with mincutplots, a-plots and r-mat
- Chakrabarti, Faloutsos, et al.
- 2007
(Show Context)
Citation Context ...h summarization methods are required to help users extract and understand the underlying information. Most existing graph summarization methods use simple statistics to describe graph characteristics =-=[6, 7, 13]-=-; for example, researchers plot degree distributions to investigate the scale-free property of graphs, employ hop-plots to study the small world effect, and utilize clustering coefficients to measure ... |

6 |
GTgraph: A suite of synthetic graph generators. http://hpcrd.lbl. gov/~kamesh/GTgraph
- Bader, Madduri
(Show Context)
Citation Context ...AAAI/IAAI, Artif. Intell. Synthetic Dataset Most real world graphs show powerlaw degree distributions and small-world characteristics [13]. Therefore, we use the R-MAT model [8] in the GTgraph suites =-=[2]-=- to generate graphs with power-law degree distributions and small-world characteristics. Based on the statistics in Table 1, we set the average node degree in each synthetic graph to 5. We used the de... |

4 |
Supergraph visualization
- Jr, Traina, et al.
- 2006
(Show Context)
Citation Context ... produces an overwhelmingly large number of frequent patterns. Various graph partitioning algorithms [14, 18, 22] are used to detect community structures (dense subgraphs) in large graphs. SuperGraph =-=[17]-=- employs hierarchical graph partitioning to visualize large graphs. However, graph partitioning techniques largely ignore the node attributes in the summarization. Studies on graph visualization are s... |

2 |
K.: Graphand semigrouphomomorphisms on semigroups of relations. Social Networks 5
- White, Reitz
- 1983
(Show Context)
Citation Context ... focus on compact graph representation for easy storage and manipulation, whereas graph summarization methods aim at producing small and understandable summaries. Regular equivalence is introduced in =-=[21]-=- to study social roles of nodes based on graphs structures in social networks. It shares resemblance with the SNAP operation. However, regular equivalence is defined only based on the relationshipsbe... |