## Comparing Stars: On Approximating Graph Edit Distance (2009)

Citations: | 11 - 0 self |

### BibTeX

@MISC{Zeng09comparingstars:,

author = {Zhiping Zeng and Anthony K. H. Tung and Jianyong Wang and Jianhua Feng and Lizhu Zhou},

title = {Comparing Stars: On Approximating Graph Edit Distance },

year = {2009}

}

### OpenURL

### Abstract

Graph data have become ubiquitous and manipulating them based on similarity is essential for many applications. Graph edit distance is one of the most widely accepted measures to determine similarities between graphs and has extensive applications in the fields of pattern recognition, computer vision etc. Unfortunately, the problem of graph edit distance computation is NP-Hard in general. Accordingly, in this paper we introduce three novel methods to compute the upper and lower bounds for the edit distance between two graphs in polynomial time. Applying these methods, two algorithms AppFull and AppSub are introduced to perform different kinds of graph search on graph databases. Comprehensive experimental studies are conducted on both real and synthetic datasets to examine various aspects of the methods for bounding graph edit distance. Result shows that these methods achieve good scalability in terms of both the number of graphs and the size of graphs. The effectiveness of these algorithms also confirms the usefulness of using our bounds in filtering and searching of graphs.

### Citations

10926 |
Computers and Intractability: A Guide to the Theory of NPCompleteness. Freeman and
- Garey, Johnson
- 1979
(Show Context)
Citation Context ... we assume they are also deleted, this bring about imbalance treatment when deleting nodes of different degrees and also between deleting a node and an edge (which does not affect any nodes).NP-Hard =-=[11]-=- in general, it is likely that GED problem is NP-Hard as well. We confirm this possibility here. Lemma 2.1. Given two graphs g1 and g2, λ(g1, g2) ≥ ||E2|− |E1|| + ||V2| − |V1||. Proof. Given that the ... |

975 |
A Formal Basis for the Heuristic Determination of Minimum Cost Paths
- Hart, Nilsson, et al.
- 1968
(Show Context)
Citation Context ...m [22, 25]. All of them fall into two categories: exact algorithms and heuristic algorithms. The most widely used method for computing exact graph edit distance is based on the well-known A* algorithm=-=[12]-=-, and Kaspar Riesen et al. used bipartite heuristic to speed up the computation procedure[25]. However, as stated in [22], in practice this kind of algorithms are practical for computing the edit dist... |

758 | The hungarian method for the assignment problem
- Kuhn
- 1955
(Show Context)
Citation Context ... the bipartite graph matching problem, we first create an n×n matrix in which each element represents the edit distance between the ith star in S(g1) and the jth star in S(g2). The Hungarian algorithm=-=[18]-=- is then applied on this square matrix to obtain the minimum cost in O(n 3 ) time. We now formally define the mapping distance between two graphs. Definition 4.3. (Mapping Distance) The mapping distan... |

443 | gSpan: Graph-Based Substructure Pattern Mining
- Yan, Han
- 2002
(Show Context)
Citation Context ...roposed to deal with a variety of graph-related research problems. For example, a variety of effective algorithms have been devised to mine graph patterns(e.g., frequent patterns) from graph databases=-=[33]-=-, to index graph databases for efficiently processing graph search[13, 37, 9] and to perform keyword search over graph databases[14, 17, 30]. With the rapidly increasing amounts of graph data(e.g., ch... |

306 | Frequent Subgraph Discovery
- Kuramochi, Karypis
- 2001
(Show Context)
Citation Context ... only three of them related to our experiments are shown in Table 2. For other parameters, we used the default values provided by this generator. For more details about this generator please refer to =-=[19]-=-. 6.1 Comparison with the Exact Algorithm and BLP We first conduct experiments to compare the runtime four algorithms for computing Lm, τ, ρ and λ. Here, λ is provided by an exact graph edit distance ... |

126 | Graph indexing: A frequent structure-based approach
- Yan, Yu, et al.
- 2004
(Show Context)
Citation Context ...ms, almost all of them exploit path-based or graph-based indexing approach. GraphGrep[26] is a famous representative of path-based approach. Srinivasa et al. build multiple abstract graphs[27], gIndex=-=[34]-=- uses the discriminative frequent structures, C-Tree[13] adopts graph closure, SAGA[28] employs pathway fragments, cIndex[9] adopts contrast subgraphs while Tree+ ∆ [37] exploits frequent tree-feature... |

122 | A graph distance metric based on the maximal common subgraph
- Bunke, Shearer
(Show Context)
Citation Context ...ures [23, 31]. As can be seen, manipulating graph data based on structural similarity is essential for many applications [36, 2, 3]. A number of graph similarity measures therefore have been proposed =-=[20, 8, 10, 24]-=-. Among these, graph edit distance has been widely accepted as a similarity measure for representing the distances between attributed graphs. Informally speaking, graph edit distance defines the simil... |

119 | Bidirectional expansion for keyword search on graph databases - Kacholia, Pandit, et al. - 2005 |

118 | Mining molecular fragments: Finding relevant substructures of molecules
- Borgelt, Berthold
- 2002
(Show Context)
Citation Context ...ty of chemical compounds[29]; in bioinformatics, collections of DNA segments in a cell which interact with each other and with other substances in the cell can be formatted as gene regulatory networks=-=[5, 6, 15]-=-. ∗ This work was supported in part by National Natural Science Foundation of China under grant No. 60833003, 973 Program under Grant No. 2006CB303103, an HP Labs Innovation Research Program award, an... |

109 | Algorithmics and applications of tree and graph searching
- Shasha, Wang, et al.
- 2002
(Show Context)
Citation Context ...ch over large graph databases becomes an important database research problem. Because of the technical limitation of processing graph search using conventional database technologies, enormous efforts =-=[26, 13, 35, 9, 37]-=- have been put into constructing practical graph searching methods. Given a graph database consisting of n graphs, D = {g1, g2, · · · , gn}, and a query graph q, almost all existing algorithms of proc... |

105 | Structural matching by discrete relaxation
- Wilson, Hancock
- 1997
(Show Context)
Citation Context ...for obtaining the upper bound will consider only vertex edit term, i.e., without considering structural information in graphs. This solution is therefore not applicable in practice. The other methods =-=[22, 32]-=- cannot provide lower bounds, and use heuristic algorithms to find unbounded suboptimal values. All their computation complexities are hard to analyze and not presented in related papers. Accordingly,... |

102 |
Graph structure in the web: experiments and models
- Broder, Kumar, et al.
(Show Context)
Citation Context ..., multimedia, chemical and biological information system. For example, World Wide Web can be considered as a graph whose vertices correspond to static pages and edges correspond to links between pages=-=[7]-=-; in chem-informatics, labelled graphs are suited to express the connectivity of chemical compounds[29]; in bioinformatics, collections of DNA segments in a cell which interact with each other and wit... |

98 | Blinks: ranked keyword searches on graphs
- He, Wang, et al.
- 2007
(Show Context)
Citation Context ...to mine graph patterns(e.g., frequent patterns) from graph databases[33], to index graph databases for efficiently processing graph search[13, 37, 9] and to perform keyword search over graph databases=-=[14, 17, 30]-=-. With the rapidly increasing amounts of graph data(e.g., chemical compounds and social network data), supporting scalable graph search over large graph databases becomes an important database researc... |

95 |
A new algorithm for error-tolerant subgraph isomorphism detection
- Messmer, Bunke
- 1998
(Show Context)
Citation Context ...ures [23, 31]. As can be seen, manipulating graph data based on structural similarity is essential for many applications [36, 2, 3]. A number of graph similarity measures therefore have been proposed =-=[20, 8, 10, 24]-=-. Among these, graph edit distance has been widely accepted as a similarity measure for representing the distances between attributed graphs. Informally speaking, graph edit distance defines the simil... |

88 |
Chemical Similarity Searching
- Willett, Barnard, et al.
(Show Context)
Citation Context ...hs gi in D containing q [5, 26, 27, 37] or contained by q[9]. 3. similarity search: find all graphs gi in D s.t. gi is similar to q within a user-specified threshold based on some similarity measures =-=[23, 31]-=-. As can be seen, manipulating graph data based on structural similarity is essential for many applications [36, 2, 3]. A number of graph similarity measures therefore have been proposed [20, 8, 10, 2... |

87 | Similarity Searching in Medical Image Databases
- Petrakis, Faloutsos
- 1997
(Show Context)
Citation Context ...hs gi in D containing q [5, 26, 27, 37] or contained by q[9]. 3. similarity search: find all graphs gi in D s.t. gi is similar to q within a user-specified threshold based on some similarity measures =-=[23, 31]-=-. As can be seen, manipulating graph data based on structural similarity is essential for many applications [36, 2, 3]. A number of graph similarity measures therefore have been proposed [20, 8, 10, 2... |

70 |
X: Mining coherent dense subgraphs across massive biological networks for functional discovery
- Hu, Yan, et al.
(Show Context)
Citation Context ...ty of chemical compounds[29]; in bioinformatics, collections of DNA segments in a cell which interact with each other and with other substances in the cell can be formatted as gene regulatory networks=-=[5, 6, 15]-=-. ∗ This work was supported in part by National Natural Science Foundation of China under grant No. 60833003, 973 Program under Grant No. 2006CB303103, an HP Labs Innovation Research Program award, an... |

57 | A linear programming approach for the weighted graph matching problem
- Almohamad, Duffuaa
- 1993
(Show Context)
Citation Context ...ork[21, 32]. However, it is hard to analyze the computation complexities of the above heuristic algorithms, and the suboptimal solutions provided by them are also unbounded. Meanwhile, the authors in =-=[1]-=- and [16] formulated the GED problem as a BLP problem. The adjacency matrix A g for g is given by A g ={ai,j}, where ai,j = 1 if there is an edge connecting vertices i and j, otherwise ai,j = 0. For t... |

57 |
Computational Modeling of Genetic and Biochemical Networks
- Bower, Bolouri
- 2001
(Show Context)
Citation Context ...ty of chemical compounds[29]; in bioinformatics, collections of DNA segments in a cell which interact with each other and with other substances in the cell can be formatted as gene regulatory networks=-=[5, 6, 15]-=-. ∗ This work was supported in part by National Natural Science Foundation of China under grant No. 60833003, 973 Program under Grant No. 2006CB303103, an HP Labs Innovation Research Program award, an... |

51 | RASCAL: Calculation of Graph Similarity using Maximum Common Edge Subgraphs
- Raymond, Gardiner, et al.
- 2002
(Show Context)
Citation Context ...ures [23, 31]. As can be seen, manipulating graph data based on structural similarity is essential for many applications [36, 2, 3]. A number of graph similarity measures therefore have been proposed =-=[20, 8, 10, 24]-=-. Among these, graph edit distance has been widely accepted as a similarity measure for representing the distances between attributed graphs. Informally speaking, graph edit distance defines the simil... |

50 | Closure-Tree: An Index Structure for Graph Queries
- He, Singh
- 2006
(Show Context)
Citation Context ... example, a variety of effective algorithms have been devised to mine graph patterns(e.g., frequent patterns) from graph databases[33], to index graph databases for efficiently processing graph search=-=[13, 37, 9]-=- and to perform keyword search over graph databases[14, 17, 30]. With the rapidly increasing amounts of graph data(e.g., chemical compounds and social network data), supporting scalable graph search o... |

48 | Bayesian graph edit distance
- Myers, Wilson, et al.
- 1999
(Show Context)
Citation Context ...g the matched configuration of vertices that has maximum a posteriori probability w.r.t. the available vertex attribute information. As graph matching algorithms aim to optimize a global MAP criterion=-=[32, 21]-=-, some heuristic algorithms are devised based on this framework[21, 32]. However, it is hard to analyze the computation complexities of the above heuristic algorithms, and the suboptimal solutions pro... |

46 |
Fast and practical indexing and querying of very large graphs
- Trißl, Leser
(Show Context)
Citation Context ...to mine graph patterns(e.g., frequent patterns) from graph databases[33], to index graph databases for efficiently processing graph search[13, 37, 9] and to perform keyword search over graph databases=-=[14, 17, 30]-=-. With the rapidly increasing amounts of graph data(e.g., chemical compounds and social network data), supporting scalable graph search over large graph databases becomes an important database researc... |

32 | G.Valiente. A graph distance metric combining maximum common subgraph and minimum common supergraph - Fernandez |

30 | Graph indexing: Tree + delta >= graph
- Zhao, Yu, et al.
- 2007
(Show Context)
Citation Context ... example, a variety of effective algorithms have been devised to mine graph patterns(e.g., frequent patterns) from graph databases[33], to index graph databases for efficiently processing graph search=-=[13, 37, 9]-=- and to perform keyword search over graph databases[14, 17, 30]. With the rapidly increasing amounts of graph data(e.g., chemical compounds and social network data), supporting scalable graph search o... |

28 | Similarity evaluation on tree-structured data - Yang, Kalnis, et al. - 2005 |

20 |
Saga: a subgraph matching tool for biological graphs
- Tian, McEachin, et al.
- 2007
(Show Context)
Citation Context ...6] is a famous representative of path-based approach. Srinivasa et al. build multiple abstract graphs[27], gIndex[34] uses the discriminative frequent structures, C-Tree[13] adopts graph closure, SAGA=-=[28]-=- employs pathway fragments, cIndex[9] adopts contrast subgraphs while Tree+ ∆ [37] exploits frequent tree-features(Tree) and a small number of discriminative graphs(∆). Because of noises that are usua... |

16 | A binary linear programming formulation of the graph edit distance
- Justice, Hero
(Show Context)
Citation Context ... of other applications such as graph classification, computer vision, pattern recognition and etc. For example, a familiar problem in computer vision isto recognizing specific objects within an image=-=[16]-=-(e.g., face identification and symbol recognition). In this case, a representative graph is generated from the image according to structural characteristics, and vertex labels may be assigned based on... |

15 | Approximate matching of hierarchical data using pq-grams
- Augsten, Böhlen, et al.
- 2005
(Show Context)
Citation Context ...is similar to q within a user-specified threshold based on some similarity measures [23, 31]. As can be seen, manipulating graph data based on structural similarity is essential for many applications =-=[36, 2, 3]-=-. A number of graph similarity measures therefore have been proposed [20, 8, 10, 24]. Among these, graph edit distance has been widely accepted as a similarity measure for representing the distances b... |

12 | Towards graph containment search and indexing
- Chen, Yu, et al.
(Show Context)
Citation Context ... example, a variety of effective algorithms have been devised to mine graph patterns(e.g., frequent patterns) from graph databases[33], to index graph databases for efficiently processing graph search=-=[13, 37, 9]-=- and to perform keyword search over graph databases[14, 17, 30]. With the rapidly increasing amounts of graph data(e.g., chemical compounds and social network data), supporting scalable graph search o... |

11 |
A platform based on the multi-dimensional data model for analysis of bio-molecular structure
- Srinivasa, Kumar
- 2003
(Show Context)
Citation Context ...raph search can be classified into the following three categories: 1. full graph search: find all graphs gi in D s.t. gi is the same as q [4]; 2. subgraph search: find all graphs gi in D containing q =-=[5, 26, 27, 37]-=- or contained by q[9]. 3. similarity search: find all graphs gi in D s.t. gi is similar to q within a user-specified threshold based on some similarity measures [23, 31]. As can be seen, manipulating ... |

9 | Feature-based similarity search in graph structures
- Yan, Zhu, et al.
- 2006
(Show Context)
Citation Context ...ch over large graph databases becomes an important database research problem. Because of the technical limitation of processing graph search using conventional database technologies, enormous efforts =-=[26, 13, 35, 9, 37]-=- have been put into constructing practical graph searching methods. Given a graph database consisting of n graphs, D = {g1, g2, · · · , gn}, and a query graph q, almost all existing algorithms of proc... |

7 | Fast suboptimal algorithms for the computation of graph edit distance
- Neuhaus, Riesen, et al.
- 2006
(Show Context)
Citation Context ...for obtaining the upper bound will consider only vertex edit term, i.e., without considering structural information in graphs. This solution is therefore not applicable in practice. The other methods =-=[22, 32]-=- cannot provide lower bounds, and use heuristic algorithms to find unbounded suboptimal values. All their computation complexities are hard to analyze and not presented in related papers. Accordingly,... |

5 |
Computational Chemical Graph Theory: Characterization, Enumeration and Generation of Chemical Structures by Computer Methods
- Trinajstic, Knop, et al.
- 1991
(Show Context)
Citation Context ...d as a graph whose vertices correspond to static pages and edges correspond to links between pages[7]; in chem-informatics, labelled graphs are suited to express the connectivity of chemical compounds=-=[29]-=-; in bioinformatics, collections of DNA segments in a cell which interact with each other and with other substances in the cell can be formatted as gene regulatory networks[5, 6, 15]. ∗ This work was ... |

4 | Speeding up graph edit distance computation with a bipartite heuristic
- Riesen, Fankhauser, et al.
- 2007
(Show Context)
Citation Context ...nd lower bounds of graph edit distance in the rest of this paper. 3. RELATED WORK 3.1 Graph Edit Distance There are a number of existing studies addressing the graph edit distance computation problem =-=[22, 25]-=-. All of them fall into two categories: exact algorithms and heuristic algorithms. The most widely used method for computing exact graph edit distance is based on the well-known A* algorithm[12], and ... |

1 | An incrementally maintainable index for approximate lookups in hierarchical data
- Augsten, Böhlen, et al.
- 2006
(Show Context)
Citation Context ...is similar to q within a user-specified threshold based on some similarity measures [23, 31]. As can be seen, manipulating graph data based on structural similarity is essential for many applications =-=[36, 2, 3]-=-. A number of graph similarity measures therefore have been proposed [20, 8, 10, 24]. Among these, graph edit distance has been widely accepted as a similarity measure for representing the distances b... |

1 |
Efficient matching and indexing of graph models in retrieval
- Berretti, Bimbo, et al.
(Show Context)
Citation Context ... query graph q, almost all existing algorithms of processing graph search can be classified into the following three categories: 1. full graph search: find all graphs gi in D s.t. gi is the same as q =-=[4]-=-; 2. subgraph search: find all graphs gi in D containing q [5, 26, 27, 37] or contained by q[9]. 3. similarity search: find all graphs gi in D s.t. gi is similar to q within a user-specified threshold... |