## Frequent Subgraph Discovery (2001)

### Cached

### Download Links

- [www-users.itlabs.umn.edu]
- [www.cs.umn.edu]
- [www-users.cs.umn.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 335 - 12 self |

### BibTeX

@MISC{Kuramochi01frequentsubgraph,

author = {Michihiro Kuramochi and George Karypis},

title = {Frequent Subgraph Discovery},

year = {2001}

}

### Years of Citing Articles

### OpenURL

### Abstract

Over the years, frequent itemset discovery algorithms have been used to solve various interesting problems. As data mining techniques are being increasingly applied to non-traditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of these domains. An alternate way of modeling the objects in these data sets, is to use a graph to model the database objects. Within that model, the problem of finding frequent patterns becomes that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally efficient algorithm for finding all frequent subgraphs in large graph databases. We evaluated the performance of the algorithm by experiments with synthetic datasets as well as a chemical compound dataset. The empirical results show that our algorithm scales linearly with the number of input transactions and it is able to discover frequent subgraphs from a set of graph transactions reasonably fast, even though we have to deal with computationally hard problems such as canonical labeling of graphs and subgraph isomorphism which are not necessary for traditional frequent itemset discovery.

### Citations

11365 | Computers and Intractability - A Guide to the Theory of NPCompletness. Freeman and Co - Garey, Johnson - 1979 |

2848 | Fast Algorithms for Mining Association Rules
- Agrawal, Srikant
- 1994
(Show Context)
Citation Context ...Introduction Efficient algorithms for finding frequent itemsets—both sequential and non-sequential—in very large transaction databases have been one of the key success stories of data mining resea=-=rch [2, 1, 26, 12, 3, 24]-=-. We can use these itemsets for discovering association rules, for extracting prevalent patterns that exist in the datasets, or for classification. Nevertheless, as data mining techniques have been in... |

1237 | Mining sequential patterns - Agrawal, Srikant - 1995 |

1234 | Mining frequent patterns without candidate generation: A frequent-pattern tree approach
- Han, Pei, et al.
- 2004
(Show Context)
Citation Context ...Introduction Efficient algorithms for finding frequent itemsets—both sequential and non-sequential—in very large transaction databases have been one of the key success stories of data mining resea=-=rch [2, 1, 26, 12, 3, 24]-=-. We can use these itemsets for discovering association rules, for extracting prevalent patterns that exist in the datasets, or for classification. Nevertheless, as data mining techniques have been in... |

254 | H.: An apriori-based algorithm for mining frequent substructures from graph data
- Inokuchi, Washio, et al.
- 1910
(Show Context)
Citation Context ...ethods. Furthermore, these methods can also perform approximate matching when discovering frequent patterns, allowing them to recognize patterns that have slight variations. Recently, Inokuchi et al. =-=[14]-=- presented a computationally efficient algorithm called AGM, that can be used to find all frequent induced subgraphs in a graph database that satisfy a certain minimum support constraint. A subgraph G... |

192 | Scalable Algorithms for Association Mining,” in
- Zaki
(Show Context)
Citation Context ...ch an algorithm for frequent subgraphs, however, is challenging as there is no natural way to build the hash-tree for graphs. For this reason, FSG instead uses Transaction ID (TID) lists, proposed by =-=[8, 19, 27, 25, 26]-=-. In this approach for each frequent subgraph we keep a list of transaction identifiers that support it. Now when we need to compute the frequency of g k+1 , we first compute the intersection of the T... |

128 | Fast vertical mining using diffsets
- Zaki, Gouda
(Show Context)
Citation Context ...Introduction Efficient algorithms for finding frequent itemsets—both sequential and non-sequential—in very large transaction databases have been one of the key success stories of data mining resea=-=rch [2, 1, 26, 12, 3, 24]-=-. We can use these itemsets for discovering association rules, for extracting prevalent patterns that exist in the datasets, or for classification. Nevertheless, as data mining techniques have been in... |

125 |
An algorithm for subgraph isomorphism
- ULLMAN
- 1976
(Show Context)
Citation Context ...ind an isomorphism between g1 and a subgraph of g2. In other words, it is to determine if a graph is included in the other larger graph. A well-known algorithm for subgraph isomorphism is proposed in =-=[22]-=-. As suggested in [10], graph isomorphism can be directly solved in practice, although it is not known to be either in P or in NP-complete. On the other hand, subgraph isomorphism has been proved to b... |

121 | Finding frequent substructures in chemical compounds
- Dehaspe, Toivonen, et al.
- 1998
(Show Context)
Citation Context ...sive, as graph and subgraph isomorphisms play a key role throughout the computations. The power of using graphs to model complex datasets has been recognized by various researchers in chemical domain =-=[21, 20, 7, 5]-=-, computer vision [15, 16], image and object retrieval [6, 9], and machine learning [13, 4, 23]. In particular, Dehaspe et al. [7] applied Inductive Logic Programming (ILP) to obtain frequent patterns... |

97 | Turbo-charging Vertical Mining of Large Databases
- Shenoy
- 2000
(Show Context)
Citation Context ...ch an algorithm for frequent subgraphs, however, is challenging as there is no natural way to build the hash-tree for graphs. For this reason, FSG instead uses Transaction ID (TID) lists, proposed by =-=[8, 19, 27, 25, 26]-=-. In this approach for each frequent subgraph we keep a list of transaction identifiers that support it. Now when we need to compute the frequency of g k+1 , we first compute the intersection of the T... |

89 |
The graph isomorphism disease
- Read, Corneil
- 1977
(Show Context)
Citation Context ..., we can sort itemsets by lexicographic ordering. Clearly this is not applicable to graphs. To get total order of graphs we use canonical labeling. A canonical label is a unique code of a given graph =-=[18, 10]-=-. A graph can be represented in many different ways, depending on the order of its edges or vertices. Nevertheless, canonical labels should be always the same no matter how graphs are represented, as ... |

75 | CHARM: An efficient algorithm for closed association rule mining
- Zaki, Hsiao
- 1999
(Show Context)
Citation Context ...ch an algorithm for frequent subgraphs, however, is challenging as there is no natural way to build the hash-tree for graphs. For this reason, FSG instead uses Transaction ID (TID) lists, proposed by =-=[8, 19, 27, 25, 26]-=-. In this approach for each frequent subgraph we keep a list of transaction identifiers that support it. Now when we need to compute the frequency of g k+1 , we first compute the intersection of the T... |

69 | The graph isomorphism problem
- FORTIN
- 1996
(Show Context)
Citation Context ..., we can sort itemsets by lexicographic ordering. Clearly this is not applicable to graphs. To get total order of graphs we use canonical labeling. A canonical label is a unique code of a given graph =-=[18, 10]-=-. A graph can be represented in many different ways, depending on the order of its edges or vertices. Nevertheless, canonical labels should be always the same no matter how graphs are represented, as ... |

65 | Substructure discovery in the subdue system
- Holder, Cook, et al.
- 1994
(Show Context)
Citation Context ... using graphs to model complex datasets has been recognized by various researchers in chemical domain [21, 20, 7, 5], computer vision [15, 16], image and object retrieval [6, 9], and machine learning =-=[13, 4, 23]-=-. In particular, Dehaspe et al. [7] applied Inductive Logic Programming (ILP) to obtain frequent patterns in the toxicology evaluation problem [21]. ILP has been actively used for predicting carcinoge... |

50 | The predictive toxicology evaluation challenge
- Srinisavan, King, et al.
- 1997
(Show Context)
Citation Context ...sive, as graph and subgraph isomorphisms play a key role throughout the computations. The power of using graphs to model complex datasets has been recognized by various researchers in chemical domain =-=[21, 20, 7, 5]-=-, computer vision [15, 16], image and object retrieval [6, 9], and machine learning [13, 4, 23]. In particular, Dehaspe et al. [7] applied Inductive Logic Programming (ILP) to obtain frequent patterns... |

41 |
Clip: concept learning from inference patterns
- Yoshida, Motoda
- 1995
(Show Context)
Citation Context ... using graphs to model complex datasets has been recognized by various researchers in chemical domain [21, 20, 7, 5], computer vision [15, 16], image and object retrieval [6, 9], and machine learning =-=[13, 4, 23]-=-. In particular, Dehaspe et al. [7] applied Inductive Logic Programming (ILP) to obtain frequent patterns in the toxicology evaluation problem [21]. ILP has been actively used for predicting carcinoge... |

40 | Carcinogenesis predictions using ILP
- Srinivasan, King, et al.
- 1997
(Show Context)
Citation Context ...sive, as graph and subgraph isomorphisms play a key role throughout the computations. The power of using graphs to model complex datasets has been recognized by various researchers in chemical domain =-=[21, 20, 7, 5]-=-, computer vision [15, 16], image and object retrieval [6, 9], and machine learning [13, 4, 23]. In particular, Dehaspe et al. [7] applied Inductive Logic Programming (ILP) to obtain frequent patterns... |

21 | Fast mining of sequential patterns in very large databases
- Zaki
- 1997
(Show Context)
Citation Context |

17 |
A tree projection algorithm for generation of large itemsets for association rules
- Agarwal, Aggarwal, et al.
- 1998
(Show Context)
Citation Context |

17 | Applying the Subdue Substructure Discovery system to the Chemical Toxicity Domain”, University of Texas at
- Chittimoori, Holder, et al.
- 1999
(Show Context)
Citation Context |

13 | Intelligent retrieval of solid models
- Cicirello
- 1999
(Show Context)
Citation Context ...he computations. The power of using graphs to model complex datasets has been recognized by various researchers in chemical domain [21, 20, 7, 5], computer vision [15, 16], image and object retrieval =-=[6, 9]-=-, and machine learning [13, 4, 23]. In particular, Dehaspe et al. [7] applied Inductive Logic Programming (ILP) to obtain frequent patterns in the toxicology evaluation problem [21]. ILP has been acti... |

10 | Comparisons of attributed graph matching algorithms for computer vision - K¨alvi¨ainen, Oja - 1990 |

10 | ChARM: An Ecient Algorithm for Closed association Rule Mining - Zaki, Hsiao - 1999 |

7 | Unifying graph-matching problem with a practical solution
- Chen, Yun
- 1998
(Show Context)
Citation Context ... using graphs to model complex datasets has been recognized by various researchers in chemical domain [21, 20, 7, 5], computer vision [15, 16], image and object retrieval [6, 9], and machine learning =-=[13, 4, 23]-=-. In particular, Dehaspe et al. [7] applied Inductive Logic Programming (ILP) to obtain frequent patterns in the toxicology evaluation problem [21]. ILP has been actively used for predicting carcinoge... |

7 | Fast vertical mining using di sets - Zaki, Gouda - 2003 |

6 |
Data Organizatinon and Access for Efficient Data Mining
- Dunkel, Soparkar
- 1999
(Show Context)
Citation Context |

6 |
Content-based image retrieval with scale-spaced object trees
- Dupplaw, Lewis
(Show Context)
Citation Context ...he computations. The power of using graphs to model complex datasets has been recognized by various researchers in chemical domain [21, 20, 7, 5], computer vision [15, 16], image and object retrieval =-=[6, 9]-=-, and machine learning [13, 4, 23]. In particular, Dehaspe et al. [7] applied Inductive Logic Programming (ILP) to obtain frequent patterns in the toxicology evaluation problem [21]. ILP has been acti... |

4 |
Comparisons of attributed graph matching algorithms for computer vision
- Kälviäinen, Oja
- 1990
(Show Context)
Citation Context ...orphisms play a key role throughout the computations. The power of using graphs to model complex datasets has been recognized by various researchers in chemical domain [21, 20, 7, 5], computer vision =-=[15, 16]-=-, image and object retrieval [6, 9], and machine learning [13, 4, 23]. In particular, Dehaspe et al. [7] applied Inductive Logic Programming (ILP) to obtain frequent patterns in the toxicology evaluat... |

2 |
An efficient A* based algorithm for optimal graph matching applied to computer vision
- Piriyakumar, Levi
- 1998
(Show Context)
Citation Context ...orphisms play a key role throughout the computations. The power of using graphs to model complex datasets has been recognized by various researchers in chemical domain [21, 20, 7, 5], computer vision =-=[15, 16]-=-, image and object retrieval [6, 9], and machine learning [13, 4, 23]. In particular, Dehaspe et al. [7] applied Inductive Logic Programming (ILP) to obtain frequent patterns in the toxicology evaluat... |

1 | Data organizatinon and access for ecient data mining - Dunkel, Soparkar - 1999 |