## Substructure Discovery Using Minimum Description Length and Background Knowledge (1994)

### Cached

### Download Links

- [cygnus.uta.edu]
- [www.cs.cmu.edu]
- [www.jair.org]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Artificial Intelligence Research |

Citations: | 164 - 39 self |

### BibTeX

@ARTICLE{Cook94substructurediscovery,

author = {Diane J. Cook and Lawrence B. Holder},

title = {Substructure Discovery Using Minimum Description Length and Background Knowledge},

journal = {Journal of Artificial Intelligence Research},

year = {1994},

volume = {1},

pages = {231--255}

}

### Years of Citing Articles

### OpenURL

### Abstract

The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our Subdue substructure discovery system based on the minimum description length principle. The Subdue system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of Subdue produce a hierarchical description of the structural regularities in the data. Subdue uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by Subdue to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate Subdu...

### Citations

675 | Knowledge acquisition via incremental conceptual clustering
- Fisher
- 1987
(Show Context)
Citation Context ..., and substructure is found across several graphs, not within a single graph as in Subdue. The Labyrinth system (Thompson & Langley, 1991) extends the Cobweb incremental conceptual clustering system (=-=Fisher, 1987-=-) to handle structured objects. Labyrinth uses Cobweb to form hierarchical concepts of the individual objects in the domain based on their primitive attributes. Concepts of structured objects are form... |

378 | Understanding line drawings of scenes with shadows
- Waltz
- 1975
(Show Context)
Citation Context ...epresenting the scene. The graph representation consists of eight types of vertices and two types of arcs (edge and space). The vertex labels (f , a, l, t, k, x, p, and m) follow the Waltz labelings (=-=Waltz, 1975-=-) of junctions of edges in the image and represent the types of vertices shown in Figure 10. An edge arc represents the edge of an object in the image, and a space arc links non-connecting objects tog... |

325 |
Stochastic Complexity in Statistical Inquiry
- Rissanen
(Show Context)
Citation Context ...esentation for background knowledge during the substructure discovery process. 4. Minimum Description Length Encoding of Graphs The minimum description length principle (MDLP) introduced by Rissanen (=-=Rissanen, 1989-=-) states that the best theory to describe a set of data is that theory which minimizes the description length of the entire data set. The MDL principle has been used for decision tree induction (Quinl... |

325 |
Learning structural descriptions from examples
- Winston
- 1975
(Show Context)
Citation Context ...re expansions from consideration when the description lengths for these expansions increases. 3. Related Work Several approaches to substructure discovery have been developed. Winston's Arch program (=-=Winston, 1975-=-) discovers substructures in order to deepen the hierarchical description of a scene and to group objects into more general concepts. The Arch program searches for two types of substructure in the blo... |

305 | Inferring Decision Trees Using the Minimum Description Length Principle - Quinlan, Rivest - 1989 |

259 | Learning from observation: Conceptual clustering - Michalski, Stepp - 1983 |

252 | AutoClass: A Bayesian Classification System - Cheeseman, Kelly, et al. - 1988 |

233 | Constructing simple stable descriptions for image partitioning
- Leclerc
- 1989
(Show Context)
Citation Context ... which minimizes the description length of the entire data set. The MDL principle has been used for decision tree induction (Quinlan & Rivest, 1989), image processing (Pednault, 1989; Pentland, 1989; =-=Leclerc, 1989-=-), concept learning from relational data (Derthick, 1991), and learning models of non-homogeneous engineering domains (Rao & Lu, 1992). We demonstrate how the minimum description length principle can ... |

219 |
Syntattic Pattern Recognition and Applications
- Fu
- 1982
(Show Context)
Citation Context ...aphs and graph grammars as an underlying representation for structural problems (Schalkoff, 1992). Many results in grammatical inference are applicable to constrained classes of graphs (e.g., trees) (=-=Fu, 1982-=-; Miclet, 1986). The approach begins with a set of sample graphs and produces a generalized graph grammar capable of deriving the original sample graphs and many others. The production rules of this g... |

193 |
Pattern recognition : statistical, structural, and neural approaches
- Schalkoff
- 1992
(Show Context)
Citation Context ...in CLiP suggest possible enhancements to Subdue. Research in pattern recognition has begun to investigate the use of graphs and graph grammars as an underlying representation for structural problems (=-=Schalkoff, 1992-=-). Many results in grammatical inference are applicable to constrained classes of graphs (e.g., trees) (Fu, 1982; Miclet, 1986). The approach begins with a set of sample graphs and produces a generali... |

114 | Laws of organization in perceptual forms - Wertheimer - 1937 |

94 |
Stochastic complexity in statistical inquiry,” World Scientific
- Rissanen
- 1989
(Show Context)
Citation Context ...esentation for background knowledge during the substructure discovery process. 4. Minimum Description Length Encoding of Graphs The minimum description length principle (MDLP) introduced by Rissanen (=-=Rissanen, 1989-=-) states that the best theory to describe a set of data is that theory which minimizes the description length of the entire data set. The MDL principle has been used for decision tree induction (Quinl... |

83 | Inexact graph match for structural pattern recognition. pattern recognition letters - Bunke, Allerman - 1983 |

53 | Concept formation in structured domains - Thompson, Langley - 1991 |

32 |
Some experiments in applying inductive inference principles to surface reconstruction
- Pednault
- 1989
(Show Context)
Citation Context ...ibe a set of data is that theory which minimizes the description length of the entire data set. The MDL principle has been used for decision tree induction (Quinlan & Rivest, 1989), image processing (=-=Pednault, 1989-=-; Pentland, 1989; Leclerc, 1989), concept learning from relational data (Derthick, 1991), and learning models of non-homogeneous engineering domains (Rao & Lu, 1992). We demonstrate how the minimum de... |

31 |
Part segmentation for object recognition
- Pentland
- 1989
(Show Context)
Citation Context ...a is that theory which minimizes the description length of the entire data set. The MDL principle has been used for decision tree induction (Quinlan & Rivest, 1989), image processing (Pednault, 1989; =-=Pentland, 1989-=-; Leclerc, 1989), concept learning from relational data (Derthick, 1991), and learning models of non-homogeneous engineering domains (Rao & Lu, 1992). We demonstrate how the minimum description length... |

29 |
A self-organizing retrieval system for graphs
- Levinson
- 1984
(Show Context)
Citation Context ...od is domain independent, although the inclusion of domain-specific knowledge would improve Subdue's performance. Motivated by the need to construct a knowledge base of chemical structures, Levinson (=-=Levinson, 1984-=-) developed a system for storing labeled graphs in which individual graphs Cook & Holder are represented by the set of vertices in a universal graph. In addition, the individual graphs are maintained ... |

15 | Spatial analogy and subsumption - Conklin, Glasgow - 1992 |

12 |
Grammatical Inference Based on Hyperedge Replacement. Graph-Grammars
- Jeltsch, Kreowski
- 1990
(Show Context)
Citation Context ...capable of deriving the original sample graphs and many others. The production rules of this general grammar capture regularities (substructures) in the sample graphs. Jeltsch and Kreowski (Jeltsch & =-=Kreowski, 1991-=-) describe an approach that begins with a maximally-specific grammar and iteratively identifies common subgraphs in the right-hand sides of the production rules. These common subgraphs are used to for... |

11 |
A minimal encoding approach to feature discovery
- Derthick
- 1991
(Show Context)
Citation Context ...ata set. The MDL principle has been used for decision tree induction (Quinlan & Rivest, 1989), image processing (Pednault, 1989; Pentland, 1989; Leclerc, 1989), concept learning from relational data (=-=Derthick, 1991-=-), and learning models of non-homogeneous engineering domains (Rao & Lu, 1992). We demonstrate how the minimum description length principle can be used to discover substructures in complex data. In pa... |

11 |
Graph clustering and model learning by data compression
- Segen
- 1990
(Show Context)
Citation Context ...inson's system is not included in Subdue, but maintaining this partial ordering would improve the performance of the graph matching procedure by pruning the number of possible matching graphs. Segen (=-=Segen, 1990-=-) describes a system for storing graphs using a probabilistic graph model to represent subsets of the graph. Alternative models are evaluated based on a minimum description length measure of the infor... |

10 | Learning engineering models with the minimum description length principle - Rao, Lu |

10 | Unifying learning methods by colored digraphs - Yoshida, Motoda, et al. - 1993 |

6 | Fuzzy substructure discovery - Holder, Cook, et al. - 1992 |

6 | Discovery of inexact concepts from structural data - Lawrence, Cook - 1993 |

6 |
AutoClass: ABayesian Classi cation System
- Cheeseman, Kelly, et al.
- 1988
(Show Context)
Citation Context ... and providing a basis for discovering hierarchically-de ned structures. Future work will combine structural discovery with discovery of concepts using a linearbased representation such as AutoClass (=-=Cheeseman, Kelly, Self, Stutz, Taylor, & Freeman, 1988-=-). In particular, we will use Subdue to compress the data fed to AutoClass, and let Subdue evaluate the interesting structures in the classes generated by AutoClass. In addition, we will be developing... |

3 |
Discrete Mathemetical Structures for
- Prather
- 1976
(Show Context)
Citation Context ...s that human attention is drawn to closed structures (Wertheimer, 1939). A closed substructure has at least as many edges as vertices, whereas a non-closed substructure has fewer edges than vertices (=-=Prather, 1976-=-). Thus, closed substructures have a higher compactness value. Compactness is defined as the weighted average of the ratio of the number of edges in the substructure to the number of vertices in the s... |

2 | Schalko , Pattern Recognition: Statistical, Structural and Neural Approaches, Chap - J - 1992 |