## Structure induction by lossless graph compression

### Cached

### Download Links

- [arxiv.org]
- [people.csail.mit.edu]
- [www.ai.mit.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 6 - 0 self |

### BibTeX

@MISC{Peshkin_structureinduction,

author = {Leonid Peshkin},

title = {Structure induction by lossless graph compression},

year = {}

}

### OpenURL

### Abstract

This work is motivated by the necessity to automate the discovery of structure in vast and evergrowing collection of relational data commonly represented as graphs, for example genomic networks. A novel algorithm, dubbed Graphitour, for structure induction by lossless graph compression is presented and illustrated by a clear and broadly known case of nested structure in a DNA molecule. This work extends to graphs some well established approaches to grammatical inference previously applied only to strings. The bottom-up graph compression problem is related to the maximum cardinality (non-bipartite) maximum cardinality matching problem. The algorithm accepts a variety of graph types including directed graphs and graphs with labeled nodes and arcs. The resulting structure could be used for representation and classification of graphs. 1

### Citations

409 |
Network motifs in the transcriptional regulation network of Escherichia coli
- Shen-Orr, Milo, et al.
- 2002
(Show Context)
Citation Context ...scherichia coli transcriptional regulation network, since it is one of the well known attempts to discover ”network motifs” or building blocks in a graph as over-represented sub-graphs by Alon et al. =-=[9]-=-. In this network there are 423 nodes corresponding to the genes or groups of genes jointly transcribed C C C O O P O OsmalPQ sdaA gltBDF malK_lamB_malM livKHMGF kbl_tdh malZ malT lrp oppABCDF livJ ma... |

376 |
Handbook of Graph Grammars and Computing by Graph Transformation. Vol 1
- Rozenberg
- 1999
(Show Context)
Citation Context ...ested structures of such kind are described by graph grammars. It is outside the scope of this paper to survey a vast literature in the field of graph grammars; please refer to a book by G. Rozenberg =-=[3]-=-sTable 1. Generalized string compression algorithm. Input: initial string; Initialize: empty grammar; Loop: Make a single left-to-right tour of the string, collecting sub-string statistics; Introduce ... |

166 | Identifying Hierarchical Structure in Sequences: A linear-time algorithm
- Nevill-Manning, Witten
- 1997
(Show Context)
Citation Context ...ome well established approaches to grammatical inference previously applied only to strings. Two methods particularly worth mentioning in this context for grammar induction on sequences are Sequitour =-=[6]-=- and ADIOS [8]. We also take inspiration from a wealth of sequence compression algorithms, often unknowingly run daily by all computer users in a form of archival software like pkzip in Unix or expand... |

164 | Substructure Discovery Using Minimum Description Length and Background Knowledge
- Cook, Holder
- 1994
(Show Context)
Citation Context ...ing to some known graph grammar, rather than with inducing such grammar from raw data. The closest work related to the ideas presented here is due to D. Cook, L. Holder and their colleagues (e.g. see =-=[2]-=- and several follow-up papers). Their work however is not concerned with inducing a structure from given graph data. Rather, they induce a flat, context-free grammar, possibly with recursion, which is... |

140 | Hidden markov model induction by bayesian model merging
- Stolcke, Omohundro
- 1993
(Show Context)
Citation Context .... Moreover, the authors present the negative result of running their SUBDUE algorithm on just the kind of biological data we successfully use in this paper. Another remotely similar work is by Stolke =-=[10]-=- in application to inducing hidden Markov models. There are many other works attempting to induce structure from relational data or compress graphs, but none seem to relate closely to the method consi... |

79 |
Implementation of Algorithms for Maximum Matching on Nonbipartite Graphs
- Gabow
- 1973
(Show Context)
Citation Context ... of the algorithm described here is in relating the graph grammar induction to the maximum cardinality non-bipartite matching problem for which there is a polynomial complexity algorithm due to Gabow =-=[4]-=-. The problem is formulated as follows: given a graph, find the largest set of edges, such that no two edges share a node. Figure 2.right illustrates this on an arbitrary non-bipartite graph. Dashed e... |

68 | Unsupervised learning of natural languages
- Solan, Horn, et al.
- 2005
(Show Context)
Citation Context ...lished approaches to grammatical inference previously applied only to strings. Two methods particularly worth mentioning in this context for grammar induction on sequences are Sequitour [6] and ADIOS =-=[8]-=-. We also take inspiration from a wealth of sequence compression algorithms, often unknowingly run daily by all computer users in a form of archival software like pkzip in Unix or expand for Mac OS X.... |

28 | Approximation algorithms for grammar-based compression
- Lehman, Shelat
- 2002
(Show Context)
Citation Context ...ll computer users in a form of archival software like pkzip in Unix or expand for Mac OS X. Let us briefly convey the intuition behind such algorithms, many of which are surveyed by Lehman and Shelat =-=[1]-=-. Although quite different in detail, all algorithms share common principles and have very similar compression ratio and computational complexity bounds. First, one has to remember that all such compr... |

25 | Off-line compression by greedy textual substitution
- Apostolico, Lonardi
- 2000
(Show Context)
Citation Context ...lly, the difference is in how exactly statistics are used to pick which substring will be substituted by a new compound symbol. In some cases, a greedy strategy is used (see e.g. Apostolico & Lonardi =-=[7]-=-), i.e. the substitution which will maximally reduce the size of the encoding at the current step is picked; in other cases, a simple first-come-first-served principle is used and any repetition is im... |

1 | Rozenberg (Ed.) ”Handbook of graph grammars and computing by graph transformation: Foundations”, World Scientific - unknown authors - 1997 |