## Evolving Stochastic Context-Free Grammars from Examples Using a Minimum Description Length Principle (0)

Venue: | Paper presented at the Workshop on Automata Induction Grammatical Inference and Language Acquisition, ICML-97 |

Citations: | 17 - 0 self |

### BibTeX

@INPROCEEDINGS{Keller_evolvingstochastic,

author = {Bill Keller and Rudi Lutz},

title = {Evolving Stochastic Context-Free Grammars from Examples Using a Minimum Description Length Principle},

booktitle = {Paper presented at the Workshop on Automata Induction Grammatical Inference and Language Acquisition, ICML-97},

year = {},

pages = {09--7}

}

### OpenURL

### Abstract

This paper describes an evolutionary approach to the problem of inferring stochastic context-free grammars from finite language samples. The approach employs a genetic algorithm, with a fitness function derived from a minimum description length principle. Solutions to the inference problem are evolved by optimizing the parameters of a covering grammar for a given language sample. We provide details of our fitness function for grammars and present the results of a number of experiments in learning grammars for a range of formal languages. Keywords: grammatical inference, genetic algorithms, language modelling, formal languages, induction, minimum description length. Introduction Grammatical inference (Gold 1978) is a fundamental problem in many areas of artificial intelligence and cognitive science, including speech and language processing, syntactic pattern recognition and automated programming. Although a wide variety of techniques for automated grammatical inference have been devi...

### Citations

6054 |
The .\.falhematica! Theory of Communication
- Shannon, Weaver
- 1949
(Show Context)
Citation Context ...e L(T ) is the number of bits needed to minimally encode the theory T , and L(DjT ) is the number of bits needed to minimally encode the data D given the theory T . From Shannon's information theory (=-=Shannon 1948-=-), we know that if we have a discrete set X of items with a probability distribution P (x) defined over it, then in order to send a message identifying x 2 X we need approximatelysL(x) = \Gamma log 2 ... |

404 |
A formal theory of inductive inference
- Solomonoff
- 1964
(Show Context)
Citation Context ..., more complex ones. Our choice of prior is therefore related to the minimum description length principle of Risannen (Risannen 1978) as well as earlier work on inductive inference due to Solomonoff (=-=Solomonoff 1964-=-). It will be explained in more detail later. A similar approach has been adopted by Chen (Chen 1995), but using greedy heuristic search rather than a genetic algorithm 1 . The MDL Principle The probl... |

373 |
The estimation of stochastic context-free grammars using the Inside–Outside algorithm. Computer Speech and Language
- Lari, Young
- 1990
(Show Context)
Citation Context ...20 0.978 0.913 Figure 2: Results on a number of language learning tasks well with other (non-genetic) techniques for stochastic grammatical inference, for example the work reported by Lari and Young (=-=Lari & Young 1990-=-) using the Inside-Outside algorithm (Baker 1979). However, further investigation is needed and we hope to report on this in a future paper. The main limitation of our approach is the cost involved in... |

306 |
Inductive inference: theory and methods
- Angluin, Smith
- 1983
(Show Context)
Citation Context ...age processing, syntactic pattern recognition and automated programming. Although a wide variety of techniques for automated grammatical inference have been devised (for surveys see (Fu & Booth 1986; =-=Angluin & Smith 1983-=-)) most are subject to limitations which severely restrict their range of application. For example, inference may be limited to grammars for the regular languages, or require access to both positive (... |

268 |
Trainable grammars for speech recognition
- Baker
- 1979
(Show Context)
Citation Context ...e learning tasks well with other (non-genetic) techniques for stochastic grammatical inference, for example the work reported by Lari and Young (Lari & Young 1990) using the Inside-Outside algorithm (=-=Baker 1979-=-). However, further investigation is needed and we hope to report on this in a future paper. The main limitation of our approach is the cost involved in evaluating the fitness of each candidate soluti... |

52 | Bayesian grammar induction for language modelling
- Chen
- 1995
(Show Context)
Citation Context ...of Risannen (Risannen 1978) as well as earlier work on inductive inference due to Solomonoff (Solomonoff 1964). It will be explained in more detail later. A similar approach has been adopted by Chen (=-=Chen 1995-=-), but using greedy heuristic search rather than a genetic algorithm 1 . The MDL Principle The problem of grammar induction from a corpus can be viewed as an instance of a much more general problem --... |

52 |
Language identification
- Gold
- 1967
(Show Context)
Citation Context ... for a range of formal languages. Keywords: grammatical inference, genetic algorithms, language modelling, formal languages, induction, minimum description length. Introduction Grammatical inference (=-=Gold 1978-=-) is a fundamental problem in many areas of artificial intelligence and cognitive science, including speech and language processing, syntactic pattern recognition and automated programming. Although a... |

40 |
Grammatical inference: introduction and survey
- Booth
- 1975
(Show Context)
Citation Context ... speech and language processing, syntactic pattern recognition and automated programming. Although a wide variety of techniques for automated grammatical inference have been devised (for surveys see (=-=Fu & Booth 1986-=-; Angluin & Smith 1983)) most are subject to limitations which severely restrict their range of application. For example, inference may be limited to grammars for the regular languages, or require acc... |

38 | Grammatical Inference from Positive and Negative Samples by Genetic Search: the GIG Method
- Dupont, “Regular
- 1994
(Show Context)
Citation Context ...of genetic algorithms to language identification problems with some success (Zhou & Grefenstette 1986; Wyard 1991; Sen & Janakiraman 1992; Angeline, Saunders & Pollack 1993; Huijsen 1993; Lucas 1993; =-=Dupont 1994-=-; Lankhorst 1994; Dunay, Petry, & Buckles 1994; Schwem & Ost 1995). However, with the exception of work reported by Schwem and Ost (Schwem & Ost 1995), the problem of inferring stochastic language mod... |

11 | W.P.: Regular Language Induction with Genetic Programming - Dunay, Petry, et al. - 1994 |

11 |
Context-free grammar induction using genetic algorithms
- Wyard
- 1991
(Show Context)
Citation Context ...for automated grammatical inference. A number of researchers have already described applications of genetic algorithms to language identification problems with some success (Zhou & Grefenstette 1986; =-=Wyard 1991-=-; Sen & Janakiraman 1992; Angeline, Saunders & Pollack 1993; Huijsen 1993; Lucas 1993; Dupont 1994; Lankhorst 1994; Dunay, Petry, & Buckles 1994; Schwem & Ost 1995). However, with the exception of wor... |

10 | Inference of stochastic regular grammars by massively parallel genetic algorithms
- Schwehm, Ost
- 1995
(Show Context)
Citation Context ...ith some success (Zhou & Grefenstette 1986; Wyard 1991; Sen & Janakiraman 1992; Angeline, Saunders & Pollack 1993; Huijsen 1993; Lucas 1993; Dupont 1994; Lankhorst 1994; Dunay, Petry, & Buckles 1994; =-=Schwem & Ost 1995-=-). However, with the exception of work reported by Schwem and Ost (Schwem & Ost 1995), the problem of inferring stochastic language models has not been addressed. This is surprising in view of the man... |

9 |
Probabilistic and Weighted Grammars
- Salomaa
- 1969
(Show Context)
Citation Context ... be different. Now, whenever we wish to rewrite a nonterminal symbolsX during production of a sentence we can throw the X dice, and use whichever rule comes up to do the rewriting. Weighted grammars (=-=Salomaa 1969-=-) can be considered as a special case of BWGs, where all the biases are set to the same value. It should be noted that any BWG is equivalent to some unbiased grammar. However this does not mean that s... |

7 |
Grammatical inference with a genetic algorithm
- Lankhorst
- 1994
(Show Context)
Citation Context ...gorithms to language identification problems with some success (Zhou & Grefenstette 1986; Wyard 1991; Sen & Janakiraman 1992; Angeline, Saunders & Pollack 1993; Huijsen 1993; Lucas 1993; Dupont 1994; =-=Lankhorst 1994-=-; Dunay, Petry, & Buckles 1994; Schwem & Ost 1995). However, with the exception of work reported by Schwem and Ost (Schwem & Ost 1995), the problem of inferring stochastic language models has not been... |

7 |
Modeling by shortest data description
- Risannen
- 1978
(Show Context)
Citation Context ... reasonable to assume that we should prefer smaller or simpler grammars to larger, more complex ones. Our choice of prior is therefore related to the minimum description length principle of Risannen (=-=Risannen 1978-=-) as well as earlier work on inductive inference due to Solomonoff (Solomonoff 1964). It will be explained in more detail later. A similar approach has been adopted by Chen (Chen 1995), but using gree... |

6 | Learning stochastic context-free grammars from corpora using a genetic algorithm
- Keller, Lutz
- 1997
(Show Context)
Citation Context ...astic regular grammars. The present work tackles the more general problem of inferring stochastic grammars for the class of context-free languages. A preliminary account of our approach was given in (=-=Keller & Lutz 1997-=-). This paper describes a new fitness function for grammars based on a minimum description length principle and presents the results of a number of language learning experiments. Stochastic Context-Fr... |

5 | An Evolutionary Algorithm that Evolves Recurrent Neural Networks - Angeline, Saunders, et al. - 1994 |

5 |
Genetic grammatical inference: Induction of pushdown automata and context-free grammars from examples using genetic algorithms
- Huijsen
- 1993
(Show Context)
Citation Context ...dy described applications of genetic algorithms to language identification problems with some success (Zhou & Grefenstette 1986; Wyard 1991; Sen & Janakiraman 1992; Angeline, Saunders & Pollack 1993; =-=Huijsen 1993-=-; Lucas 1993; Dupont 1994; Lankhorst 1994; Dunay, Petry, & Buckles 1994; Schwem & Ost 1995). However, with the exception of work reported by Schwem and Ost (Schwem & Ost 1995), the problem of inferrin... |

4 |
Learning to construct pushdown automata for accepting deterministic context-free languages
- Sen, Janakiraman
- 1992
(Show Context)
Citation Context ...d grammatical inference. A number of researchers have already described applications of genetic algorithms to language identification problems with some success (Zhou & Grefenstette 1986; Wyard 1991; =-=Sen & Janakiraman 1992-=-; Angeline, Saunders & Pollack 1993; Huijsen 1993; Lucas 1993; Dupont 1994; Lankhorst 1994; Dunay, Petry, & Buckles 1994; Schwem & Ost 1995). However, with the exception of work reported by Schwem and... |

3 |
Biased chromosomes for grammatical inference
- Lucas
- 1993
(Show Context)
Citation Context ...pplications of genetic algorithms to language identification problems with some success (Zhou & Grefenstette 1986; Wyard 1991; Sen & Janakiraman 1992; Angeline, Saunders & Pollack 1993; Huijsen 1993; =-=Lucas 1993-=-; Dupont 1994; Lankhorst 1994; Dunay, Petry, & Buckles 1994; Schwem & Ost 1995). However, with the exception of work reported by Schwem and Ost (Schwem & Ost 1995), the problem of inferring stochastic... |

2 |
A universal prior for integers and estimation by minimum description length
- Risannen
- 1983
(Show Context)
Citation Context ...les, where the length of the code for a rule is given by the length of a code for r plus the length of a code for w. To encode a weight w we use one of a family of prefix codes for the integers (see (=-=Risannen 1983-=-) for more details) which all form good approximations to the minimal encoding. These codes represent an integer by a code for the integer itself, preceded by a code for its length. The code for the i... |

1 |
A new crossover operator for rapid function optimisation using a genetic algorithm CSRP-446 School of Cognitive and Computing Sciences
- Keller, Lutz
- 1996
(Show Context)
Citation Context ... still get a chance at breeding while useful genetic material from the weakest parent may survive through the fittest child. Further details of the genetic algorithm are given in (Keller & Lutz 1997; =-=Keller & Lutz 1996-=-). The Fitness Function In practice, it is not convenient to compute the conditional probability P (GjC) directly as a means of evaluating the fitness of grammars. Instead, the genetic algorithm uses ... |