## Grammar Model-based Program Evolution (2004)

Venue: | In Proceedings of the 2004 IEEE Congress on Evolutionary Computation |

Citations: | 22 - 1 self |

### BibTeX

@INPROCEEDINGS{Shan04grammarmodel-based,

author = {Y. Shan and R. I. Mckay and R. Baxter},

title = {Grammar Model-based Program Evolution},

booktitle = {In Proceedings of the 2004 IEEE Congress on Evolutionary Computation},

year = {2004},

pages = {478--485},

publisher = {IEEE Press}

}

### OpenURL

### Abstract

In Evolutionary Computation, genetic operators, such as mutation and crossover, are employed to perturb individuals to generate the next population. However these fixed, problem independent genetic operators may destroy the subsolution, usually called building blocks, instead of discovering and preserving them. One way to overcome this problem is to build a model based on the good individuals, and sample this model to obtain the next population. There is a wide range of such work in Genetic Algorithms

### Citations

2897 |
Genetic Programming: On the Programming of Computers by Means of Natural Selection
- Koza
- 1992
(Show Context)
Citation Context ...ocus. However, in tree representation, The meaning of the node has to be interpreted in its surrounding context. For example, in one of the standard GP benchmark problems – the Artificial Ant Proble=-=m [35] �-=-�� the node move will have an entirely different effect depending on the current position of the ant. The model chosen for evolving tree structure has to be able to represent this strong local depende... |

313 |
An information measure for classification
- Wallace, Boulton
- 1968
(Show Context)
Citation Context ...d to compare grammars. Therefore, we need to measure whether a specific merge improves the grammar model. We use minimum length encoding inference, usually referred to as Minimum Message Length (MML) =-=[28]-=-, [29] or Minimum Description Length (MDL) [30] to compare grammars. In the remainder of this paper, MML is generally employed to refer to minimum encoding inference. We want to find a grammar which h... |

304 | Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning
- Baluja
- 1994
(Show Context)
Citation Context ... probabilistic model of promising solutions to guide further exploration of the search space. EDA is a cluster of methods, ranging from methods that assume the genes in the chromosome are independent =-=[8]-=-, [3], [4], through others that take into account pairwise interactions [9], [10], [11], to methods that can accurately model even a very complex problem structure with highly overlapping multivariate... |

298 |
Stochastic Complexity and Statistical Inquiry
- Rissanen
- 1989
(Show Context)
Citation Context ...asure whether a specific merge improves the grammar model. We use minimum length encoding inference, usually referred to as Minimum Message Length (MML) [28], [29] or Minimum Description Length (MDL) =-=[30]-=- to compare grammars. In the remainder of this paper, MML is generally employed to refer to minimum encoding inference. We want to find a grammar which has low complexity but can cover training sample... |

279 | A Survey of Optimization by Building and Using Probabilistic Models
- Pelikan, Goldberg, et al.
- 2002
(Show Context)
Citation Context ...n individuals in a population are superior to others, so as tosform inductive hypotheses which explicitly characterize good individuals. The second is Estimation of Distribution Algorithms (EDA) [6], =-=[7]-=-. EDA uses a probabilistic model of promising solutions to guide further exploration of the search space. EDA is a cluster of methods, ranging from methods that assume the genes in the chromosome are ... |

263 |
Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation
- Larrañaga, Lozano
- 2002
(Show Context)
Citation Context ...ood individuals and sample this model to obtain the next population. Recently, this kind of research, on EC guided by inductively learnt models, such as the Estimation of Distribution Algorithm (EDA) =-=[1]-=- and the Probabilistic Model-building Genetic Algorithm (PMBGA) [2], has drawn increasing interest. There are several reasons leading to this increasing interest. Firstly there is the theoretical attr... |

256 | BOA: The Bayesian optimization algorithm
- Pelikan, Goldberg, et al.
- 1999
(Show Context)
Citation Context ...at take into account pairwise interactions [9], [10], [11], to methods that can accurately model even a very complex problem structure with highly overlapping multivariate building blocks [12], [13], =-=[14]-=-, [15]. B. Related Works with GP Style Tree Representation Because of the complexity of GP style tree representation, the amount of work in evolving programs with tree representation is not comparable... |

231 | The compact genetic algorithm
- Harik, Lobo, et al.
- 1999
(Show Context)
Citation Context ...erators and population with a well-formed model makes it possible to understand EC. In some simple cases, EC guided by an inductively learnt model is a quite accurate approximation of conventional EC =-=[3]-=-, [4]. Secondly, in terms of practical usefulness, empirical studies have demonstrated a superior performance by this kind of method. Because of the abandonment of genetic operators (either partially ... |

194 | Linkage learning via probabilistic modeling in the ECGA
- Harik, Lobo, et al.
- 2006
(Show Context)
Citation Context ...ers that take into account pairwise interactions [9], [10], [11], to methods that can accurately model even a very complex problem structure with highly overlapping multivariate building blocks [12], =-=[13]-=-, [14], [15]. B. Related Works with GP Style Tree Representation Because of the complexity of GP style tree representation, the amount of work in evolving programs with tree representation is not comp... |

188 |
Estimation and inference by compact coding
- Wallace, Freeman
- 1987
(Show Context)
Citation Context ...θ) = Γ(n + 1) = n! (8) 1 BC(α1, ..., αC) C� i=1 ˆθ αi−1 i B. Costing of coding probability distribution with Dirichlet prior For number of classes C, number of data n, the cost of coding th=-=is data is [36]: MML( ˆ θ, Dn, -=-αi) = − log P ( ˆ θ)P (Dn| ˆ θ) � F ( ˆ θ) (9) + C 1 (1 + log 2 12 ) where F (k) is the Fisher Information term. For Dirichlet distributions, Fisher Information [37] is F (θ) = nC−1 � ... |

131 | MIMIC: Finding Optima by Estimating Probability Densities
- Bonet, Isbell, et al.
- 1997
(Show Context)
Citation Context ... the search space. EDA is a cluster of methods, ranging from methods that assume the genes in the chromosome are independent [8], [3], [4], through others that take into account pairwise interactions =-=[9]-=-, [10], [11], to methods that can accurately model even a very complex problem structure with highly overlapping multivariate building blocks [12], [13], [14], [15]. B. Related Works with GP Style Tre... |

126 | Bayesian Learning of Probabilistic Language Models
- Stolcke
- 1994
(Show Context)
Citation Context ...ent attached to each rule (more accurately to each RHS). We choose the RHS based on its probability. 2) Learning method: We use a specific-to-general method to search for a good grammar, motivated by =-=[27]-=-. We start from a very specialized grammar which covers only the training examples (selected superior individuals). The merge operator is employed among the rules to generalize the initial grammar. Th... |

121 | Grammatical evolution: Evolving programs for an arbitrary language
- Ryan, Collins, et al.
- 1998
(Show Context)
Citation Context ... In this research, Stochastic Context-free Grammars (SCFG) are chosen as the model. Previously in GP, grammars, in particular Context-free Grammars (CFG), have been used to constrain the search space =-=[25]-=-, [26]. However, since grammar is a formal model for language, both natural and formal language, it can function as more than just the constraint of the search space. If we take the GP individual as a... |

114 | Using optimal dependency-trees for combinatorial optimization: Learning the structure of the search space
- Baluja, Davies
- 1997
(Show Context)
Citation Context ...search space. EDA is a cluster of methods, ranging from methods that assume the genes in the chromosome are independent [8], [3], [4], through others that take into account pairwise interactions [9], =-=[10]-=-, [11], to methods that can accurately model even a very complex problem structure with highly overlapping multivariate building blocks [12], [13], [14], [15]. B. Related Works with GP Style Tree Repr... |

105 | Minimum message length and Kolmogorov complexity
- Wallace, Dowe
- 1999
(Show Context)
Citation Context ...ompare grammars. Therefore, we need to measure whether a specific merge improves the grammar model. We use minimum length encoding inference, usually referred to as Minimum Message Length (MML) [28], =-=[29]-=- or Minimum Description Length (MDL) [30] to compare grammars. In the remainder of this paper, MML is generally employed to refer to minimum encoding inference. We want to find a grammar which has low... |

90 | The bivariate marginal distribution algorithm
- Pelikan, Mühlenbein
- 1999
(Show Context)
Citation Context ... space. EDA is a cluster of methods, ranging from methods that assume the genes in the chromosome are independent [8], [3], [4], through others that take into account pairwise interactions [9], [10], =-=[11]-=-, to methods that can accurately model even a very complex problem structure with highly overlapping multivariate building blocks [12], [13], [14], [15]. B. Related Works with GP Style Tree Representa... |

84 |
Bayesian Optimization Algorithm: From Single Level to Hierarchy, Ph
- Pelikan
- 2002
(Show Context)
Citation Context .... Recently, this kind of research, on EC guided by inductively learnt models, such as the Estimation of Distribution Algorithm (EDA) [1] and the Probabilistic Model-building Genetic Algorithm (PMBGA) =-=[2]-=-, has drawn increasing interest. There are several reasons leading to this increasing interest. Firstly there is the theoretical attraction. The highly complex and dynamic effects of genetic operator ... |

79 | A Theory of Learning Classification Rules
- Buntine
- 1990
(Show Context)
Citation Context ...probabilities. We prefer very skewed probability distributions to uniform distribution because this means we would have less uncertainty. This preference can be modeled by a symmetric Dirichlet prior =-=[32]. The last term can-=- be calculated using the following equation. MML( ˆ θ, Dn, αi) = − C� log ˆ θ i=1 1 αi+ni− 2 i + log BC(α1, . . . , αC) + 1 (C − 1) log n 2 + C 1 (1 + log 2 12 ) where αi(i �= 0) ar... |

61 | Probabilistic incremental program evolution
- Salustowicz, Schmidhuber
- 1997
(Show Context)
Citation Context ...s a strong connection with EDA, i.e. evolving linear solutions with the guidance of a probabilistic model. It includes the following three projects. Probabilistic Incremental Program Evolution (PIPE) =-=[16]-=- combines probability vector coding of program instructions, Population-Based Incremental Learning [8], and tree-coded programs. PIPE iteratively generates successive populations of functional program... |

60 | Grammatically-based genetic programming
- Whigham
- 1995
(Show Context)
Citation Context ...higham’s very early work [19], ant-TAG [20] and Program Evolution with Explicit Learning (PEEL) [21]. This second kind of work has some connection with Grammar Guided Genetic Programming (GGGP) [22]=-=, [23]-=-, [24], i.e. using a grammar to constrain search space. The individual GP tree in GGGP must respect the grammar. This overcomes the closure problem in GP and provides a more formalized mechanism for t... |

53 | Bayesian grammar induction for language modeling
- Chen
- 1995
(Show Context)
Citation Context ...equires the statement of the inferred SCFG. Although there are some rough estimates of L(G) in the literature, they are not adequate for our purpose as they over-estimate the grammar complexity [27], =-=[31].-=- In the literature, this overestimation is compensated by a fudge factor α, which is tuned to fit the data. However direct application of this method in GMPE led to instability in the algorithm, with... |

40 | Probabilistic model building and competent genetic programming
- Sastry, Goldberg
(Show Context)
Citation Context ...rograms. Each iteration uses the best program to refine the distribution. Thus, the structures of promising individuals are learnt and encoded in the PPT. Extended Compact Genetic Programmming (ECGP) =-=[17]-=- is a direct application of ECGA [13] in tree representation. Marginal product models (MPMs) are used to model the population of genetic programming. MPMs are formed as a product of marginal distribut... |

40 |
On using syntactic constraints with genetic programming
- Gruau
- 1996
(Show Context)
Citation Context ...’s very early work [19], ant-TAG [20] and Program Evolution with Explicit Learning (PEEL) [21]. This second kind of work has some connection with Grammar Guided Genetic Programming (GGGP) [22], [23]=-=, [24]-=-, i.e. using a grammar to constrain search space. The individual GP tree in GGGP must respect the grammar. This overcomes the closure problem in GP and provides a more formalized mechanism for typing ... |

38 |
The Royal Tree Problem, A Benchmark for Single and Multiple Population GP
- Punch
- 1996
(Show Context)
Citation Context ... LHS, ni is the frequency of the i-th RHS, n = �C i=1 ni, and BC() is the Beta function. Please refer to the Appendix for details. A. Royal Tree Problem III. EXPERIMENTAL STUDY The Royal Tree Proble=-=m [33]-=- was designed to be a difficult problem for GP. In this experiment, we use the level-e Royal Tree Problem. This problem has five nonterminals a, b, c, d and e with arity 1,2,3,4,5 respectively and one... |

36 | Learnable Evolution Model: Evolutionary Processes Guided by
- Michalski
(Show Context)
Citation Context ...ng linear solutions guided by an inductively learnt model; they differ mainly in the models chosen. Among them, we perceive the following two streams. The first is the Learnable Evolution Model (LEM) =-=[5]-=-. The central engine of evolution in LEM is a machine learning mode, which creates new populations by employing hypotheses about high fitness individuals found in past populations. The machine learnin... |

35 | An analysis of the MAX problem in genetic programming
- Langdon, Poli
- 1997
(Show Context)
Citation Context ...0 TABLE I COMPARISON OF GP AND GMPE ON ROYAL TREE PROBLEM. Method No.Eval/Gen Gen No. Eval./Run Succeed Speedup GMPE 30 2574 77,220 50.9% 22.7 GP 3500 500 1,750,000 50% B. Max Problem The Max problem =-=[34] -=-has only one terminal x with value 0.5 and two nonterminals, + and ×. The purpose is to find a tree with maximum fitness under some tree size constraint. In this experiment, we use the maximum depth ... |

26 |
P.: Global optimization with Bayesian networks
- Etxeberria, Larrañaga
- 1999
(Show Context)
Citation Context ...e into account pairwise interactions [9], [10], [11], to methods that can accurately model even a very complex problem structure with highly overlapping multivariate building blocks [12], [13], [14], =-=[15]-=-. B. Related Works with GP Style Tree Representation Because of the complexity of GP style tree representation, the amount of work in evolving programs with tree representation is not comparable with ... |

22 | The Factorized Distribution Algorithm for additively decomposed functions
- Mühlenbein, Mahnig
- 1999
(Show Context)
Citation Context ...gh others that take into account pairwise interactions [9], [10], [11], to methods that can accurately model even a very complex problem structure with highly overlapping multivariate building blocks =-=[12]-=-, [13], [14], [15]. B. Related Works with GP Style Tree Representation Because of the complexity of GP style tree representation, the amount of work in evolving programs with tree representation is no... |

18 | Program evolution with explicit learning: a new framework for program automatic synthesis
- Shan, McKay, et al.
(Show Context)
Citation Context ...s, is an ideal formalism for modeling GP-style tree structure. The grammar-model-based methods include Whigham’s very early work [19], ant-TAG [20] and Program Evolution with Explicit Learning (PEEL=-=) [21]-=-. This second kind of work has some connection with Grammar Guided Genetic Programming (GGGP) [22], [23], [24], i.e. using a grammar to constrain search space. The individual GP tree in GGGP must resp... |

16 |
From recombination of the genes to the estimation of distribution
- Muehlenbein
- 1996
(Show Context)
Citation Context ...rs and population with a well-formed model makes it possible to understand EC. In some simple cases, EC guided by an inductively learnt model is a quite accurate approximation of conventional EC [3], =-=[4]-=-. Secondly, in terms of practical usefulness, empirical studies have demonstrated a superior performance by this kind of method. Because of the abandonment of genetic operators (either partially or co... |

16 | Inductive bias and genetic programming
- Whigham
- 1995
(Show Context)
Citation Context ...is widely used to model the internal hierarchical structure of sentences, is an ideal formalism for modeling GP-style tree structure. The grammar-model-based methods include Whigham’s very early wor=-=k [19]-=-, ant-TAG [20] and Program Evolution with Explicit Learning (PEEL) [21]. This second kind of work has some connection with Grammar Guided Genetic Programming (GGGP) [22], [23], [24], i.e. using a gram... |

15 | Estimation of distribution programming based on bayesian network
- Yanai, Iba
- 2003
(Show Context)
Citation Context ...totype tree into subtrees and builds the probabilistic models for each subtree. Apparently, the subtrees are taken as independent probabilistic variables. Estimation of Distribution Programming (EDP) =-=[18]-=- tries to model the dependency of adjacent nodes in GP tree. Although there are a few possible dependencies among the adjacent nodes, only the conditional probability of a child node given a parent no... |

13 | Anttag: a new method to compose computer programs using colonies of ants
- Abbass, Hoai, et al.
- 2002
(Show Context)
Citation Context ... to model the internal hierarchical structure of sentences, is an ideal formalism for modeling GP-style tree structure. The grammar-model-based methods include Whigham’s very early work [19], ant-TA=-=G [20]-=- and Program Evolution with Explicit Learning (PEEL) [21]. This second kind of work has some connection with Grammar Guided Genetic Programming (GGGP) [22], [23], [24], i.e. using a grammar to constra... |

1 |
Wong and Kwong Sak Leung. Evolutionary program induction directed by logic grammars
- Leung
- 1997
(Show Context)
Citation Context ...lude Whigham’s very early work [19], ant-TAG [20] and Program Evolution with Explicit Learning (PEEL) [21]. This second kind of work has some connection with Grammar Guided Genetic Programming (GGGP=-=) [22]-=-, [23], [24], i.e. using a grammar to constrain search space. The individual GP tree in GGGP must respect the grammar. This overcomes the closure problem in GP and provides a more formalized mechanism... |

1 |
Multistate and Multinomial Distributions
- Allison
- 2001
(Show Context)
Citation Context ...oding this data is [36]: MML( ˆ θ, Dn, αi) = − log P ( ˆ θ)P (Dn| ˆ θ) � F ( ˆ θ) (9) + C 1 (1 + log 2 12 ) where F (k) is the Fisher Information term. For Dirichlet distributions, Fisher=-= Information [37] is F (θ) = nC−1-=- � C i=1 θi where n is the number of data items. Putting it all together gives: where ˆ θi = ni+αi n+α0 . MML( ˆ θ, Dn, αi) = − C� log ˆ θ i=1 C. Example for Grammar Fragment Given gra... |