## Darwinian Evolution in Parallel Universes: A Parallel Genetic Algorithm for Variable Selection

Citations: | 5 - 1 self |

### BibTeX

@MISC{Zhu_darwinianevolution,

author = {Mu Zhu and Hugh A. Chipman},

title = {Darwinian Evolution in Parallel Universes: A Parallel Genetic Algorithm for Variable Selection},

year = {}

}

### OpenURL

### Abstract

The need to identify a few important variables that affect a certain outcome of interest commonly arises in various industrial engineering applications. The genetic algorithm (GA) appears to be a natural tool for solving such a problem. In this article we first demonstrate that the GA is actually not a particularly effective variable selection tool, and then propose a very simple modification. Our idea is to run a number of GAs in parallel without allowing each GA to fully converge, and to consolidate the information from all the individual GAs in the end. We call the resulting algorithm the parallel genetic algorithm (PGA). Using a number of both simulated and real examples, we show that the PGA is an interesting as well as highly competitive and easy-to-use variable selection tool.

### Citations

8422 |
Genetic Algorithms
- Goldberg
- 1989
(Show Context)
Citation Context ...e correct variables have been selected. 1.1 The Genetic Algorithm An interesting heuristic search algorithm well suited for the combinatorial optimization problem is the genetic algorithm (GA) (e.g., =-=Goldberg 1989-=-). Although GA is not widely known among statisticians, it has garnered some interest in this community. Chatterjee, Laudato, and Lynch (1996) gave an introductory review and showed how the GA can be ... |

3972 | M.P.(1983) "Optimization by Simulated Annealing
- Kirkpatrick, Gelatt, et al.
(Show Context)
Citation Context ...., m} 0 otherwise. or, more elaborately, as the Boltzmann probability (see, e.g., Liang and Wong, 2000): pi = e−si/t ∑ e −si/t , where t is a “temperature” parameter much like in simulated annealing (=-=Kirkpatrick et al., 1983-=-). However, it is well-known in the simulated annealing literature that setting the optimal temperature ladder is a crucial step, but remains a delicate and difficult task; some comments to this effec... |

2831 |
Estimating the dimension of a model
- Schwarz
- 1978
(Show Context)
Citation Context ...f what we mean by the “best.” This definition has remained highly controversial; common choices include the Akaike information criterion (AIC) (Akaike 1973), the Bayesian information criterion (BIC) (=-=Schwarz 1978-=-), and the Cp (Mallows 1973), among other alternatives. It is well known (e.g., Kou and Efron 2002) that a solution that is “best” for one criterion (e.g., the AIC) is generally not “best” for a diffe... |

2804 | Bagging predictors
- Breiman
- 1996
(Show Context)
Citation Context ... various control parameters in the GA and the best fitness function to use. We are not thrilled by either option. Instead, we are inspired by the recent success stories of algorithms such as bagging (=-=Breiman 1996-=-), where, by consolidating a number of relatively simpleminded models, such as a large tree grown to a maximal size without any pruning, one can often do better than a carefully constructed model such... |

1797 | Schapire: “Experiments with a New Boosting Algorithm
- Freund, Robert
- 1996
(Show Context)
Citation Context ... is better than a greedy search for a single best solution, an idea that has recently become popular due to the success of algorithms in machine learning such as Bagging (Brieman, 1996) and Boosting (=-=Freund and Schapire, 1996-=-); it also shows that there is more to be gained from parallel computation than simply reducing the amount of computational time. Both of these lessons, we believe, will have far-reaching consequences... |

1748 |
Information theory and an extension of the maximum likelihood principle
- Akaike
- 1973
(Show Context)
Citation Context ...o find the “best” subset, we need a working definition of what we mean by the “best.” This definition has remained highly controversial; common choices include the Akaike information criterion (AIC) (=-=Akaike 1973-=-), the Bayesian information criterion (BIC) (Schwarz 1978), and the Cp (Mallows 1973), among other alternatives. It is well known (e.g., Kou and Efron 2002) that a solution that is “best” for one crit... |

1737 | Generalized Additive Models
- Hastie, Tibshirani
- 1990
(Show Context)
Citation Context ...ional cost will of course be more substantial, because a total of m×N×B models must be fitted, and this can become quite significant if, for example, each model is a generalized additive model (GAM) (=-=Hastie and Tibshirani 1990-=-) that is typically fitted with a rather expensive back-fitting algorithm, but it will still be much easier to implement than, say, SSVS, because we can simply embed a functional call to existing GAM ... |

1336 | Additive logistic regression: A statistical view of boosting - Friedman, Hastie, et al. - 2000 |

1055 | Multivariate Analysis - Mardia, Kent, et al. - 1979 |

513 |
Introduction to probability models
- Ross
- 1980
(Show Context)
Citation Context ... one another, one can think of r( j, 1), r( j, 2), ...,r( j, B) as an independent sample from some underlying distribution, say Fj. The fact that r( j, b) is bounded between 0 and 1 means (see, e.g., =-=Ross 1997-=-, problem 11.33) that v( j, b) ≡ var(r( j, b)) ≤ 1 4 , and hence var(¯rj) ≤ 1 4B . The central limit theorem then implies that for large B, ¯rj is approximately normally distributed with a variance no... |

434 |
Optimization of control parameters for genetic algorithms
- Grefenstette
- 1986
(Show Context)
Citation Context ...ntained and automatic as possible, we prefer to use a fixed set of choices. A number of theoretical and empirical studies on good choices of these tuning parameters are available (e.g., De Jong 1975; =-=Grefenstette 1985-=-; Goldberg 1989, p. 71). The website http://www.mathworks.com/access/ helpdesk/help/toolbox/gads/, maintained by the producer of Matlab, also provides some general guidelines. These sources suggest th... |

415 | Core Team. (2006) R: a language and environment for statistical computing - Development |

373 |
Variable selection via Gibbs sampling
- George, McCulloch
- 1993
(Show Context)
Citation Context ...using both the AIC and the BIC as the selection criteria. We also consider a (completely different) Bayesian stochastic variable selection method known as stochastic search variable selection (SSVS) (=-=George and McCulloch 1993-=-). 4.3.1 Simulation Settings. Our simulation is based on the illustrative example in Section 1.2. In Sections 1.2 and 2.1 we have already presented some evidence that parallel evolution is useful for ... |

133 | Regression by leaps and bounds - Furnival, Wilson - 1974 |

30 |
Bagging Predictors
- Brieman
- 1996
(Show Context)
Citation Context ... number of mediocre solutions is better than a greedy search for a single best solution, an idea that has recently become popular due to the success of algorithms in machine learning such as Bagging (=-=Brieman, 1996-=-) and Boosting (Freund and Schapire, 1996); it also shows that there is more to be gained from parallel computation than simply reducing the amount of computational time. Both of these lessons, we bel... |

28 | Evolutionary Monte Carlo: applications to cp model sampling and change point problem
- LIANG, WONG
- 2000
(Show Context)
Citation Context ...ll of the variables is .5. The resulting problem is generally considered a hard variable selection problem both because of its size ( p = 60) and because of the correlation among the variables (e.g., =-=Liang and Wong 2000-=-). The response variable y is generated as y = β1x1 +···+β60x60 + ɛ, ɛ ∼ N120(0,σ 2 I),σ = 2, (10) where β1 =···=β15 = 0, β16 =···=β30 = 1, β31 =···= β45 = 2, and β46 =···=β60 = 3. In other words, the... |

21 | Instabilities of regression estimates relating air pollution to mortality - McDonald, Schwing - 1973 |

18 | A Bayesian variable selection approach for analyzing designed experiments with complex aliasing - Chipman, Hamada, et al. - 1997 |

12 | Genetic algorithms and their statistical applications: an introduction - CHATTERJEE, LAUDATO, et al. - 1996 |

9 | A Case Study of Stochastic Optimization in Health Policy: Problem Formulation and Preliminary Results
- Draper, Fouskakis
- 2000
(Show Context)
Citation Context ...he correct model; it finds a model that includes a number of extra spurious variables. Other researchers have also had similar unsuccessful experiences with the GA as a variable selection tool (e.g., =-=Draper and Fouskakis 2000-=-). i=1 Table 4. Results for the Illustrative Example Method Selected variables True model 5, 10, 15 Stepwise AIC 1, 4, 5, 6, 7, 9, 10, 12, 14, 15, 16, 17 Stepwise BIC 5, 6, 10, 15 Exhaustive AIC 2, 5,... |

7 |
Adaptive free-knot splines
- Miyata, Shen
- 2003
(Show Context)
Citation Context ...ld also use a more sophisticated strategy by using a large mutation rate in the beginning to move around the search space more aggressively and gradually decreasing the mutation rate over time (e.g., =-=Miyata and Shen 2003-=-), much like how one would set the critical temperature parameter in simulated annealing (Kirkpatrick, Gelatt, and Vecchi 1983). We experimented with various different ways for setting the mutation ra... |

3 | Subset Selection in Regression (2nd - Miller - 2002 |

2 | An Analysis of the Bahavior of a Class of Genetic Adaptive Systems,” unpublished doctoral dissertation - Jong - 1975 |

2 |
Majority-Vote Classifiers: Theory and Applications,” unpublished doctoral dissertation
- James
- 1998
(Show Context)
Citation Context ...formation from a number of mediocre solutions may actually be better than conducting a greedy search for a single best solution. This important idea is sometimes referred to as “majority vote” (e.g., =-=James 1998-=-). The case of PGA further testifies to the importance of such an idea and demonstrates that there is more to be gained from parallel computation than simply reducing the amount of computational time.... |

2 |
Smoothers and the Cp, generalized maximum likelihood and extended exponential criteria: A geometric approach
- KOU, B
- 2002
(Show Context)
Citation Context ...ces include the Akaike information criterion (AIC) (Akaike 1973), the Bayesian information criterion (BIC) (Schwarz 1978), and the Cp (Mallows 1973), among other alternatives. It is well known (e.g., =-=Kou and Efron 2002-=-) that a solution that is “best” for one criterion (e.g., the AIC) is generally not “best” for a different criterion (e.g., the BIC). In particular, the AIC tends to select more variables than is nece... |

1 |
A Systematic Approach to the Analysis
- Filliben, Li
- 1997
(Show Context)
Citation Context ...tcome of interest, y. In Section 4.4 we describe an example in which engineers are interested in determining how the different input bits affect the conversion error in a digital-to-analog converter (=-=Filliben and Li 1997-=-). Let C ={x1, x2,...,xp} be the set containing a total of p possible factors, which are composed of functions of variables, such as main effects and interaction terms. To understand how the outcome y... |