## Using optimal dependency-trees for combinatorial optimization: Learning the structure of the search space (1997)

### Cached

### Download Links

Venue: | Proceedings of the 14th International Conference on Machine Learning |

Citations: | 119 - 2 self |

### BibTeX

@INPROCEEDINGS{Baluja97usingoptimal,

author = {Shumeet Baluja and Scott Davies},

title = {Using optimal dependency-trees for combinatorial optimization: Learning the structure of the search space},

booktitle = {Proceedings of the 14th International Conference on Machine Learning},

year = {1997},

pages = {30--38},

publisher = {Morgan Kaufmann}

}

### Years of Citing Articles

### OpenURL

### Abstract

Many combinatorial optimization algorithms have no mechanism to capture inter-parameter dependencies. However, modeling such dependencies may allow an algorithm to concentrate its sampling more effectively on regions of the search space which have appeared promising in the past. We present an algorithm which incrementally learns second-order probability distributions from good solutions seen so far, uses these statistics to generate optimal (in terms of maximum likelihood) dependency trees to model these distributions, and then stochastically generates new candidate solutions from these trees. We test this algorithm on a variety of optimization problems. Our results indicate superior performance over other tested algorithms that either (1) do not explicitly use these dependencies, or (2) use these dependencies to generate a more restricted class of dependency graphs. Scott Davies was supported by a Graduate Student Research Fellowship from the National Science Foundation. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied of the National Science Foundation. Keywords:

### Citations

7782 |
Genetic Algorithms
- Goldberg
- 1989
(Show Context)
Citation Context ... A, 50% from parent B [Syswerda, 1989]), mutation rate 0.1% (probability of mutating each bit), elitist selection (the best solution from generation g replaces the worst solution in generation g+1 ) [=-=Goldberg, 1989-=-], and population size 200. The GAs used fitness proportional selection (this means that the chances of selecting a member of the population for recombination is proportional to its evaluation). 9/20s... |

1119 | A Bayesian method for the induction of probabilistic networks from data, Machine Learning 9
- Cooper, Herskovits
- 1992
(Show Context)
Citation Context ...milar to the coefficients of Dirichlet distributions – a correspondence which suggests the use of Bayesian scoring metrics in place of information-theoretic ones in the future, such as those used in [=-=Cooper and Herskovits, 1992-=-] and [Heckerman, et al., 1995]. (3) 6/20sINITIALIZATION: For all bits i and j and all binary assignments to a and b, initialize A[X i =a, X j =b] to C init . MAIN LOOP: Repeat until Termination Condi... |

942 | Learning Bayesian networks: the combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...hlet distributions – a correspondence which suggests the use of Bayesian scoring metrics in place of information-theoretic ones in the future, such as those used in [Cooper and Herskovits, 1992] and [=-=Heckerman, et al., 1995-=-]. (3) 6/20sINITIALIZATION: For all bits i and j and all binary assignments to a and b, initialize A[X i =a, X j =b] to C init . MAIN LOOP: Repeat until Termination Condition is Met 1. Generate a depe... |

562 |
Shortest connection networks and some generalizations
- Prim
- 1957
(Show Context)
Citation Context ...tion with each bit still outside the tree, we can perform the entire tree-growing operation in O(n 2 ) time. Because our algorithm is a variant of Prim's algorithm for finding minimum spanning trees [=-=Prim, 1957-=-], it can easily be shown that it constructs a (2) 5/20stree that maximizes the sum n ∑ I( Xm ( i) , Xm ( p( i) ) ) i = 1 which in turn minimizes the Kullback-Leibler divergence D(P||P′), as shown in ... |

180 | Removing the genetics from the standard genetic algorithms
- Baluja, Caruana
- 1995
(Show Context)
Citation Context ...e many of the dependencies which Trees capture, but PBIL and GA cannot. 3.3 The Peaks Set of Problems This set of three problems is based on the four-peaks problem, which was originally presented in [=-=Baluja and Caruana, 1995-=-]. The original four-peaks problem is defined as follows. Given an input vector X, which is composed of N binary elements, maximize the following: Fitness is maximized if a string is able to get both ... |

134 | MIMC: finding optima by estimating probability densities - Bonet, Isbell, et al. - 1997 |

117 |
Learning Gaussian networks
- Geiger, Heckerman
- 1994
(Show Context)
Citation Context ... extended to handle variables with more than two values. We are also currently extending it to handle real-valued variables. There are many opportunities here for exploiting recent research, such as [=-=Geiger and Heckerman, 1994-=-], on learning Bayesian networks for real-valued functions. 18/20s5. ACKNOWLEDGEMENTS The authors would like to gratefully acknowledge the help of Doug Baker, Justin Boyan, Lonnie Chrisman, Greg Coope... |

76 |
Learning Bayesian networks: Search methods and experimental results
- Chickering, Geiger, et al.
- 1995
(Show Context)
Citation Context ...B) is the likelihood of the data D given the network B. Unfortunately, it has been shown that k-LEARN is NP-complete for k > 1 for the 17/20stypes of scoring metrics we would probably wish to employ [=-=Chickering, et al., 1995-=-]. However, search heuristics for finding approximate solutions have been developed for automatically learning Bayesian networks from data [Heckerman, et al., 1995]. A common approach is to perform hi... |

34 | Genetic Algorithms and Explicit Search Statistics
- Baluja
- 1997
(Show Context)
Citation Context ...odel the interparameter dependencies. However, for many optimization problems drawn from the GA literature, the use of these statistics alone allows PBIL to outperform standard GAs and hill-climbers [=-=Baluja, 1997-=-]. Mutual Information Maximization for Input Clustering (MIMIC): This work extends PBIL by (1) capturing some of the pair-wise dependencies between the solution parameters, and (2) providing a statist... |

20 | Using Prediction to Improve Combinatorial Optimization Search”, to Appear: AI/Stats
- Boyan, Moore
- 1997
(Show Context)
Citation Context ...e statistics with the strings generated from the previous model, the model could be updated with the strings returned by the separate search procedure. Somewhat similar methods have been explored by [=-=Boyan & Moore, 1997-=-]. Many ideas used in other combinatorial algorithms can easily be incorporated, such as mutation, weighting the contribution of candidate solutions according to their evaluations, and explicitly reco... |

17 | Convergence Controlled Variation - Eshelman, Mathias, et al. - 1996 |

2 |
Removing the Genetics from
- Baluja, Caruana
- 1995
(Show Context)
Citation Context ...etween 0.0-0.2 and 0.8-1.0. (C) Probabilities between 0.4-0.6. GA 11/20s3.3 The Peaks Set of Problems This set of three problems is based on the four-peaks problem, which was originally presented in [=-=Baluja and Caruana, 1995-=-]. The original four-peaks problem is defined as follows. Given an input vector X, which is composed of N binary elements, maximize the following: FourPeaks ( T, X) = MAX ( head ( 1, X) , tail ( 0, X)... |