Results 1  10
of
18
Convergence rates of efficient global optimization algorithms
 Journal of Machine Learning Research
, 2011
"... In the efficient global optimization problem, we minimize an unknown function f, using as few observations f(x) as possible. It can be considered a continuumarmedbandit problem, with noiseless data, and simple regret. Expectedimprovement algorithms are perhaps the most popular methods for solving ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
In the efficient global optimization problem, we minimize an unknown function f, using as few observations f(x) as possible. It can be considered a continuumarmedbandit problem, with noiseless data, and simple regret. Expectedimprovement algorithms are perhaps the most popular methods for solving the problem; in this paper, we provide theoretical results on their asymptotic behaviour. Implementing these algorithms requires a choice of Gaussianprocess prior, which determines an associated space of functions, its reproducingkernel Hilbert space (RKHS). When the prior is fixed, expected improvement is known to converge on the minimum of any function in its RKHS. We provide convergence rates for this procedure, optimal for functions of low smoothness, and describe a modified algorithm attaining optimal rates for smoother functions. In practice, however, priors are typically estimated sequentially from the data. For standard estimators, we show this procedure may never find the minimum of f. We then propose alternative estimators, chosen to minimize the constants in the rate of convergence, and show these estimators retain the convergence rates of a fixed prior.
The KnowledgeGradient Algorithm for Sequencing Experiments in Drug Discovery
 INFORMS J. on Computing
, 2010
"... We present a new technique for adaptively choosing the sequence of molecular compounds to test in drug discovery. Beginning with a base compound, we consider the problem of searching for a chemical derivative of the molecule that best treats a given disease. The problem of choosing molecules to test ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
We present a new technique for adaptively choosing the sequence of molecular compounds to test in drug discovery. Beginning with a base compound, we consider the problem of searching for a chemical derivative of the molecule that best treats a given disease. The problem of choosing molecules to test to maximize the expected quality of the best compound discovered may be formulated mathematically as a rankingandselection problem in which each molecule is an alternative. We apply a recently developed algorithm, known as the knowledgegradient algorithm, that uses correlations in our Bayesian prior distribution between the performance of different alternatives (molecules) to dramatically reduce the number of molecular tests required, but it has heavy computational requirements that limit the number of possible alternatives to a few thousand. We develop computational improvements that allow the knowledgegradient method to consider much larger sets of alternatives, and we demonstrate the method on a problem with 87,120 alternatives.
Information collection on a graph
, 2010
"... We derive a knowledge gradient policy for an optimal learning problem on a graph, in which we use sequential measurements to refine Bayesian estimates of individual edge values in order to learn about the best path. This problem differs from traditional ranking and selection, in that the implementat ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
We derive a knowledge gradient policy for an optimal learning problem on a graph, in which we use sequential measurements to refine Bayesian estimates of individual edge values in order to learn about the best path. This problem differs from traditional ranking and selection, in that the implementation decision (the path we choose) is distinct from the measurement decision (the edge we measure). Our decision rule is easy to compute, and performs competitively against other learning policies, including a Monte Carlo adaptation of the knowledge gradient policy for ranking and selection. 1
Hierarchical Knowledge Gradient for Sequential Sampling
"... We propose a sequential sampling policy for noisy discrete global optimization and ranking and selection, in which we aim to efficiently explore a finite set of alternatives before selecting an alternative as best when exploration stops. Each alternative may be characterized by a multidimensional ve ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
We propose a sequential sampling policy for noisy discrete global optimization and ranking and selection, in which we aim to efficiently explore a finite set of alternatives before selecting an alternative as best when exploration stops. Each alternative may be characterized by a multidimensional vector of categorical and numerical attributes and has independent normal rewards. We use a Bayesian probability model for the unknown reward of each alternative and follow a fully sequential sampling policy called the knowledgegradient policy. This policy myopically optimizes the expected increment in the value of sampling information in each time period. We propose a hierarchical aggregation technique that uses the common features shared by alternatives to learn about many alternatives from even a single measurement. This approach greatly reduces the measurement effort required, but it requires some prior knowledge on the smoothness of the function in the form of an aggregation function and computational issues limit the number of alternatives that can be easily considered to the thousands. We prove that our policy is consistent, finding a globally optimal alternative when given enough measurements, and show through simulations that it performs competitively with or significantly better than other policies.
Knowledgegradient methods for statistical learning
, 2009
"... We consider the class of fully sequential Bayesian information collection problems, a class that includes ranking and selection problems, multiarmed bandit problems, and many others. Although optimal policies for such problems are generally known to exist and to satisfy Bellman’s recursion, the cur ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We consider the class of fully sequential Bayesian information collection problems, a class that includes ranking and selection problems, multiarmed bandit problems, and many others. Although optimal policies for such problems are generally known to exist and to satisfy Bellman’s recursion, the curses of dimensionality prevent us from actually computing them except in a few very special cases. Motivated by this difficulty, we develop a general class of practical and theoretically wellfounded information collection policies known as knowledgegradient (KG) policies. KG policies have several attractive qualities: they are myopically optimal in general; they are asymptotically optimal in a broad class of problems; they are flexible and may be computed easily in a broad class of problems; and they perform well numerically in several wellstudied ranking and selection problems compared with other stateoftheart policies designed specifically for these problems. iii Acknowledgements I am grateful to many people for their help in completing my PhD. First, I would like to thank my advisor, Professor Warren Powell, for his ability to choose problems, his untiring availability for questions, his high expectations, and the wonderful
Bayesian Active Learning With Basis Functions
 in ‘Proceedings of the 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning
, 2011
"... Abstract—A common technique for dealing with the curse of dimensionality in approximate dynamic programming is to use a parametric value function approximation, where the value of being in a state is assumed to be a linear combination of basis functions. Even with this simplification, we face the ex ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract—A common technique for dealing with the curse of dimensionality in approximate dynamic programming is to use a parametric value function approximation, where the value of being in a state is assumed to be a linear combination of basis functions. Even with this simplification, we face the exploration/exploitation dilemma: an inaccurate approximation may lead to poor decisions, making it necessary to sometimes explore actions that appear to be suboptimal. We propose a Bayesian strategy for active learning with basis functions, based on the knowledge gradient concept from the optimal learning literature. The new method performs well in numerical experiments conducted on an energy storage problem. I.
Journal of Machine Learning Research () Submitted; Published Hierarchical Knowledge Gradient for Sequential Sampling
"... Editor: We consider the problem of selecting the best of a finite but very large set of alternatives. Each alternative may be characterized by a multidimensional vector and has independent normal rewards. This problem arises in settings such as (i) ranking and selection, (ii) simulation optimizatio ..."
Abstract
 Add to MetaCart
Editor: We consider the problem of selecting the best of a finite but very large set of alternatives. Each alternative may be characterized by a multidimensional vector and has independent normal rewards. This problem arises in settings such as (i) ranking and selection, (ii) simulation optimization where the unknown mean of each alternative is estimated with stochastic simulation output, and (iii) approximate dynamic programming where we need to estimate values based on MonteCarlo simulation. We use a Bayesian probability model for the unknown reward of each alternative and follow a fully sequential sampling policy called the knowledgegradient policy. This policy myopically optimizes the expected increment in the value of sampling information in each time period. Because the number of alternatives is large, we propose a hierarchical aggregation technique that uses the common features shared by alternatives to learn about many alternatives from even a single measurement, thus greatly reducing the measurement effort required. We demonstrate how this hierarchical knowledgegradient policy can be applied to efficiently maximize a continuous function and prove that this policy finds a globally optimal alternative in the limit.
Hierarchical KnowledgeGradient for Sequential Sampling
, 2009
"... We consider the problem of selecting the best of a finite but very large set of alternatives. Each alternative may be characterized by a multidimensional vector and has independent normal rewards. This problem arises in various settings such as (i) ranking and selection, (ii) simulation optimizatio ..."
Abstract
 Add to MetaCart
We consider the problem of selecting the best of a finite but very large set of alternatives. Each alternative may be characterized by a multidimensional vector and has independent normal rewards. This problem arises in various settings such as (i) ranking and selection, (ii) simulation optimization where the unknown mean of each alternative is estimated with stochastic simulation output, and (iii) approximate dynamic programming where we need to estimate values based on MonteCarlo simulation. We use a Bayesian probability model for the unknown reward of each alternative and follow a fully sequential sampling policy called the knowledgegradient policy. This policy myopically optimizes the expected increment in the value of sampling information in each time period. Because the number of alternatives is large, we propose a hierarchical aggregation technique that uses the common features shared by alternatives to learn about many alternatives from even a single measurement, thus greatly reducing the measurement effort required. We demonstrate how this hierarchical knowledgegradient policy can be applied to efficiently maximize a continuous function and prove that this policy finds a globally optimal alternative in the limit.
A Monte Carlo Knowledge Gradient Method For Learning Abatement Potential Of Emissions Reduction Technologies
"... Suppose that we have a set of emissions reduction technologies whose greenhouse gas abatement potential is unknown, and we wish to find an optimal portfolio (subset) of these technologies. Due to the interaction between technologies, the effectiveness of a portfolio can only be observed through expe ..."
Abstract
 Add to MetaCart
Suppose that we have a set of emissions reduction technologies whose greenhouse gas abatement potential is unknown, and we wish to find an optimal portfolio (subset) of these technologies. Due to the interaction between technologies, the effectiveness of a portfolio can only be observed through expensive field implementations. We view this problem as an online optimal learning problem with correlated prior beliefs, where the performance of a portfolio of technologies in one project is used to guide choices for future projects. Given the large number of potential portfolios, we propose a learning policy which uses Monte Carlo sampling to narrow down the choice set to a relatively small number of promising portfolios, and then applies a oneperiod lookahead approach using knowledge gradients to choose among this reduced set. We present experimental evidence that this policy is competitive against other online learning policies that consider the entire choice set. 1