Results 1 - 10
of
25
Parallelizing explorationexploitation tradeoffs with gaussian process bandit optimization
- In In Proc. International Conference on Machine Learning
, 2012
"... How can we take advantage of opportunities for experimental parallelization in exploration-exploitation tradeoffs? In many experimental scenarios, it is often desirable to execute experiments simultaneously or in batches, rather than only performing one at a time. Ad-ditionally, observations may be ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
How can we take advantage of opportunities for experimental parallelization in exploration-exploitation tradeoffs? In many experimental scenarios, it is often desirable to execute experiments simultaneously or in batches, rather than only performing one at a time. Ad-ditionally, observations may be both noisy and expensive. We introduce Gaussian Process Batch Upper Confidence Bound (GP-BUCB), an upper confidence bound-based algorithm, which models the reward function as a sample from a Gaussian process and which can select batches of experiments to run in parallel. We prove a general regret bound for GP-BUCB, as well as the surprising result that for some common kernels, the asymptotic average regret can be made independent of the batch size. The GP-BUCB algorithm is also applicable in the related case of a delay between initiation of an experiment and observation of its re-sults, for which the same regret bounds hold. We also introduce Gaussian Process Adaptive Upper Confidence Bound (GP-AUCB), a variant of GP-BUCB which can exploit parallelism in an adaptive manner. We evaluate GP-BUCB and GP-AUCB on several simulated and real data sets. These experiments show that GP-BUCB and GP-AUCB are competitive with state-of-the-art heuristics.1
Learning to Optimize Via Posterior Sampling
, 2013
"... This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multi-armed bandit problems. The algorithm, also known as Thompson Sampling, offers significant advantages over the popular upper confide ..."
Abstract
-
Cited by 18 (8 self)
- Add to MetaCart
(Show Context)
This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multi-armed bandit problems. The algorithm, also known as Thompson Sampling, offers significant advantages over the popular upper confidence bound (UCB) approach, and can be applied to problems with finite or infinite action spaces and complicated relationships among action rewards. We make two theoretical contributions. The first establishes a connection between posterior sampling and UCB algorithms. This result lets us convert regret bounds developed for UCB algorithms into Bayes risk bounds for posterior sampling. Our second theoretical contribution is a Bayes risk bound for posterior sampling that applies broadly and can be specialized to many model classes. This bound depends on a new notion we refer to as the margin dimension, which measures the degree of dependence among action rewards. Compared to UCB algorithm Bayes risk bounds for specific model classes, our general bound matches the best available for linear models and is stronger than the best available for generalized linear models. Further, our analysis provides insight into performance advantages of posterior sampling, which are highlighted through simulation results that demonstrate performance surpassing recently proposed UCB algorithms. 1
High-dimensional gaussian process bandits.
- In Advances in Neural Information Processing Systems (NIPS),
, 2013
"... Abstract Many applications in machine learning require optimizing unknown functions defined over a high-dimensional space from noisy samples that are expensive to obtain. We address this notoriously hard challenge, under the assumptions that the function varies only along some low-dimensional subsp ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
(Show Context)
Abstract Many applications in machine learning require optimizing unknown functions defined over a high-dimensional space from noisy samples that are expensive to obtain. We address this notoriously hard challenge, under the assumptions that the function varies only along some low-dimensional subspace and is smooth (i.e., it has a low norm in a Reproducible Kernel Hilbert Space). In particular, we present the SI-BO algorithm, which leverages recent low-rank matrix recovery techniques to learn the underlying subspace of the unknown function and applies Gaussian Process Upper Confidence sampling for optimization of the function. We carefully calibrate the exploration-exploitation tradeoff by allocating the sampling budget to subspace estimation and function optimization, and obtain the first subexponential cumulative regret bounds and convergence rates for Bayesian optimization in high-dimensions under noisy observations. Numerical results demonstrate the effectiveness of our approach in difficult scenarios.
Online Learning for Linearly Parametrized Control Problems
, 2012
"... Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
(Show Context)
Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author’s prior written permission.
Parallel Gaussian process optimization with upper confidence bound and pure exploration
- In Machine Learning and Knowledge Discovery in Databases
, 2013
"... Abstract. In this paper, we consider the challenge of maximizing an unknown function f for which evaluations are noisy and are acquired with high cost. An iterative procedure uses the previous measures to ac-tively select the next estimation of f which is predicted to be the most useful. We focus on ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper, we consider the challenge of maximizing an unknown function f for which evaluations are noisy and are acquired with high cost. An iterative procedure uses the previous measures to ac-tively select the next estimation of f which is predicted to be the most useful. We focus on the case where the function can be evaluated in par-allel with batches of fixed size and analyze the benefit compared to the purely sequential procedure in terms of cumulative regret. We introduce the Gaussian Process Upper Confidence Bound and Pure Exploration algorithm (GP-UCB-PE) which combines the UCB strategy and Pure Ex-ploration in the same batch of evaluations along the parallel iterations. We prove theoretical upper bounds on the regret with batches of size K for this procedure which show the improvement of the order of K for fixed iteration cost over purely sequential versions. Moreover, the mul-tiplicative constants involved have the property of being dimension-free. We also confirm empirically the efficiency of GP-UCB-PE on real and synthetic problems compared to state-of-the-art competitors. 1
Eluder dimension and the sample complexity of optimistic exploration
- In Advances in Neural Information Processing Systems
, 2013
"... Abstract This paper considers the sample complexity of the multi-armed bandit with dependencies among the arms. Some of the most successful algorithms for this problem use the principle of optimism in the face of uncertainty to guide exploration. The clearest example of this is the class of upper c ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Abstract This paper considers the sample complexity of the multi-armed bandit with dependencies among the arms. Some of the most successful algorithms for this problem use the principle of optimism in the face of uncertainty to guide exploration. The clearest example of this is the class of upper confidence bound (UCB) algorithms, but recent work has shown that a simple posterior sampling algorithm, sometimes called Thompson sampling, can be analyzed in the same manner as optimistic approaches. In this paper, we develop a regret bound that holds for both classes of algorithms. This bound applies broadly and can be specialized to many model classes. It depends on a new notion we refer to as the eluder dimension, which measures the degree of dependence among action rewards. Compared to UCB algorithm regret bounds for specific model classes, our general bound matches the best available for linear models and is stronger than the best available for generalized linear models.
Modeling Human Decision-making in Generalized Gaussian Multi-armed Bandits
, 2014
"... We present a formal model of human decision-making in explore-exploit tasks using the context of multi-armed bandit problems, where the decision-maker must choose among multiple options with uncertain rewards. We address the standard multi-armed bandit problem, the multi-armed bandit problem with tr ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
We present a formal model of human decision-making in explore-exploit tasks using the context of multi-armed bandit problems, where the decision-maker must choose among multiple options with uncertain rewards. We address the standard multi-armed bandit problem, the multi-armed bandit problem with transition costs, and the multi-armed bandit problem on graphs. We focus on the case of Gaussian rewards in a setting where the decision-maker uses Bayesian inference to estimate the reward values. We model the decision-maker’s prior knowledge with the Bayesian prior on the mean reward. We de-velop the upper credible limit (UCL) algorithm for the standard multi-armed bandit problem and show that this deterministic algorithm achieves logarithmic cumulative expected regret, which is optimal performance for uninformative priors. We show how good priors and good assumptions on the correlation structure among arms can greatly enhance decision-making performance, even over short time horizons. We extend to the stochastic UCL algorithm and draw several connections to human decision-making behavior. We present empirical data from human exper-iments and show that human performance is efficiently captured by the stochastic UCL algorithm with appropriate parameters. For the multi-armed bandit problem with transition costs and the multi-armed bandit problem on graphs, we generalize the UCL algorithm to the block UCL algorithm and the graphical block UCL algorithm, respectively. We show that these algorithms also achieve logarithmic cumulative expected regret and require a sub-logarithmic expected number of transitions among arms. We further illustrate the performance of these algorithms with numerical examples.
An information-theoretic analysis of Thompson sampling.
, 2014
"... Abstract We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bound ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Abstract We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance.
On optimal foraging and multi-armed bandits
- in Proc. 51st Annu. Allerton Conf. Commun. Control Comput
, 2013
"... Abstract—We consider two variants of the standard multi-armed bandit problem, namely, the multi-armed bandit prob-lem with transition costs and the multi-armed bandit problem on graphs. We develop block allocation algorithms for these problems that achieve an expected cumulative regret that is unifo ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
Abstract—We consider two variants of the standard multi-armed bandit problem, namely, the multi-armed bandit prob-lem with transition costs and the multi-armed bandit problem on graphs. We develop block allocation algorithms for these problems that achieve an expected cumulative regret that is uniformly dominated by a logarithmic function of time, and an expected cumulative number of transitions from one arm to another arm uniformly dominated by a double-logarithmic function of time. We observe that the multi-armed bandit prob-lem with transition costs and the associated block allocation algorithm capture the key features of popular animal foraging models in literature. I.
A Stepwise uncertainty reduction approach to constrained global optimization
"... Using statistical emulators to guide sequen-tial evaluations of complex computer exper-iments is now a well-established practice. When a model provides multiple outputs, a typical objective is to optimize one of the out-puts with constraints (for instance, a thresh-old not to exceed) on the values o ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Using statistical emulators to guide sequen-tial evaluations of complex computer exper-iments is now a well-established practice. When a model provides multiple outputs, a typical objective is to optimize one of the out-puts with constraints (for instance, a thresh-old not to exceed) on the values of the other outputs. We propose here a new optimiza-tion strategy based on the stepwise uncer-tainty reduction paradigm, which offers an efficient trade-off between exploration and lo-cal search near the boundaries. The strategy is illustrated on numerical examples. 1