Results 1  10
of
12
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
"... Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regre ..."
Abstract

Cited by 45 (9 self)
 Add to MetaCart
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GPUCB, an intuitive upperconfidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data, GPUCB compares favorably with other heuristical GP optimization approaches. 1.
Convergence rates of efficient global optimization algorithms
 Journal of Machine Learning Research
, 2011
"... In the efficient global optimization problem, we minimize an unknown function f, using as few observations f(x) as possible. It can be considered a continuumarmedbandit problem, with noiseless data, and simple regret. Expectedimprovement algorithms are perhaps the most popular methods for solving ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
In the efficient global optimization problem, we minimize an unknown function f, using as few observations f(x) as possible. It can be considered a continuumarmedbandit problem, with noiseless data, and simple regret. Expectedimprovement algorithms are perhaps the most popular methods for solving the problem; in this paper, we provide theoretical results on their asymptotic behaviour. Implementing these algorithms requires a choice of Gaussianprocess prior, which determines an associated space of functions, its reproducingkernel Hilbert space (RKHS). When the prior is fixed, expected improvement is known to converge on the minimum of any function in its RKHS. We provide convergence rates for this procedure, optimal for functions of low smoothness, and describe a modified algorithm attaining optimal rates for smoother functions. In practice, however, priors are typically estimated sequentially from the data. For standard estimators, we show this procedure may never find the minimum of f. We then propose alternative estimators, chosen to minimize the constants in the rate of convergence, and show these estimators retain the convergence rates of a fixed prior.
Adaptive submodularity: Theory and applications in active learning and stochastic optimization
 J. Artificial Intelligence Research
, 2011
"... Many problems in artificial intelligence require adaptively making a sequence of decisions with uncertain outcomes under partial observability. Solving such stochastic optimization problems is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive subm ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
Many problems in artificial intelligence require adaptively making a sequence of decisions with uncertain outcomes under partial observability. Solving such stochastic optimization problems is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive submodularity, generalizing submodular set functions to adaptive policies. We prove that if a problem satisfies this property, a simple adaptive greedy algorithm is guaranteed to be competitive with the optimal policy. In addition to providing performance guarantees for both stochastic maximization and coverage, adaptive submodularity can be exploited to drastically speed up the greedy algorithm by using lazy evaluations. We illustrate the usefulness of the concept by giving several examples of adaptive submodular objectives arising in diverse AI applications including management of sensing resources, viral marketing and active learning. Proving adaptive submodularity for these problems allows us to recover existing results in these applications as special cases, improve approximation guarantees and handle natural generalizations. 1.
Contextual Gaussian Process Bandit Optimization
"... How should we design experiments to maximize performance of a complex system, taking into account uncontrollable environmental conditions? How should we select relevant documents (ads) to display, given information about the user? These tasks can be formalized as contextual bandit problems, where at ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
How should we design experiments to maximize performance of a complex system, taking into account uncontrollable environmental conditions? How should we select relevant documents (ads) to display, given information about the user? These tasks can be formalized as contextual bandit problems, where at each round, we receive context (about the experimental conditions, the query), and have to choose an action (parameters, documents). The key challenge is to trade off exploration by gathering data for estimating the mean payoff function over the contextaction space, and to exploit by choosing an action deemed optimal based on the gathered data. We model the payoff function as a sample from a Gaussian process defined over the joint contextaction space, and develop CGPUCB, an intuitive upperconfidence style algorithm. We show that by mixing and matching kernels for contexts and actions, CGPUCB can handle a variety of practical applications. We further provide generic tools for deriving regret bounds when using such composite kernel functions. Lastly, we evaluate our algorithm on two case studies, in the context of automated vaccine design and sensor management. We show that contextsensitive optimization outperforms no or naive use of context. 1
Portfolio Allocation for Bayesian Optimization
"... Bayesian optimization with Gaussian processes has become an increasingly popular tool in the machine learning community. It is efficient and can be used when very little is known about the objective function, making it popular in expensive blackbox optimization scenarios. It uses Bayesian methods t ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Bayesian optimization with Gaussian processes has become an increasingly popular tool in the machine learning community. It is efficient and can be used when very little is known about the objective function, making it popular in expensive blackbox optimization scenarios. It uses Bayesian methods to sample the objective efficiently using an acquisition function which incorporates the posterior estimate of the objective. However, there are several different parameterized acquisition functions in the literature, and it is often unclear which one to use. Instead of using a single acquisition function, we adopt a portfolio of acquisition functions governed by an online multiarmed bandit strategy. We propose several portfolio strategies, the best of which we call GPHedge, and show that this method outperforms the best individual acquisition function. We also provide a theoretical bound on the algorithm’s performance. 1
Hierarchical Knowledge Gradient for Sequential Sampling
"... We propose a sequential sampling policy for noisy discrete global optimization and ranking and selection, in which we aim to efficiently explore a finite set of alternatives before selecting an alternative as best when exploration stops. Each alternative may be characterized by a multidimensional ve ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
We propose a sequential sampling policy for noisy discrete global optimization and ranking and selection, in which we aim to efficiently explore a finite set of alternatives before selecting an alternative as best when exploration stops. Each alternative may be characterized by a multidimensional vector of categorical and numerical attributes and has independent normal rewards. We use a Bayesian probability model for the unknown reward of each alternative and follow a fully sequential sampling policy called the knowledgegradient policy. This policy myopically optimizes the expected increment in the value of sampling information in each time period. We propose a hierarchical aggregation technique that uses the common features shared by alternatives to learn about many alternatives from even a single measurement. This approach greatly reduces the measurement effort required, but it requires some prior knowledge on the smoothness of the function in the form of an aggregation function and computational issues limit the number of alternatives that can be easily considered to the thousands. We prove that our policy is consistent, finding a globally optimal alternative when given enough measurements, and show through simulations that it performs competitively with or significantly better than other policies.
An Unbiased Offline Evaluation of Contextual Bandit Algorithms with Generalized Linear Models
"... Contextual bandit algorithms have become popular tools in online recommendation and advertising systems. Offline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their “partiallabel ” nature. A com ..."
Abstract
 Add to MetaCart
Contextual bandit algorithms have become popular tools in online recommendation and advertising systems. Offline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their “partiallabel ” nature. A common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating the simulator itself is often difficult and modeling bias is usually unavoidably introduced. The purpose of this paper is twofold. First, we review a recently proposed offline evaluation technique. Different from simulatorbased approaches, the method is completely datadriven, is easy to adapt to different applications, and more importantly, provides provably unbiased evaluations. We argue for the wide use of this technique as standard practice when comparing bandit algorithms in reallife problems. Second, as an application of this technique, we compare and validate a number of new algorithms based on generalized linear models. Experiments using real Yahoo! data suggest substantial improvement over algorithms with linear models when the rewards are binary.
Author manuscript, published in "FOGA Foundations of Genetic Algorithms XII 2013 (2013)" Noisy Optimization Complexity Under Locality Assumption
, 2013
"... In spite of various recent publications on the subject, there are still gaps between upper and lower bounds in evolutionary optimization for noisy objective function. In this paper we reduce the gap, and get tight bounds within logarithmic factors in the case of small noise and no longdistance infl ..."
Abstract
 Add to MetaCart
In spite of various recent publications on the subject, there are still gaps between upper and lower bounds in evolutionary optimization for noisy objective function. In this paper we reduce the gap, and get tight bounds within logarithmic factors in the case of small noise and no longdistance influence on the objective function.
Author manuscript, published in "The 29th Conference on Uncertainty in Artificial Intelligence (2013)" FiniteTime Analysis of Kernelised Contextual Bandits
, 2013
"... We tackle the problem of online reward maximisation over a large finite set of actions described by their contexts. We focus on the case when the number of actions is too big to sample all of them even once. However we assume that we have access to the similarities between actions ’ contexts and tha ..."
Abstract
 Add to MetaCart
We tackle the problem of online reward maximisation over a large finite set of actions described by their contexts. We focus on the case when the number of actions is too big to sample all of them even once. However we assume that we have access to the similarities between actions ’ contexts and that the expected reward is an arbitrary linear function of the contexts ’ images in the related reproducing kernel Hilbert space (RKHS). We propose KernelUCB, a kernelised UCB algorithm, and give a cumulative regret bound through a frequentist analysis. For contextual bandits, the related algorithm GPUCB turns out to be a special case of our algorithm, and our finitetime analysis improves the regret bound of GPUCB for the agnostic case, both in the terms of the kerneldependent quantity and the RKHS norm of the reward function. Moreover, for the linear kernel, our regret bound matches the lower bound for contextual linear bandits. 1