Results 1 - 10
of
15
Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. Information Theory
- IEEE Transactions on
, 2012
"... Abstract—Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low norm in a reproducing kernel Hilbert space. We resolve th ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
(Show Context)
Abstract—Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low norm in a reproducing kernel Hilbert space. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze an intuitive Gaussian process upper confidence bound (- algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data,- compares favorably with other heuristical GP optimization approaches. Index Terms—Bandit problems, Bayesian prediction, experimental design, Gaussian process (GP), information gain,
Adaptive Collective Routing Using Gaussian Process Dynamic Congestion Models
"... We consider the problem of adaptively routing a fleet of cooperative vehicles within a road network in the presence of uncertain and dynamic congestion conditions. To tackle this problem, we first propose a Gaussian Process Dynamic Congestion Model that can effectively characterize both the dynamics ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
We consider the problem of adaptively routing a fleet of cooperative vehicles within a road network in the presence of uncertain and dynamic congestion conditions. To tackle this problem, we first propose a Gaussian Process Dynamic Congestion Model that can effectively characterize both the dynamics and the uncertainty of congestion conditions. Our model is efficient and thus facilitates real-time adaptive routing in the face of uncertainty. Using this congestion model, we develop an efficient algorithm for non-myopic adaptive routing to minimize the collective travel time of all vehicles in the system. A key property of our approach is the ability to efficiently reason about the long-term value of exploration, which enables collectively balancing the exploration/exploitation trade-off for entire fleets of vehicles. We validate our approach based on traffic data from two large Asian cities. We show that our congestion model is effective in modeling dynamic congestion conditions. We also show that our routing algorithm generates significantly faster routes compared to standard baselines, and achieves near-optimal performance compared to an omniscient routing algorithm. We also present the results from a preliminary field study, which showcases the efficacy of our approach.
C.: Hierarchical exploration for accelerating contextual bandits. In: ICML. (2012) ACKNOWLEDGMENTS This work was partially funded by the Army Research Laboratory under Cooperative Agreement #W911NF-10-20061. The views and conclusions are those of the auth
"... Contextual bandit learning is an increasingly popular approach to optimizing recommender systems via user feedback, but can be slow to converge in practice due to the need for exploring a large feature space. In this paper, we propose a coarse-to-fine hierarchical approach for encoding prior knowled ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Contextual bandit learning is an increasingly popular approach to optimizing recommender systems via user feedback, but can be slow to converge in practice due to the need for exploring a large feature space. In this paper, we propose a coarse-to-fine hierarchical approach for encoding prior knowledge that drastically reduces the amount of exploration required. Intuitively, user preferences can be reasonably embedded in a coarse low-dimensional feature space that can be explored efficiently, requiring exploration in the high-dimensional space only as necessary. We introduce a bandit algorithm that explores within this coarse-to-fine spectrum, and prove performance guarantees that depend on how well the coarse space captures the user’s preferences. We demonstrate substantial improvement over conventional bandit algorithms through extensive simulation as well as a live user study in the setting of personalized news recommendation. 1.
On multilabel classification and ranking with partial feedback
- In NIPS
, 2012
"... We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show O(T 1/2 log T) regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upper-confidence scheme by contrasting against full-information baselines on real-world multilabel datasets, often obtaining comparable performance. 1
Gaussian process optimization with mutual information. arXiv preprint arXiv:1311.4825
, 2013
"... In this paper, we analyze a generic algorithm scheme for sequential global opti-mization using Gaussian processes. The upper bounds we derive on the cumulative regret for this generic algorithm improve by an exponential factor the previously known bounds for algorithms like GP-UCB. We also introduce ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper, we analyze a generic algorithm scheme for sequential global opti-mization using Gaussian processes. The upper bounds we derive on the cumulative regret for this generic algorithm improve by an exponential factor the previously known bounds for algorithms like GP-UCB. We also introduce the novel Gaussian Process Mutual Information algorithm (GP-MI), which significantly improves fur-ther these upper bounds for the cumulative regret. We confirm the efficiency of this algorithm on synthetic and real tasks against the natural competitor, GP-UCB, and also the Expected Improvement heuristic. 1
Finite-Time Analysis of Kernelised Contextual Bandits
- THE 29TH CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 2013
"... We tackle the problem of online reward maximisation over a large finite set of actions described by their contexts. We focus on the case when the number of actions is too big to sample all of them even once. However we assume that we have access to the similarities between actions’ contexts and that ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We tackle the problem of online reward maximisation over a large finite set of actions described by their contexts. We focus on the case when the number of actions is too big to sample all of them even once. However we assume that we have access to the similarities between actions’ contexts and that the expected reward is an arbitrary linear function of the contexts ’ images in the related reproducing kernel Hilbert space (RKHS). We propose KernelUCB, a kernelised UCB algorithm, and give a cumulative regret bound through a frequentist analysis. For contextual bandits, the related algorithm GP-UCB turns out to be a special case of our algorithm, and our finite-time analysis improves the regret bound of GP-UCB for the agnostic case, both in the terms of the kernel-dependent quantity and the RKHS norm of the reward function. Moreover, for the linear kernel, our regret bound matches the lower bound for contextual linear bandits.
Taking the Human Out of the Loop: A Review of Bayesian Optimization
"... Big data applications are typically associated with systems involving large numbers of users, massive complex software systems, and large-scale heterogeneous computing and storage architectures. The construction of such systems involves many distributed design choices. The end products (e.g., rec-o ..."
Abstract
- Add to MetaCart
Big data applications are typically associated with systems involving large numbers of users, massive complex software systems, and large-scale heterogeneous computing and storage architectures. The construction of such systems involves many distributed design choices. The end products (e.g., rec-ommendation systems, medical analysis tools, real-time game engines, speech recognizers) thus involves many tunable config-uration parameters. These parameters are often specified and hard-coded into the software by various developers or teams. If optimized jointly, these parameters can result in significant improvements. Bayesian optimization is a powerful tool for the joint optimization of design choices that is gaining great popularity in recent years. It promises greater automation so as to increase both product quality and human productivity. This review paper introduces Bayesian optimization, highlights some of its methodological aspects, and showcases a wide range of applications.
On Multilabel Classification and Ranking with Partial Feedback Claudio Gentile
"... We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates ..."
Abstract
- Add to MetaCart
(Show Context)
We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show O(T 1/2 log T) regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upper-confidence scheme by contrasting against full-information baselines on real-world multilabel datasets, often obtaining comparable performance. 1
Journal of Machine Learning Research () Submitted; Published On Multilabel Classification and Ranking with Partial Feedback
"... We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates ..."
Abstract
- Add to MetaCart
We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show O(T 1/2 log T) regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upper-confidence scheme by contrasting against full-information baselines on diverse real-world multilabel datasets, often obtaining comparable performance. 1.