Results 1 
9 of
9
Combinatorial Bandits
"... We study sequential prediction problems in which, at each time instance, the forecaster chooses a binary vector from a certain fixed set S ⊆ {0, 1} d and suffers a loss that is the sum of the losses of those vector components that equal to one. The goal of the forecaster is to achieve that, in the l ..."
Abstract

Cited by 46 (6 self)
 Add to MetaCart
(Show Context)
We study sequential prediction problems in which, at each time instance, the forecaster chooses a binary vector from a certain fixed set S ⊆ {0, 1} d and suffers a loss that is the sum of the losses of those vector components that equal to one. The goal of the forecaster is to achieve that, in the long run, the accumulated loss is not much larger than that of the best possible vector in the class. We consider the “bandit ” setting in which the forecaster has only access to the losses of the chosen vectors. We introduce a new general forecaster achieving a regret bound that, for a variety of concrete choices of S, is of order √ nd ln S  where n is the time horizon. This is not improvable in general and is better than previously known bounds. We also point out that computationally efficient implementations for various interesting choices of S exist. 1
Bandits, Query Learning, and the Haystack Dimension
"... Motivated by multiarmed bandits (MAB) problems with a very large or even infinite number of arms, we consider the problem of finding a maximum of an unknown target function by querying the function at chosen inputs (or arms). We give an analysis of the query complexity of this problem, under the as ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Motivated by multiarmed bandits (MAB) problems with a very large or even infinite number of arms, we consider the problem of finding a maximum of an unknown target function by querying the function at chosen inputs (or arms). We give an analysis of the query complexity of this problem, under the assumption that the payoff of each arm is given by a function belonging to a known, finite, but otherwise arbitrary function class. Our analysis centers on a new notion of function class complexity that we call the haystack dimension, which is used to prove the approximate optimality of a simple greedy algorithm. This algorithm is then used as a subroutine in a functional MAB algorithm, yielding provably nearoptimal regret. We provide a generalization to the infinite cardinality setting, and comment on how our analysis is connected to, and improves upon, existing results for query learning and generalized binary search. 1
LargeScale Bandit Problems and KWIK Learning
"... We show that parametric multiarmed bandit (MAB) problems with large state and action spaces can be algorithmically reduced to the supervised learning model known as “Knows What It Knows ” or KWIK learning. We give matching impossibility results showing that the KWIKlearnability requirement cannot b ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We show that parametric multiarmed bandit (MAB) problems with large state and action spaces can be algorithmically reduced to the supervised learning model known as “Knows What It Knows ” or KWIK learning. We give matching impossibility results showing that the KWIKlearnability requirement cannot be replaced by weaker supervised learning assumptions. We provide such results in both the standard parametric MAB setting, as well as for a new model in which the action space is finite but growing with time. 1.
On multilabel classification and ranking with partial feedback
 In NIPS
, 2012
"... We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2ndorder descent methods, and relies on upperconfidence bounds to tradeoff exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2ndorder descent methods, and relies on upperconfidence bounds to tradeoff exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show O(T 1/2 log T) regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upperconfidence scheme by contrasting against fullinformation baselines on realworld multilabel datasets, often obtaining comparable performance. 1
On Multilabel Classification and Ranking with Partial Feedback Claudio Gentile
"... We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2ndorder descent methods, and relies on upperconfidence bounds to tradeoff exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates ..."
Abstract
 Add to MetaCart
We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2ndorder descent methods, and relies on upperconfidence bounds to tradeoff exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show O(T 1/2 log T) regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upperconfidence scheme by contrasting against fullinformation baselines on realworld multilabel datasets, often obtaining comparable performance. 1
Journal of Machine Learning Research () Submitted; Published On Multilabel Classification and Ranking with Partial Feedback
"... We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2ndorder descent methods, and relies on upperconfidence bounds to tradeoff exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates ..."
Abstract
 Add to MetaCart
We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2ndorder descent methods, and relies on upperconfidence bounds to tradeoff exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show O(T 1/2 log T) regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upperconfidence scheme by contrasting against fullinformation baselines on diverse realworld multilabel datasets, often obtaining comparable performance. 1.
Scalable Planning and Learning for Multiagent POMDPs: Extended Version
"... Online, samplebased planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows expone ..."
Abstract
 Add to MetaCart
(Show Context)
Online, samplebased planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable approach based on samplebased planning and factored value functions that exploits structure present in many multiagent settings. This approach applies not only in the planning case, but also in the Bayesian reinforcement learning setting. Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems. 1
Scalable Planning and Learning for Multiagent
"... Online, samplebased planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows expon ..."
Abstract
 Add to MetaCart
Online, samplebased planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable approach based on samplebased planning and factored value functions that exploits structure present in many multiagent settings. This approach applies not only in the planning case, but also in the Bayesian reinforcement learning setting. Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems. 1
On Multilabel Classification and Ranking with Bandit Feedback
"... We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2ndorder descent methods, and relies on upperconfidence bounds to tradeoff exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates ..."
Abstract
 Add to MetaCart
We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2ndorder descent methods, and relies on upperconfidence bounds to tradeoff exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show O(T 1/2 log T) regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upperconfidence scheme by contrasting against fullinformation baselines on diverse realworld multilabel data sets, often obtaining comparable performance.