Results 1  10
of
10
The Ratio Index for Budgeted Learning, with Applications
, 2008
"... In the budgeted learning problem, we are allowed to experiment on a set of alternatives (given a fixed experimentation budget) with the goal of picking a single alternative with the largest possible expected payoff. Constant factor approximation algorithms for this problem were developed by Guha and ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
(Show Context)
In the budgeted learning problem, we are allowed to experiment on a set of alternatives (given a fixed experimentation budget) with the goal of picking a single alternative with the largest possible expected payoff. Constant factor approximation algorithms for this problem were developed by Guha and Munagala by rounding a linear program that couples the various alternatives together. In this paper we present an index for this problem, which we call the ratio index, which also guarantees a constant factor approximation. Indexbased policies have the advantage that a single number (i.e. the index) can be computed for each alternative irrespective of all other alternatives, and the alternative with the highest index is experimented upon. This is analogous to the famous Gittins index for the discounted multiarmed bandit problem. The ratio index has several interesting structural properties. First, we show that it can be computed in strongly polynomial time. Second, we show that with the appropriate discount factor, the Gittins index and our ratio index are constant factor approximations of each other, and hence the Gittins index also gives a constant factor approximation to the budgeted learning problem. Finally, we show that the ratio index can be used to create an indexbased policy that achieves an O(1)approximation for the finite horizon version of the multiarmed bandit problem. Moreover, the policy does not require any knowledge of the horizon (whereas we compare its performance against an optimal strategy that is aware of the horizon). This yields the following
Sequential design of experiments via linear programming
 Preliminary version appeared in the ACM Symposium on Theory of Computing
, 2007
"... The celebrated multiarmed bandit problem in decision theory models the central tradeoff between exploration, or learning about the state of a system, and exploitation, or utilizing the system. In this paper we study the variant of the multiarmed bandit problem where the exploration phase involves ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
The celebrated multiarmed bandit problem in decision theory models the central tradeoff between exploration, or learning about the state of a system, and exploitation, or utilizing the system. In this paper we study the variant of the multiarmed bandit problem where the exploration phase involves costly experiments and occurs before the exploitation phase; and where each play of an arm during the exploration phase updates a prior belief about the arm. The problem of finding an inexpensive exploration strategy to optimize a certain exploitation objective is NPHard even when a single play reveals all information about an arm, and all exploration steps cost the same. We provide the first polynomial time constantfactor approximation algorithm for this class of problems. We show that this framework also generalizes several problems of interest studied in the context of data acquisition in sensor networks. Our analyses also extends to switching and setup costs, and to concave utility objectives. Our solution approach is via a novel linear program rounding technique based on stochastic packing. In addition to yielding exploration policies whose performance is within a small constant factor of the adaptive optimal policy, a nice feature of this approach is that the resulting policies explore the arms sequentially without revisiting any arm. Sequentiality is a wellstudied paradigm in decision theory, and is very desirable in domains where multiple explorations can be conducted in parallel, for instance, in the sensor network context. 1
Multiarmed bandits with limited exploration
 In Proceedings of the Annual Symposium on Theory of Computing (STOC
, 2007
"... A central problem to decision making under uncertainty is the tradeoff between exploration and exploitation: between learning from and adapting to a stochastic system and exploiting the current bestknowledge about the system. A fundamental decisiontheoretic model that captures this tradeoff is t ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
A central problem to decision making under uncertainty is the tradeoff between exploration and exploitation: between learning from and adapting to a stochastic system and exploiting the current bestknowledge about the system. A fundamental decisiontheoretic model that captures this tradeoff is the celebrated stochastic Multiarm Bandit Problem. In this paper, we consider scenarios where the exploration phase corresponds to designing experiments, and the exploration phase has the following restrictions: (1) it must necessarily precede the exploitation phase; (2) it is expensive in terms of some resource consumed, so that only a limited amount of exploration can be performed; and (3) switching from one experiment to another incurs a setup cost. Such a model, which is termed budgeted learning, is relevant in scenarios such as clinical trials and sensor network data acquisition. Though the classic multiarmed bandit problem admits to a polynomial time greedy optimal solution termed the Gittins index policy, the budgeted learning problem does not admit to such a greedy optimal solution. In fact, the problem is NPHard even in simple settings. Our main contribution is in presenting constant factor approximation algorithms for this problem via a novel linear program rounding technique based on stochastic packing.
How to allocate a restricted budget of leaveoneout assessments for effective model selection in machine learning: a comparison of stateoftheart techniques
 Proceedings of the 17th BelgianDutch Conference on Artificial Intelligence (BNAIC’05
, 2005
"... The problem of selecting the best among several alternatives in a stochastic context has been the object of research in several domains: stochastic optimization, discreteevent stochastic simulation, experimental design. A particular instance of this problem is of particular relevance in machine lea ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
The problem of selecting the best among several alternatives in a stochastic context has been the object of research in several domains: stochastic optimization, discreteevent stochastic simulation, experimental design. A particular instance of this problem is of particular relevance in machine learning where the search of the model which could best represent a finite set of data asks for comparing several alternatives on the basis of a finite set of noisy data. This paper aims to bridge a gap between these different communities by comparing experimentally the effectiveness of techniques proposed in the simulation and in the stochastic dynamic programming community in performing a model selection task. In particular, we will consider here a model selection task in regression where the alternatives are represented by a finite set of Knearest neighbors models with different values of the structural parameter K. The techniques we compare are i) a twostage selection technique proposed in the stochastic simulation community, ii) a stochastic dynamic programming approach conceived to address the multiarmed bandit problem, iii) a racing method, iv) a greedy approach, v) a roundsearch technique. 1
Budgeted Parameter Learning of Generative Bayesian Networks
"... “The Bayesian revolution in the sciences is fueled, not only by more and more cognitive scientists suddenly noticing that mental phenomena have Bayesian structure in them; not only by scientists in every field learning to judge their statistical methods by comparison with the Bayesian method; but al ..."
Abstract
 Add to MetaCart
(Show Context)
“The Bayesian revolution in the sciences is fueled, not only by more and more cognitive scientists suddenly noticing that mental phenomena have Bayesian structure in them; not only by scientists in every field learning to judge their statistical methods by comparison with the Bayesian method; but also by the idea that science itself is a special case of Bayes ’ Theorem; experimental evidence is
Yahoo! Research Labs
"... Classical learning assumes the learner is given a labeled data sample, from which it learns a model. The field of Active Learning deals with the situation where the learner begins not with a training sample, but instead with resources that it can use to obtain information to help identify the optima ..."
Abstract
 Add to MetaCart
Classical learning assumes the learner is given a labeled data sample, from which it learns a model. The field of Active Learning deals with the situation where the learner begins not with a training sample, but instead with resources that it can use to obtain information to help identify the optimal model. To better understand this task, this paper presents and analyses the simplified “(budgeted) active model selection” version, which captures the pure exploration aspect of many active learning problems in a clean and simple problem formulation. Here the learner can use a fixed budget of “model probes ” (where each probe evaluates the specified model on a random indistinguishable instance) to identify which of a given set of possible models has the highest expected accuracy. Our goal is a policy that sequentially determines which model to probe next, based on the information observed so far. We present a formal description of this task, and show that it is NPhard in general. We then investigate a number of algorithms for this task, including several existing ones (eg, “RoundRobin”, “Interval Estimation”, “Gittins”) as well as some novel ones (e.g., “BiasedRobin”), describing first their approximation properties and then their empirical performance on various problem instances. We observe empirically that the simple biasedrobin algorithm significantly outperforms the other algorithms in the case of identical costs and priors. 1
Approximation Algorithms for Bayesian MultiArmed Bandit Problems∗
, 2014
"... In this paper, we consider several finitehorizon Bayesian multiarmed bandit problems with side constraints. These constraints include metric switching costs between arms, delayed feedback about observations, concave reward functions over plays, and explorethenexploit models. These problems do n ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, we consider several finitehorizon Bayesian multiarmed bandit problems with side constraints. These constraints include metric switching costs between arms, delayed feedback about observations, concave reward functions over plays, and explorethenexploit models. These problems do not have any known optimal (or near optimal) algorithms in subexponential running time; several of the variants are in fact computationally intractable (NPHard). All of these problems violate the exchange property that the reward from the play of an arm is not contingent upon when the arm is played. This separation of scheduling and accounting of the reward is critical to almost all known analysis techniques, and yet it does not hold even in fairly basic and natural setups which we consider here. Standard index policies are suboptimal in these contexts, there has been little analysis of such policies in these settings. We present a general solution framework that yields constant factor approximation algorithms for all the above variants. Our framework proceeds by formulating a weakly coupled linear programming relaxation, whose solution yields a collection of compact policies whose execution is restricted to a single arm. These singlearm policies are made more structured to ensure
Myopic Policies for Budgeted Optimization with Constrained Experiments (Project Report)
, 2008
"... Motivated by a realworld problem, we study a novel setting for budgeted optimization where the goal is to optimize an unknown function f(x) given a budget. In our setting, it is not practical to request samples of f(x) at precise input values due to the formidable cost of experimental setup at pre ..."
Abstract
 Add to MetaCart
(Show Context)
Motivated by a realworld problem, we study a novel setting for budgeted optimization where the goal is to optimize an unknown function f(x) given a budget. In our setting, it is not practical to request samples of f(x) at precise input values due to the formidable cost of experimental setup at precise values. Rather, we may request constrained experiments, which give the experimenter constraints on x for which they must return f(x). Importantly, as the constraints become looser, the experimental cost decreases, but the uncertainty about the location of the next observation increases. Our problem is to manage this tradeoff by selecting a sequence of constrained experiments to best optimize f within the budget. We propose a number of myopic policies for selecting constrained experiments using both modelfree and modelbased approaches, inspired by policies for unconstrained settings. Experiments on synthetic and realworld functions indicate that our policies outperform random selection, that the modelbased policies are superior to