Results 1 - 10
of
17
Approximation algorithms for restless bandit problems
- CORR
"... In this paper, we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACE-Hard to approximate to any non-trivi ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
(Show Context)
In this paper, we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACE-Hard to approximate to any non-trivial factor, and little progress has been made on this problem despite its significance in modeling activity allocation under uncertainty. We make progress on this problem by showing that for an interesting and general subclass that we term Monotone bandits, a surprisingly simple and intuitive greedy policy yields a factor 2 approximation. Such greedy policies are termed index policies, and are popular due to their simplicity and their optimality for the stochastic multi-armed bandit problem. The Monotone bandit problem strictly generalizes the stochastic multi-armed bandit problem, and naturally models multi-project scheduling where the state of a project becomes increasingly uncertain when the project is not scheduled. We develop several novel techniques in the design and analysis of the index policy. Our algorithm proceeds by introducing a novel “balance” constraint to the dual of a well-known LP relaxation to the restless bandit problem. This is followed by a structural characterization of the optimal solution by using both the exact primal as well as dual complementary slackness conditions. This yields an interpretation of the dual variables as potential functions from which we derive the index policy and the associated analysis.
Approximation algorithms for budgeted learning problems
- In Proc. ACM Symp. on Theory of Computing
, 2007
"... We present the first approximation algorithms for a large class of budgeted learning problems. One classic example of the above is the budgeted multi-armed bandit problem. In this problem each arm of the bandit has an unknown reward distribution on which a prior is specified as input. The knowledge ..."
Abstract
-
Cited by 31 (8 self)
- Add to MetaCart
(Show Context)
We present the first approximation algorithms for a large class of budgeted learning problems. One classic example of the above is the budgeted multi-armed bandit problem. In this problem each arm of the bandit has an unknown reward distribution on which a prior is specified as input. The knowledge about the underlying distribution can be refined in the exploration phase by playing the arm and observing the rewards. However, there is a budget on the total number of plays allowed during exploration. After this exploration phase, the arm with the highest (posterior) expected reward is chosen for exploitation. The goal is to design the adaptive exploration phase subject to a budget constraint on the number of plays, in order to maximize the expected reward of the arm chosen for exploitation. While this problem is reasonably well understood in the infinite horizon setting or regret bounds, the budgeted version of the problem is NP-Hard. For this problem, and several generalizations, we provide approximate policies that achieve a reward within constant factor of the reward optimal policy. Our algorithms use a novel linear program rounding technique based on stochastic packing.
Approximation Algorithms for Partial-information based Stochastic Control with Markovian Rewards
"... We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according to an underlying on/off Markov process with known parameters. The evolution of the Markov chain happens irrespective of ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
(Show Context)
We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according to an underlying on/off Markov process with known parameters. The evolution of the Markov chain happens irrespective of whether the arm is played, and furthermore, the exact state of the Markov chain is only revealed to the player when the arm is played and the reward observed. At most one arm (or in general, M arms) can be played any time step. The goal is to design a policy for playing the arms in order to maximize the infinite horizon time average expected reward. This problem is an instance of a Partially Observable Markov Decision Process (POMDP), and a special case of the notoriously intractable “restless bandit ” problem. Unlike the stochastic MAB problem, the FEEDBACK MAB problem does not admit to greedy index-based optimal policies. The state of the system at any time step encodes the beliefs about the states of different arms, and the policy decisions change these beliefs – this aspect complicates the design and analysis of simple algorithms. We design a constant factor approximation to the FEEDBACK MAB problem by solving and rounding a natural LP relaxation to this problem. As far as we are aware, this is the first approximation algorithm for a POMDP problem. 1
Fast greedy algorithms in mapreduce and streaming
- In SPAA
, 2013
"... Greedy algorithms are practitioners ’ best friends—they are intu-itive, simple to implement, and often lead to very good solutions. However, implementing greedy algorithms in a distributed setting is challenging since the greedy choice is inherently sequential, and it is not clear how to take advant ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
(Show Context)
Greedy algorithms are practitioners ’ best friends—they are intu-itive, simple to implement, and often lead to very good solutions. However, implementing greedy algorithms in a distributed setting is challenging since the greedy choice is inherently sequential, and it is not clear how to take advantage of the extra processing power. Our main result is a powerful sampling technique that aids in parallelization of sequential algorithms. We then show how to use this primitive to adapt a broad class of greedy algorithms to the MapReduce paradigm; this class includes maximum cover and submodular maximization subject to p-system constraints. Our method yields efficient algorithms that run in a logarithmic num-ber of rounds, while obtaining solutions that are arbitrarily close to those produced by the standard sequential greedy algorithm. We begin with algorithms for modular maximization subject to a ma-troid constraint, and then extend this approach to obtain approxima-tion algorithms for submodular maximization subject to knapsack or p-system constraints. Finally, we empirically validate our algo-rithms, and show that they achieve the same quality of the solution as standard greedy algorithms but run in a substantially fewer num-ber of rounds. Categories and Subject Descriptors
Model-driven optimization using adaptive probes
- In SODA ’07
, 2007
"... Abstract In several applications such as databases, planning, and sensor networks, parameters such as selectivity, load, or sensed values are known only with some associated uncertainty. The performance of such a system (as captured by some objective function over the parameters) is significantly i ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
(Show Context)
Abstract In several applications such as databases, planning, and sensor networks, parameters such as selectivity, load, or sensed values are known only with some associated uncertainty. The performance of such a system (as captured by some objective function over the parameters) is significantly improved if some of these parameters can be probed or observed. In a resource constrained situation, deciding which parameters to observe in order to optimize system performance itself becomes an interesting and important optimization problem. This problem is the focus of this paper. Unfortunately designing optimal observation schemes is NPHard even for the simplest objective functions, leading to the study of approximation algorithms. In this paper we present general techniques for designing non-adaptive probing algorithms which are at most a constant factor worse than optimal adaptive probing schemes. Interestingly, this shows that for several problems of interest, while probing yields significant improvement in the objective function, being adaptive about the probing is not beneficial beyond constant factors.
How to Probe for an Extreme Value
- PROCEEDINGS OF THE 25 TH ACM SIGACT-SIGMOD-SIGART
, 2006
"... In several systems applications, parameters such as load are known only with some associated uncertainty, which is specified, or modeled, as a distribution over values. The performance of the system optimization and monitoring schemes can be improved by spending resources such as time or bandwidth i ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
(Show Context)
In several systems applications, parameters such as load are known only with some associated uncertainty, which is specified, or modeled, as a distribution over values. The performance of the system optimization and monitoring schemes can be improved by spending resources such as time or bandwidth in observing or resolving the values of these parameters. In a resource-constrained situation, deciding which parameters to observe in order to best optimize the expected system performance (or in general, optimize the expected value of a certain objective function) itself becomes an interesting optimization problem. In this paper, we initiate the study of such problems that we term “model-driven optimization”. In particular, we study the problem of optimizing the minimum value in the presence of observable distributions. We show that this problem is NP-Hard, and present greedy algorithms with good performance bounds. The proof of the performance bounds are via novel sub-modularity arguments and connections to covering integer programs.
Model-driven dynamic control of embedded wireless sensor networks
- WORKSHOP ON DYNAMIC DATA DRIVEN APPLICATION SYSTEMS, INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2006
, 2006
"... Next-generation wireless sensor networks may revolutionize understanding of environmental change by assimilating heterogeneous data, assessing the relative value and costs of data collection, and scheduling activities accordingly. Thus, they are dynamic, data-driven distributed systems that integra ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Next-generation wireless sensor networks may revolutionize understanding of environmental change by assimilating heterogeneous data, assessing the relative value and costs of data collection, and scheduling activities accordingly. Thus, they are dynamic, data-driven distributed systems that integrate sensing with modeling and prediction in an adaptive framework. Integration of a range of technologies will allow estimation of the value of future data in terms of its contribution to understanding and cost. This balance is especially important for environmental data, where sampling intervals will range from meters and seconds to landscapes and years. In this paper, we first describe a general framework for dynamic data-driven wireless network control that combines modeling of the sensor network and its embedding environment, both in and out of the network. We then describe a range of challenges that must be addressed, and an integrated suite of solutions for the design of dynamic sensor networks.
Sequential design of experiments via linear programming
- Preliminary version appeared in the ACM Symposium on Theory of Computing
, 2007
"... The celebrated multi-armed bandit problem in decision theory models the central trade-off between exploration, or learning about the state of a system, and exploitation, or utilizing the system. In this paper we study the variant of the multi-armed bandit problem where the exploration phase involves ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
(Show Context)
The celebrated multi-armed bandit problem in decision theory models the central trade-off between exploration, or learning about the state of a system, and exploitation, or utilizing the system. In this paper we study the variant of the multi-armed bandit problem where the exploration phase involves costly experiments and occurs before the exploitation phase; and where each play of an arm during the exploration phase updates a prior belief about the arm. The problem of finding an inexpensive exploration strategy to optimize a certain exploitation objective is NP-Hard even when a single play reveals all information about an arm, and all exploration steps cost the same. We provide the first polynomial time constant-factor approximation algorithm for this class of problems. We show that this framework also generalizes several problems of interest studied in the context of data acquisition in sensor networks. Our analyses also extends to switching and setup costs, and to concave utility objectives. Our solution approach is via a novel linear program rounding technique based on stochastic packing. In addition to yielding exploration policies whose performance is within a small constant factor of the adaptive optimal policy, a nice feature of this approach is that the resulting policies explore the arms sequentially without revisiting any arm. Sequentiality is a well-studied paradigm in decision theory, and is very desirable in domains where multiple explorations can be conducted in parallel, for instance, in the sensor network context. 1