Results 1 -
9 of
9
Approximation Algorithms for Partial-information based Stochastic Control with Markovian Rewards
"... We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according to an underlying on/off Markov process with known parameters. The evolution of the Markov chain happens irrespective of ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according to an underlying on/off Markov process with known parameters. The evolution of the Markov chain happens irrespective of whether the arm is played, and furthermore, the exact state of the Markov chain is only revealed to the player when the arm is played and the reward observed. At most one arm (or in general, M arms) can be played any time step. The goal is to design a policy for playing the arms in order to maximize the infinite horizon time average expected reward. This problem is an instance of a Partially Observable Markov Decision Process (POMDP), and a special case of the notoriously intractable “restless bandit ” problem. Unlike the stochastic MAB problem, the FEEDBACK MAB problem does not admit to greedy index-based optimal policies. The state of the system at any time step encodes the beliefs about the states of different arms, and the policy decisions change these beliefs – this aspect complicates the design and analysis of simple algorithms. We design a constant factor approximation to the FEEDBACK MAB problem by solving and rounding a natural LP relaxation to this problem. As far as we are aware, this is the first approximation algorithm for a POMDP problem. 1
Approximation algorithms for budgeted learning problems
- In Proc. ACM Symp. on Theory of Computing
, 2007
"... We present the first approximation algorithms for a large class of budgeted learning problems. One classic example of the above is the budgeted multi-armed bandit problem. In this problem each arm of the bandit has an unknown reward distribution on which a prior is specified as input. The knowledge ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
We present the first approximation algorithms for a large class of budgeted learning problems. One classic example of the above is the budgeted multi-armed bandit problem. In this problem each arm of the bandit has an unknown reward distribution on which a prior is specified as input. The knowledge about the underlying distribution can be refined in the exploration phase by playing the arm and observing the rewards. However, there is a budget on the total number of plays allowed during exploration. After this exploration phase, the arm with the highest (posterior) expected reward is chosen for exploitation. The goal is to design the adaptive exploration phase subject to a budget constraint on the number of plays, in order to maximize the expected reward of the arm chosen for exploitation. While this problem is reasonably well understood in the infinite horizon setting or regret bounds, the budgeted version of the problem is NP-Hard. For this problem, and several generalizations, we provide approximate policies that achieve a reward within constant factor of the reward optimal policy. Our algorithms use a novel linear program rounding technique based on stochastic packing.
Approximation algorithms for restless bandit problems
- CoRR
"... In this paper, we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACE-Hard to approximate to any non-trivi ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
In this paper, we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACE-Hard to approximate to any non-trivial factor, and little progress has been made on this problem despite its significance in modeling activity allocation under uncertainty. We make progress on this problem by showing that for an interesting and general subclass that we term Monotone bandits, a surprisingly simple and intuitive greedy policy yields a factor 2 approximation. Such greedy policies are termed index policies, and are popular due to their simplicity and their optimality for the stochastic multi-armed bandit problem. The Monotone bandit problem strictly generalizes the stochastic multi-armed bandit problem, and naturally models multi-project scheduling where the state of a project becomes increasingly uncertain when the project is not scheduled. We develop several novel techniques in the design and analysis of the index policy. Our algorithm proceeds by introducing a novel “balance” constraint to the dual of a well-known LP relaxation to the restless bandit problem. This is followed by a structural characterization of the optimal solution by using both the exact primal as well as dual complementary slackness conditions. This yields an interpretation of the dual variables as potential functions from which we derive the index policy and the associated analysis. 1
Model-driven dynamic control of embedded wireless sensor networks
- Workshop on Dynamic Data Driven Application Systems, International Conference on Computational Science (ICCS 2006
, 2006
"... Abstract. Next-generation wireless sensor networks may revolutionize understanding of environmental change by assimilating heterogeneous data, assessing the relative value and costs of data collection, and scheduling activities accordingly. Thus, they are dynamic, data-driven distributed systems tha ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. Next-generation wireless sensor networks may revolutionize understanding of environmental change by assimilating heterogeneous data, assessing the relative value and costs of data collection, and scheduling activities accordingly. Thus, they are dynamic, data-driven distributed systems that integrate sensing with modeling and prediction in an adaptive framework. Integration of a range of technologies will allow estimation of the value of future data in terms of its contribution to understanding and cost. This balance is especially important for environmental data, where sampling intervals will range from meters and seconds to landscapes and years. In this paper, we first describe a general framework for dynamic data-driven wireless network control that combines modeling of the sensor network and its embedding environment, both in and out of the network. We then describe a range of challenges that must be addressed, and an integrated suite of solutions for the design of dynamic sensor networks. 1
How to Probe for an Extreme Value
- PROCEEDINGS OF THE 25 TH ACM SIGACT-SIGMOD-SIGART
, 2006
"... In several systems applications, parameters such as load are known only with some associated uncertainty, which is specified, or modeled, as a distribution over values. The performance of the system optimization and monitoring schemes can be improved by spending resources such as time or bandwidth i ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In several systems applications, parameters such as load are known only with some associated uncertainty, which is specified, or modeled, as a distribution over values. The performance of the system optimization and monitoring schemes can be improved by spending resources such as time or bandwidth in observing or resolving the values of these parameters. In a resource-constrained situation, deciding which parameters to observe in order to best optimize the expected system performance (or in general, optimize the expected value of a certain objective function) itself becomes an interesting optimization problem. In this paper, we initiate the study of such problems that we term “model-driven optimization”. In particular, we study the problem of optimizing the minimum value in the presence of observable distributions. We show that this problem is NP-Hard, and present greedy algorithms with good performance bounds. The proof of the performance bounds are via novel sub-modularity arguments and connections to covering integer programs.
Multi-armed bandits with limited exploration
- In Proceedings of the Annual Symposium on Theory of Computing (STOC
, 2007
"... A central problem to decision making under uncertainty is the trade-off between exploration and exploitation: between learning from and adapting to a stochastic system and exploiting the current best-knowledge about the system. A fundamental decision-theoretic model that captures this trade-off is t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A central problem to decision making under uncertainty is the trade-off between exploration and exploitation: between learning from and adapting to a stochastic system and exploiting the current best-knowledge about the system. A fundamental decision-theoretic model that captures this trade-off is the celebrated stochastic Multi-arm Bandit Problem. In this paper, we consider scenarios where the exploration phase corresponds to designing experiments, and the exploration phase has the following restrictions: (1) it must necessarily precede the exploitation phase; (2) it is expensive in terms of some resource consumed, so that only a limited amount of exploration can be performed; and (3) switching from one experiment to another incurs a setup cost. Such a model, which is termed budgeted learning, is relevant in scenarios such as clinical trials and sensor network data acquisition. Though the classic multi-armed bandit problem admits to a polynomial time greedy optimal solution termed the Gittins index policy, the budgeted learning problem does not admit to such a greedy optimal solution. In fact, the problem is NP-Hard even in simple settings. Our main contribution is in presenting constant factor approximation algorithms for this problem via a novel linear program rounding technique based on stochastic packing.
Approximation Algorithms for Correlated Knaspacks and Non-Martingale Bandits
"... In the stochastic knapsack problem, we are given a knapsack of size B, and a set of jobs whose sizes and rewards are drawn from a known probability distribution. However, the only way to know the actual size and reward is to schedule the job—when it completes, we get to know these values. How should ..."
Abstract
- Add to MetaCart
In the stochastic knapsack problem, we are given a knapsack of size B, and a set of jobs whose sizes and rewards are drawn from a known probability distribution. However, the only way to know the actual size and reward is to schedule the job—when it completes, we get to know these values. How should we schedule jobs to maximize the expected total reward? We know constant-factor approximations for this problem when we assume that rewards and sizes are independent random variables, and that we cannot prematurely cancel jobs after we schedule them. What can we say when either or both of these assumptions are changed? The stochastic knapsack problem is of interest in its own right, but techniques developed for it are applicable to other stochastic packing problems. Indeed, ideas for this problem have been useful for budgeted learning problems, where one is given several arms which evolve in a specified stochastic fashion with each pull, and the goal is to pull the arms a total of B times to maximize the reward obtained. Much recent work on this problem focus on the case when the evolution of the arms follows a martingale, i.e., when the expected reward from the future is the same as the reward at the current state. What can we say when the rewards do not form a martingale? In this paper, we give constant-factor approximation algorithms for the stochastic knapsack problem with
Model-based Querying . . .
"... The data generated by sensor networks or other distributed measurement infrastructures is typically incomplete, imprecise, and often erroneous, such that it is not an accurate representation of physical reality. To map raw sensor readings onto physical reality, a mathematical description, a model, o ..."
Abstract
- Add to MetaCart
The data generated by sensor networks or other distributed measurement infrastructures is typically incomplete, imprecise, and often erroneous, such that it is not an accurate representation of physical reality. To map raw sensor readings onto physical reality, a mathematical description, a model, of the underlying system or process is required to complement the sensor data. Models can help provide more robust interpretations of sensor readings: by accounting for spatial or temporal biases in the observed data, by identifying sensors that are providing faulty data, by extrapolating the values of missing sensor data, or by inferring hidden variables that may not be directly observable. Models also offer a principled approach to predict future states of a system. Finally, since models incorporate spatio-temporal correlations in the environment (which tend to be very strong in many monitoring applications), they lead to significantly more energy-efficient query execution – by exploiting such attribute correlations, it is often possible to use a small set of observations to provide approximations of the values of a large number of attributes. Model-based querying over a sensor network consists of two components: (1) identifying and/or building a model for a given sensor network, and (2) executing declarative queries against a sensor network that has been augmented with such a model (these steps may happen serially or concurrently). The queries may be on future or hidden states of the system, and are posed in a declarative SQL-like language. Since the cost of acquiring sensor readings

