Results 1  10
of
20
The linear programming approach to approximate dynamic programming
 Operations Research
, 2001
"... The curse of dimensionality gives rise to prohibitive computational requirements that render infeasible the exact solution of largescale stochastic control problems. We study an efficient method based on linear programming for approximating solutions to such problems. The approach “fits ” a linear ..."
Abstract

Cited by 140 (16 self)
 Add to MetaCart
The curse of dimensionality gives rise to prohibitive computational requirements that render infeasible the exact solution of largescale stochastic control problems. We study an efficient method based on linear programming for approximating solutions to such problems. The approach “fits ” a linear combination of preselected basis functions to the dynamic programming costtogo function. We develop error bounds that offer performance guarantees and also guide the selection of both basis functions and “staterelevance weights ” that influence quality of the approximation. Experimental results in the domain of queueing network control provide empirical support for the methodology. (Dynamic programming/optimal control: approximations/largescale problems. Queues, algorithms: control of queueing networks.)
Validity of heavy traffic steadystate approximations in open queueing networks
, 2006
"... We consider a single class open queueing network, also known as a generalized Jackson network (GJN). A classical result in heavytraffic theory asserts that the sequence of normalized queue length processes of the GJN converge weakly to a reflected Brownian motion (RBM) in the orthant, as the traffic ..."
Abstract

Cited by 28 (4 self)
 Add to MetaCart
We consider a single class open queueing network, also known as a generalized Jackson network (GJN). A classical result in heavytraffic theory asserts that the sequence of normalized queue length processes of the GJN converge weakly to a reflected Brownian motion (RBM) in the orthant, as the traffic intensity approaches unity. However, barring simple instances, it is still not known whether the stationary distribution of RBM provides a valid approximation for the steadystate of the original network. In this paper we resolve this open problem by proving that the rescaled stationary distribution of the GJN converges to the stationary distribution of the RBM, thus validating a socalled “interchangeoflimits” for this class of networks. Our method of proof involves a combination of Lyapunov function techniques, strong approximations and tail probability bounds that yield tightness of the sequence of stationary distributions of the GJN.
The natural workstealing algorithm is stable
 In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS
, 2001
"... In this paper we analyse a very simple dynamic workstealing algorithm. In the workgeneration model, there are n (work) generators. A generatorallocation function is simply a function from the n generators to the n processors. We consider a fixed, but arbitrary, distribution D over generatoralloca ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
In this paper we analyse a very simple dynamic workstealing algorithm. In the workgeneration model, there are n (work) generators. A generatorallocation function is simply a function from the n generators to the n processors. We consider a fixed, but arbitrary, distribution D over generatorallocation functions. During each timestep of our process, a generatorallocation function h is chosen from D, and the generators are allocated to the processors according to h. Each generator may then generate a unittime task which it inserts into the queue of its host processor. It generates such a task independently with probability λ. After the new tasks are generated, each processor removes one task from its queue and services it. For many choices of D, the workgeneration model allows the load to become arbitrarily imbalanced, even when λ < 1. For example, D could be the point distribution containing a single function h which allocates all of the generators to just one processor. For this choice of D, the chosen processor receives around λn units of work at each step and services one. The natural workstealing algorithm that we analyse is widely used in practical applications and works as follows. During each time step, each empty
Approximation algorithms for budgeted learning problems
 In Proc. ACM Symp. on Theory of Computing
, 2007
"... We present the first approximation algorithms for a large class of budgeted learning problems. One classic example of the above is the budgeted multiarmed bandit problem. In this problem each arm of the bandit has an unknown reward distribution on which a prior is specified as input. The knowledge ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
We present the first approximation algorithms for a large class of budgeted learning problems. One classic example of the above is the budgeted multiarmed bandit problem. In this problem each arm of the bandit has an unknown reward distribution on which a prior is specified as input. The knowledge about the underlying distribution can be refined in the exploration phase by playing the arm and observing the rewards. However, there is a budget on the total number of plays allowed during exploration. After this exploration phase, the arm with the highest (posterior) expected reward is chosen for exploitation. The goal is to design the adaptive exploration phase subject to a budget constraint on the number of plays, in order to maximize the expected reward of the arm chosen for exploitation. While this problem is reasonably well understood in the infinite horizon setting or regret bounds, the budgeted version of the problem is NPHard. For this problem, and several generalizations, we provide approximate policies that achieve a reward within constant factor of the reward optimal policy. Our algorithms use a novel linear program rounding technique based on stochastic packing.
Approximation algorithms for restless bandit problems
 CoRR
"... In this paper, we consider the restless bandit problem, which is one of the most wellstudied generalizations of the celebrated stochastic multiarmed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACEHard to approximate to any nontrivi ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
In this paper, we consider the restless bandit problem, which is one of the most wellstudied generalizations of the celebrated stochastic multiarmed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACEHard to approximate to any nontrivial factor, and little progress has been made on this problem despite its significance in modeling activity allocation under uncertainty. We make progress on this problem by showing that for an interesting and general subclass that we term Monotone bandits, a surprisingly simple and intuitive greedy policy yields a factor 2 approximation. Such greedy policies are termed index policies, and are popular due to their simplicity and their optimality for the stochastic multiarmed bandit problem. The Monotone bandit problem strictly generalizes the stochastic multiarmed bandit problem, and naturally models multiproject scheduling where the state of a project becomes increasingly uncertain when the project is not scheduled. We develop several novel techniques in the design and analysis of the index policy. Our algorithm proceeds by introducing a novel “balance” constraint to the dual of a wellknown LP relaxation to the restless bandit problem. This is followed by a structural characterization of the optimal solution by using both the exact primal as well as dual complementary slackness conditions. This yields an interpretation of the dual variables as potential functions from which we derive the index policy and the associated analysis. 1
Extension of the PAC Framework to Finite and Countable Markov Chains
 In Proceedings of the 12th Annual Conference on Computational Learning Theory
, 2000
"... We consider a model of learning in which the successive observations follow a certain Markov chain. The observations are labeled according to a membership to some unknown target set. For a Markov chain with finitely many states we show that, if the target set belongs to a family of sets with a finit ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
We consider a model of learning in which the successive observations follow a certain Markov chain. The observations are labeled according to a membership to some unknown target set. For a Markov chain with finitely many states we show that, if the target set belongs to a family of sets with a finite VC dimension, then probably approximately correct learning of this set is possible with polynomially large samples. Specifically for observations following a random walk with a state space X and uniform stationary distribution, the sample size required is no more than\Omega i t 0 1\Gamma 2 log(t 0 jX j 1 ffi ) j , where ffi is the confidence level, 2 is the second largest eigenvalue of the transition matrix and t 0 is the sample size sufficient for learning from i.i.d. observations We extend these results to Markov chains with countably many states using Lyapunov function technique and recent results on mixing properties of infinite state Markov chains. 1 INTRODUCTION The subject...
A Distributed Scheme for Achieving EnergyDelay Tradeoffs With Multiple Service Classes over a Dynamically Varying Network
"... We consider a dynamical probabilistic traffic model for the number of users transmitting at any time. This model captures both user mobility and traffic burstiness. Moreover, we assume no centralized controller, such as a scheduler, is available. When multiple users transmit simultaneously, multiple ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We consider a dynamical probabilistic traffic model for the number of users transmitting at any time. This model captures both user mobility and traffic burstiness. Moreover, we assume no centralized controller, such as a scheduler, is available. When multiple users transmit simultaneously, multiple access interference affects throughput considerably. Most queue control schemes assume individual users know the states of their own queues (local queue information) along with the states of other users queues (shared queue information) and address issues of scheduling; but this sharing of information may be onerous in a practical system. While shared queue information has recently been shown [1] not to affect the capacity of such systems, it has a considerable impact on delay. We introduce a scheme, where for each user, a bit of shared queue information specifies whether its queue length is above or below a threshold. Our scheme relies on two different service classes implemented through a superposition coding scheme (first proposed in [2], further studied and expanded in [1]). The first class experiences no delay due to multiple access interference while second class requires retransmissions when such an event occurs. We show how our scheme affords an energydelay tradeoff. Moreover, when configured properly, our scheme can be can attain boundary points of the region corresponding to minimum energy with no shared queue information for 0 delay along with minimum energy subject to system stability. We derive bounds on the performance of the multiple access system using our proposed scheme by introducing Lyapunov function bounds in a manner similar to [3].
Approximate Dynamic Programming via Linear Programming
 In NIPS14
, 2001
"... The curse of dimensionality gives rise to prohibitive computational requirements that render infeasible the exact solution of largescale stochastic control problems. We study an efficient method based on linear programming for approximating solutions to such problems. ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
The curse of dimensionality gives rise to prohibitive computational requirements that render infeasible the exact solution of largescale stochastic control problems. We study an efficient method based on linear programming for approximating solutions to such problems.
Approximate dynamic programming via iterated Bellman inequalities,” http://www.stanford.edu/ ∼boyd/papers/ adp iter bellman.html
, 2010
"... In this paper we introduce new methods for finding functions that lower bound the value function of a stochastic control problem, using an iterated form of the Bellman inequality. Our method is based on solving linear or semidefinite programs, and produces both a bound on the optimal objective, as w ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
In this paper we introduce new methods for finding functions that lower bound the value function of a stochastic control problem, using an iterated form of the Bellman inequality. Our method is based on solving linear or semidefinite programs, and produces both a bound on the optimal objective, as well as a suboptimal policy that appears to works very well. These results extend and improve bounds obtained by authors in a previous paper using a single Bellman inequality condition. We describe the methods in a general setting, and show how they can be applied in specific cases including the finite state case, constrained linear quadratic control, switched affine control, and multiperiod portfolio investment. 1