Results 1 - 10
of
11
The linear programming approach to approximate dynamic programming
- Operations Research
, 2001
"... The curse of dimensionality gives rise to prohibitive computational requirements that render infeasible the exact solution of large-scale stochastic control problems. We study an efficient method based on linear programming for approximating solutions to such problems. The approach “fits ” a linear ..."
Abstract
-
Cited by 105 (15 self)
- Add to MetaCart
The curse of dimensionality gives rise to prohibitive computational requirements that render infeasible the exact solution of large-scale stochastic control problems. We study an efficient method based on linear programming for approximating solutions to such problems. The approach “fits ” a linear combination of pre-selected basis functions to the dynamic programming cost-to-go function. We develop error bounds that offer performance guarantees and also guide the selection of both basis functions and “state-relevance weights ” that influence quality of the approximation. Experimental results in the domain of queueing network control provide empirical support for the methodology. (Dynamic programming/optimal control: approximations/large-scale problems. Queues, algorithms: control of queueing networks.)
Validity of heavy traffic steady-state approximations in open queueing networks
, 2006
"... We consider a single class open queueing network, also known as a generalized Jackson network (GJN). A classical result in heavytraffic theory asserts that the sequence of normalized queue length processes of the GJN converge weakly to a reflected Brownian motion (RBM) in the orthant, as the traffic ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
We consider a single class open queueing network, also known as a generalized Jackson network (GJN). A classical result in heavytraffic theory asserts that the sequence of normalized queue length processes of the GJN converge weakly to a reflected Brownian motion (RBM) in the orthant, as the traffic intensity approaches unity. However, barring simple instances, it is still not known whether the stationary distribution of RBM provides a valid approximation for the steady-state of the original network. In this paper we resolve this open problem by proving that the re-scaled stationary distribution of the GJN converges to the stationary distribution of the RBM, thus validating a so-called “interchange-of-limits” for this class of networks. Our method of proof involves a combination of Lyapunov function techniques, strong approximations and tail probability bounds that yield tightness of the sequence of stationary distributions of the GJN.
The natural work-stealing algorithm is stable
- In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS
, 2001
"... In this paper we analyse a very simple dynamic work-stealing algorithm. In the workgeneration model, there are n (work) generators. A generator-allocation function is simply a function from the n generators to the n processors. We consider a fixed, but arbitrary, distribution D over generator-alloca ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
In this paper we analyse a very simple dynamic work-stealing algorithm. In the workgeneration model, there are n (work) generators. A generator-allocation function is simply a function from the n generators to the n processors. We consider a fixed, but arbitrary, distribution D over generator-allocation functions. During each time-step of our process, a generator-allocation function h is chosen from D, and the generators are allocated to the processors according to h. Each generator may then generate a unit-time task which it inserts into the queue of its host processor. It generates such a task independently with probability λ. After the new tasks are generated, each processor removes one task from its queue and services it. For many choices of D, the work-generation model allows the load to become arbitrarily imbalanced, even when λ < 1. For example, D could be the point distribution containing a single function h which allocates all of the generators to just one processor. For this choice of D, the chosen processor receives around λn units of work at each step and services one. The natural work-stealing algorithm that we analyse is widely used in practical applications and works as follows. During each time step, each empty
Extension of the PAC Framework to Finite and Countable Markov Chains
- In Proceedings of the 12th Annual Conference on Computational Learning Theory
, 2000
"... We consider a model of learning in which the successive observations follow a certain Markov chain. The observations are labeled according to a membership to some unknown target set. For a Markov chain with finitely many states we show that, if the target set belongs to a family of sets with a finit ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We consider a model of learning in which the successive observations follow a certain Markov chain. The observations are labeled according to a membership to some unknown target set. For a Markov chain with finitely many states we show that, if the target set belongs to a family of sets with a finite VC dimension, then probably approximately correct learning of this set is possible with polynomially large samples. Specifically for observations following a random walk with a state space X and uniform stationary distribution, the sample size required is no more than\Omega i t 0 1\Gamma 2 log(t 0 jX j 1 ffi ) j , where ffi is the confidence level, 2 is the second largest eigenvalue of the transition matrix and t 0 is the sample size sufficient for learning from i.i.d. observations We extend these results to Markov chains with countably many states using Lyapunov function technique and recent results on mixing properties of infinite state Markov chains. 1 INTRODUCTION The subject...
Approximation algorithms for budgeted learning problems
- In Proc. ACM Symp. on Theory of Computing
, 2007
"... We present the first approximation algorithms for a large class of budgeted learning problems. One classic example of the above is the budgeted multi-armed bandit problem. In this problem each arm of the bandit has an unknown reward distribution on which a prior is specified as input. The knowledge ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
We present the first approximation algorithms for a large class of budgeted learning problems. One classic example of the above is the budgeted multi-armed bandit problem. In this problem each arm of the bandit has an unknown reward distribution on which a prior is specified as input. The knowledge about the underlying distribution can be refined in the exploration phase by playing the arm and observing the rewards. However, there is a budget on the total number of plays allowed during exploration. After this exploration phase, the arm with the highest (posterior) expected reward is chosen for exploitation. The goal is to design the adaptive exploration phase subject to a budget constraint on the number of plays, in order to maximize the expected reward of the arm chosen for exploitation. While this problem is reasonably well understood in the infinite horizon setting or regret bounds, the budgeted version of the problem is NP-Hard. For this problem, and several generalizations, we provide approximate policies that achieve a reward within constant factor of the reward optimal policy. Our algorithms use a novel linear program rounding technique based on stochastic packing.
A Distributed Scheme for Achieving Energy-Delay Tradeoffs With Multiple Service Classes over a Dynamically Varying Network
"... We consider a dynamical probabilistic traffic model for the number of users transmitting at any time. This model captures both user mobility and traffic burstiness. Moreover, we assume no centralized controller, such as a scheduler, is available. When multiple users transmit simultaneously, multiple ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We consider a dynamical probabilistic traffic model for the number of users transmitting at any time. This model captures both user mobility and traffic burstiness. Moreover, we assume no centralized controller, such as a scheduler, is available. When multiple users transmit simultaneously, multiple access interference affects throughput considerably. Most queue control schemes assume individual users know the states of their own queues (local queue information) along with the states of other users queues (shared queue information) and address issues of scheduling; but this sharing of information may be onerous in a practical system. While shared queue information has recently been shown [1] not to affect the capacity of such systems, it has a considerable impact on delay. We introduce a scheme, where for each user, a bit of shared queue information specifies whether its queue length is above or below a threshold. Our scheme relies on two different service classes implemented through a superposition coding scheme (first proposed in [2], further studied and expanded in [1]). The first class experiences no delay due to multiple access interference while second class requires retransmissions when such an event occurs. We show how our scheme affords an energy-delay tradeoff. Moreover, when configured properly, our scheme can be can attain boundary points of the region corresponding to minimum energy with no shared queue information for 0 delay along with minimum energy subject to system stability. We derive bounds on the performance of the multiple access system using our proposed scheme by introducing Lyapunov function bounds in a manner similar to [3].
Approximate Dynamic Programming via Linear Programming
- In NIPS-14
, 2001
"... The curse of dimensionality gives rise to prohibitive computational requirements that render infeasible the exact solution of large-scale stochastic control problems. We study an efficient method based on linear programming for approximating solutions to such problems. ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
The curse of dimensionality gives rise to prohibitive computational requirements that render infeasible the exact solution of large-scale stochastic control problems. We study an efficient method based on linear programming for approximating solutions to such problems.
Approximate Linear Programming for Average-Cost Dynamic Programming
, 2003
"... This paper extends our earlier analysis on approximate linear programming as an approach to approximating the cost-to-go function in a discounted-cost dynamic program [6]. In this paper, we consider the average-cost criterion and a version of approximate linear programming that generates approximati ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper extends our earlier analysis on approximate linear programming as an approach to approximating the cost-to-go function in a discounted-cost dynamic program [6]. In this paper, we consider the average-cost criterion and a version of approximate linear programming that generates approximations to the optimal average cost and differential cost function. We demonstrate that a naive version of approximate linear programming prioritizes approximation of the optimal average cost and that this may not be well-aligned with the objective of deriving a policy with low average cost. For that, the algorithm should aim at producing a good approximation of the differential cost function. We propose a twophase variant of approximate linear programming that allows for external control of the relative accuracy of the approximation of the differential cost function over different portions of the state space via state-relevance weights. Performance bounds suggest that the new algorithm is compatible with the objective of optimizing performance and provide guidance on appropriate choices for state-relevance weights.
Multi-armed bandits with limited exploration
- In Proceedings of the Annual Symposium on Theory of Computing (STOC
, 2007
"... A central problem to decision making under uncertainty is the trade-off between exploration and exploitation: between learning from and adapting to a stochastic system and exploiting the current best-knowledge about the system. A fundamental decision-theoretic model that captures this trade-off is t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A central problem to decision making under uncertainty is the trade-off between exploration and exploitation: between learning from and adapting to a stochastic system and exploiting the current best-knowledge about the system. A fundamental decision-theoretic model that captures this trade-off is the celebrated stochastic Multi-arm Bandit Problem. In this paper, we consider scenarios where the exploration phase corresponds to designing experiments, and the exploration phase has the following restrictions: (1) it must necessarily precede the exploitation phase; (2) it is expensive in terms of some resource consumed, so that only a limited amount of exploration can be performed; and (3) switching from one experiment to another incurs a setup cost. Such a model, which is termed budgeted learning, is relevant in scenarios such as clinical trials and sensor network data acquisition. Though the classic multi-armed bandit problem admits to a polynomial time greedy optimal solution termed the Gittins index policy, the budgeted learning problem does not admit to such a greedy optimal solution. In fact, the problem is NP-Hard even in simple settings. Our main contribution is in presenting constant factor approximation algorithms for this problem via a novel linear program rounding technique based on stochastic packing.
Stability of Fluid Networks with Proportional Routing
- in Russian); Ann. Phys. (N.Y
, 2001
"... this paper we investigate the stability of a class of two-station multiclass uid networks with proportional routing. We obtain explicit necessary and sucient conditions for the global stability of such networks. By virtue of a stability theorem of Dai [14], these results also give sucient conditions ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
this paper we investigate the stability of a class of two-station multiclass uid networks with proportional routing. We obtain explicit necessary and sucient conditions for the global stability of such networks. By virtue of a stability theorem of Dai [14], these results also give sucient conditions for the stability of a class of related multiclass queueing networks. Our study extends the results of Dai and VandeVate [19], who provided a similar analysis for uid models without proportional routing, which arise from queueing networks with deterministic routing. The models we investigate include uid models which arise from a large class of two-station queueing networks with probabilistic routing. The stability conditions derived turn out to have an appealing intuitive interpretation in terms of virtual stations and push-starts which were introduced in earlier work on multiclass networks. Keywords: Multiclass queueing network, uid mod

