Results 1  10
of
13
Discounted deterministic Markov decision processes and discounted allpairs shortest paths
 ACM Transcations on Algorithms
"... We present two new algorithms for finding optimal strategies for discounted, infinitehorizon, Deterministic Markov Decision Processes (DMDP). The first one is an adaptation of an algorithm of Young, Tarjan and Orlin for finding minimum mean weight cycles. It runs in O(mn + n 2 log n) time, where n ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
in many situations. Both algorithms improve on a recent O(mn 2)time algorithm of Andersson and Vorobyov. We also present a randomized Õ(m1/2 n 2)time algorithm for finding Discounted AllPairs Shortest Paths (DAPSP), improving several previous algorithms. 1
Policy gradient methods for reinforcement learning with function approximation.
 In NIPS,
, 1999
"... Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly repres ..."
Abstract

Cited by 439 (20 self)
 Add to MetaCart
;actorcritic" or policyiteration architectures (e.g., Policy Gradient Theorem We consider the standard reinforcement learning framework (see, e.g., Sutton and Barto, 1998), in which a learning agent interacts with a Markov decision process (MDP). The state, action, and reward at each time t ∈ {0, 1, 2
Learning Algorithms for Markov Decision Processes with Average Cost
 SIAM Journal on Control and Optimization
, 2001
"... Abstract. This paper gives the first rigorous convergence analysis of analogues of Watkins’s Qlearning algorithm, applied to average cost control of finitestate Markov chains. We discuss two algorithms which may be viewed as stochastic approximation counterparts of two existing algorithms for recu ..."
Abstract

Cited by 48 (9 self)
 Add to MetaCart
approximation, dynamic programming AMS subject classification. 93E20 PII. S0363012999361974 1. Introduction. Qlearning algorithms are simulationbased reinforcement learning algorithms for learning the value function arising in the dynamic programming approach to Markov decision processes. They were first
Information Relaxation Bounds for Infinite Horizon Markov Decision Processes
"... We consider infinite horizon stochastic dynamic programs with discounted costs and study how to use information relaxations to calculate lower bounds on the performance of an optimal policy. We develop a general framework that allows for reformulations of the underlying state transition function. Th ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
obtained from simple heuristics. Finally, we discuss extensions of the approach to stochastic shortest path and average cost problems.
Robust Combination of Local Controllers
 Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI01
, 2001
"... Finding solutions to high dimensional Markov Decision Processes (MDPs) is a difficult problem, especially in the presence of uncertainty or if the actions and time measurements are continuous. Frequently this difficulty can be alleviated by the availability of problemspecific knowledge. For example ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Finding solutions to high dimensional Markov Decision Processes (MDPs) is a difficult problem, especially in the presence of uncertainty or if the actions and time measurements are continuous. Frequently this difficulty can be alleviated by the availability of problemspecific knowledge
A Strongly Polynomial Algorithm for Controlled Queues
, 2008
"... We consider the problem of computing optimal policies of finitestate, finiteaction Markov Decision Processes (MDPs). A reduction to a continuum of constrained MDPs (CMDPs) is presented such that the optimal policies for these CMDPs constitute a path in a graph defined over the deterministic polici ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We consider the problem of computing optimal policies of finitestate, finiteaction Markov Decision Processes (MDPs). A reduction to a continuum of constrained MDPs (CMDPs) is presented such that the optimal policies for these CMDPs constitute a path in a graph defined over the deterministic
A Strongly Polynomial Algorithm for Controlled Queues
, 2009
"... We consider the problem of computing optimal policies of finitestate, finiteaction Markov Decision Processes (MDPs). A reduction to a continuum of constrained MDPs (CMDPs) is presented such that the optimal policies for these CMDPs constitute a path in a graph defined over the deterministic polici ..."
Abstract
 Add to MetaCart
We consider the problem of computing optimal policies of finitestate, finiteaction Markov Decision Processes (MDPs). A reduction to a continuum of constrained MDPs (CMDPs) is presented such that the optimal policies for these CMDPs constitute a path in a graph defined over the deterministic
A Mean Field Approach for Optimization in Particles Systems and Applications
, 2009
"... This paper investigates the limit behavior of Markov decision processes (MDPs) made of independent particles evolving in a common environment, when the number of particles goes to infinity. In the finite horizon case or with a discounted cost and an infinite horizon, we show that when the number o ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper investigates the limit behavior of Markov decision processes (MDPs) made of independent particles evolving in a common environment, when the number of particles goes to infinity. In the finite horizon case or with a discounted cost and an infinite horizon, we show that when the number
JeanCharles Delvenne∗‡
, 2011
"... The question of knowing whether the policy Iteration algorithm (PI) for solving Markov Decision Processes (MDPs) has exponential or (strongly) polynomial complexity has attracted much attention in the last 50 years. Recently, Fearnley proposed an example on which PI needs an exponential number of it ..."
Abstract
 Add to MetaCart
The question of knowing whether the policy Iteration algorithm (PI) for solving Markov Decision Processes (MDPs) has exponential or (strongly) polynomial complexity has attracted much attention in the last 50 years. Recently, Fearnley proposed an example on which PI needs an exponential number
Results 1  10
of
13