Results 1  10
of
78
Nearoptimal Regret Bounds for Reinforcement Learning
"... For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s ..."
Abstract

Cited by 98 (11 self)
 Add to MetaCart
For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s
Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning
"... We consider an agent interacting with an environment in a single stream of actions, observations, and rewards, with no reset. This process is not assumed to be a Markov Decision Process (MDP). Rather, the agent has several representations (mapping histories of past interactions to a discrete state s ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
We consider an agent interacting with an environment in a single stream of actions, observations, and rewards, with no reset. This process is not assumed to be a Markov Decision Process (MDP). Rather, the agent has several representations (mapping histories of past interactions to a discrete state
Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning
"... We consider an agent interacting with an environment in a single stream of actions, observations, and rewards, with no reset. This process is not assumed to be a Markov Decision Process (MDP). Rather, the agent has several representations (mapping histories of past interactions to a discrete stat ..."
Abstract
 Add to MetaCart
We consider an agent interacting with an environment in a single stream of actions, observations, and rewards, with no reset. This process is not assumed to be a Markov Decision Process (MDP). Rather, the agent has several representations (mapping histories of past interactions to a discrete
Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning
"... We consider an agent interacting with an environment in a single stream of actions, observations, and rewards, with no reset. This process is not assumed to be a Markov Decision Process (MDP). Rather, the agent has several representations (mapping histories of past interactions to a discrete stat ..."
Abstract
 Add to MetaCart
We consider an agent interacting with an environment in a single stream of actions, observations, and rewards, with no reset. This process is not assumed to be a Markov Decision Process (MDP). Rather, the agent has several representations (mapping histories of past interactions to a discrete
Structured Prediction with Reinforcement Learning
 MACHINE LEARNING JOURNAL
, 2008
"... We formalize the problem of Structured Prediction as a Reinforcement Learning task. We first define a Structured Prediction Markov Decision Process (SPMDP), an instantiation of Markov Decision Processes for Structured Prediction and show that learning an optimal policy for this SPMDP is equivalen ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We formalize the problem of Structured Prediction as a Reinforcement Learning task. We first define a Structured Prediction Markov Decision Process (SPMDP), an instantiation of Markov Decision Processes for Structured Prediction and show that learning an optimal policy for this SP
Final Report: Structure in Reinforcement Learning
, 2013
"... As planned, in the beginning of the project I’ve been concentrating on the topic of online aggregation for undiscounted reinforcement learning in Markov decision processes (MDPs). I’ve started research on online aggregation already back in Austria, so that I could quickly conclude work by proving re ..."
Abstract
 Add to MetaCart
As planned, in the beginning of the project I’ve been concentrating on the topic of online aggregation for undiscounted reinforcement learning in Markov decision processes (MDPs). I’ve started research on online aggregation already back in Austria, so that I could quickly conclude work by proving
MIMO transmission control in fading channels  A constrained markov decision process formulation with monotone randomized policies
 IEEE TRANSACTIONS ON SIGNAL PROCESSING
, 2007
"... This paper addresses the optimal power and rate allocation control in multipleinput multipleoutput (MIMO) wireless systems over Markovian fading channels. The problem is posed as an infinite horizon averagecost constrained Markov decision process (CMDP) with the goal of minimizing the average tr ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
This paper addresses the optimal power and rate allocation control in multipleinput multipleoutput (MIMO) wireless systems over Markovian fading channels. The problem is posed as an infinite horizon averagecost constrained Markov decision process (CMDP) with the goal of minimizing the average
Inferencebased Decision Making in Games
, 2011
"... Background: Reinforcement learning in complex games has traditionally been the domain of valueor policy iteration algorithms, resulting from their effectiveness in planning in Markov decision processes, before algorithms based on regret minimization guarantees such as upper confidence bounds applied ..."
Abstract
 Add to MetaCart
Background: Reinforcement learning in complex games has traditionally been the domain of valueor policy iteration algorithms, resulting from their effectiveness in planning in Markov decision processes, before algorithms based on regret minimization guarantees such as upper confidence bounds
Expediting RL by Using Graphical Structures (Short Paper)
"... The goal of Reinforcement learning (RL) is to maximize reward (minimize cost) in a Markov decision process (MDP) without knowing the underlying model a priori. RL algorithms tend to be much slower than planning algorithms, which require the model as input. Recent results demonstrate that MDP plannin ..."
Abstract
 Add to MetaCart
The goal of Reinforcement learning (RL) is to maximize reward (minimize cost) in a Markov decision process (MDP) without knowing the underlying model a priori. RL algorithms tend to be much slower than planning algorithms, which require the model as input. Recent results demonstrate that MDP
Sample Complexity Bounds of Exploration
"... Abstract Efficient exploration is widely recognized as a fundamental challenge inherent in reinforcement learning. Algorithms that explore efficiently converge faster to nearoptimal policies. While heuristics techniques are popular in practice, they lack formal guarantees and may not work well in g ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
is used to unify most existing modelbased PACMDP algorithms for various subclasses of Markov decision processes. We also compare the samplecomplexity framework to alternatives for formalizing exploration efficiency such as regret minimization and Bayes optimal solutions. 1
Results 1  10
of
78