Results 1  10
of
88
Nearoptimal reinforcement learning in polynomial time
 Machine Learning
, 1998
"... We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on the resources required to achieve nearoptimal return in general Markov decision processes. After observing that the number of actions required to approach the optimal return is lower bounded by the m ..."
Abstract

Cited by 304 (5 self)
 Add to MetaCart
We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on the resources required to achieve nearoptimal return in general Markov decision processes. After observing that the number of actions required to approach the optimal return is lower bounded
RMAX  A General Polynomial Time Algorithm for NearOptimal Reinforcement Learning
, 2001
"... Rmax is a very simple modelbased reinforcement learning algorithm which can attain nearoptimal average reward in polynomial time. In Rmax, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The mod ..."
Abstract

Cited by 297 (10 self)
 Add to MetaCart
Rmax is a very simple modelbased reinforcement learning algorithm which can attain nearoptimal average reward in polynomial time. In Rmax, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model
PAC Associative Reinforcement Learning
, 1995
"... General algorithms for the reinforcement learning problem typically learn policies in the form of a table that directly maps the states of the environment into actions. When the statespace is large these methods become impractical. One approach to increase efficiency is to restrict the class of pol ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
of policies by considering only policies that can be described using some fixed representation. This paper pursues this approach and analyzes the associative reinforcement learning problem in the PAC learning framework. As a representation, we use a general form of decision lists that can describe a wide
PAC Reinforcement Learning Bounds for RTDP and RandRTDP
"... Realtime Dynamic Programming (RTDP) is a popular algorithm for planning in a Markov Decision Process (MDP). It can also be viewed as a learning algorithm, where the agent improves the value function and policy while acting in an MDP. It has been empirically observed that an RTDP agent generally per ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Realtime Dynamic Programming (RTDP) is a popular algorithm for planning in a Markov Decision Process (MDP). It can also be viewed as a learning algorithm, where the agent improves the value function and policy while acting in an MDP. It has been empirically observed that an RTDP agent generally
An analytic solution to discrete Bayesian reinforcement learning.
 In ICML.
, 2006
"... Abstract Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving this goal because the amount of exploration required is often too costly and/or too time ..."
Abstract

Cited by 139 (8 self)
 Add to MetaCart
Abstract Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving this goal because the amount of exploration required is often too costly and/or too time
On PAC learning algorithms for rich Boolean function classes
, 2007
"... We give an overview of the fastest known algorithms for learning various expressive classes of Boolean functions in the Probably Approximately Correct (PAC) learning model. In addition to surveying previously known results, we use existing techniques to give the first known subexponentialtime algo ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We give an overview of the fastest known algorithms for learning various expressive classes of Boolean functions in the Probably Approximately Correct (PAC) learning model. In addition to surveying previously known results, we use existing techniques to give the first known subexponentialtime
Efficient Reinforcement Learning
 In Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory
, 1994
"... In this paper we propose a new formal model for studying reinforcement learning, based on Valiant's PAC framework. In our model the learner does not have direct access to every state of the environment. Instead, every sequence of experiments starts in a fixed initial state and the learner is pr ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
In this paper we propose a new formal model for studying reinforcement learning, based on Valiant's PAC framework. In our model the learner does not have direct access to every state of the environment. Instead, every sequence of experiments starts in a fixed initial state and the learner
NearBayesian exploration in polynomial time (full version). Available at http://ai.stanford.edu/˜kolter
, 2009
"... We consider the exploration/exploitation problem in reinforcement learning (RL). The Bayesian approach to modelbased RL offers an elegant solution to this problem, by considering a distribution over possible models and acting to maximize expected reward; unfortunately, the Bayesian solution is intr ..."
Abstract

Cited by 71 (0 self)
 Add to MetaCart
We consider the exploration/exploitation problem in reinforcement learning (RL). The Bayesian approach to modelbased RL offers an elegant solution to this problem, by considering a distribution over possible models and acting to maximize expected reward; unfortunately, the Bayesian solution
PAC Adaptive Control of Linear Systems
 in Proceedings of the 10th Annual Conference on Computational Learning Theory, ACM
, 1997
"... We consider a special case of reinforcement learning where the environment can be described by a linear system. The states of the environment and the actions the agent can perform are represented by real vectors and the system dynamic is given by a linear equation with a stochastic component. The pr ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
We consider a special case of reinforcement learning where the environment can be described by a linear system. The states of the environment and the actions the agent can perform are represented by real vectors and the system dynamic is given by a linear equation with a stochastic component
Learning partially observable deterministic action models
 In Proc. Nineteenth International Joint Conference on Artificial Intelligence (IJCAI ’05
, 2005
"... We present exact algorithms for identifying deterministicactions ’ effects and preconditions in dynamic partially observable domains. They apply when one does not know the action model (the way actions affect the world) of a domain and must learn it from partial observations over time. Such scenari ..."
Abstract

Cited by 55 (2 self)
 Add to MetaCart
We present exact algorithms for identifying deterministicactions ’ effects and preconditions in dynamic partially observable domains. They apply when one does not know the action model (the way actions affect the world) of a domain and must learn it from partial observations over time
Results 1  10
of
88