Results 1  10
of
41
PointBased Value Iteration for Continuous POMDPs
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for modelbased POMDPs are restricted to discrete states, actions, and observations, but many realworld problems such as, for instance, robot na ..."
Abstract

Cited by 65 (4 self)
 Add to MetaCart
We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for modelbased POMDPs are restricted to discrete states, actions, and observations, but many realworld problems such as, for instance, robot navigation, are naturally defined on continuous spaces. In this work, we demonstrate that the value function for continuous POMDPs is convex in the beliefs over continuous state spaces, and piecewiselinear convex for the particular case of discrete observations and actions but still continuous states. We also demonstrate that continuous Bellman backups are contracting and isotonic ensuring the monotonic convergence of valueiteration algorithms. Relying on those properties, we extend the PERSEUS algorithm, originally developed for discrete POMDPs, to work in continuous state spaces by representing the observation, transition, and reward models using Gaussian mixtures, and the beliefs using Gaussian mixtures or particle sets. With these representations, the integrals that appear in the Bellman backup can be computed in closed form and, therefore, the algorithm is computationally feasible. Finally, we further extend PERSEUS to deal with continuous action and observation sets by designing effective sampling approaches.
CoEvolving Recurrent Neurons Learn Deep Memory POMDPs
 PROCEEDINGS OF THE GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO
, 2005
"... Recurrent neural networks are theoretically capable of learning complex temporal sequences, but training them through gradientdescent is too slow and unstable for practical use in reinforcement learning environments. Neuroevolution, the evolution of artificial neural networks using genetic algorith ..."
Abstract

Cited by 22 (7 self)
 Add to MetaCart
(Show Context)
Recurrent neural networks are theoretically capable of learning complex temporal sequences, but training them through gradientdescent is too slow and unstable for practical use in reinforcement learning environments. Neuroevolution, the evolution of artificial neural networks using genetic algorithms, can potentially solve realworld reinforcement learning tasks that require deep use of memory, i.e. memory spanning hundreds or thousands of inputs, by searching the space of recurrent neural networks directly. In this paper, we introduce a new neuroevolution algorithm called Hierarchical Enforced SubPopulations that simultaneously evolves networks at two levels of granularity: full networks and network components or neurons. We demonstrate the method in two POMDP tasks that involve temporal dependencies of up to thousands of timesteps, and show that it is faster and simpler than the current best conventional reinforcement learning system on these tasks.
A Robot that ReinforcementLearns to Identify and Memorize Important Previous Observations
 IN PROCEEDINGS OF THE 2003 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS
, 2003
"... It is difficult to apply traditional reinforcement learning algorithms to robots, due to problems with large and continuous domains, partial observability, and limited numbers of learning experiences. This paper deals with these problems by combining: 1. reinforcement learning with memory, implement ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
It is difficult to apply traditional reinforcement learning algorithms to robots, due to problems with large and continuous domains, partial observability, and limited numbers of learning experiences. This paper deals with these problems by combining: 1. reinforcement learning with memory, implemented using an LSTM recurrent neural network whose inputs are discrete events extracted from raw inputs; 2. online exploration and offline policy learning. An experiment with a real robot demonstrates the methodology's feasibility.
Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets
, 2002
"... The Long ShortTerm Memory (LSTM) network trained by gradient descent solves difficult problems which traditional recurrent neural networks in general cannot. We have recently observed that the decoupled extended Kalman filter training algorithm allows for even better performance, reducing significa ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
The Long ShortTerm Memory (LSTM) network trained by gradient descent solves difficult problems which traditional recurrent neural networks in general cannot. We have recently observed that the decoupled extended Kalman filter training algorithm allows for even better performance, reducing significantly the number of training steps when compared to the original gradient descent training algorithm. In this paper we present a set of experiments which are unsolvable by classical recurrent networks but which are solved elegantly and robustly and quickly by LSTM combined with Kalman filters.
Learning to Ground Fact Symbols in BehaviorBased Robots
, 2002
"... A robot running a hybrid control system (its architecture comprising a deliberative and a reactive part) must permanently update its symbolic situation model to allow its ongoing deliberation to operate. Previous work has shown that this update can be improved by using, possibly among other sources, ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
A robot running a hybrid control system (its architecture comprising a deliberative and a reactive part) must permanently update its symbolic situation model to allow its ongoing deliberation to operate. Previous work has shown that this update can be improved by using, possibly among other sources, the robot's sensor information as filtered through recent activation value histories of robot behaviors. In that work, characteristic patterns in groups of behavior activation values are used to define chronicles, which allow true facts about the current situation to be hypothesized. Chronicle definitions are handcrafted as part of the domain modeling.
Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning
 AUTON AGENT MULTIAGENT SYST
, 2009
"... ..."
Hierarchical Reinforcement Learning with Subpolicies Specializing for Learned Subgoals
"... This paper describes a method for hierarchical reinforcement learning in which highlevel policies automatically discover subgoals, and lowlevel policies learn to specialize for different subgoals. Subgoals are represented as desired abstract observations which cluster raw input data. Highlevel va ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
This paper describes a method for hierarchical reinforcement learning in which highlevel policies automatically discover subgoals, and lowlevel policies learn to specialize for different subgoals. Subgoals are represented as desired abstract observations which cluster raw input data. Highlevel value functions cover the state space at a coarse level; lowlevel value functions cover only parts of the state space at a finegrained level. An experiment shows that this method outperforms several flat reinforcement learning methods. A second experiment shows how problems of partial observability due to observation abstraction can be overcome using highlevel policies with memory. Key words Reinforcement learning, hierarchical reinforcement learning, feedforward neural networks, recurrent neural networks, MDPs, POMDPs, shortterm memory 1
Sequential Constant Size Compressors and Reinforcement Learning
 In Proceedings of the Fourth Conference on Artificial General Intelligence
, 2011
"... Abstract. Traditional Reinforcement Learning methods are insufficient for AGIs who must be able to learn to deal with Partially Observable Markov Decision Processes. We investigate a novel method for dealing with this problem: standard RL techniques using as input the hidden layer output of a Sequen ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
(Show Context)
Abstract. Traditional Reinforcement Learning methods are insufficient for AGIs who must be able to learn to deal with Partially Observable Markov Decision Processes. We investigate a novel method for dealing with this problem: standard RL techniques using as input the hidden layer output of a Sequential ConstantSize Compressor (SCSC). The SCSC takes the form of a sequential Recurrent AutoAssociative Memory, trained through standard backpropagation. Results illustrate the feasibility of this approach — this system learns to deal with highdimensional visual observations (up to 640 pixels) in partially observable environments where there are long time lags (up to 12 steps) between relevant sensory information and necessary action.
Solving deep memory pomdps with recurrent policy gradients
 In International Conference on Artificial Neural Networks
, 2007
"... Abstract. This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limitedmemory stochastic policies for partially observable Markov decision problems (POMDPs) that require longterm memories of past observations. The approach involves approximating a ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
Abstract. This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limitedmemory stochastic policies for partially observable Markov decision problems (POMDPs) that require longterm memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating returnweighted characteristic eligibilities through time. Using a “Long ShortTerm Memory ” architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task. 1
Recurrent Policy Gradients
, 2009
"... Reinforcement learning for partially observable Markov decision problems (POMDPs) is a challenge as it requires policies with an internal state. Traditional approaches suffer significantly from this shortcoming and usually make strong assumptions on the problem domain such as perfect system models, ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Reinforcement learning for partially observable Markov decision problems (POMDPs) is a challenge as it requires policies with an internal state. Traditional approaches suffer significantly from this shortcoming and usually make strong assumptions on the problem domain such as perfect system models, stateestimators and a Markovian hidden system. Recurrent neural networks (RNNs) offer a natural framework for dealing with policy learning using hidden state and require only few limiting assumptions. As they can be trained well using gradient descent, they are suited for policy gradient approaches. In this paper, we present a policy gradient method, the Recurrent Policy Gradient which constitutes a modelfree reinforcement learning method. It is aimed at training limitedmemory stochastic policies on problems which require longterm memories of past observations. The approach involves approximating a policy gradient for a recurrent neural network by backpropagating returnweighted characteristic eligibilities through time. Using a “Long ShortTerm Memory” RNN architecture, we are able to outperform previous RL methods on three important benchmark tasks. Furthermore, we show that using historydependent baselines helps reducing estimation variance significantly, thus enabling our approach to tackle more challenging, highly stochastic environments.