Results 1  10
of
58
Recent advances in hierarchical reinforcement learning
, 2003
"... A preliminary unedited version of this paper was incorrectly published as part of Volume ..."
Abstract

Cited by 225 (25 self)
 Add to MetaCart
A preliminary unedited version of this paper was incorrectly published as part of Volume
PointBased Value Iteration for Continuous POMDPs
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for modelbased POMDPs are restricted to discrete states, actions, and observations, but many realworld problems such as, for instance, robot na ..."
Abstract

Cited by 65 (4 self)
 Add to MetaCart
We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for modelbased POMDPs are restricted to discrete states, actions, and observations, but many realworld problems such as, for instance, robot navigation, are naturally defined on continuous spaces. In this work, we demonstrate that the value function for continuous POMDPs is convex in the beliefs over continuous state spaces, and piecewiselinear convex for the particular case of discrete observations and actions but still continuous states. We also demonstrate that continuous Bellman backups are contracting and isotonic ensuring the monotonic convergence of valueiteration algorithms. Relying on those properties, we extend the PERSEUS algorithm, originally developed for discrete POMDPs, to work in continuous state spaces by representing the observation, transition, and reward models using Gaussian mixtures, and the beliefs using Gaussian mixtures or particle sets. With these representations, the integrals that appear in the Bellman backup can be computed in closed form and, therefore, the algorithm is computationally feasible. Finally, we further extend PERSEUS to deal with continuous action and observation sets by designing effective sampling approaches.
Learning Hierarchical Partially Observable Markov Decision Process Models for Robot Navigation
, 2001
"...  We propose and investigate a general framework for hierarchical modeling of partially observable environments, such as oce buildings, using Hierarchical Hidden Markov Models (HHMMs). Our main goal is to explore hierarchical modeling as a basis for designing more ecient methods for model constructi ..."
Abstract

Cited by 48 (9 self)
 Add to MetaCart
 We propose and investigate a general framework for hierarchical modeling of partially observable environments, such as oce buildings, using Hierarchical Hidden Markov Models (HHMMs). Our main goal is to explore hierarchical modeling as a basis for designing more ecient methods for model construction and useage. As a case study we focus on indoor robot navigation and show how this framework can be used to learn a hierarchy of models of the environment at dierent levels of spatial abstraction. We introduce the idea of model reuse that can be used to combine already learned models into a larger model. We describe an extension of the HHMM model to includes actions, which we call hierarchical POMDPs, and describe a modied hierarchical BaumWelch algorithm to learn these models. We train dierent families of hierarchical models for a simulated and a real world corridor environment and compare them with the standard \at" representation of the same environment. We show that the hierarchical POMDP approach, combined with model reuse, allows learning hierarchical models that t the data better and train faster than at models.
Representing hierarchical POMDPs as DBNs for multiscale robot localization
, 2004
"... We explore the advantages of representing hierarchical partially observable Markov decision processes (HPOMDPs) as dynamic Bayesian networks (DBNs). In particular, we focus on the special case of using HPOMDPs to represent multiresolution spatial maps for indoor robot navigation. Our results show ..."
Abstract

Cited by 34 (2 self)
 Add to MetaCart
We explore the advantages of representing hierarchical partially observable Markov decision processes (HPOMDPs) as dynamic Bayesian networks (DBNs). In particular, we focus on the special case of using HPOMDPs to represent multiresolution spatial maps for indoor robot navigation. Our results show that a DBN representation of HPOMDPs can train significantly faster than the original learning algorithm for HPOMDPs or the equivalent flat POMDP, and requires much less data. In addition, the DBN formulation can easily be extended to parameter tying and factoring of variables, which further reduces the time and sample complexity. This enables us to apply HPOMDP methods to much larger problems than previously possible. 1.
PolicyGradient Algorithms for Partially Observable Markov decision processes
, 2003
"... Partially observable Markov decision processes are interesting because of their ability to model most conceivable realworld learning problems, for example, robot navigation, driving a car, speech recognition, stock trading, and playing games. The downside of this generality is that exact algorithms ..."
Abstract

Cited by 36 (2 self)
 Add to MetaCart
Partially observable Markov decision processes are interesting because of their ability to model most conceivable realworld learning problems, for example, robot navigation, driving a car, speech recognition, stock trading, and playing games. The downside of this generality is that exact algorithms are computationally intractable. Such computational complexity motivates approximate approaches. One such class of algorithms are the socalled policygradient methods from reinforcement learning. They seek to adjust the parameters of an agent in the direction that maximises the longterm average of a reward signal. Policygradient methods are attractive as a scalable approach for controlling partially observable Markov decision processes (POMDPs). In the most
Dissertation: Latent Variable and Predictive Models of Dynamical Systems.
, 1981
"... To seek a postdoctoral or permanent research position with a focus on challenging projects in statistical machine learning and data mining. Research Summary I am interested in modeling the behavior of timeevolving systems using latentvariable representations. My research focuses on circumventing d ..."
Abstract
 Add to MetaCart
To seek a postdoctoral or permanent research position with a focus on challenging projects in statistical machine learning and data mining. Research Summary I am interested in modeling the behavior of timeevolving systems using latentvariable representations. My research focuses on circumventing drawbacks of traditional latent variable models: local minima and instability during optimization, difficulties in scaling to large state spaces or high dimensional data, etc. I create novel learning and inference algorithms for existing models and devise new models with theoretically sound learning methods. The resulting models are useful for tasks such as prediction, recognition, classification, anomaly detection, inferring missing values, and decisionmaking under uncertainty. Examples of systems I have worked on include mobile robot sensors, laptop user modeling, authorkeyword evolution in document corpora, biosurveillance for disease tracking, dynamic video textures, and human activity recognition over time using wearable sensors.
Rapid Concept Learning for Mobile Robots
 Machine Learning 31(13):727. Also published in Autonomous Robots
, 1998
"... . Concept learning in robotics is an extremely challenging problem. Sensory data is often highdimensional, and noisy due to specularities and other irregularities. In this paper, we investigate two general strategies to speed up learning, based on spatial decomposition of the sensory representation ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
. Concept learning in robotics is an extremely challenging problem. Sensory data is often highdimensional, and noisy due to specularities and other irregularities. In this paper, we investigate two general strategies to speed up learning, based on spatial decomposition of the sensory representation, and simultaneous learning of multiple classes using a shared structure. We study two concept learning scenarios: a hallway navigation problem, where the robot has to induce feature detectors such as "opening" or "wall". The second task is recycling, where the robot has to learn to recognize objects, such as a "trash can". We use a common underlying function approximator in both studies in the form of a feedforward neural network, with several hundred input units and multiple output units. Despite the high degree of freedom afforded by such an approximator, we show the two strategies provide sufficient bias to achieve rapid learning. We provide detailed experimental studies on an actual mobile robot called PAVLOV to illustrate the effectiveness of this approach.
Approximate Planning with Hierarchical Partially Observable Markov Decision Process Models for Robot Navigation
, 2002
"... We propose and investigate a planning framework based on the Hierarchical Partially Observable Markov Decision Process model (HPOMDP), and apply it to robot navigation. We show how this framework can be used to produce more robust plans as compared to at models such as Partially Observable Markov De ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
We propose and investigate a planning framework based on the Hierarchical Partially Observable Markov Decision Process model (HPOMDP), and apply it to robot navigation. We show how this framework can be used to produce more robust plans as compared to at models such as Partially Observable Markov Decision Processes (POMDPs). In our approach the environment is modeled at dierent levels of resolution, where abstract states represent both spatial and temporal abstraction. We test our hierarchical POMDP approach using a large simulated and real navigation environment. The results show that the robot is more successful in navigating to goals starting with no positional knowledge (uniform initial belief state distribution) using the hierarchical POMDP framework as compared to the at POMDP approach. In addition, the HPOMDP model allows the robot to eciently model and navigate large scale environments.
Approximate planning in POMDPs with macroactions
 in Advances in Neural Information Processing Systems 16 (NIPS
, 2004
"... Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction. We present and explore a new reinforcement learning algorithm over gridpoints in belief space, which uses macroactions ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction. We present and explore a new reinforcement learning algorithm over gridpoints in belief space, which uses macroactions and Monte Carlo updates of the Qvalues. We apply the algorithm to a large scale robot navigation task and demonstrate that with temporal abstraction we can consider an even smaller part of the belief space, we can learn POMDP policies faster, and we can do information gathering more efficiently. 1
Optimizing Production Manufacturing using Reinforcement Learning
 In Eleventh International FLAIRS Conference
, 1998
"... Many industrial processes involve making parts with ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Many industrial processes involve making parts with
Results 1  10
of
58