Results 1  10
of
161
Reinforcement learning: a survey
 Journal of Artificial Intelligence Research
, 1996
"... This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract

Cited by 1309 (22 self)
 Add to MetaCart
This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trialanderror interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
Planning and acting in partially observable stochastic domains
 ARTIFICIAL INTELLIGENCE
, 1998
"... In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm ..."
Abstract

Cited by 832 (30 self)
 Add to MetaCart
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm for solving pomdps offline and show how, in some cases, a finitememory controller can be extracted from the solution to a pomdp. We conclude with a discussion of how our approach relates to previous work, the complexity of finding exact solutions to pomdps, and of some possibilities for finding approximate solutions.
Robotic mapping: A survey
 Exploring Artificial Intelligence in the New Millenium
"... This article provides a comprehensive introduction into the field of robotic mapping, with a focus on indoor mapping. It describes and compares various probabilistic techniques, as they are presently being applied to a vast array of mobile robot mapping problems. The history of robotic mapping is al ..."
Abstract

Cited by 289 (9 self)
 Add to MetaCart
This article provides a comprehensive introduction into the field of robotic mapping, with a focus on indoor mapping. It describes and compares various probabilistic techniques, as they are presently being applied to a vast array of mobile robot mapping problems. The history of robotic mapping is also described, along with an extensive list of open research problems.
Acting Optimally in Partially Observable Stochastic Domains
, 1994
"... In this paper, we describe the partially observable Markov decision process (pomdp) approach to finding optimal or nearoptimal control strategies for partially observable stochastic environments, given a complete model of the environment. The pomdp approach was originally developed in the oper ..."
Abstract

Cited by 275 (16 self)
 Add to MetaCart
In this paper, we describe the partially observable Markov decision process (pomdp) approach to finding optimal or nearoptimal control strategies for partially observable stochastic environments, given a complete model of the environment. The pomdp approach was originally developed in the operations research community and provides a formal basis for planning problems that have been of interest to the AI community. We found the existing algorithms for computing optimal control strategies to be highly computationally inefficient and have developed a new algorithm that is empirically more efficient. We sketch this algorithm and present preliminary results on several small problems that illustrate important properties of the pomdp approach.
Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach
 In Proceedings of the Tenth National Conference on Artificial Intelligence
, 1992
"... It is known that Perceptual Aliasing may significantly diminish the effectiveness of reinforcement learning algorithms [ Whitehead and Ballard, 1991 ] . Perceptual aliasing occurs when multiple situations that are indistinguishable from immediate perceptual input require different responses from the ..."
Abstract

Cited by 194 (0 self)
 Add to MetaCart
It is known that Perceptual Aliasing may significantly diminish the effectiveness of reinforcement learning algorithms [ Whitehead and Ballard, 1991 ] . Perceptual aliasing occurs when multiple situations that are indistinguishable from immediate perceptual input require different responses from the system. For example, if a robot can only see forward, yet the presence of a battery charger behind it determines whether or not it should backup, immediate perception alone is insufficient for determining the most appropriate action. It is problematic since reinforcement algorithms typically learn a control policy from immediate perceptual input to the optimal choice of action. This paper introduces the predictive distinctions approach to compensate for perceptual aliasing caused from incomplete perception of the world. An additional component, a predictive model, is utilized to track aspects of the world that may not be visible at all times. In addition to the control policy, the model mus...
Acting under Uncertainty: Discrete Bayesian Models for MobileRobot Navigation
 In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems
, 1996
"... Discrete Bayesian models have been used to model uncertainty for mobilerobot navigation, but the question of how actions should be chosen remains largely unexplored. This paper presents the optimal solution to the problem, formulated as a partially observable Markov decision process. Since solving ..."
Abstract

Cited by 183 (12 self)
 Add to MetaCart
Discrete Bayesian models have been used to model uncertainty for mobilerobot navigation, but the question of how actions should be chosen remains largely unexplored. This paper presents the optimal solution to the problem, formulated as a partially observable Markov decision process. Since solving for the optimal control policy is intractable, in general, it goes on to explore a variety of heuristic control strategies. The control strategies are compared experimentally, both in simulation and in runs on a robot. 1 Introduction A robot that delivers items and performs errands in an office environment needs to be able to navigate robustly. It should be able to overcome errors in perception and action, at worst getting lost for some period of time, but then being able to recover by relocalizing itself and continuing with its task. The Bayesian framework is particularly appropriate for modeling the robot's belief about its location (or, more generally, the state of the world). It suppl...
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one ..."
Abstract

Cited by 177 (8 self)
 Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
Hidden Markov processes
 IEEE Trans. Inform. Theory
, 2002
"... Abstract—An overview of statistical and informationtheoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discretetime finitestate homogeneous Markov chain observed through a discretetime memoryless invariant channel. In recent years, the work of Baum and Petrie on finite ..."
Abstract

Cited by 173 (3 self)
 Add to MetaCart
Abstract—An overview of statistical and informationtheoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discretetime finitestate homogeneous Markov chain observed through a discretetime memoryless invariant channel. In recent years, the work of Baum and Petrie on finitestate finitealphabet HMPs was expanded to HMPs with finite as well as continuous state spaces and a general alphabet. In particular, statistical properties and ergodic theorems for relative entropy densities of HMPs were developed. Consistency and asymptotic normality of the maximumlikelihood (ML) parameter estimator were proved under some mild conditions. Similar results were established for switching autoregressive processes. These processes generalize HMPs. New algorithms were developed for estimating the state, parameter, and order of an HMP, for universal coding and classification of HMPs, and for universal decoding of hidden Markov channels. These and other related topics are reviewed in this paper. Index Terms—Baum–Petrie algorithm, entropy ergodic theorems, finitestate channels, hidden Markov models, identifiability, Kalman filter, maximumlikelihood (ML) estimation, order estimation, recursive parameter estimation, switching autoregressive processes, Ziv inequality. I.
Probabilistic Algorithms in Robotics
 AI Magazine vol
"... This article describes a methodology for programming robots known as probabilistic robotics. The probabilistic paradigm pays tribute to the inherent uncertainty in robot perception, relying on explicit representations of uncertainty when determining what to do. This article surveys some of the progr ..."
Abstract

Cited by 165 (9 self)
 Add to MetaCart
This article describes a methodology for programming robots known as probabilistic robotics. The probabilistic paradigm pays tribute to the inherent uncertainty in robot perception, relying on explicit representations of uncertainty when determining what to do. This article surveys some of the progress in the field, using indepth examples to illustrate some of the nuts and bolts of the basic approach. Our central conjecture is that the probabilistic approach to robotics scales better to complex realworld applications than approaches that ignore a robot’s uncertainty. 1
Planning Under Time Constraints in Stochastic Domains
 ARTIFICIAL INTELLIGENCE
, 1993
"... We provide a method, based on the theory of Markov decision processes, for efficient planning in stochastic domains. Goals are encoded as reward functions, expressing the desirability of each world state; the planner must find a policy (mapping from states to actions) that maximizes future reward ..."
Abstract

Cited by 163 (19 self)
 Add to MetaCart
We provide a method, based on the theory of Markov decision processes, for efficient planning in stochastic domains. Goals are encoded as reward functions, expressing the desirability of each world state; the planner must find a policy (mapping from states to actions) that maximizes future rewards. Standard goals of achievement, as well as goals of maintenance and prioritized combinations of goals, can be specified in this way. An optimal policy can be found using existing methods, but these methods require time at best polynomial in the number of states in the domain, where the number of states is exponential in the number of propositions (or state variables). By using information about the starting state, the reward function, and the transition probabilities of the domain, we restrict the planner's attention to a set of world states that are likely to be encountered in satisfying the goal. Using this restricted set of states, the planner can generate more or less complete ...