Results 1 - 10
of
11
Probabilistic Robot Navigation in Partially Observable Environments
- In Proceedings of IJCAI-95
, 1995
"... Autonomous mobile robots need very reliable navigation capabilities in order to operate unattended for long periods of time. This paper reports on first results of a research program that uses partially observable Markov models to robustly track a robot's location in office environments and to direc ..."
Abstract
-
Cited by 231 (9 self)
- Add to MetaCart
Autonomous mobile robots need very reliable navigation capabilities in order to operate unattended for long periods of time. This paper reports on first results of a research program that uses partially observable Markov models to robustly track a robot's location in office environments and to direct its goal-oriented actions. The approach explicitly maintains a probability distribution over the possible locations of the robot, taking into account various sources of uncertainty, including approximate knowledge of the environment, and actuator and sensor uncertainty. A novel feature of our approach is its integration of topological map information with approximate metric information. We demonstrate the robustness of this approach in controlling an actual indoor mobile robot navigating corridors. 1 Introduction We are interested in the task of long-term autonomous navigation in an office environment (with corridors, foyers, and rooms). While the state of the art in autonomous office nav...
Xavier: A Robot Navigation Architecture Based on Partially Observable Markov Decision Process Models
- Artificial Intelligence Based Mobile Robotics: Case Studies of Successful Robot Systems
, 1998
"... Autonomous mobile robots need very reliable navigation capabilities in order to operate unattended for long periods of time. We present a technique for achieving this goal that uses partially observable Markov decision process models (POMDPs) to explicitly model navigation uncertainty, including act ..."
Abstract
-
Cited by 88 (7 self)
- Add to MetaCart
Autonomous mobile robots need very reliable navigation capabilities in order to operate unattended for long periods of time. We present a technique for achieving this goal that uses partially observable Markov decision process models (POMDPs) to explicitly model navigation uncertainty, including actuator and sensor uncertainty and approximate knowledge of the environment. This allows the robot to maintain a probability distribution over its current pose. Thus, while the robot rarely knows exactly where it is, it always has some belief as to what its true pose is, and is never completely lost. We present a navigation architecture based on POMDPs that provides a uniform framework with an established theoretical foundation for pose estimation, path planning, robot control during navigation, and learning. Our experiments show that this architecture indeed leads to robust corridor navigation for an actual indoor mobile robot. 1
TD Models: Modeling the World at a Mixture of Time Scales
- PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING
, 1995
"... Temporal-difference (TD) learning can be used not just to predict rewards, as is commonly done in reinforcement learning, but also to predict states, i.e., to learn a model of the world's dynamics. We present theory and algorithms for intermixing TD models of the world at different levels of tem ..."
Abstract
-
Cited by 55 (14 self)
- Add to MetaCart
Temporal-difference (TD) learning can be used not just to predict rewards, as is commonly done in reinforcement learning, but also to predict states, i.e., to learn a model of the world's dynamics. We present theory and algorithms for intermixing TD models of the world at different levels of temporal abstraction within a single structure. Such multi-scale TD models can be used in model-based reinforcementlearning architectures and dynamic programming methods in place of conventional Markov models. This enables planning at higher and varied levels of abstraction, and, as such, may prove useful in formulating methods for hierarchical or multi-level planning and reinforcement learning. In this paper we treat only the prediction problem---that of learning a model and value function for the case of fixed agent behavior. Within this context, we establish the theoretical foundations of multi-scale models and derive TD algorithms for learning them. Two small computational experim...
Between MDPs and semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales
- Journal of Artificial Intelligence Research
, 1998
"... Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key challenges for AI. In this paper we develop an approach to these problems based on the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion o ..."
Abstract
-
Cited by 51 (7 self)
- Add to MetaCart
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key challenges for AI. In this paper we develop an approach to these problems based on the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action to include options—whole courses of behavior that may be temporally extended, stochastic, and contingent on events. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Options may be given a priori, learned by experience, or both. They may be used interchangeably with actions in a variety of planning and learning methods. The theory of semi-Markov decision processes (SMDPs) can be applied to model the consequences of options and as a basis for planning and learning methods using them. In this paper we develop these connections, building on prior work by Bradtke and Duff (1995), Parr (in prep.) and others. Our main novel results concern the interface between the MDP and SMDP levels of analysis. We show how a set of options can be altered by changing only their termination conditions
From SAB90 to SAB94 : Four Years of Animat Research
, 1994
"... This paper builds on a previous review of significant research on adaptive behavior in animats. It summarizes the current state of the art and suggests some directions likely to provide interesting results in the near future. 1 Introduction An animat is a simulated animal or a real robot whose rule ..."
Abstract
-
Cited by 33 (8 self)
- Add to MetaCart
This paper builds on a previous review of significant research on adaptive behavior in animats. It summarizes the current state of the art and suggests some directions likely to provide interesting results in the near future. 1 Introduction An animat is a simulated animal or a real robot whose rules of behavior are inspired by those of animals. It is usually equipped with sensors, with actuators, and with a behavioral control architecture that allow it to react or to respond to variations in its environment (internal or external), notably to those that might impair its chances of survival. The behavior of an animat is what the animat does. This is characterized by a sequence of actions which reflects the dynamic interplay between the animat and its environment, mediated through the animat's sensors and actuators. The behavior of an animat is adaptive so long as it allows the animat to survive or to fulfill its mission. This requires that the animat's essential variables be monitored a...
Learning to Solve Multiple Goals
, 1997
"... In many domains, the task can be decomposed into a set of independent subgoals. Often, such tasks are too complex to be learned using standard techniques such as Reinforcement Learning. The complexity is caused by the learning system having to keep track of the status of all sub-goals concurrently. ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
In many domains, the task can be decomposed into a set of independent subgoals. Often, such tasks are too complex to be learned using standard techniques such as Reinforcement Learning. The complexity is caused by the learning system having to keep track of the status of all sub-goals concurrently. Thus, if the solution to one sub-goal is known when another sub-goal is in some given state, the known solution must be relearned when the status of the other sub-goal changes. This dissertation presents a modular approach to reinforcement learning that takes advantage of task decomposition to avoid unnecessary relearning. In the modular approach, modules are created to learn each sub-goal. Each module receives only those inputs relevant to its associated sub-goal, and can therefore learn without being affected by the state of other sub-goals. Furthermore, each module searches a much smaller space than that defined by all inputs considered together, thereby greatly reducing learning time. Si...
Natural Intelligence For Autonomous Agents
- Halmstad University
, 1994
"... The paper presents a general architecture for behaviour based control systems for autonomous agents. A number of archi tectural principles are proposed which make it possible to combine reactive control with learning and problem solving in a coherent way. In particular, I investigate the interactio ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
The paper presents a general architecture for behaviour based control systems for autonomous agents. A number of archi tectural principles are proposed which make it possible to combine reactive control with learning and problem solving in a coherent way. In particular, I investigate the interaction between reinforcement learning, internal world models and dynamic action selection as well as a number of connections to psychological models and biological systems.
Learning with Incomplete Selective Perception
, 1993
"... An agent with selective perception focuses its sensors on those parts of the environment that are relevant to the task at hand. Selective perception is an efficient method of gathering information from the world, but it presents problems for a learning agent when different actions are required in si ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
An agent with selective perception focuses its sensors on those parts of the environment that are relevant to the task at hand. Selective perception is an efficient method of gathering information from the world, but it presents problems for a learning agent when different actions are required in situations for which the selective perception system cannot produce distinguishing outputs. If this happens the agent is said to have incomplete perception, and the agent may be able to use internal state determined by past perceptions and actions in order to choose the correct action. I propose research on learning algorithms that use short-term memory to disambiguate the incomplete perception that arises with selective perception. I present the Utile Distinction Memory algorithm (UDM) that solves the incomplete perception problem using a partially observable Markov decision process to represent the agent's internal state space. A significant feature of the algorithm is that it will build an ...
Motivation and Attention in an Autonomous Agent
- University of Birmingham
, 1993
"... Introduction In an attempt to construct a neural network based architecture for autonomous agents, we have been investigating motivation and attention as natural parts of a cognitive system (Balkenius 1993a, b). The role of motivation was originally suggested to us by problems encountered in behavi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Introduction In an attempt to construct a neural network based architecture for autonomous agents, we have been investigating motivation and attention as natural parts of a cognitive system (Balkenius 1993a, b). The role of motivation was originally suggested to us by problems encountered in behaviour selection. For agents with a large set of interacting behaviours, interference between behaviours becomes a substantial problem. The only tractable solution seems to be to include a functionally central system for behaviour selection. This system is responsible for the activation and inhibition of the behaviours of the agent. We identify this system with the motivational system of an animal where a central decision determines the motivational state which in turn determines behaviour. The design of our motivational system suggests answers to the following questions: (1) What are the determinants of motivation? (2) In what way do motivational states interact with each other? (3) How are b
Q-error as a Selection Mechanism in Modular Reinforcement-Learning Systems ∗
"... This paper introduces a novel multi-modular method for reinforcement learning. A multimodular system is one that partitions the learning task among a set of experts (modules), where each expert is incapable of solving the entire task by itself. There are many advantages to splitting up large tasks i ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper introduces a novel multi-modular method for reinforcement learning. A multimodular system is one that partitions the learning task among a set of experts (modules), where each expert is incapable of solving the entire task by itself. There are many advantages to splitting up large tasks in this way, but existing methods face difficulties when choosing which module(s) should contribute to the agent’s actions at any particular moment. We introduce a novel selection mechanism where every module, besides calculating a set of action values, also estimates its own error for the current input. The selection mechanism combines each module’s estimate of long-term reward and self-error to produce a score by which the next module is chosen. As a result, the modules can use their resources effectively and efficiently divide up the task. The system is shown to learn complex tasks even when the individual modules use only linear function approximators. 1

