Results 1 - 10
of
11
Acting Optimally in Partially Observable Stochastic Domains
, 1994
"... In this paper, we describe the partially observable Markov decision process (pomdp) approach to finding optimal or near-optimal control strategies for partially observable stochastic environments, given a complete model of the environment. The pomdp approach was originally developed in the oper ..."
Abstract
-
Cited by 243 (16 self)
- Add to MetaCart
In this paper, we describe the partially observable Markov decision process (pomdp) approach to finding optimal or near-optimal control strategies for partially observable stochastic environments, given a complete model of the environment. The pomdp approach was originally developed in the operations research community and provides a formal basis for planning problems that have been of interest to the AI community. We found the existing algorithms for computing optimal control strategies to be highly computationally inefficient and have developed a new algorithm that is empirically more efficient. We sketch this algorithm and present preliminary results on several small problems that illustrate important properties of the pomdp approach.
Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents
- In Proceedings of the Tenth International Conference on Machine Learning
, 1993
"... Intelligent human agents exist in a cooperative social environment that facilitates learning. They learn not only by trialand -error, but also through cooperation by sharing instantaneous information, episodic experience, and learned knowledge. The key investigations of this paper are, "Given the sa ..."
Abstract
-
Cited by 220 (0 self)
- Add to MetaCart
Intelligent human agents exist in a cooperative social environment that facilitates learning. They learn not only by trialand -error, but also through cooperation by sharing instantaneous information, episodic experience, and learned knowledge. The key investigations of this paper are, "Given the same number of reinforcement learning agents, will cooperative agents outperform independent agents who do not communicate during learning?" and "What is the price for such cooperation?" Using independent agents as a benchmark, cooperative agents are studied in following ways: (1) sharing sensation, (2) sharing episodes, and (3) sharing learned policies. This paper shows that (a) additional sensation from another agent is beneficial if it can be used efficiently, (b) sharing learned policies or episodes among agents speeds up learning at the cost of communication, and (c) for joint tasks, agents engaging in partnership can significantly outperform independent agents although they may learn slo...
Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach
- In Proceedings of the Tenth National Conference on Artificial Intelligence
, 1992
"... It is known that Perceptual Aliasing may significantly diminish the effectiveness of reinforcement learning algorithms [ Whitehead and Ballard, 1991 ] . Perceptual aliasing occurs when multiple situations that are indistinguishable from immediate perceptual input require different responses from the ..."
Abstract
-
Cited by 173 (0 self)
- Add to MetaCart
It is known that Perceptual Aliasing may significantly diminish the effectiveness of reinforcement learning algorithms [ Whitehead and Ballard, 1991 ] . Perceptual aliasing occurs when multiple situations that are indistinguishable from immediate perceptual input require different responses from the system. For example, if a robot can only see forward, yet the presence of a battery charger behind it determines whether or not it should backup, immediate perception alone is insufficient for determining the most appropriate action. It is problematic since reinforcement algorithms typically learn a control policy from immediate perceptual input to the optimal choice of action. This paper introduces the predictive distinctions approach to compensate for perceptual aliasing caused from incomplete perception of the world. An additional component, a predictive model, is utilized to track aspects of the world that may not be visible at all times. In addition to the control policy, the model mus...
Overcoming Incomplete Perception with Utile Distinction Memory
- IN PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING
, 1993
"... This paper presents a method by which a reinforcement learning agent can solve the incomplete perception problem using memory. The agent uses a hidden Markov model (HMM) to represent its internal state space and creates memory capacity by splitting states of the HMM. The key idea is a test to ..."
Abstract
-
Cited by 96 (6 self)
- Add to MetaCart
This paper presents a method by which a reinforcement learning agent can solve the incomplete perception problem using memory. The agent uses a hidden Markov model (HMM) to represent its internal state space and creates memory capacity by splitting states of the HMM. The key idea is a test to determine when and how a state should be split: the agent only splits a state when doing so will help the agent predict utility. Thus the agent can create only as much memory as needed to perform the task at hand---not as much as would be required to model all the perceivable world. I call the technique UDM, for Utile Distinction Memory.
Types of cost in inductive concept learning
- In Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning
, 2000
"... Inductive concept learning is the task of learning to assign cases to a discrete set of classes. In real-world applications of concept learning, there are many different types of cost involved. The majority of the machine learning literature ignores all types of cost (unless accuracy is interpreted ..."
Abstract
-
Cited by 77 (0 self)
- Add to MetaCart
Inductive concept learning is the task of learning to assign cases to a discrete set of classes. In real-world applications of concept learning, there are many different types of cost involved. The majority of the machine learning literature ignores all types of cost (unless accuracy is interpreted as a type of cost measure). A few papers have investigated the cost of misclassification errors. Very few papers have examined the many other types of cost. In this paper, we attempt to create a taxonomy of the different types of cost that are involved in inductive concept learning. This taxonomy may help to organize the literature on cost-sensitive learning. We hope that it will inspire researchers to investigate all types of cost in inductive concept learning in more depth. 1.
Learning with Incomplete Selective Perception
, 1993
"... An agent with selective perception focuses its sensors on those parts of the environment that are relevant to the task at hand. Selective perception is an efficient method of gathering information from the world, but it presents problems for a learning agent when different actions are required in si ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
An agent with selective perception focuses its sensors on those parts of the environment that are relevant to the task at hand. Selective perception is an efficient method of gathering information from the world, but it presents problems for a learning agent when different actions are required in situations for which the selective perception system cannot produce distinguishing outputs. If this happens the agent is said to have incomplete perception, and the agent may be able to use internal state determined by past perceptions and actions in order to choose the correct action. I propose research on learning algorithms that use short-term memory to disambiguate the incomplete perception that arises with selective perception. I present the Utile Distinction Memory algorithm (UDM) that solves the incomplete perception problem using a partially observable Markov decision process to represent the agent's internal state space. A significant feature of the algorithm is that it will build an ...
Reinforcement Learning in Non-Markov Environments
- Artificial Intelligence. Submitted
, 1992
"... Recently, techniques based on reinforcement learning (RL) have been used to build systems that learn to perform non-trivial sequential decision tasks. To date, most of this work has focussed on learning tasks that can be described as Markov decision processes (MDPs). While MDPs are useful for modeli ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Recently, techniques based on reinforcement learning (RL) have been used to build systems that learn to perform non-trivial sequential decision tasks. To date, most of this work has focussed on learning tasks that can be described as Markov decision processes (MDPs). While MDPs are useful for modeling a wide range of control problems, there are important problems that are inherently non-Markov. We refer to these as hidden state tasks since they arise when information relevant to identifying the state of the environment is hidden (or missing) from the agent's internal representation. Two important types of control problems that resist Markov modeling are those in which 1) the system has a high degree of control over the information collected by its sensors (e.g., as in active-vision), or 2) the system has a limited set of sensors that do not always provide adequate information about the current state of the environment. Not surprisingly, traditional RL algorithms, which are based primar...
Learning in a State of Confusion: Employing Active Perception and Reinforcement Learning in Partially Observable Worlds
, 2006
"... In applying reinforcement learning to agents acting in the real world we are often faced with tasks that are non-Markovian in nature. Much work has been done using state estimation algorithms to try to uncover Markovian models of tasks in order to allow the learning of optimal solutions using reinfo ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In applying reinforcement learning to agents acting in the real world we are often faced with tasks that are non-Markovian in nature. Much work has been done using state estimation algorithms to try to uncover Markovian models of tasks in order to allow the learning of optimal solutions using reinforcement learning. Unfortunately these algorithms which attempt to simultaneously learn a Markov model of the world and how to act have proved very brittle. Our focus differs. In considering embodied, embedded and situated agents we have a preference for simple learning algorithms which reliably learn satisficing policies. The learning algorithms we consider do not try to uncover the underlying Markovian states, instead they aim to learn successful deterministic reactive policies such that agents actions are based directly upon the observations provided by their sensors. Existing results have shown that such reactive policies can be arbitrarily worse than a policy that has access to the underlying Markov process and in some cases no satisficing reactive policy can exist. Our first contribution is to show that providing agents
Adaptive Behaviour Based Robotics using On-Board Genetic Programming
"... This thesis investigates the use of Genetic Programming (GP) to evolve controllers for an autonomous robot. GP is a type of Genetic Algorithm (GA) using the Darwinian idea of natural selection and genetic recombination, where the individuals most often is represented as a tree-structure. The GP is u ..."
Abstract
- Add to MetaCart
This thesis investigates the use of Genetic Programming (GP) to evolve controllers for an autonomous robot. GP is a type of Genetic Algorithm (GA) using the Darwinian idea of natural selection and genetic recombination, where the individuals most often is represented as a tree-structure. The GP is used to evolve a population of possible solutions over many generations to solve problems. The most common approach used today, to develop controllers for autonomous robots, is to employ a GA to evolve an Artificial Neural Network (ANN). This approach is most often used in simulation only or in conjunction with online evolution; where simulation still covers the largest part of the process. The GP has been largely neglected in Behaviour Based Robotics (BBR). The is primarily due to the problem of speed, which is the biggest curse of any standard GP. The main contribution of this thesis is the approach of using a linear representation of the GP in online evolution, and to establish whether or not the GP is feasible in this situation. Since this is not a comparison with other methods, only a demonstration of the possibilities with GP, there is no need for testing the particular test cases with other methods. The work in this thesis builds upon the work by Wolfgang Banzhaf and Peter Nordin, and therefor a comparison with their work will be done. i ii
Reinforcement Learning for Mixed Open-loop and Closed-loop Control
- In Proceedings of the Ninth Neural Information Processing Systems Conference
, 1996
"... Closed-loop control relies on sensory feedback that is usually assumed to be free. But if sensing incurs a cost, it may be costeffective to take sequences of actions in open-loop mode. We describe a reinforcement learning algorithm that learns to combine open-loop and closed-loop control when se ..."
Abstract
- Add to MetaCart
Closed-loop control relies on sensory feedback that is usually assumed to be free. But if sensing incurs a cost, it may be costeffective to take sequences of actions in open-loop mode. We describe a reinforcement learning algorithm that learns to combine open-loop and closed-loop control when sensing incurs a cost. Although we assume reliable sensors, use of open-loop control means that actions must sometimes be taken when the current state of the controlled system is uncertain. This is a special case of the hidden-state problem in reinforcement learning, and to cope, our algorithm relies on short-term memory. The main result of the paper is a rule that significantly limits exploration of possible memory states by pruning memory states for which the estimated value of information is greater than its cost. We prove that this rule allows convergence to an optimal policy. 1 Introduction Reinforcement learning (RL) is widely-used for learning closed-loop control policies. Cl...

