Results 11  20
of
176
Decomposition Techniques for Planning in Stochastic Domains
 IN PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI95
, 1995
"... This paper is concerned with modeling planning problems involving uncertainty as discretetime, finitestate stochastic automata. Solving planning problems is reduced to computing policies for Markov decision processes. Classical methods for solving Markov decision processes cannot cope with the siz ..."
Abstract

Cited by 110 (7 self)
 Add to MetaCart
This paper is concerned with modeling planning problems involving uncertainty as discretetime, finitestate stochastic automata. Solving planning problems is reduced to computing policies for Markov decision processes. Classical methods for solving Markov decision processes cannot cope with the size of the state spaces for typical problems encountered in practice. As an alternative, we investigate methods that decompose global planning problems into a number of local problems, solve the local problems separately, and then combine the local solutions to generate a global solution. We present algorithms that decompose planning problems into smaller problems given an arbitrary partition of the state space. The local problems are interpreted as Markov decision processes and solutions to the local problems are interpreted as policies restricted to the subsets of the state space defined by the partition. One algorithm relies on constructing and solving an abstract version of the original de...
Model Minimization in Markov Decision Processes
 In Proceedings of the Fourteenth National Conference on Artificial Intelligence
, 1997
"... We use the notion of stochastic bisimulation homogeneity to analyze planning problems represented as Markov decision processes (MDPs). Informally, a partition of the state space for an MDP is said to be homogeneous if for each action, states in the same block have the same probability of being ..."
Abstract

Cited by 105 (7 self)
 Add to MetaCart
We use the notion of stochastic bisimulation homogeneity to analyze planning problems represented as Markov decision processes (MDPs). Informally, a partition of the state space for an MDP is said to be homogeneous if for each action, states in the same block have the same probability of being carried to each other block. We provide an algorithm for finding the coarsest homogeneous refinement of any partition of the state space of an MDP. The resulting partition can be used to construct a reduced MDP which is minimal in a well defined sense and can be used to solve the original MDP. Our algorithm is an adaptation of known automata minimization algorithms, and is designed to operate naturally on factored or implicit representations in which the full state space is never explicitly enumerated. We show that simple variations on this algorithm are equivalent or closely similar to several different recently published algorithms for finding optimal solutions to (partially ...
Planning, learning and coordination in multiagent decision processes
 In Proceedings of the Sixth Conference on Theoretical Aspects of Rationality and Knowledge (TARK96
, 1996
"... There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from singleagent planning and learning can be applied in multiagent settings. We survey a number of different techniq ..."
Abstract

Cited by 96 (1 self)
 Add to MetaCart
There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from singleagent planning and learning can be applied in multiagent settings. We survey a number of different techniques from decisiontheoretic planning and reinforcement learning and describe a number of interesting issues that arise with regard to coordinating the policies of individual agents. To this end, we describe multiagent Markov decision processes as a general model in which to frame this discussion. These are special nperson cooperative games in which agents share the same utility function. We discuss coordination mechanisms based on imposed conventions (or social laws) as well as learning methods for coordination. Our focus is on the decomposition of sequential decision processes so that coordination can be learned (or imposed) locally, at the level of individual states. We also discuss the use of structured problem representations and their role in the generalization of learned conventions and in approximation. 1
Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces
, 1996
"... A key element in the solution of reinforcement learning problems is the value function. The purpose of this function is to measure the longterm utility or value of any given state and it is important because an agent can use it to decide what to do next. A common problem in reinforcement learning w ..."
Abstract

Cited by 92 (6 self)
 Add to MetaCart
A key element in the solution of reinforcement learning problems is the value function. The purpose of this function is to measure the longterm utility or value of any given state and it is important because an agent can use it to decide what to do next. A common problem in reinforcement learning when applied to systems having continuous states and action spaces is that the value function must operate with a domain consisting of realvalued variables, which means that it should be able to represent the value of infinitely many state and action pairs. For this reason, function approximators are used to represent the value function when a closeform solution of the optimal policy is not available. In this paper, we extend a previously proposed reinforcement learning algorithm so that it can be used with function approximators that generalize the value of individual experiences across both, state and action spaces. In particular, we discuss the benefits of using sparse coarsecoded funct...
InstanceBased Utile Distinctions for Reinforcement Learning with Hidden State
 In Proceedings of the Twelfth International Conference on Machine Learning
, 1995
"... We present Utile Suffix Memory, a reinforcement learning algorithm that uses shortterm memory to overcome the state aliasing that results from hidden state. By combining the advantages of previous work in instancebased (or "memorybased ") learning and previous work with statistical tests for separ ..."
Abstract

Cited by 91 (1 self)
 Add to MetaCart
We present Utile Suffix Memory, a reinforcement learning algorithm that uses shortterm memory to overcome the state aliasing that results from hidden state. By combining the advantages of previous work in instancebased (or "memorybased ") learning and previous work with statistical tests for separating noise from task structure, the method learns quickly, creates only as much memory as needed for the task at hand, and handles noise well. Utile Suffix Memory uses a treestructured representation, and is related to work on Prediction Suffix Trees [Ron et al., 1994] , Partigame [Moore, 1993] , Galgorithm [Chapman and Kaelbling, 1991] , and Variable Resolution Dynamic Programming [Moore, 1991] . 1 INTRODUCTION The sensory systems of embedded agents are inherently limited. When a reinforcement learning agent's sensory limitations hide features of the environment from the agent, we say that the agent suffers from hidden state. There are many reasons why important features can be hidden...
Learning Maps for Indoor Mobile Robot Navigation
 ARTIFICIAL INTELLIGENCE (ACCEPTED FOR PUBLICATION)
, 1997
"... Autonomous robots must be able to learn and maintain models of their environments. Research on mobile robot navigation has produced two major paradigms for mapping indoor environments: gridbased and topological. While gridbased methods produce accurate metric maps, their complexity often prohibits ..."
Abstract

Cited by 83 (12 self)
 Add to MetaCart
Autonomous robots must be able to learn and maintain models of their environments. Research on mobile robot navigation has produced two major paradigms for mapping indoor environments: gridbased and topological. While gridbased methods produce accurate metric maps, their complexity often prohibits efficient planning and problem solving in largescale indoor environments. Topological maps, on the other hand, can be used much more efficiently, yet accurate and consistent topological maps are often difficult to learn and maintain in largescale environments, particularly if momentary sensor data is highly ambiguous. This paper describes an approach that integrates both paradigms: gridbased and topological. Gridbased maps are learned using artificial neural networks and naive Bayesian integration. Topological maps are generated on top of the gridbased maps, by partitioning the latter into coherent regions. By combining both paradigms, the approach presented here gains advantages from both worlds: accuracy/consistency and efficiency. The paper gives results for autonomous exploration, mapping and operation of a mobile robot in populated multiroom environments.
Learning to Use Selective Attention and ShortTerm Memory in Sequential Tasks
 From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior
, 1996
"... This paper presents UTree, a reinforcement learning algorithm that uses selective attention and shortterm memory to simultaneously address the intertwined problems of large perceptual state spaces and hidden state. By combining the advantages of work in instancebased (or "memorybased") learning a ..."
Abstract

Cited by 71 (1 self)
 Add to MetaCart
This paper presents UTree, a reinforcement learning algorithm that uses selective attention and shortterm memory to simultaneously address the intertwined problems of large perceptual state spaces and hidden state. By combining the advantages of work in instancebased (or "memorybased") learning and work with robust statistical tests for separating noise from task structure, the method learns quickly, creates only taskrelevant state distinctions, and handles noise well. UTree uses a treestructured representation, and is related to work on Prediction Suffix Trees [Ron et al., 1994] , Partigame [Moore, 1993] , Galgorithm [Chapman and Kaelbling, 1991] , and Variable Resolution Dynamic Programming [Moore, 1991] . It builds on Utile Suffix Memory [McCallum, 1995c] , which only used shortterm memory, not selective perception. The algorithm is demonstrated solving a highway driving task in which the agent weaves around slower and faster traffic. The agent uses active perception with ...
Abstraction and Approximate Decision Theoretic Planning
, 1997
"... ion and Approximate Decision Theoretic Planning Richard Dearden and Craig Boutilier y Department of Computer Science University of British Columbia Vancouver, British Columbia CANADA, V6T 1Z4 email: dearden,cebly@cs.ubc.ca Abstract Markov decision processes (MDPs) have recently been proposed a ..."
Abstract

Cited by 67 (15 self)
 Add to MetaCart
ion and Approximate Decision Theoretic Planning Richard Dearden and Craig Boutilier y Department of Computer Science University of British Columbia Vancouver, British Columbia CANADA, V6T 1Z4 email: dearden,cebly@cs.ubc.ca Abstract Markov decision processes (MDPs) have recently been proposed as useful conceptual models for understanding decisiontheoretic planning. However, the utility of the associated computational methods remains open to question: most algorithms for computing optimal policies require explicit enumeration of the state space of the planning problem. We propose an abstraction technique for MDPs that allows approximately optimal solutions to be computed quickly. Abstractions are generated automatically, using an intensional representation of the planning problem (probabilistic strips rules) to determine the most relevant problem features and optimally solving a reduced problem based on these relevant features. The key features of our method are: abstractions can ...
Approximate Solutions to Markov Decision Processes
, 1999
"... One of the basic problems of machine learning is deciding how to act in an uncertain world. For example, if I want my robot to bring me a cup of coffee, it must be able to compute the correct sequence of electrical impulses to send to its motors to navigate from the coffee pot to my office. In fact, ..."
Abstract

Cited by 66 (9 self)
 Add to MetaCart
One of the basic problems of machine learning is deciding how to act in an uncertain world. For example, if I want my robot to bring me a cup of coffee, it must be able to compute the correct sequence of electrical impulses to send to its motors to navigate from the coffee pot to my office. In fact, since the results of its actions are not completely predictable, it is not enough just to compute the correct sequence; instead the robot must sense and correct for deviations from its intended path. In order for any machine learner to act reasonably in an uncertain environment, it must solve problems like the above one quickly and reliably. Unfortunately, the world is often so complicated that it is difficult or impossible to find the optimal sequence of actions to achieve a given goal. So, in order to scale our learners up to realworld problems, we usually must settle for approximate solutions. One representation for a learner's environment and goals is a Markov decision process or MDP. ...
Variable Resolution Discretization for HighAccuracy Solutions of Optimal Control Problems
 In IJCAI
, 1999
"... State abstraction is of central importance in reinforcement learning and Markov Decision Processes. This paper studies the case of variable resolution state abstraction for continuousstate, deterministic dynamic control problems in which nearoptimal policies are required. We describe variable reso ..."
Abstract

Cited by 62 (6 self)
 Add to MetaCart
State abstraction is of central importance in reinforcement learning and Markov Decision Processes. This paper studies the case of variable resolution state abstraction for continuousstate, deterministic dynamic control problems in which nearoptimal policies are required. We describe variable resolution policy and value function representations based on Kuhn triangulations embedded in a kdtree. We then consider topdown approaches to choosing which cells to split in order to generate improved policies. We begin with local approaches based on value function properties and policy properties that use only features of individual cells in making splitting choices. Later, by introducing two new nonlocal measures, influence and variance, we derive a splitting criterion that allows one cell to efficiently take into account its impact on other cells when deciding whether to split. We evaluate the performance of a variety of splitting criteria on many benchmark problems (published on the web)...