• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces (1995)

by Andrew W Moore, Christopher G Atkeson
Venue:Machine Learning
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 155
Next 10 →

Tree-based batch mode reinforcement learning

by Damien Ernst, Pierre Geurts, Louis Wehenkel, L. Littman - Journal of Machine Learning Research , 2005
"... Reinforcement learning aims to determine an optimal control policy from interaction with a system or from observations gathered from a system. In batch mode, it can be achieved by approximating the so-called Q-function based on a set of four-tuples (xt,ut,rt,xt+1) where xt denotes the system state a ..."
Abstract - Cited by 93 (22 self) - Add to MetaCart
Reinforcement learning aims to determine an optimal control policy from interaction with a system or from observations gathered from a system. In batch mode, it can be achieved by approximating the so-called Q-function based on a set of four-tuples (xt,ut,rt,xt+1) where xt denotes the system state at time t, ut the control action taken, rt the instantaneous reward obtained and xt+1 the successor state of the system, and by determining the control policy from this Q-function. The Q-function approximation may be obtained from the limit of a sequence of (batch mode) supervised learning problems. Within this framework we describe the use of several classical tree-based supervised learning methods (CART, Kd-tree, tree bagging) and two newly proposed ensemble algorithms, namely extremely and totally randomized trees. We study their performances on several examples and find that the ensemble methods based on regression trees perform well in extracting relevant information about the optimal control policy from sets of four-tuples. In particular, the totally randomized trees give good results while ensuring the convergence of the sequence, whereas by relaxing the convergence constraint even better accuracy results are provided by the extremely randomized trees.

Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State

by R. Andrew Mccallum - In Proceedings of the Twelfth International Conference on Machine Learning , 1995
"... We present Utile Suffix Memory, a reinforcement learning algorithm that uses short-term memory to overcome the state aliasing that results from hidden state. By combining the advantages of previous work in instance-based (or "memorybased ") learning and previous work with statistical tests for separ ..."
Abstract - Cited by 84 (1 self) - Add to MetaCart
We present Utile Suffix Memory, a reinforcement learning algorithm that uses short-term memory to overcome the state aliasing that results from hidden state. By combining the advantages of previous work in instance-based (or "memorybased ") learning and previous work with statistical tests for separating noise from task structure, the method learns quickly, creates only as much memory as needed for the task at hand, and handles noise well. Utile Suffix Memory uses a tree-structured representation, and is related to work on Prediction Suffix Trees [Ron et al., 1994] , Parti-game [Moore, 1993] , G-algorithm [Chapman and Kaelbling, 1991] , and Variable Resolution Dynamic Programming [Moore, 1991] . 1 INTRODUCTION The sensory systems of embedded agents are inherently limited. When a reinforcement learning agent's sensory limitations hide features of the environment from the agent, we say that the agent suffers from hidden state. There are many reasons why important features can be hidden...

Reinforcement Learning In Continuous Time and Space

by Kenji Doya - Neural Computation , 2000
"... This paper presents a reinforcement learning framework for continuoustime dynamical systems without a priori discretization of time, state, and action. Based on the Hamilton-Jacobi-Bellman (HJB) equation for infinitehorizon, discounted reward problems, we derive algorithms for estimating value f ..."
Abstract - Cited by 83 (4 self) - Add to MetaCart
This paper presents a reinforcement learning framework for continuoustime dynamical systems without a priori discretization of time, state, and action. Based on the Hamilton-Jacobi-Bellman (HJB) equation for infinitehorizon, discounted reward problems, we derive algorithms for estimating value functions and for improving policies with the use of function approximators. The process of value function estimation is formulated as the minimization of a continuous-time form of the temporal difference (TD) error. Update methods based on backward Euler approximation and exponential eligibility traces are derived and their correspondences with the conventional residual gradient, TD(0), and TD() algorithms are shown. For policy improvement, two methods, namely, a continuous actor-critic method and a value-gradient based greedy policy, are formulated. As a special case of the latter, a nonlinear feedback control law using the value gradient and the model of the input gain is derived....

Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces

by Juan Carlos SantamarĂ­a, Richard S. Sutton, Ashwin Ram , 1996
"... A key element in the solution of reinforcement learning problems is the value function. The purpose of this function is to measure the long-term utility or value of any given state and it is important because an agent can use it to decide what to do next. A common problem in reinforcement learning w ..."
Abstract - Cited by 77 (6 self) - Add to MetaCart
A key element in the solution of reinforcement learning problems is the value function. The purpose of this function is to measure the long-term utility or value of any given state and it is important because an agent can use it to decide what to do next. A common problem in reinforcement learning when applied to systems having continuous states and action spaces is that the value function must operate with a domain consisting of real-valued variables, which means that it should be able to represent the value of infinitely many state and action pairs. For this reason, function approximators are used to represent the value function when a close-form solution of the optimal policy is not available. In this paper, we extend a previously proposed reinforcement learning algorithm so that it can be used with function approximators that generalize the value of individual experiences across both, state and action spaces. In particular, we discuss the benefits of using sparse coarse-coded funct...

Learning Maps for Indoor Mobile Robot Navigation

by Sebastian Thrun - ARTIFICIAL INTELLIGENCE (ACCEPTED FOR PUBLICATION) , 1997
"... Autonomous robots must be able to learn and maintain models of their environments. Research on mobile robot navigation has produced two major paradigms for mapping indoor environments: grid-based and topological. While grid-based methods produce accurate metric maps, their complexity often prohibits ..."
Abstract - Cited by 75 (11 self) - Add to MetaCart
Autonomous robots must be able to learn and maintain models of their environments. Research on mobile robot navigation has produced two major paradigms for mapping indoor environments: grid-based and topological. While grid-based methods produce accurate metric maps, their complexity often prohibits efficient planning and problem solving in large-scale indoor environments. Topological maps, on the other hand, can be used much more efficiently, yet accurate and consistent topological maps are often difficult to learn and maintain in large-scale environments, particularly if momentary sensor data is highly ambiguous. This paper describes an approach that integrates both paradigms: grid-based and topological. Grid-based maps are learned using artificial neural networks and naive Bayesian integration. Topological maps are generated on top of the grid-based maps, by partitioning the latter into coherent regions. By combining both paradigms, the approach presented here gains advantages from both worlds: accuracy/consistency and efficiency. The paper gives results for autonomous exploration, mapping and operation of a mobile robot in populated multi-room environments.

Planning, learning and coordination in multiagent decision processes

by Craig Boutilier - In Proceedings of the Sixth Conference on Theoretical Aspects of Rationality and Knowledge (TARK96 , 1996
"... There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from single-agent planning and learning can be applied in multiagent settings. We survey a number of different techniq ..."
Abstract - Cited by 72 (1 self) - Add to MetaCart
There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from single-agent planning and learning can be applied in multiagent settings. We survey a number of different techniques from decision-theoretic planning and reinforcement learning and describe a number of interesting issues that arise with regard to coordinating the policies of individual agents. To this end, we describe multiagent Markov decision processes as a general model in which to frame this discussion. These are special n-person cooperative games in which agents share the same utility function. We discuss coordination mechanisms based on imposed conventions (or social laws) as well as learning methods for coordination. Our focus is on the decomposition of sequential decision processes so that coordination can be learned (or imposed) locally, at the level of individual states. We also discuss the use of structured problem representations and their role in the generalization of learned conventions and in approximation. 1

Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks

by Andrew Kachites Mccallum - From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior , 1996
"... This paper presents U-Tree, a reinforcement learning algorithm that uses selective attention and shortterm memory to simultaneously address the intertwined problems of large perceptual state spaces and hidden state. By combining the advantages of work in instance-based (or "memory-based") learning a ..."
Abstract - Cited by 70 (1 self) - Add to MetaCart
This paper presents U-Tree, a reinforcement learning algorithm that uses selective attention and shortterm memory to simultaneously address the intertwined problems of large perceptual state spaces and hidden state. By combining the advantages of work in instance-based (or "memory-based") learning and work with robust statistical tests for separating noise from task structure, the method learns quickly, creates only task-relevant state distinctions, and handles noise well. U-Tree uses a tree-structured representation, and is related to work on Prediction Suffix Trees [Ron et al., 1994] , Parti-game [Moore, 1993] , G-algorithm [Chapman and Kaelbling, 1991] , and Variable Resolution Dynamic Programming [Moore, 1991] . It builds on Utile Suffix Memory [McCallum, 1995c] , which only used short-term memory, not selective perception. The algorithm is demonstrated solving a highway driving task in which the agent weaves around slower and faster traffic. The agent uses active perception with ...

Approximate Solutions to Markov Decision Processes

by Geoffrey J. Gordon , 1999
"... One of the basic problems of machine learning is deciding how to act in an uncertain world. For example, if I want my robot to bring me a cup of coffee, it must be able to compute the correct sequence of electrical impulses to send to its motors to navigate from the coffee pot to my office. In fact, ..."
Abstract - Cited by 62 (9 self) - Add to MetaCart
One of the basic problems of machine learning is deciding how to act in an uncertain world. For example, if I want my robot to bring me a cup of coffee, it must be able to compute the correct sequence of electrical impulses to send to its motors to navigate from the coffee pot to my office. In fact, since the results of its actions are not completely predictable, it is not enough just to compute the correct sequence; instead the robot must sense and correct for deviations from its intended path. In order for any machine learner to act reasonably in an uncertain environment, it must solve problems like the above one quickly and reliably. Unfortunately, the world is often so complicated that it is difficult or impossible to find the optimal sequence of actions to achieve a given goal. So, in order to scale our learners up to real-world problems, we usually must settle for approximate solutions. One representation for a learner's environment and goals is a Markov decision process or MDP. ...

Abstraction and Approximate Decision Theoretic Planning

by Richard Dearden, Craig Boutilier , 1997
"... ion and Approximate Decision Theoretic Planning Richard Dearden and Craig Boutilier y Department of Computer Science University of British Columbia Vancouver, British Columbia CANADA, V6T 1Z4 email: dearden,cebly@cs.ubc.ca Abstract Markov decision processes (MDPs) have recently been proposed a ..."
Abstract - Cited by 60 (14 self) - Add to MetaCart
ion and Approximate Decision Theoretic Planning Richard Dearden and Craig Boutilier y Department of Computer Science University of British Columbia Vancouver, British Columbia CANADA, V6T 1Z4 email: dearden,cebly@cs.ubc.ca Abstract Markov decision processes (MDPs) have recently been proposed as useful conceptual models for understanding decision-theoretic planning. However, the utility of the associated computational methods remains open to question: most algorithms for computing optimal policies require explicit enumeration of the state space of the planning problem. We propose an abstraction technique for MDPs that allows approximately optimal solutions to be computed quickly. Abstractions are generated automatically, using an intensional representation of the planning problem (probabilistic strips rules) to determine the most relevant problem features and optimally solving a reduced problem based on these relevant features. The key features of our method are: abstractions can ...

Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems

by Rémi Munos, Andrew Moore - In IJCAI , 1999
"... State abstraction is of central importance in reinforcement learning and Markov Decision Processes. This paper studies the case of variable resolution state abstraction for continuous-state, deterministic dynamic control problems in which near-optimal policies are required. We describe variable reso ..."
Abstract - Cited by 55 (6 self) - Add to MetaCart
State abstraction is of central importance in reinforcement learning and Markov Decision Processes. This paper studies the case of variable resolution state abstraction for continuous-state, deterministic dynamic control problems in which near-optimal policies are required. We describe variable resolution policy and value function representations based on Kuhn triangulations embedded in a kdtree. We then consider top-down approaches to choosing which cells to split in order to generate improved policies. We begin with local approaches based on value function properties and policy properties that use only features of individual cells in making splitting choices. Later, by introducing two new non-local measures, influence and variance, we derive a splitting criterion that allows one cell to efficiently take into account its impact on other cells when deciding whether to split. We evaluate the performance of a variety of splitting criteria on many benchmark problems (published on the web)...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University