Results 1 - 10
of
18
The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces
- Machine Learning
, 1995
"... Abstract. Parti-game is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous state-spaces. In high dimensions it is essential that learning does not plan uniformly over a state-space. Parti-game maintains a decision-tree partitioning of state-space and ap ..."
Abstract
-
Cited by 203 (8 self)
- Add to MetaCart
Abstract. Parti-game is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous state-spaces. In high dimensions it is essential that learning does not plan uniformly over a state-space. Parti-game maintains a decision-tree partitioning of state-space and applies techniques from game-theory and computational geometry to e ciently and adaptively concentrate high resolution only on critical areas. The currentversion of the algorithm is designed to nd feasible paths or trajectories to goal regions in high dimensional spaces. Future versions will be designed to nd a solution that optimizes a real-valued criterion. Many simulated problems have been tested, ranging from two-dimensional to nine-dimensional state-spaces, including mazes, path planning, non-linear dynamics, and planar snake robots in restricted spaces. In all cases, a good solution is found in less than ten trials and a few minutes.
Efficient Memory-based Learning for Robot Control
, 1990
"... This dissertation is about the application of machine learning to robot control. A system which has no initial model of the robot/world dynamics should be able to construct such a model using data received through its sensors--an approach which is formalized here as the $AB (State-Action-Behaviour) ..."
Abstract
-
Cited by 94 (1 self)
- Add to MetaCart
This dissertation is about the application of machine learning to robot control. A system which has no initial model of the robot/world dynamics should be able to construct such a model using data received through its sensors--an approach which is formalized here as the $AB (State-Action-Behaviour) control cycle. A method of learning is presented in which all the experiences in the lifetime of the robot are explicitly remembered. The experiences are stored in a manner which permits fast recall of the closest previous experience to any new situation, thus permitting very quick predictions of the effects of proposed actions and, given a goal behaviour, permitting fast generation of a candidate action. The learning can take place in high-dimensional non-linear control spaces with real-valued ranges of variables. Furthermore, the method avoids a number of shortcomings of earlier learning methods in which the controller can become trapped in inadequate performance which does not improve. Also considered is how the system is made resistant to noisy inputs and how it adapts to environmental changes. A well founded mechanism for choosing actions is introduced which solves the experiment/perform dilemma for this domain with adequate computational efficiency, and with fast convergence to the goal behaviour. The dissertation explefins in detail how the $AB control cycle can be integrated into both low and high complexity tasks. The methods and algorithms are evaluated with numerous experiments using both real and simulated robot domefins. The final experiment also illustrates how a compound learning task can be structured into a hierarchy of simple learning tasks.
Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems
- In IJCAI
, 1999
"... State abstraction is of central importance in reinforcement learning and Markov Decision Processes. This paper studies the case of variable resolution state abstraction for continuous-state, deterministic dynamic control problems in which near-optimal policies are required. We describe variable reso ..."
Abstract
-
Cited by 55 (6 self)
- Add to MetaCart
State abstraction is of central importance in reinforcement learning and Markov Decision Processes. This paper studies the case of variable resolution state abstraction for continuous-state, deterministic dynamic control problems in which near-optimal policies are required. We describe variable resolution policy and value function representations based on Kuhn triangulations embedded in a kdtree. We then consider top-down approaches to choosing which cells to split in order to generate improved policies. We begin with local approaches based on value function properties and policy properties that use only features of individual cells in making splitting choices. Later, by introducing two new non-local measures, influence and variance, we derive a splitting criterion that allows one cell to efficiently take into account its impact on other cells when deciding whether to split. We evaluate the performance of a variety of splitting criteria on many benchmark problems (published on the web)...
Reinforcement Learning And Its Application To Control
, 1992
"... Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be us ..."
Abstract
-
Cited by 49 (2 self)
- Add to MetaCart
Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be used to train the controller. But when such control actions are not known a priori, appropriate control behavior has to be inferred from observations of the IP. One can distinguish between two classes of methods for training controllers under such circumstances. Indirect methods involve constructing a model of the problem's IP and using the model to obtain training information for the controller. On the other hand, direct, or model-free,...
Kinematic Models for Robot Compliant Motion Identification of Uncertainties
"... This thesis is about force controlled compliant robot motion, with the emphasis on: 1) mod- elling of arbitrary and time-varying contact situations between a rigid manipulated object and rigid objects in its environment, 2) motion specification in terms of allowed velocities and ac- celerations for ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
This thesis is about force controlled compliant robot motion, with the emphasis on: 1) mod- elling of arbitrary and time-varying contact situations between a rigid manipulated object and rigid objects in its environment, 2) motion specification in terms of allowed velocities and ac- celerations for the manipulated object, maintaining the contact with the physical constraints but without generating too large contact forces, and 3) on-line identification of uncertainties in the instantaneous geometric parameters of the motion constraint model, i.e., the position of the contact points, the direction of the contact normal, and the local curvature parameters. Requirements for generality, simplicity and robustness have guided the research work.
Dynamically stable legged locomotion
- MIT Artificial Intelligence Laboratory
, 1989
"... A bst fact This report documents our recent progress in exploring active balance for dynamic legged systems. The purpose of this research is to establish a foundation of knowledge that can lead both to the construction of useful legged vehicles and to a better understanding of legged locomotion as i ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
A bst fact This report documents our recent progress in exploring active balance for dynamic legged systems. The purpose of this research is to establish a foundation of knowledge that can lead both to the construction of useful legged vehicles and to a better understanding of legged locomotion as it exists in nature. We have made progress in five areas: a Balance in 3D can be achieved with a very simple control system. The control system has three separate parts, one that controls forward running velocity, one that controls body attitude, and one that controls hopping height. Experiments with a physical 3D machine that hops on just one leg show that it can hop in place, travel at a specified rate, follow simple paths, and maintain balance when disturbed. Top recorded running speed was 2.2 m/sec (4.8 mph). The 3D control algorithms are direct generalizations of those used earlier in 2D, with surprisingly little additional complication. 0 Computer simulations of a simple multi-legged system suggest that many of the concepts that are usehl in understanding locomotion with one leg can be used to understand locomotion with several legs. A planar model with two legs trots and bounds with the same three part control
Toward Gesture-Based Programming: Agent-Based Haptic Skill Acquisition and Interpretation
, 1997
"... Programming by human demonstration is a new paradigm for the development of robotic applications that focuses on the needs of task experts rather than programming experts. The traditional text-based programming paradigm demands the user be an expert in a particular programming language and further d ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Programming by human demonstration is a new paradigm for the development of robotic applications that focuses on the needs of task experts rather than programming experts. The traditional text-based programming paradigm demands the user be an expert in a particular programming language and further demands that the user can translate the task into this foreign language. This level of programming expertise generally precludes the user from having detailed task expertise because his/her time is devoted to the practice of programming, not the practice of the task. The goal of programming by demonstration is to eliminate both the programming language expertise and, more importantly, the expertise required to translate the task into the language. Gesture-Based Programming is a new form of programming by human demonstration that views the demonstration as a series of inexact "gestures" that convey the "intention " of the task strategy, not the details of the strategy itself. This is analogous...
Gesture-Based Programming, Part 2: Primordial Learning
- in Intelligent Engineering Systems through Artificial Neural Networks, Volume 6; Smart Engineering Systems: Neural Networks, Fuzzy Logic and Evolutionary Programming, ASME Press
, 1996
"... : In part one of this two-part series, we described our Gesture-Based Programming paradigm for programming by human demonstration. This paradigm depends on a pre-existing knowledge base of capabilities, collectively called "encapsulated expertise," that comprise the real-time sensorimotor primitive ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
: In part one of this two-part series, we described our Gesture-Based Programming paradigm for programming by human demonstration. This paradigm depends on a pre-existing knowledge base of capabilities, collectively called "encapsulated expertise," that comprise the real-time sensorimotor primitives from which the run-time executable is constructed as well as providing the basis for interpreting the teacher's actions during programming. In this paper we present a technique based on principal components analysis, augmentable with model-based information, for learning and recognizing sensorimotor primitives. We describes simple applications of the technique to a mobile robot and a PUMA manipulator. The mobile robot learned to escape from jams while the manipulator learned guarded moves and accommodations that are composable to allow flat plate mating operations. While these initial applications are simple, they demonstrate the ability to extract primitives from demonstration, recognize ...
Learning the Peg-into-Hole Assembly Operation with a Connectionist Reinforcement Technique
- In Second LEARNING CONTROLLERS FOR INDUSTRIAL ROBOTS 31 International CIRP Workshop on Learning in Intelligent Manufacturing Systems
, 1995
"... The paper presents a learning controller that is capable of increasing insertion speed during consecutive peg-into-hole operations, without increasing the contact force level. Our aim is to find a better relationship between measured forces and the controlled velocity, without using a complicated (h ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The paper presents a learning controller that is capable of increasing insertion speed during consecutive peg-into-hole operations, without increasing the contact force level. Our aim is to find a better relationship between measured forces and the controlled velocity, without using a complicated (human generated) model. We followed a connectionist approach. Two learning phases are distinguished. First the learning controller is trained (or initialised) in a supervised way by a suboptimal task frame controller. Then a reinforcement learning phase follows. The controller consists of two networks: (1) the policy network and (2) the exploration network. On-line robotic exploration plays a crucial role in obtaining a better policy. Optionally, this architecture can be extended with a third network: the reinforcement network. The learning controller is implemented on a CADbased contact force simulator. In contrast with most other related work, the experiments are simulated in 3D with 6 degr...
Co-Learning in Differential Games
"... Game playing has been a popular problem area for research in artificial intelligence and machine learning for many years. In almost every study of game playing and machine learning, the focus has been on games with a finite set of states and a finite set of actions. Further, most of this research ha ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Game playing has been a popular problem area for research in artificial intelligence and machine learning for many years. In almost every study of game playing and machine learning, the focus has been on games with a finite set of states and a finite set of actions. Further, most of this research has focused on a single player or team learning how to play against another player or team that is applying a fixed strategy for playing the game. In this paper, we explore multi-agent learning in the context of game playing and develop algorithms for "co-learning" in which all players attempt to learn their optimal strategies simultaneously. Specifically, we address two approaches to colearning, demonstrating strong performance by a memory-based reinforcement learner and comparable but faster performance with a treebased reinforcement learner. Keywords: Markov games, differential games, pursuit games, multi-agent learning, reinforcement learning, Q-learning 1 1 Introduction Since the gen...

