Results 1 - 10
of
24
Prioritized sweeping: Reinforcement learning with less data and less time
- Machine Learning
, 1993
"... We present a new algorithm, Prioritized Sweeping, for e cient prediction and control of stochas-tic Markov systems. Incremental learning methods such asTemporal Di erencing and Q-learning have fast real time performance. Classical methods are slower, but more accurate, because they make full use of ..."
Abstract
-
Cited by 275 (5 self)
- Add to MetaCart
We present a new algorithm, Prioritized Sweeping, for e cient prediction and control of stochas-tic Markov systems. Incremental learning methods such asTemporal Di erencing and Q-learning have fast real time performance. Classical methods are slower, but more accurate, because they make full use of the observations. Prioritized Sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize important dynamic programming sweeps and to guide the exploration of state-space. We compare Prioritized Sweeping with other reinforcement learning schemes for a number of di erent stochastic optimal control prob-lems. It successfully solves large state-space real time problems with which other methods have di culty. 1 1
Efficient Memory-based Learning for Robot Control
, 1990
"... This dissertation is about the application of machine learning to robot control. A system which has no initial model of the robot/world dynamics should be able to construct such a model using data received through its sensors--an approach which is formalized here as the $AB (State-Action-Behaviour) ..."
Abstract
-
Cited by 94 (1 self)
- Add to MetaCart
This dissertation is about the application of machine learning to robot control. A system which has no initial model of the robot/world dynamics should be able to construct such a model using data received through its sensors--an approach which is formalized here as the $AB (State-Action-Behaviour) control cycle. A method of learning is presented in which all the experiences in the lifetime of the robot are explicitly remembered. The experiences are stored in a manner which permits fast recall of the closest previous experience to any new situation, thus permitting very quick predictions of the effects of proposed actions and, given a goal behaviour, permitting fast generation of a candidate action. The learning can take place in high-dimensional non-linear control spaces with real-valued ranges of variables. Furthermore, the method avoids a number of shortcomings of earlier learning methods in which the controller can become trapped in inadequate performance which does not improve. Also considered is how the system is made resistant to noisy inputs and how it adapts to environmental changes. A well founded mechanism for choosing actions is introduced which solves the experiment/perform dilemma for this domain with adequate computational efficiency, and with fast convergence to the goal behaviour. The dissertation explefins in detail how the $AB control cycle can be integrated into both low and high complexity tasks. The methods and algorithms are evaluated with numerous experiments using both real and simulated robot domefins. The final experiment also illustrates how a compound learning task can be structured into a hierarchy of simple learning tasks.
Landmark-Based Robot Navigation
- Algorithmica
, 1992
"... Achieving goals despite uncertainty in control and sensing may require robots to perform complicated motion planning and execution monitoring. This paper describes a reduced version of the general planning problem in the presence of uncertainty and a complete polynomial algorithm solving it. The pla ..."
Abstract
-
Cited by 56 (3 self)
- Add to MetaCart
Achieving goals despite uncertainty in control and sensing may require robots to perform complicated motion planning and execution monitoring. This paper describes a reduced version of the general planning problem in the presence of uncertainty and a complete polynomial algorithm solving it. The planner computes a guaranteed plan (for given uncertainty bounds) by backchaining omnidirectional backprojections of the goal until one fully contains the set of possible initial positions of the robot. The algorithm assumes that landmarks are scattered across the workspace, that robot control and position sensing are perfect within the fields of influence of these landmarks (the regions in which the landmarks can be sensed by the robot), and that control is imperfect and sensing null outside these fields. The polynomiality and completeness of the algorithm derive from these simplifying assumptions, whose satisfaction may require the robot and/or its workspace to be specifically engineered. Thi...
Stochastic Plans for Robotic Manipulation
, 1990
"... Geometric uncertainty is unavoidable when programming robots for physical applications. We propose a stochastic framework for manipulation planning where plans are ranked on the basis of expected cost. That is, we express the desirability of states and actions with a cost function and describe uncer ..."
Abstract
-
Cited by 33 (7 self)
- Add to MetaCart
Geometric uncertainty is unavoidable when programming robots for physical applications. We propose a stochastic framework for manipulation planning where plans are ranked on the basis of expected cost. That is, we express the desirability of states and actions with a cost function and describe uncertainty with probability distributions. We illustrate the approach with a new design for a programmable parts feeder, a mechanism that orients two-dimensional parts using a sequence of open-loop mechanical motions. We present a planning algorithm that accepts an n-sided polygonal part as input and, in time O(n²), generates a stochastically optimal plan for orienting the part.
Exploration bonuses and dual control
- MACHINE LEARNING
, 1996
"... Finding the Bayesian balance between exploration and exploitation in adaptive optimal control is in general intractable. This paper shows how to compute suboptimal estimates based on a certainty equivalence approximation (Cozzolino, Gonzalez-Zubieta & Miller, 1965) arising from a form of dual contr ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
Finding the Bayesian balance between exploration and exploitation in adaptive optimal control is in general intractable. This paper shows how to compute suboptimal estimates based on a certainty equivalence approximation (Cozzolino, Gonzalez-Zubieta & Miller, 1965) arising from a form of dual control. This systematizes and extends existing uses of exploration bonuses in reinforcement learning (Sutton, 1990). The approach has two components: a statistical model of uncertainty in the world and a way of turning this into exploratory behavior. This general approach is applied to two-dimensional mazes with moveable barriers and its performance is compared with Sutton’s DYNA system.
Beating a Defender in Robotic Soccer: Memory-Based Learning of a Continuous Function
, 1995
"... Learning how to adjust to an opponent's position is critical to the success of having intelligent agents collaborating towards the achievement of specific tasks in unfriendly environments. This paper describes our work on developing methods to learn to choose an action based on a continuous-valued s ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
Learning how to adjust to an opponent's position is critical to the success of having intelligent agents collaborating towards the achievement of specific tasks in unfriendly environments. This paper describes our work on developing methods to learn to choose an action based on a continuous-valued state attribute indicating the position of an opponent. We use a framework in which teams of agents compete in a simulator of a game of robotic soccer. We introduce a memory-based supervised learning strategy which enables an agent to choose to pass or shoot in the presence of a defender. In our memory model, training examples affect neighboring generalized learned instances with different weights. We conduct experiments in which the agent incrementally learns to approximate a function with a continuous domain. Then we investigate the question of how the agent performs in nondeterministic variations of the training situations. Our experiments indicate that when the random variations fall within some bound of the initial training, the agent performs better with some initial training rather than from a tabula-rasa.
Parallel Methods for Synthesizing Whole-Hand Grasps from Generalized Prototypes
, 1994
"... Robotic hands are very flexible mechanisms. Because of this flexibility, it has been difficult to automate the process of acquiring objects using these robotic hands. Each new grasping problem is too complex to analyze without the use of good heuristics. Constraints due to target object geometry, en ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Robotic hands are very flexible mechanisms. Because of this flexibility, it has been difficult to automate the process of acquiring objects using these robotic hands. Each new grasping problem is too complex to analyze without the use of good heuristics. Constraints due to target object geometry, environment geometry, hand kinematics and geometry, and a task description must all be considered when forming a solution. One tempting
Representing and Learning Routine Activities
, 1995
"... A routine is a habitually repeated performance of some actions. Agents use routines to guide their everyday activities and to enrich their abstract concepts about acts. This dissertation addresses the question of how an agent who is engaged in ordinary, routine activities changes its behavior over t ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
A routine is a habitually repeated performance of some actions. Agents use routines to guide their everyday activities and to enrich their abstract concepts about acts. This dissertation addresses the question of how an agent who is engaged in ordinary, routine activities changes its behavior over time, how the agent's internal representations about the world is affected by its interactions, and what is a good agent architecture for learning routine interactions with the world. In it, I develop a theory that proposes several key processes: (1) automaticity, (2) habituation and skill refinement, (3) abstraction-bychunking, and (4) discovery of new knowledge chunks. The process of automaticity caches the agent's knowledge about actions into a flat stimulus-response data structure that eliminates knowledge of action consequences. The stimulus-response data structure produces a response to environmental stimuli in constant time. The process of habituation and skill refinement uses environm...
Methods for deciding what to do next and learning
, 1992
"... Recent years have seen intense analysis and questioning of the importance of the role of classical AI-planning in deciding moment-to-moment actions. This has led to development of several new AI-planning paradigms such as reactive planning [Fir87, GL87]. Informally, a planning paradigm ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
Recent years have seen intense analysis and questioning of the importance of the role of classical AI-planning in deciding moment-to-moment actions. This has led to development of several new AI-planning paradigms such as reactive planning [Fir87, GL87]. Informally, a planning paradigm
Learning to Manipulate Articulated Objects in Unstructured Environments Using a Grounded Relational Representation
"... Abstract — We introduce a learning-based approach to manipulation in unstructured environments. This approach permits autonomous acquisition of manipulation expertise from interactions with the environment. The resulting expertise enables a robot to perform effective manipulation based on partial st ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Abstract — We introduce a learning-based approach to manipulation in unstructured environments. This approach permits autonomous acquisition of manipulation expertise from interactions with the environment. The resulting expertise enables a robot to perform effective manipulation based on partial state information. The manipulation expertise is represented in a relational state representation and learned using relational reinforcement learning. The relational representation renders learning tractable by collapsing a large number of states onto a single, relational state. The relational state representation is carefully grounded in the perceptual and interaction skills of the robot. This ensures that symbolically learned knowledge remains meaningful in the physical world. We experimentally validate the proposed learning approach on the task of manipulating an articulated object to obtain a model of its kinematic structure. Our experiments demonstrate that the manipulation expertise acquired by the robot leads to substantial performance improvements. These improvements are maintained when experience is applied to previously unseen objects. I.

