Results 1 -
9 of
9
Learning to Search: Functional Gradient Techniques for Imitation Learning
- Autonomous Robots
, 2009
"... Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise o ..."
Abstract
-
Cited by 26 (11 self)
- Add to MetaCart
Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling “programming by demonstration ” for developing high-performance robotic systems. Unfortunately, many “behavioral cloning ” (Bain & Sammut, 1995; Pomerleau, 1989; LeCun et al., 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. These systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poor-quality robot performance. While planning algorithms have shown success in many real-world applications ranging from legged locomotion (Chestnutt et al., 2003) to outdoor unstructured navigation (Kelly et al., 2004; Stentz, 2009), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recently, a set of techniques has been developed that explore learning these functions from expert human demonstration.
Learning to Search: Structured Prediction Techniques for Imitation Learning
, 2009
"... Modern robots successfully manipulate objects, navigate rugged terrain, drive in urban settings, and play world-class chess. Unfortunately, programming these robots is challenging, timeconsuming and expensive; the parameters governing their behavior are often unintuitive, even when the desired behav ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Modern robots successfully manipulate objects, navigate rugged terrain, drive in urban settings, and play world-class chess. Unfortunately, programming these robots is challenging, timeconsuming and expensive; the parameters governing their behavior are often unintuitive, even when the desired behavior is clear and easily demonstrated. Inspired by successful end-to-end learning systems such as neural network controlled driving platforms (Pomerleau, 1989), learning-based “programming by demonstration ” has gained currency as a method to achieve intelligent robot behavior. Unfortunately, with highly structured algorithms at their core, modern robotic systems are hard to train using classical learning techniques. Rather than redefining robot architectures to accommodate existing learning algorithms, this thesis develops learning techniques that leverage the performance of modern robotic components. We begin with a discussion of a novel imitation learning framework we call Maximum Margin Planning which automates finding a cost function for optimal planning and control algorithms such as A*. In the linear setting, this framework has firm theoretical backing in the form of strong generalization and regret bounds. Further, we have developed practical nonlinear generalizations that are effective and efficient for real-world problems. This framework reduces imitation learning
Efficient Optimization of Control Libraries
, 2011
"... A popular approach to high dimensional control problems in robotics uses a library of candidate “maneuvers ” or “trajectories”[13, 28]. The library is either evaluated on a fixed number of candidate choices at runtime (e.g. path set selection for planning) or by iterating through a sequence of feasi ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
A popular approach to high dimensional control problems in robotics uses a library of candidate “maneuvers ” or “trajectories”[13, 28]. The library is either evaluated on a fixed number of candidate choices at runtime (e.g. path set selection for planning) or by iterating through a sequence of feasible choices until success is achieved (e.g. grasp selection). The performance of the library relies heavily on the content and order of the sequence of candidates. We propose a provably efficient method to optimize such libraries leveraging recent advances in optimizing sub-modular functions of sequences [29]. This approach is demonstrated on two important problems: mobile robot navigation and manipulator grasp set selection. In the first case, performance can be improved by choosing a subset of candidates which optimizes the metric under consideration (cost of traversal). In the second case, performance can be optimized by minimizing the depth the list is searched before a successful candidate is found. Our method can be used in both online and batch settings with provable performance guarantees, and can be run in an anytime manner to handle real-time constraints. 1
Constrained Manipulation Planning
, 2011
"... Every planning problem in robotics involves constraints. Whether the robot must avoid collision or joint limits, there are always states that are not permissible. Some constraints are straightforward to satisfy while others can be so stringent that feasible states are very difficult to find. What ma ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Every planning problem in robotics involves constraints. Whether the robot must avoid collision or joint limits, there are always states that are not permissible. Some constraints are straightforward to satisfy while others can be so stringent that feasible states are very difficult to find. What makes planning with constraints challenging is that, for many constraints, it is impossible or impractical to provide the planning algorithm with the allowed states explicitly; it must discover these states as it plans. The goal of this thesis is to develop a framework for representing and exploring feasible states in the context of manipulation planning. Planning for manipulation gives rise to a rich variety of tasks that include constraints on collisionavoidance, torque, balance, closed-chain kinematics, and end-effector pose. While many researchers have developed representations and strategies to plan with a specific constraint, the goal of this thesis is to develop a broad representation of constraints on a robot’s configuration and identify general strategies to manage these constraints during the planning process. Some of the most important constraints in manipulation planning are functions of the pose of the manipulator’s end-effector, so we devote a large part of this thesis to end-effector placement for grasping and transport tasks. We present an efficient approach to generating paths that uses Task Space Regions (TSRs) to specify manipulation
Contextual Sequence Prediction with Application to Control Library Optimization
"... Abstract—Sequence optimization, where the items in a list are ordered to maximize some reward has many applications such as web advertisement placement, search, and control libraries in robotics. Previous work in sequence optimization produces a static ordering that does not take any features of the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—Sequence optimization, where the items in a list are ordered to maximize some reward has many applications such as web advertisement placement, search, and control libraries in robotics. Previous work in sequence optimization produces a static ordering that does not take any features of the item or context of the problem into account. In this work, we propose a general approach to order the items within the sequence based on the context (e.g., perceptual information, environment description, and goals). We take a simple, efficient, reduction-based approach where the choice and order of the items is established by repeatedly learning simple classifiers or regressors for each “slot ” in the sequence. Our approach leverages recent work on submodular function maximization to provide a formal regret reduction from submodular sequence optimization to simple costsensitive prediction. We apply our contextual sequence prediction algorithm to optimize control libraries and demonstrate results on two robotics problems: manipulator trajectory prediction and mobile robot path planning. I.
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Online Learning of Uneven Terrain for Humanoid Bipedal Walking
"... We present a novel method to control a biped humanoid robot to walk on unknown inclined terrains, using an online learning algorithm to estimate in real-time the local terrain from proprioceptive and inertial sensors. Compliant controllers for the ankle joints are used to actively probe the surround ..."
Abstract
- Add to MetaCart
We present a novel method to control a biped humanoid robot to walk on unknown inclined terrains, using an online learning algorithm to estimate in real-time the local terrain from proprioceptive and inertial sensors. Compliant controllers for the ankle joints are used to actively probe the surrounding surface, and the measured sensor data are combined to explicitly learn the global inclination and local disturbances of the terrain. These estimates are then used to adaptively modify the robot locomotion and control parameters. Results from both a physically-realistic computer simulation and experiments on a commercially available small humanoid robot show that our method can rapidly adapt to changing surface conditions to ensure stable walking on uneven surfaces.
Batch, Off-policy and Model-free Apprenticeship Learning
"... Abstract. This paper addresses the problem of apprenticeship learning, that is learning control policies from demonstration by an expert. An efficient framework for it is inverse reinforcement learning (IRL). Based on the assumption that the expert maximizes a utility function, IRL aims at learning ..."
Abstract
- Add to MetaCart
Abstract. This paper addresses the problem of apprenticeship learning, that is learning control policies from demonstration by an expert. An efficient framework for it is inverse reinforcement learning (IRL). Based on the assumption that the expert maximizes a utility function, IRL aims at learning the underlying reward from example trajectories. Many IRL algorithms assume that the reward function is linearly parameterized and rely on the computation of some associated feature expectations, which is done through Monte Carlo simulation. However, this assumes to have full trajectories for the expert policy as well as at least a generative model for intermediate policies. In this paper, we introduce a temporal difference method, namely LSTD-µ, to compute these feature expectations. This allows extending apprenticeship learning to a batch and off-policy setting. 1
Practical Bipedal Walking Control on Uneven Terrain Using Surface Learning and Push Recovery
"... Abstract — Bipedal walking in human environments is made difficult by the unevenness of the terrain and by external disturbances. Most approaches to bipedal walking in such environments either rely upon a precise model of the surface or special hardware designed for uneven terrain. In this paper, we ..."
Abstract
- Add to MetaCart
Abstract — Bipedal walking in human environments is made difficult by the unevenness of the terrain and by external disturbances. Most approaches to bipedal walking in such environments either rely upon a precise model of the surface or special hardware designed for uneven terrain. In this paper, we present an alternative approach to stabilize the walking of an inexpensive, commercially-available, position-controlled humanoid robot in difficult environments. We use electrically compliant swing foot dynamics and onboard sensors to estimate the inclination of the local surface, and use a online learning algorithm to learn an adaptive surface model. Perturbations due to external disturbances or model errors are rejected by a hierarchical push recovery controller, which modulates three biomechanically motivated push recovery controllers according to the current estimated state. We use a physically realistic simulation with an articulated robot model and reinforcement learning algorithm to train the push recovery controller, and implement the learned controller on a commercial DARwIn-OP small humanoid robot. Experimental results show that this combined approach enables the robot to walk over unknown, uneven surfaces without falling down.
Part of the Robotics Commons Recommended Citation
, 2011
"... Every planning problem in robotics involves constraints. Whether the robot must avoid collision or joint limits, there are always states that are not permissible. Some constraints are straightforward to satisfy while others can be so stringent that feasible states are very difficult to find. What ma ..."
Abstract
- Add to MetaCart
Every planning problem in robotics involves constraints. Whether the robot must avoid collision or joint limits, there are always states that are not permissible. Some constraints are straightforward to satisfy while others can be so stringent that feasible states are very difficult to find. What makes planning with constraints challenging is that, for many constraints, it is impossible or impractical to provide the planning algorithm with the allowed states explicitly; it must discover these states as it plans. The goal of this thesis is to develop a framework for representing and exploring feasible states in the context of manipulation planning. Planning for manipulation gives rise to a rich variety of tasks that include constraints on collisionavoidance, torque, balance, closed-chain kinematics, and end-effector pose. While many researchers have developed representations and strategies to plan with a specific constraint, the goal of this thesis is to develop a broad representation of constraints on a robot’s configuration and identify general strategies to manage these constraints during the planning process. Some of the most important constraints in manipulation planning are functions of the pose of the manipulator’s end-effector, so we

