Results 1  10
of
58
Stacked Hierarchical Labeling
"... Abstract. In this work we propose a hierarchical approach for labeling semantic objects and regions in scenes. Our approach is reminiscent of early vision literature in that we use a decomposition of the image in order to encode relational and spatial information. In contrast to much existing work o ..."
Abstract

Cited by 62 (16 self)
 Add to MetaCart
(Show Context)
Abstract. In this work we propose a hierarchical approach for labeling semantic objects and regions in scenes. Our approach is reminiscent of early vision literature in that we use a decomposition of the image in order to encode relational and spatial information. In contrast to much existing work on structured prediction for scene understanding, we bypass a global probabilistic model and instead directly train a hierarchical inference procedure inspired by the message passing mechanics of some approximate inference procedures in graphical models. This approach mitigates both the theoretical and empirical difficulties of learning probabilistic models when exact inference is intractable. In particular, we draw from recent work in machine learning and break the complex inference process into a hierarchical series of simple machine learning subproblems. Each subproblem in the hierarchy is designed to capture the image and contextual statistics in the scene. This hierarchy spans coarsetofine regions and explicitly models the mixtures of semantic labels that may be present due to imperfect segmentation. To avoid cascading of errors and overfitting, we train the learning problems in sequence to ensure robustness to likely errors earlier in the inference sequence and leverage the stacking approach developed by Cohen et al. 1
Modeling Interaction via the Principle of Maximum Causal Entropy
"... The principle of maximum entropy provides a powerful framework for statistical models of joint, conditional, and marginal distributions. However, there are many important distributions with elements of interaction and feedback where its applicability has not been established. This work presents the ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
(Show Context)
The principle of maximum entropy provides a powerful framework for statistical models of joint, conditional, and marginal distributions. However, there are many important distributions with elements of interaction and feedback where its applicability has not been established. This work presents the principle of maximum causal entropy—an approach based on causally conditioned probabilities that can appropriately model the availability and influence of sequentially revealed side information. Using this principle, we derive models for sequential data with revealed information, interaction, and feedback, and demonstrate their applicability for statistically framing inverse optimal control and decision prediction tasks. 1.
Relative Entropy Inverse Reinforcement Learning
"... We consider the problem of imitation learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is optimally acting ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
(Show Context)
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is optimally acting in a Markov Decision Process (MDP). Most of the past work on IRL requires that a (near)optimal policy can be computed for different reward functions. However, this requirement can hardly be satisfied in systems with a large, or continuous, state space. In this paper, we propose a modelfree IRL algorithm, where the relative entropy between the empirical distribution of the stateaction trajectories under a baseline policy and their distribution under the learned policy is minimized by stochastic gradient descent. We compare this new approach to wellknown IRL algorithms using learned MDP models. Empirical results on simulated car racing, gridworld and ballinacup problems show that our approach is able to learn good policies from a small number of demonstrations. 1
Learning Trajectory Preferences for Manipulators via Iterative Improvement
"... We consider the problem of learning good trajectories for manipulation tasks. This is challenging because the criterion defining a good trajectory varies with users, tasks and environments. In this paper, we propose a coactive online learning framework for teaching robots the preferences of its use ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
(Show Context)
We consider the problem of learning good trajectories for manipulation tasks. This is challenging because the criterion defining a good trajectory varies with users, tasks and environments. In this paper, we propose a coactive online learning framework for teaching robots the preferences of its users for object manipulation tasks. The key novelty of our approach lies in the type of feedback expected from the user: the human user does not need to demonstrate optimal trajectories as training data, but merely needs to iteratively provide trajectories that slightly improve over the trajectory currently proposed by the system. We argue that this coactive preference feedback can be more easily elicited from the user than demonstrations of optimal trajectories, while, nevertheless, theoretical regret bounds of our algorithm match the asymptotic rates of optimal trajectory algorithms. We demonstrate the generalization ability of our algorithm on a variety of tasks, for whom, the preferences were not only influenced by the object being manipulated but also by the surrounding environment. 1.
Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain
, 2010
"... Rough terrain autonomous navigation continues to pose a challenge to the robotics community. Robust navigation by a mobile robot depends not only on the individual performance of perception and planning systems, but on how well these systems are coupled. When traversing complex unstructured terrain, ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
(Show Context)
Rough terrain autonomous navigation continues to pose a challenge to the robotics community. Robust navigation by a mobile robot depends not only on the individual performance of perception and planning systems, but on how well these systems are coupled. When traversing complex unstructured terrain, this coupling (in the form of a cost function) has a large impact on robot behavior and performance, necessitating a robust design. This paper explores the application of Learning from Demonstration to this task for the Crusher autonomous navigation platform. Using expert examples of desired navigation behavior, mappings from both online and offline perceptual data to planning costs are learned. Challenges in adapting existing techniques to complex online planning systems and imperfect demonstration are addressed, along with additional practical considerations. The benefits to autonomous performance of this approach are examined, as well as the decrease in necessary designer effort. Experimental results are presented from autonomous traverses through complex natural environments. 1
Imitation Learning in Relational Domains: A FunctionalGradient Boosting Approach
"... Imitation learning refers to the problem of learning how to behave by observing a teacher in action. We consider imitation learning in relational domains, in which there is a varying number of objects and relations among them. In prior work, simple relational policies are learned by viewing imitatio ..."
Abstract

Cited by 16 (11 self)
 Add to MetaCart
Imitation learning refers to the problem of learning how to behave by observing a teacher in action. We consider imitation learning in relational domains, in which there is a varying number of objects and relations among them. In prior work, simple relational policies are learned by viewing imitation learning as supervised learning of a function from states to actions. For propositional worlds, functional gradient methods have been proved to be beneficial. They are simpler to implement than most existing methods, more efficient, more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the form of the function. Building on recent generalizations of functional gradient boosting to relational representations, we implement a functional gradient boosting approach to imitation learning in relational domains. In particular, given a set of traces from the human teacher, our system learns a policy in the form of a set of relational regression trees that additively approximate the functional gradients. The use of multiple additive trees combined with relational representation allows for learning more expressive policies than what has been done before. We demonstrate the usefulness of our approach in several different domains. 1
Nonlinear Inverse Reinforcement Learning with Gaussian Processes
"... We present a probabilistic algorithm for nonlinear inverse reinforcement learning. The goal of inverse reinforcement learning is to learn the reward function in a Markov decision process from expert demonstrations. While most prior inverse reinforcement learning algorithms represent the reward as a ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
We present a probabilistic algorithm for nonlinear inverse reinforcement learning. The goal of inverse reinforcement learning is to learn the reward function in a Markov decision process from expert demonstrations. While most prior inverse reinforcement learning algorithms represent the reward as a linear combination of a set of features, we use Gaussian processes to learn the reward as a nonlinear function, while also determining the relevance of each feature to the expert’s policy. Our probabilistic algorithm allows complex behaviors to be captured from suboptimal stochastic demonstrations, while automatically balancing the simplicity of the learned reward structure against its consistency with the observed actions. 1
A novel method for learning policies from variable constraint data
, 2009
"... Many everyday human skills can be framed in terms of performing some task subject to constraints imposed by the environment. Constraints are usually unobservable and frequently change between contexts. In this paper, we present a novel approach for learning (unconstrained) control policies from mov ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
Many everyday human skills can be framed in terms of performing some task subject to constraints imposed by the environment. Constraints are usually unobservable and frequently change between contexts. In this paper, we present a novel approach for learning (unconstrained) control policies from movement data, where observations come from movements under different constraints. As a key ingredient, we introduce a small but highly effective modification to the standard risk functional, allowing us to make a meaningful comparison between the estimated policy and constrained observations. We demonstrate our approach on systems of varying complexity, including kinematic data from the ASIMO humanoid robot with 27 degrees of freedom, and present results for learning from human demonstration.
Computational Rationalization: The Inverse Equilibrium Problem
"... Modeling the purposeful behavior of imperfect agents from a small number of observations is a challenging task. When restricted to the singleagent decisiontheoretic setting, inverse optimal control techniques assume that observed behavior is an approximately optimal solution to an unknown decision ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
Modeling the purposeful behavior of imperfect agents from a small number of observations is a challenging task. When restricted to the singleagent decisiontheoretic setting, inverse optimal control techniques assume that observed behavior is an approximately optimal solution to an unknown decision problem. These techniques learn a utility function that explains the example behavior and can then be used to accurately predict or imitate future behavior in similar observed or unobserved situations. In this work, we consider similar tasks in competitive and cooperative multiagent domains. Here, unlike singleagent settings, a player cannot myopically maximize its reward — it must speculate on how the other agents may act to influence the game’s outcome. Employing the gametheoretic notion of regret and the principle of maximum entropy, we introduce a technique for predicting and generalizing behavior, as well as recovering a reward function in these domains. 1.
Optimization and learning for rough terrain legged locomotion
 I. J. Robotic Res
"... We present a novel approach to legged locomotion over rough terrain that is thoroughly rooted in optimization. This approach relies on a hierarchy of fast, anytime algorithms to plan a set of footholds, along with the dynamic body motions required to execute them. Components within the planning fram ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
We present a novel approach to legged locomotion over rough terrain that is thoroughly rooted in optimization. This approach relies on a hierarchy of fast, anytime algorithms to plan a set of footholds, along with the dynamic body motions required to execute them. Components within the planning framework coordinate to exchange plans, costtogo estimates, and “certificates ” that ensure the output of an abstract highlevel planner can be realized by lower layers of the hierarchy. The burden of careful engineering of cost functions to achieve desired performance is substantially mitigated by a simple inverse optimal control technique. Robustness is achieved by realtime replanning of the full trajectory, augmented by reflexes and feedback control. We demonstrate the successful application of our approach in guiding the LittleDog quadruped robot over a variety of rough terrains. Other novel aspects of our past research efforts include a variety of pioneering inverse optimal control techniques as well as a system for planning using arbitrary prerecorded robot behaviors. 1