Results 11  20
of
216
Pairwise Preference Learning and Ranking
 Proceedings of the 14th European Conference on Machine Learning
, 2003
"... We consider supervised learning of a ranking function, which is a mapping from instances to total orders over a set of labels (options). The training information consists of examples with partial (and possibly inconsistent) information about their associated rankings. From these, we induce a rank ..."
Abstract

Cited by 40 (11 self)
 Add to MetaCart
We consider supervised learning of a ranking function, which is a mapping from instances to total orders over a set of labels (options). The training information consists of examples with partial (and possibly inconsistent) information about their associated rankings. From these, we induce a ranking function by reducing the original problem to a number of binary classification problems, one for each pair of labels. The main objective of this work is to investigate the tradeoff between the quality of the induced ranking function and the computational complexity of the algorithm, both depending on the amount of preference information given for each example. To this end, we present theoretical results on the complexity of pairwise preference learning.
Dynamic imitation in a humanoid robot through nonparametric probabilistic inference
 In Proceedings of Robotics: Science and Systems (RSS’06
, 2006
"... Abstract — We tackle the problem of learning imitative wholebody motions in a humanoid robot using probabilistic inference in Bayesian networks. Our inferencebased approach affords a straightforward method to exploit rich yet uncertain prior information obtained from human motion capture data. Dyna ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
Abstract — We tackle the problem of learning imitative wholebody motions in a humanoid robot using probabilistic inference in Bayesian networks. Our inferencebased approach affords a straightforward method to exploit rich yet uncertain prior information obtained from human motion capture data. Dynamic imitation implies that the robot must interact with its environment and account for forces such as gravity and inertia during imitation. Rather than explicitly modeling these forces and the body of the humanoid as in traditional approaches, we show that stable imitative motion can be achieved by learning a sensorbased representation of dynamic balance. Bayesian networks provide a sound theoretical framework for combining prior kinematic information (from observing a human demonstrator) with prior dynamic information (based on previous experience) to model and subsequently infer motions which, with high probability, will be dynamically stable. By posing the problem as one of inference in a Bayesian network, we show that methods developed for approximate inference can be leveraged to efficiently perform inference of actions. Additionally, by using nonparametric inference and a nonparametric (Gaussian process) forward model, our approach does not make any strong assumptions about the physical environment or the mass and inertial properties of the humanoid robot. We propose an iterative, probabilistically constrained algorithm for exploring the space of motor commands and show that the algorithm can quickly discover dynamically stable actions for wholebody imitation of human motion. Experimental results based on simulation and subsequent execution by a HOAP2 humanoid robot demonstrate that our algorithm is able to imitate a human performing actions such as squatting and a onelegged balance. I.
Bayesian models of human action understanding
 Advances in Neural Information Processing Systems 18
, 2006
"... We present a Bayesian framework for explaining how people reason about and predict the actions of an intentional agent, based on observing its behavior. Actionunderstanding is cast as a problem of inverting a probabilistic generative model, which assumes that agents tend to act rationally in order ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
We present a Bayesian framework for explaining how people reason about and predict the actions of an intentional agent, based on observing its behavior. Actionunderstanding is cast as a problem of inverting a probabilistic generative model, which assumes that agents tend to act rationally in order to achieve their goals given the constraints of their environment. Working in a simple spriteworld domain, we show how this model can be used to infer the goal of an agent and predict how the agent will act in novel situations or when environmental constraints change. The model provides a qualitative account of several kinds of inferences that preverbal infants have been shown to perform, and also fits quantitative predictions that adult observers make in a new experiment. 1
Navigate Like a Cabbie: Probabilistic Reasoning from Observed ContextAware Behavior
"... We present PROCAB, an efficient method for Probabilistically Reasoning from Observed ContextAware Behavior. It models the contextdependent utilities and underlying reasons that people take different actions. The model generalizes to unseen situations and scales to incorporate rich contextual infor ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
We present PROCAB, an efficient method for Probabilistically Reasoning from Observed ContextAware Behavior. It models the contextdependent utilities and underlying reasons that people take different actions. The model generalizes to unseen situations and scales to incorporate rich contextual information. We train our model using the route preferences of 25 taxi drivers demonstrated in over 100,000 miles of collected data, and demonstrate the performance of our model by inferring: (1) decision at next intersection, (2) route to known destination, and (3) destination given partially traveled route.
Reinforcement learning with limited reinforcement: Using bayes risk for active learning in pomdps. ISAIM (online proceedings
, 2008
"... Partially Observable Markov Decision Processes (POMDPs) have succeeded in planning domains that require balancing actions that increase an agent’s knowledge and actions that increase an agent’s reward. Unfortunately, most POMDPs are defined with a large number of parameters which are difficult to sp ..."
Abstract

Cited by 22 (7 self)
 Add to MetaCart
Partially Observable Markov Decision Processes (POMDPs) have succeeded in planning domains that require balancing actions that increase an agent’s knowledge and actions that increase an agent’s reward. Unfortunately, most POMDPs are defined with a large number of parameters which are difficult to specify only from domain knowledge. In this paper, we present an approximation approach that allows us to treat the POMDP model parameters as additional hidden state in a “modeluncertainty ” POMDP. Coupled with modeldirected queries, our planner actively learns good policies. We demonstrate our approach on several POMDP problems. 1.
Active Learning for Reward Estimation in Inverse Reinforcement Learning
, 2009
"... Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonst ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at “arbitrary” states. The purpose of our algorithm is to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert. We also discuss the use of our algorithm in higher dimensional problems, using both Monte Carlo and gradient methods. We present illustrative results of our algorithm in several simulated examples of different complexities.
Apprenticeship learning for helicopter control
 Communications of the ACM
"... doi:10.1145/1538788.1538812 Autonomous helicopter flight is widely regarded to be a highly challenging control problem. As helicopters are highly unstable and exhibit complicated dynamical behavior, it is particularly difficult to design controllers that achieve high performance over a broad flight ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
doi:10.1145/1538788.1538812 Autonomous helicopter flight is widely regarded to be a highly challenging control problem. As helicopters are highly unstable and exhibit complicated dynamical behavior, it is particularly difficult to design controllers that achieve high performance over a broad flight regime. While these aircraft are notoriously difficult to control, there are expert human pilots who are nonetheless capable of demonstrating a wide variety of maneuvers, including aerobatic maneuvers at the edge of the helicopter’s performance envelope. In this paper, we present algorithms for modeling and control that leverage these demonstrations to build highperformance control systems for autonomous helicopters. More specifically, we detail our experiences with the Stanford Autonomous Helicopter, which is now capable of extreme aerobatic flight meeting or exceeding the performance of our own expert pilot. 1.
Fitting and compilation of multiagent models through piecewise linear functions
 In AAMAS
, 2004
"... Decisiontheoretic models have become increasingly popular as a basis for solving agent and multiagent problems, due to their ability to quantify the complex uncertainty and preferences that pervade most nontrivial domains. However, this quantitative nature also complicates the problem of constructi ..."
Abstract

Cited by 17 (13 self)
 Add to MetaCart
Decisiontheoretic models have become increasingly popular as a basis for solving agent and multiagent problems, due to their ability to quantify the complex uncertainty and preferences that pervade most nontrivial domains. However, this quantitative nature also complicates the problem of constructing models that accurately represent an existing agent or multiagent system, leading to the common question, “Where do the numbers come from? ” In this work, we present a method for exploiting knowledge about the qualitative structure of a problem domain to automatically derive the correct quantitative values that would generate an observed pattern of agent behavior. In particular, we propose the use of piecewise linear functions to represent probability distributions and utility functions with a structure that we can then exploit to more efficiently compute value functions. More importantly, we have designed algorithms that can (for example) take a sequence of actions and automatically generate a reward function that would generate that behavior within our agent model. This algorithm allows us to efficiently fit an agent or multiagent model to observed behavior. We illustrate the application of this framework with examples in multiagent modeling and social simulation, using decisiontheoretic models drawn from the alphabet soup of existing research (e.g., MDPs, POMDPs, DecPOMDPs, ComMTDPs). 1.
Targeting specific distributions of trajectories
 in MDPs,” Twenty First National Conference on Artificial Intelligence
, 2006
"... We define TTDMDPs, a novel class of Markov decision processes where the traditional goal of an agent is changed from finding an optimal trajectory through a state space to realizing a specified distribution of trajectories through the space. After motivating this formulation, we show how to convert ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
We define TTDMDPs, a novel class of Markov decision processes where the traditional goal of an agent is changed from finding an optimal trajectory through a state space to realizing a specified distribution of trajectories through the space. After motivating this formulation, we show how to convert a traditional MDP into a TTDMDP. We derive an algorithm for finding nondeterministic policies by constructing a trajectory tree that allows us to compute locallyconsistent policies. We specify the necessary conditions for solving the problem exactly and present a heuristic algorithm for constructing policies when an exact answer is impossible or impractical. We present empirical results for our algorithm in two domains: a synthetic grid world and stories in an interactive drama or game.
Learning nonparametric models for probabilistic imitation
 in Advances in Neural Information Processing Systems 19 (NIPS’06
, 2007
"... Learning by imitation represents an important mechanism for rapid acquisition of new behaviors in humans and robots. A critical requirement for learning by imitation is the ability to handle uncertainty arising from the observation process as well as the imitator’s own dynamics and interactions with ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
Learning by imitation represents an important mechanism for rapid acquisition of new behaviors in humans and robots. A critical requirement for learning by imitation is the ability to handle uncertainty arising from the observation process as well as the imitator’s own dynamics and interactions with the environment. In this paper, we present a new probabilistic method for inferring imitative actions that takes into account both the observations of the teacher as well as the imitator’s dynamics. Our key contribution is a nonparametric learning method which generalizes to systems with very different dynamics. Rather than relying on a known forward model of the dynamics, our approach learns a nonparametric forward model via exploration. Leveraging advances in approximate inference in graphical models, we show how the learned forward model can be directly used to plan an imitating sequence. We provide experimental results for two systems: a biomechanical model of the human arm and a 25degreesoffreedom humanoid robot. We demonstrate that the proposed method can be used to learn appropriate motor inputs to the model arm which imitates the desired movements. A second set of results demonstrates dynamically stable fullbody imitation of a human teacher by the humanoid robot. 1