## Learning to Search: Functional Gradient Techniques for Imitation Learning (2009)

Venue: | Autonomous Robots |

Citations: | 46 - 18 self |

### BibTeX

@ARTICLE{Ratliff09learningto,

author = {Nathan D. Ratliff and David Silver and J. Andrew Bagnell},

title = {Learning to Search: Functional Gradient Techniques for Imitation Learning},

journal = {Autonomous Robots},

year = {2009},

pages = {25--53}

}

### OpenURL

### Abstract

Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling “programming by demonstration ” for developing high-performance robotic systems. Unfortunately, many “behavioral cloning ” (Bain & Sammut, 1995; Pomerleau, 1989; LeCun et al., 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. These systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poor-quality robot performance. While planning algorithms have shown success in many real-world applications ranging from legged locomotion (Chestnutt et al., 2003) to outdoor unstructured navigation (Kelly et al., 2004; Stentz, 2009), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recently, a set of techniques has been developed that explore learning these functions from expert human demonstration.

### Citations

1213 |
Markov decision processes: Discrete stochastic dynamic programming
- Puterman
- 2009
(Show Context)
Citation Context ...his notation is simply a matter of convenience for describing the algorithm; there is a one-toone correspondence between the set of stationary Markovian policies and the set of feasible flow vectors (=-=Puterman, 1994-=-). The constraints can, therefore, be satisfied simply by invoking a generic MDP solver (i.e. a planning algorithm). We denote the set of all feasible flow vectors for a given MDP as G. At a high leve... |

569 | Greedy function approximation: A gradient boosting machine
- Friedman
(Show Context)
Citation Context ...xponentiated functional gradient method that generalizes the exponentiated gradient algorithm (Kivinen & Warmuth, 1997) to function spaces in a way analogous to gradient boosting (Mason et al., 1999; =-=Friedman, 1999-=-a). LEARCH has performed successfully on a wide range of real-world structured prediction and imitation learning problems including programming by demonstration (Ratliff et al., 2006a; Silver et al., ... |

528 | Greed is good: algorithmic results for sparse approximation
- Tropp
(Show Context)
Citation Context ...& Lugosi, 2006) demonstrate that the exponentiated gradient algorithm is closely related to the growing body of work in the signal processing community on sparsity and ‖ · ‖1 regularized regression. (=-=Tropp, 2004-=-; Donoho & Elad, 2003) Exponentiated gradient achieves this by rapidly increasing the weight on a few important predictors while quickly decreasing the weights on a bulk of irrelevant features. The un... |

455 | Locally weighted learning
- Atkeson, Moore, et al.
- 1997
(Show Context)
Citation Context ... paper; we point the interested reader to (Argall et al., to appear) for a detailed overview. In particular, we do not discuss the well known techniques of Schaal and Atkeson (Schaal & Atkeson, 1994; =-=Atkeson et al., 1995-=-), whose algorithms are not designed to generalize across differing problems. The first notable examples of imitation learning came in the form of action prediction: predict the best action to take no... |

439 | Max-margin Markov networks
- Taskar, Guestrin, et al.
- 2003
(Show Context)
Citation Context ... ξi}. 8: end for 9: end while 10: end procedure 2 Maximum margin planning Maximum margin planning (MMP) is a framework for imitation learning based on maximum margin structured prediction techniques (=-=Taskar et al., 2003-=-; Taskar et al., 2006). We initially introduced the framework in (Ratliff et al., 2006a) for linear hypothesis spaces in conjunction with an efficient optimization procedure based on the subgradient m... |

274 |
Minimization Methods for Nondifferentiable Functions, in
- Shor
- 1985
(Show Context)
Citation Context ...ially introduced the framework in (Ratliff et al., 2006a) for linear hypothesis spaces in conjunction with an efficient optimization procedure based on the subgradient method for convex optimization (=-=Shor, 1985-=-). This linear setting is well understood theoretically. In particular, contrary to many competing imitation learning frameworks, this linear margin-based formulation allows the straightforward deriva... |

247 | Exponentiated gradient versus gradient descent for linear predictors
- Kivinen, Warmuth
- 1997
(Show Context)
Citation Context ...ed algorithms, which we term LEArning to seaRCH (LEARCH), for solving MMP. This family includes a novel exponentiated functional gradient method that generalizes the exponentiated gradient algorithm (=-=Kivinen & Warmuth, 1997-=-) to function spaces in a way analogous to gradient boosting (Mason et al., 1999; Friedman, 1999a). LEARCH has performed successfully on a wide range of real-world structured prediction and imitation ... |

243 | Apprenticeship learning via inverse reinforcement learning
- Abbeel, Ng
- 2004
(Show Context)
Citation Context ...takes a particular action from a given state. The first approaches to recovering a cost function for general, discrete MDPs by observing a demonstrated behavior were suggested in (Ng & Russell, 2000; =-=Abbeel & Ng, 2004-=-). In this formulation, it is assumed that the agent observed is truly acting in the MDP and the goal is to find a policy that accrues the same long-term cumulative reward as the demonstrated behavior... |

218 |
Optimal Control: Linear Quadratic Methods
- Anderson, Moore
- 1989
(Show Context)
Citation Context ...trol and planning algorithms has been developed for addressing the imitation learning problem. The concept is to leverage the older notion of inverse optimal control (Kalman, 1964; Boyd et al., 1994; =-=Anderson & Moore, 1990-=-): imitation learning is achieved by first recovering a cost function by observing an expert perform a task, and then executing new tasks using an optimal controller/planner to minimize that learned c... |

198 | Algorithms for inverse reinforcement learning
- Ng, Russell
- 2000
(Show Context)
Citation Context ...eived when an agent takes a particular action from a given state. The first approaches to recovering a cost function for general, discrete MDPs by observing a demonstrated behavior were suggested in (=-=Ng & Russell, 2000-=-; Abbeel & Ng, 2004). In this formulation, it is assumed that the agent observed is truly acting in the MDP and the goal is to find a policy that accrues the same long-term cumulative reward as the de... |

172 |
B.: Linear matrix inequalities in system and control theory
- Boyd, Ghaoui, et al.
- 1994
(Show Context)
Citation Context ...xisting optimal control and planning algorithms has been developed for addressing the imitation learning problem. The concept is to leverage the older notion of inverse optimal control (Kalman, 1964; =-=Boyd et al., 1994-=-; Anderson & Moore, 1990): imitation learning is achieved by first recovering a cost function by observing an expert perform a task, and then executing new tasks using an optimal controller/planner to... |

165 | C.: Learning structured prediction models: A large margin approach
- Taskar, Chatalbashev, et al.
(Show Context)
Citation Context ...ponential number of constraints since the number of policies |G| is exponential in the number of state-action pairs |S||A|. However, following the logic originally introduced in (Taskar et al., 2003; =-=Taskar et al., 2005-=-) we note that, for a given example i, the left-hand-side of equation 2 is constant across all policies µ ∈ Gi. Therefore, if the constraint holds for the single policy that minimizes the right-hand-s... |

143 |
On Learning, Representing, and Generalizing a Task in a Humanoid Robot
- Calinon, Guenter, et al.
- 2007
(Show Context)
Citation Context ...is the work of Schaal and Atkeson (Schaal & Atkeson, 1994; Atkeson et al., 1995) whose algorithms develop control strategies within a given environment using trajectory demonstrations. Additionally, (=-=Calinon et al., 2007-=-) and (Hersch et al., 2008) have developed techniques that allow generalization to alternative goals within a given domain. Our work focuses on the broader problem of generalizing control between dist... |

142 |
Functional gradient techniques for combining hypotheses
- Mason, Baxter, et al.
- 1999
(Show Context)
Citation Context ...y includes a novel exponentiated functional gradient method that generalizes the exponentiated gradient algorithm (Kivinen & Warmuth, 1997) to function spaces in a way analogous to gradient boosting (=-=Mason et al., 1999-=-; Friedman, 1999a). LEARCH has performed successfully on a wide range of real-world structured prediction and imitation learning problems including programming by demonstration (Ratliff et al., 2006a;... |

126 |
ALVINN: An Autonomous Land Vehicle in a Neural Network
- Pomerleau
- 1989
(Show Context)
Citation Context ...machine learning paradigm offers the promise of enabling “programming by demonstration” for developing high-performance robotic systems. Unfortunately, many “behavioral cloning” (Bain & Sammut, 1995; =-=Pomerleau, 1989-=-; LeCun et al., 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. ... |

106 | Maximum margin planning
- Ratliff, Bagnell, et al.
- 2006
(Show Context)
Citation Context ...lgorithms apply an inverse optimal control approach to find a cost function for which planned behavior mimics an expert’s demonstration. The work we present extends the Maximum Margin Planning (MMP) (=-=Ratliff et al., 2006-=-a) framework to admit learning of more powerful, non-linear cost functions. These algorithms, known collectively as LEARCH (LEArning to seaRCH ), are simpler to implement than most existing methods, m... |

94 | Automatic grasp planning using shape primitives
- Miller, Knoop, et al.
- 2003
(Show Context)
Citation Context ...on features that locally describe the relationship between the hand and the object, and choosing the optimal scoring grasp. This grasp planning paradigm parallels that of the popular GraspIt! system (=-=Miller et al., 2003-=-). In total, our predictor chose among 2, 496 grasp candidates. We applied the exponentiated LEARCH algorithm to generalize the grasping behavior exemplified by a set of training examples demonstrated... |

92 |
Robot Juggling: Implementation of Memory-Based Learning
- Schaal, Atkeson
- 1994
(Show Context)
Citation Context ...beyond the scope of this paper; we point the interested reader to (Argall et al., to appear) for a detailed overview. In particular, we do not discuss the well known techniques of Schaal and Atkeson (=-=Schaal & Atkeson, 1994-=-; Atkeson et al., 1995), whose algorithms are not designed to generalize across differing problems. The first notable examples of imitation learning came in the form of action prediction: predict the ... |

70 | Boosting as a regularized path to a maximum margin classifier
- Rosset, Zhu, et al.
- 2004
(Show Context)
Citation Context ...sion to simplify the requirements on the corresponding optimization algorithm. Boosting-type functional gradient descent procedures often admit regularization path arguments of the type discussed in (=-=Rosset et al., 2004-=-). These arguments state that the number of boosting steps executed determines the effective size or complexity of the model class under consideration. Early stopping, therefore, plays a similar role ... |

68 | Maximum entropy inverse reinforcement learning
- Ziebart, Maas, et al.
(Show Context)
Citation Context ...s where we choose randomly which controller to execute on the initial step) that achieve the desired feature counts. Such policies can differ wildly in the behavior they actually manifest. Recently, (=-=Ziebart et al., 2008-=-) resolved these difficulties in the feature count matching approach by connecting the inverse optimal control problem to probabilistic modeling and the Maximum Entropy Method (Jaynes, 2003) and provi... |

66 | Approximate Solutions to Markov Decision Processes
- Gordon
- 1999
(Show Context)
Citation Context ...., 2006a), we additionally derive a compact quadratic program (QP) for optimizing the dual of the convex program given in Equation 5 using an linear programming (LP) formulation of the MDP inference (=-=Gordon, 1999-=-). For many problems, this LP, in addition to the resulting QP, is very large and impractical to solve in practice. We defer the details of this formulation to the reference. 2.3.1 Subgradients and th... |

65 |
Prediction, learning, and games
- Cesa-Bianchi, Lugosi
- 2006
(Show Context)
Citation Context ...lutions, the key benefit enjoyed by the exponentiated gradient algorithm is a robustness to large numbers of potentially irrelevant features. In particular, powerful results (Kivinen & Warmuth, 1997; =-=Cesa-Bianchi & Lugosi, 2006-=-) demonstrate that the exponentiated gradient algorithm is closely related to the growing body of work in the signal processing community on sparsity and ‖ · ‖1 regularized regression. (Tropp, 2004; D... |

62 | M.: Online subgradient methods for structured prediction
- Ratliff, Bagnell, et al.
- 2007
(Show Context)
Citation Context ...ange of real-world structured prediction and imitation learning problems including programming by demonstration (Ratliff et al., 2006a; Silver et al., 2008), LADAR classification (Munoz et al., 2008; =-=Ratliff et al., 2007-=-a), sequence prediction (Ratliff et al., 2007a), and heuristic learning (Ratliff et al., 2006b). Figure 1 depicts three robotic platforms on which these algorithms have been applied. LEARCH addresses ... |

61 | T.: Regularized least-squares classification
- Rifkin, Yeo, et al.
- 2003
(Show Context)
Citation Context ...e Maximum Margin Planning (MMP) objective: R(w) = 1 N N∑ ( w T Fiµi − min{w µ∈Gi T Fiµ − l T i µ} i=1 ) + λ 2 ‖w‖2 . (6) This convex5 objective function takes the form of a regularized risk function (=-=Rifkin & Poggio, 2003-=-); its two terms trade off data fit with hypothesis complexity. We emphasize that this objective function forms an upper bound on our structured loss function L(µi, µ) = lT i µ in a way that generaliz... |

55 |
A Survey of Robot Learning from Demonstration,” Robot
- Argall, Chernova, et al.
- 2009
(Show Context)
Citation Context ...t form the foundation for this work. A full survey of the numerous approaches to imitation learning studied in the literature is beyond the scope of this paper, but we point the interested reader to (=-=Argall et al., 2009-=-) for a detailed overview. Of note is the work of Schaal and Atkeson (Schaal & Atkeson, 1994; Atkeson et al., 1995) whose algorithms develop control strategies within a given environment using traject... |

54 | Boosting structured prediction for imitation learning
- Ratliff, Bradley, et al.
- 2007
(Show Context)
Citation Context ...lgorithms apply an inverse optimal control approach to find a cost function for which planned behavior mimics an expert’s demonstration. The work we present extends the Maximum Margin Planning (MMP) (=-=Ratliff et al., 2006-=-a) framework to admit learning of more powerful, non-linear cost functions. These algorithms, known collectively as LEARCH (LEArning to seaRCH ), are simpler to implement than most existing methods, m... |

48 | Structured Learning with Approximate Inference. NIPS
- Kulesza, Pereira
- 2007
(Show Context)
Citation Context ...2009), demonstrate that the subgradient method and its functional gradient generalizations perform well on large scale LADAR classification problems that require approximate inference. Additionally, (=-=Kulesza & Pereira, 2008-=-) demonstrates that training with approximate inference implemented using an LP relaxation both works well in practice and facilitates a characterization of approximate convergence. 3 Nonlinear techni... |

46 | Apprenticeship learning using inverse reinforcement learning and gradient methods
- Neu, Szepesvári
(Show Context)
Citation Context ...near MMP algorithms suffered from a sensitivity to the relative scaling of its features, a problem common to margin based learning formulations including alternative IRL formulations as discussed in (=-=Neu & Szepesvari, 2007-=-). 9 Section 4.3 demonstrates a solution to this problem that arises naturally from this regression-based correlation measure when used with a hypothesis space of linear functions. The functional grad... |

44 | Footstep planning for the honda asimo humanoid
- Chestnutt, Lau, et al.
- 2005
(Show Context)
Citation Context ...rediction Recent work has demonstrated that decomposing legged locomotion into separate footstep planners and execution controllers is an effective strategy for many problems (Chestnutt et al., 2003; =-=Chestnutt et al., 2005-=-). The footstep planner finds a sequence of feasible footstep across the terrain, and the execution controller finds a trajectory through the full body configuration space of the robot that successful... |

42 | Planning biped navigation strategies in complex environments
- Chestnutt, Kuffner, et al.
- 2003
(Show Context)
Citation Context ...f a supervised learning approach often leads to myopic and poor-quality robot performance. While planning algorithms have shown success in many real-world applications ranging from legged locomotion (=-=Chestnutt et al., 2003-=-) to outdoor unstructured navigation (Kelly et al., 2004; Stentz, 2009), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. S... |

38 | Using interpolation to improve path planning: the Field D* algorithm
- Ferguson, Stentz
- 2006
(Show Context)
Citation Context ... in order to assign features to each state. For planners that operate in a continuous state space, attention must be paid during discretization of the plan. For example the interpolated A* algorithm (=-=Ferguson & Stentz, 2006-=-) used in Section 6 generates paths that are interpolated across 2D cell boundaries. Therefore, the distance traveled through a cell may be anywhere in the interval (0, √ 2] (in cells). As the cost of... |

37 |
Dynamical system modulation for robot learning via kinesthetic demonstrations
- HERSCH, GUENTER, et al.
- 2008
(Show Context)
Citation Context ...tkeson (Schaal & Atkeson, 1994; Atkeson et al., 1995) whose algorithms develop control strategies within a given environment using trajectory demonstrations. Additionally, (Calinon et al., 2007) and (=-=Hersch et al., 2008-=-) have developed techniques that allow generalization to alternative goals within a given domain. Our work focuses on the broader problem of generalizing control between distinct domains; we therefore... |

34 | A.Y.: Hierarchical apprenticeship learning with application to quadruped locomotion. In: NIPS
- Kolter, Abbeel, et al.
- 2007
(Show Context)
Citation Context ... of inverse optimal control algorithms to robotic systems. Sections 5 and 6 present some of these case studies; more recently, extensions to hierarchical learning problems for quadrupedal locomotion (=-=Kolter et al., 2008-=-) have additionally supported inverse optimal control’s status as a preferred approach to imitation learning. 1.2 An implementational introduction to LEArning to seaRCH ∈ Rd . This feature vector repr... |

34 |
Off-Road Obstacle Avoidance through End-to-End Learning
- Lecun, Muller, et al.
- 2006
(Show Context)
Citation Context ...paradigm offers the promise of enabling “programming by demonstration” for developing high-performance robotic systems. Unfortunately, many “behavioral cloning” (Bain & Sammut, 1995; Pomerleau, 1989; =-=LeCun et al., 2006-=-) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. These systems are of... |

29 |
Maximal sparsity representation via l1 minimization
- Donoho, Elad
- 2003
(Show Context)
Citation Context ...6) demonstrate that the exponentiated gradient algorithm is closely related to the growing body of work in the signal processing community on sparsity and ‖ · ‖1 regularized regression. (Tropp, 2004; =-=Donoho & Elad, 2003-=-) Exponentiated gradient achieves this by rapidly increasing the weight on a few important predictors while quickly decreasing the weights on a bulk of irrelevant features. The unnormalized KL prior f... |

28 | Structured Prediction via the Extragradient Method
- Taskar, Lacoste-julien, et al.
- 2005
(Show Context)
Citation Context ...nd while 10: end procedure 2 Maximum margin planning Maximum margin planning (MMP) is a framework for imitation learning based on maximum margin structured prediction techniques (Taskar et al., 2003; =-=Taskar et al., 2006-=-). We initially introduced the framework in (Ratliff et al., 2006a) for linear hypothesis spaces in conjunction with an efficient optimization procedure based on the subgradient method for convex opti... |

23 | Experimental analysis of overhead data processing to support long range navigation
- Silver, Sofman, et al.
(Show Context)
Citation Context ...tal elevation maps, and even 3-D LADAR scans. Much of this data is freely available at lower resolutions, and is commercially available at increased resolution. Previous work (Vandapel et al., 2003) (=-=Silver et al., 2006-=-) has demonstrated the effectiveness of overhead data for use in route planning. Overhead imagery, elevation maps, and point clouds are processed into features stored in 2-D grids. These feature maps ... |

20 |
When is a linear control system optimal ?, Trans
- KALMAN
- 1964
(Show Context)
Citation Context ...n leveraging existing optimal control and planning algorithms has been developed for addressing the imitation learning problem. The concept is to leverage the older notion of inverse optimal control (=-=Kalman, 1964-=-; Boyd et al., 1994; Anderson & Moore, 1990): imitation learning is achieved by first recovering a cost function by observing an expert perform a task, and then executing new tasks using an optimal co... |

19 | Directional associative markov network for 3-d point cloud classification
- Munoz, Vandapel, et al.
- 2008
(Show Context)
Citation Context ...essfully on a wide range of real-world structured prediction and imitation learning problems including programming by demonstration (Ratliff et al., 2006a; Silver et al., 2008), LADAR classification (=-=Munoz et al., 2008-=-; Ratliff et al., 2007a), sequence prediction (Ratliff et al., 2007a), and heuristic learning (Ratliff et al., 2006b). Figure 1 depicts three robotic platforms on which these algorithms have been appl... |

19 | High performance outdoor navigation from overhead data using imitation learning
- Silver, Bagnell, et al.
- 2008
(Show Context)
Citation Context ... Friedman, 1999a). LEARCH has performed successfully on a wide range of real-world structured prediction and imitation learning problems including programming by demonstration (Ratliff et al., 2006a; =-=Silver et al., 2008-=-), LADAR classification (Munoz et al., 2008; Ratliff et al., 2007a), sequence prediction (Ratliff et al., 2007a), and heuristic learning (Ratliff et al., 2006b). Figure 1 depicts three robotic platfor... |

15 | Imitation learning for locomotion and manipulation
- Ratliff, Bagnell, et al.
- 1987
(Show Context)
Citation Context ...ange of real-world structured prediction and imitation learning problems including programming by demonstration (Ratliff et al., 2006a; Silver et al., 2008), LADAR classification (Munoz et al., 2008; =-=Ratliff et al., 2007-=-a), sequence prediction (Ratliff et al., 2007a), and heuristic learning (Ratliff et al., 2006b). Figure 1 depicts three robotic platforms on which these algorithms have been applied. LEARCH addresses ... |

8 |
The crusher system for autonomous navigation,” in AUVSIs Unmanned Systems North America
- Stentz, Bares, et al.
- 2007
(Show Context)
Citation Context ...rformance. While planning algorithms have shown success in many real-world applications ranging from legged locomotion (Chestnutt et al., 2003) to outdoor unstructured navigation (Kelly et al., 2004; =-=Stentz et al., 2007-=-), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recen... |

6 |
Probability – The Logic of Science
- Jaynes
- 2003
(Show Context)
Citation Context ..., (Ziebart et al., 2008) resolved these difficulties in the feature count matching approach by connecting the inverse optimal control problem to probabilistic modeling and the Maximum Entropy Method (=-=Jaynes, 2003-=-) and provided a unique, optimal, stochastic policy. Further, while defining inverse reinforcement learning as the problem of matching cumulative feature counts is sensible when the reward function is... |

4 |
Autonomous navigation for complex terrain,” Carnegie Mellon Robotics Institute Technical Report, manuscript in preparation
- Stentz
(Show Context)
Citation Context ...rformance. While planning algorithms have shown success in many real-world applications ranging from legged locomotion (Chestnutt et al., 2003) to outdoor unstructured navigation (Kelly et al., 2004; =-=Stentz, 2009-=-), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recen... |

4 | Quality assessment of traversability maps from aerial lidar data for an unmanned ground vehicle
- Vandapel, Donamukkala, et al.
- 2003
(Show Context)
Citation Context ... satellite imagery, digital elevation maps, and even 3-D LiDAR scans. Much of this data is freely available at lower resolutions, and is commercially available at increased resolution. Previous work (=-=Vandapel et al., 2003-=-) (Silver et al., 2006) has demonstrated the effectiveness of overhead data for use in route planning. Overhead imagery, elevation maps, and point clouds are processed into features stored in 2-D grid... |

2 |
Toward reliable autonomous vehicles operating in challenging environments
- Kelly, Amidi, et al.
- 2004
(Show Context)
Citation Context ...oor-quality robot performance. While planning algorithms have shown success in many real-world applications ranging from legged locomotion (Chestnutt et al., 2003) to outdoor unstructured navigation (=-=Kelly et al., 2004-=-; Stentz, 2009), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and pro... |

1 |
A framework for behavioral cloning. Machine Intelligence Agents
- Bain, Sammut
- 1995
(Show Context)
Citation Context ...ately expensive. The machine learning paradigm offers the promise of enabling “programming by demonstration” for developing high-performance robotic systems. Unfortunately, many “behavioral cloning” (=-=Bain & Sammut, 1995-=-; Pomerleau, 1989; LeCun et al., 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern ... |

1 | Contextual classification with functional maxmargin markov networks - Bagnell, D, et al. - 2009 |

1 |
Functional bundle methods. The Learning Workshop
- Ratliff, Bagnell
- 2009
(Show Context)
Citation Context ...e exponentiated functional gradient algorithms have proven successful (Munoz et al., 2009); we are currently developing a functional generalization of bundle methods called Functional Bundle Methods (=-=Ratliff & Bagnell, 2009-=-) that are highly applicable to the structured prediction setting due to their rapid convergence and efficient use of trained classifiers. Finally, we are eager to expand the theoretical understanding... |