Results 1 - 10
of
39
Learning to Search: Functional Gradient Techniques for Imitation Learning
- Autonomous Robots
, 2009
"... Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise o ..."
Abstract
-
Cited by 26 (11 self)
- Add to MetaCart
Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling “programming by demonstration ” for developing high-performance robotic systems. Unfortunately, many “behavioral cloning ” (Bain & Sammut, 1995; Pomerleau, 1989; LeCun et al., 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. These systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poor-quality robot performance. While planning algorithms have shown success in many real-world applications ranging from legged locomotion (Chestnutt et al., 2003) to outdoor unstructured navigation (Kelly et al., 2004; Stentz, 2009), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recently, a set of techniques has been developed that explore learning these functions from expert human demonstration.
Bundle methods for machine learning
- JMLR
"... We present a globally convergent method for regularized risk minimization problems. Our method applies to Support Vector estimation, regression, Gaussian Processes, and any other regularized risk minimization setting which leads to a convex optimization problem. SVMPerf can be shown to be a special ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
We present a globally convergent method for regularized risk minimization problems. Our method applies to Support Vector estimation, regression, Gaussian Processes, and any other regularized risk minimization setting which leads to a convex optimization problem. SVMPerf can be shown to be a special case of our approach. In addition to the unified framework we present tight convergence bounds, which show that our algorithm converges in O(1/ɛ) steps to ɛ precision for general convex problems and in O(log(1/ɛ)) steps for continuously differentiable problems. We demonstrate in experiments the performance of our approach. 1
Bundle Methods for Regularized Risk Minimization
"... A wide variety of machine learning problems can be described as minimizing a regularized risk functional, with different algorithms using different notions of risk and different regularizers. Examples include linear Support Vector Machines (SVMs), Gaussian Processes, Logistic Regression, Conditional ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
A wide variety of machine learning problems can be described as minimizing a regularized risk functional, with different algorithms using different notions of risk and different regularizers. Examples include linear Support Vector Machines (SVMs), Gaussian Processes, Logistic Regression, Conditional Random Fields (CRFs), and Lasso amongst others. This paper describes the theory and implementation of a scalable and modular convex solver which solves all these estimation problems. It can be parallelized on a cluster of workstations, allows for data-locality, and can deal with regularizers such as L1 and L2 penalties. In addition to the unified framework we present tight convergence bounds, which show that our algorithm converges in O(1/ɛ) steps to ɛ precision for general convex problems and in O(log(1/ɛ)) steps for continuously differentiable problems. We demonstrate the performance of our general purpose solver on a variety of publicly available datasets.
Slow learners are fast
- In NIPS
, 2009
"... Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multi-core architectures. In this paper we prove t ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multi-core architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online learning. 1
Directional Associative Markov Network for 3-D Point Cloud Classification
"... In this paper we address the problem of automated three dimensional point cloud interpretation. This problem is important for various tasks from environment modeling to obstacle avoidance for autonomous robot navigation. In addition to locally extracted features, classifiers need to utilize contextu ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
In this paper we address the problem of automated three dimensional point cloud interpretation. This problem is important for various tasks from environment modeling to obstacle avoidance for autonomous robot navigation. In addition to locally extracted features, classifiers need to utilize contextual information in order to perform well. A popular approach to account for context is to utilize the Markov Random Field framework. One recent variant that has successfully been used for the problem considered is the Associative Markov Network (AMN). We extend the AMN model to learn directionality in the clique potentials, resulting in a new anisotropic model that can be efficiently learned using the subgradient method. We validate the proposed approach using data collected from different range sensors and show better performance against standard AMN and Support Vector Machine algorithms. 1.
Onboard Contextual Classification of 3-D Point Clouds with Learned High-order Markov Random Fields
"... Abstract — Contextual reasoning through graphical models such as Markov Random Fields often show superior performance against local classifiers in many domains. Unfortunately, this performance increase is often at the cost of time consuming, memory intensive learning and slow inference at testing ti ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Abstract — Contextual reasoning through graphical models such as Markov Random Fields often show superior performance against local classifiers in many domains. Unfortunately, this performance increase is often at the cost of time consuming, memory intensive learning and slow inference at testing time. Structured prediction for 3-D point cloud classification is one example of such an application. In this paper we present two contributions. First we show how efficient learning of a random field with higher-order cliques can be achieved using subgradient optimization. Second, we present a context approximation using random fields with high-order cliques designed to make this model usable online, onboard a mobile vehicle for environment modeling. We obtained results with the mobile vehicle on a variety of terrains, at 1/3 Hz for a map 25 × 50 meters and a vehicle speed of 1-2 m/s. I.
On the Generalization Ability of Online Strongly Convex Programming Algorithms
"... This paper examines the generalization properties of online convex programming algorithms when the loss function is Lipschitz and strongly convex. Our main result is a sharp bound, that holds with high probability, on the excess risk of the output of an online algorithm in terms of the average regre ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
This paper examines the generalization properties of online convex programming algorithms when the loss function is Lipschitz and strongly convex. Our main result is a sharp bound, that holds with high probability, on the excess risk of the output of an online algorithm in terms of the average regret. This allows one to use recent algorithms with logarithmic cumulative regret guarantees to achieve fast convergence rates for the excess risk with high probability. As a corollary, we characterize the convergence rate of PEGASOS (with high probability), a recently proposed method for solving the SVM optimization problem. 1
Tighter bounds for structured estimation
- PROC. OF ADV. IN NEURAL INF. PROCESSING SYST
, 2008
"... Large-margin structured estimation methods minimize a convex upper bound of loss functions. While they allow for efficient optimization algorithms, these convex formulations are not tight and sacrifice the ability to accurately model the true loss. We present tighter non-convex bounds based on gener ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Large-margin structured estimation methods minimize a convex upper bound of loss functions. While they allow for efficient optimization algorithms, these convex formulations are not tight and sacrifice the ability to accurately model the true loss. We present tighter non-convex bounds based on generalizing the notion of a ramp loss from binary classification to structured estimation. We show that a small modification of existing optimization algorithms suffices to solve this modified problem. On structured prediction tasks such as protein sequence alignment and web page ranking, our algorithm leads to improved accuracy.
Imitation learning for locomotion and manipulation
- IEEE-RAS International Conference on Humanoid Robots
, 2007
"... Abstract — Decision making in robotics often involves computing an optimal action for a given state, where the space of actions under consideration can potentially be large and state dependent. Many of these decision making problems can be naturally formalized in the multi-class classification frame ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Abstract — Decision making in robotics often involves computing an optimal action for a given state, where the space of actions under consideration can potentially be large and state dependent. Many of these decision making problems can be naturally formalized in the multi-class classification framework, where actions are regarded as labels for states. One powerful approach to multi-class classification relies on learning a function that scores each action; action selection is done by returning the action with maximum score. In this work, our interest is in applying recently developed techniques for large non-linear multi-class learning to problems of imitation learning in robotics. In particular, we apply recently developed functional gradient methods for optimizing a structured margin loss function to problems in robot locomotion and manipulation. In the first case, the problem is to predict next footstep locations greedily given the four-foot configuration over a terrain height map, and the second problem is to predict good grasps of complex free-form objects given an approach direction for a robotic hand. I.
No-Regret Reductions for Imitation Learning and Structured Prediction
- In AISTATS
, 2011
"... Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches (Daumé III et al., ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches (Daumé III et al., 2009; Ross and Bagnell, 2010) provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings. We demonstrate that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem. 1

