Results 1  10
of
46
Accelerated training of conditional random fields with stochastic gradient methods
 In ICML
, 2006
"... We apply Stochastic MetaDescent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than lim ..."
Abstract

Cited by 95 (4 self)
 Add to MetaCart
We apply Stochastic MetaDescent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than limitedmemory BFGS, the leading method reported to date. We report results for both exact and inexact inference techniques. 1.
Fast iterative alignment of pose graphs with poor initial estimates
 In IEEE Intl. Conf. on Robotics and Automation (ICRA
, 2006
"... Abstract — A robot exploring an environment can estimate its own motion and the relative positions of features in the environment. Simultaneous Localization and Mapping (SLAM) algorithms attempt to fuse these estimates to produce a map and a robot trajectory. The constraints are generally nonlinear ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
Abstract — A robot exploring an environment can estimate its own motion and the relative positions of features in the environment. Simultaneous Localization and Mapping (SLAM) algorithms attempt to fuse these estimates to produce a map and a robot trajectory. The constraints are generally nonlinear, thus SLAM can be viewed as a nonlinear optimization problem. The optimization can be difficult, due to poor initial estimates arising from odometry data, and due to the size of the state space. We present a fast nonlinear optimization algorithm that rapidly recovers the robot trajectory, even when given a poor initial estimate. Our approach uses a variant of Stochastic Gradient Descent on an alternative statespace representation that has good stability and computational properties. We compare our algorithm to several others, using both real and synthetic data sets.
Fast Curvature MatrixVector Products for SecondOrder Gradient Descent
 Neural Computation
, 2002
"... We propose a generic method for iteratively approximating various secondorder gradient steps  Newton, GaussNewton, LevenbergMarquardt, and natural gradient  in linear time per iteration, using special curvature matrixvector products that can be computed in O(n). Two recent acceleration techn ..."
Abstract

Cited by 38 (14 self)
 Add to MetaCart
We propose a generic method for iteratively approximating various secondorder gradient steps  Newton, GaussNewton, LevenbergMarquardt, and natural gradient  in linear time per iteration, using special curvature matrixvector products that can be computed in O(n). Two recent acceleration techniques for online learning, matrix momentum and stochastic metadescent (SMD), in fact implement this approach. Since both were originally derived by very different routes, this o ers fresh insight into their operation, resulting in further improvements to SMD.
Piecewise pseudolikelihood for efficient CRF training
 In International Conference on Machine Learning (ICML
, 2007
"... Discriminative training of graphical models can be expensive if the variables have large cardinality, even if the graphical structure is tractable. In such cases, pseudolikelihood is an attractive alternative, because its running time is linear in the variable cardinality, but on some data its accur ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
Discriminative training of graphical models can be expensive if the variables have large cardinality, even if the graphical structure is tractable. In such cases, pseudolikelihood is an attractive alternative, because its running time is linear in the variable cardinality, but on some data its accuracy can be poor. Piecewise training (Sutton & McCallum, 2005) can have better accuracy but does not scale as well in the variable cardinality. In this paper, we introduce piecewise pseudolikelihood, which retains the computational efficiency of pseudolikelihood but can have much better accuracy. On several benchmark NLP data sets, piecewise pseudolikelihood has better accuracy than standard pseudolikelihood, and in many cases nearly equivalent to maximum likelihood, with five to ten times less training time than batch CRF training. 1.
Neural network–based colonoscopic diagnosis using on–line learning and differential evolution, Applied Soft Computing 4(2004
 and Differential Evolution, Applied Soft Computing
, 2004
"... ABSTRACT: In this paper, online training of neural networks is investigated in the context of computerassisted colonoscopic diagnosis. A memorybased adaptation of the learning rate for the online Backpropagation is proposed and used to seed an online evolution process that applies a Differentia ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
ABSTRACT: In this paper, online training of neural networks is investigated in the context of computerassisted colonoscopic diagnosis. A memorybased adaptation of the learning rate for the online Backpropagation is proposed and used to seed an online evolution process that applies a Differential Evolution Strategy to (re)adapt the neural network to modified environmental conditions. Our approach looks at online training from the perspective of tracking the changing location of an approximate solution of a patternbased, and, thus, dynamically changing, error function. The proposed hybrid strategy is compared with other standard training methods that have traditionally been used for training neural networks offline. Results in interpreting colonoscopy images and frames of video sequences are promising and suggest that networks trained with this strategy detect malignant regions of interest with accuracy.
On the role of tracking in stationary environments
 Proceedings of the TwentyFourth International Conference on Machine Learning (ICML 2007
, 2007
"... It is often thought that learning algorithms that track the best solution, as opposed to converging to it, are important only on nonstationary problems. We present three results suggesting that this is not so. First we illustrate in a simple concrete example, the Black and White problem, that tracki ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
It is often thought that learning algorithms that track the best solution, as opposed to converging to it, are important only on nonstationary problems. We present three results suggesting that this is not so. First we illustrate in a simple concrete example, the Black and White problem, that tracking can perform better than any converging algorithm on a stationary problem. Second, we show the same point on a larger, more realistic problem, an application of temporaldifference learning to computer Go. Our third result suggests that tracking in stationary problems could be important for metalearning research (e.g., learning to learn, feature selection, transfer). We apply a metalearning algorithm for stepsize adaptation, IDBD (Sutton, 1992a), to the Black and White problem, showing that metalearning has a dramatic longterm effect on performance whereas, on an analogous converging problem, metalearning has only a small secondorder effect. This small result suggests a way of eventually overcoming a major obstacle to metalearning research: the lack of an independent methodology for task selection. 1.
Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure
"... Graphical models are often used “inappropriately,” with approximations in the topology, inference, and prediction. Yet it is still common to train their parameters to approximately maximize training likelihood. We argue that instead, one should seek the parameters that minimize the empirical risk of ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
Graphical models are often used “inappropriately,” with approximations in the topology, inference, and prediction. Yet it is still common to train their parameters to approximately maximize training likelihood. We argue that instead, one should seek the parameters that minimize the empirical risk of the entire imperfect system. We show how to locally optimize this risk using backpropagation and stochastic metadescent. Over a range of syntheticdata problems, compared to the usual practice of choosing approximate MAP parameters, our approach significantly reduces loss on test data, sometimes by an order of magnitude. 1
Fast online policy gradient learning with smd gain vector adaptation
 Advances in Neural Information Processing Systems 18
, 2006
"... Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously illbehaved optimization problems. We improve its robustness and speed of convergence with stochastic metadescent, a gain vector adaptation method that employs fast Hessianvecto ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously illbehaved optimization problems. We improve its robustness and speed of convergence with stochastic metadescent, a gain vector adaptation method that employs fast Hessianvector products. In our experiments the resulting algorithms outperform previously employed online stochastic, offline conjugate, and natural policy gradient methods. 1
Step size adaptation in reproducing kernel Hilbert
, 2006
"... This paper presents an online Support Vector Machine (SVM) that uses the Stochastic MetaDescent (SMD) algorithm to adapt its step size automatically. We formulate the online learning problem as a stochastic gradient descent in Reproducing Kernel Hilbert Space (RKHS) and translate SMD to the nonpara ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
This paper presents an online Support Vector Machine (SVM) that uses the Stochastic MetaDescent (SMD) algorithm to adapt its step size automatically. We formulate the online learning problem as a stochastic gradient descent in Reproducing Kernel Hilbert Space (RKHS) and translate SMD to the nonparametric setting, where its gradient trace parameter is no longer a coefficient vector but an element of the RKHS. We derive efficient updates that allow us to perform the step size adaptation in linear time. We apply the online SVM framework to a variety of loss functions, and in particular show how to handle structured output spaces and achieve efficient online multiclass classification. Experiments show that our algorithm outperforms more primitive methods for setting the gradient step size.