Results 11  20
of
25
A Very Fast Learning Method for Neural Networks Based On Sensitivity
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... This paper introduces a learning method for twolayer feedforward neural networks based on sensitivity analysis, which uses a linear training algorithm for each of the two layers. First, random values are assigned to the outputs of the first layer; later, these initial values are updated based on ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
This paper introduces a learning method for twolayer feedforward neural networks based on sensitivity analysis, which uses a linear training algorithm for each of the two layers. First, random values are assigned to the outputs of the first layer; later, these initial values are updated based on sensitivity formulas, which use the weights in each of the layers; the process is repeated until convergence. Since these
MEG Source Localization using an MLP with a Distributed Output Representation
"... We present a system that takes realistic magnetoencephalographic (MEG) signals and localizes a single dipole to reasonable accuracy in real time. At its heart is a multilayer perceptron (MLP) which takes the sensor measurements as inputs, uses one hidden layer, and generates as outputs the amplitude ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
We present a system that takes realistic magnetoencephalographic (MEG) signals and localizes a single dipole to reasonable accuracy in real time. At its heart is a multilayer perceptron (MLP) which takes the sensor measurements as inputs, uses one hidden layer, and generates as outputs the amplitudes of receptive fields holding a distributed representation of the dipole location. We trained this SoftMLP on dipolar sources with real brain noise and converted the network's output into an explicit Cartesian coordinate representation of the dipole location using two different decoding strategies. The proposed SoftMLPs are much more accurate than previous networks which output source locations in Cartesian coordinates. Hybrid SoftMLPstartLM systems, in which the SoftMLP output initializes LevenbergMarquardt, retained their accuracy of 0.28 cm with a decrease in computation time from 36 ms to 30 ms. We apply the SoftMLP localizer to real MEG data separated by a blind source separation algorithm, and compare the SoftMLP dipole locations to those of a conventional system.
A new marginbased criterion for efficient gradient descent
, 2003
"... Abstract. During the last few decades, several papers were published about secondorder optimization methods for gradient descent based learning algorithms. Unfortunately, these methods usually have a cost in time close to O(n 3) per iteration, and O(n 2) in space, where n is the number of parameter ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. During the last few decades, several papers were published about secondorder optimization methods for gradient descent based learning algorithms. Unfortunately, these methods usually have a cost in time close to O(n 3) per iteration, and O(n 2) in space, where n is the number of parameters to optimize, which is intractable with large optimization systems usually found in reallife problems. Moreover, these methods are usually not easy to implement. Many enhancements have also been proposed in order to overcome these problems, but most of them still cost O(n 2) in time per iteration. Instead of trying to solve a hard optimization problem using complex secondorder tricks, we propose to modify the problem itself in order to optimize a simpler one, by simply changing the cost function used during training. Furthermore, we will argue that analyzing the Hessian resulting from the choice of various cost functions is very informative and could help in the design of new machine learning algorithms. For instance, we propose in this paper a version of the Support Vector Machines criterion applied to Multi Layer Perceptrons, which yields very good training and generalization performance in practice. Several empirical comparisons on two benchmark data sets are given to justify this approach. 2 IDIAP–RR 0316 1
A Convergence Analysis of LogLinear Training
"... Loglinear models are widely used probability models for statistical pattern recognition. Typically, loglinear models are trained according to a convex criterion. In recent years, the interest in loglinear models has greatly increased. The optimization of loglinear model parameters is costly and ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Loglinear models are widely used probability models for statistical pattern recognition. Typically, loglinear models are trained according to a convex criterion. In recent years, the interest in loglinear models has greatly increased. The optimization of loglinear model parameters is costly and therefore an important topic, in particular for largescale applications. Different optimization algorithms have been evaluated empirically in many papers. In this work, we analyze the optimization problem analytically and show that the training of loglinear models can be highly illconditioned. We verify our findings on two handwriting tasks. By making use of our convergence analysis, we obtain good results on a largescale continuous handwriting recognition task with a simple and generic approach. 1
Fast robust MEG source localization using MLPs
 Biomag 2002: 13th International Conference on Biomagnetism
, 2002
"... Source localization from MEG data in real time requires algorithms which are robust, fully automatic, and very fast. We present two neural network systems which are able to localize a single dipole to reasonable accuracy within a fraction of a millisecond, even when the signals are contaminated by c ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Source localization from MEG data in real time requires algorithms which are robust, fully automatic, and very fast. We present two neural network systems which are able to localize a single dipole to reasonable accuracy within a fraction of a millisecond, even when the signals are contaminated by considerable noise. The first network is a multilayer perceptron (MLP) which takes the sensor measurements as inputs, uses two hidden layers, and outputs source location in Cartesian coordinates. After training with random dipolar sources contaminated by real noise, localization of a single dipole could be performed within 300 microseconds on an 800 Mhz Athlon workstation, with an average localization error of 1.15 cm. To improve the accuracy to 0.28 cm, one can apply a few iterations of conventional LevenbergMarquardt (LM) minimization using the MLP output as the initial guess. The combined method is about twenty times faster than multistart LM localization with comparable accuracy. In a second network with only one hidden layer, the outputs were the amplitudes of 193 evenly distributed Gaussian functions holding a soft distributed representation of the dipole location. We trained this network on dipolar sources with real noise, and externally converted the network's output into an explicit Cartesian coordinate representation of the dipole location. This new network had an improved localization accuracy of 0.87 cm, while localization time was lengthened to about 800 microseconds.
OnLine Stochastic Functional Smoothing Optimization for Neural Network Training
, 1997
"... : A set of new algorithms based on an online implementation of a well known global optimization strategy based on stochastic functional smoothing are proposed for training neural networks. These algorithms are different from other online global optimization approaches because they use not only fi ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
: A set of new algorithms based on an online implementation of a well known global optimization strategy based on stochastic functional smoothing are proposed for training neural networks. These algorithms are different from other online global optimization approaches because they use not only firstorder, but also secondorder (Hessian) gradient information. Therefore, they have faster convergence than firstorder gradient descent search methods. Convergence and sensitivity analysis of the proposed method are provided. The online algorithms are compared with secondorder gradient method, momentum learning and conjugate gradients in order to claim their consistent and global convergence abilities; and are compared with conventional stochastic global optimization scheme in order to claim their faster learning rate. Computer simulation results are presented to support the analysis. Keywordsstochastic functional smoothing, online algorithms, global convergence, mean square error, ...
Fast Robust SubjectIndependent Magnetoencephalographic Source Localization Using an Artificial Neural Network
 Human Brain Mapping
, 2005
"... We describe a system that localizes a single dipole to reasonable accuracy from noisy magnetoencephalographic (MEG) measurements in real time. At its core is a multilayer perceptron (MLP) trained to map sensor signals and head position to dipole location. Including head position overcomes the pre ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We describe a system that localizes a single dipole to reasonable accuracy from noisy magnetoencephalographic (MEG) measurements in real time. At its core is a multilayer perceptron (MLP) trained to map sensor signals and head position to dipole location. Including head position overcomes the previous need to retrain the MLP for each subject and session. The training dataset was generated by mapping randomly chosen dipoles and head positions through an analytic model and adding noise from real MEG recordings. After training, a localization took 0.7 ms with an average error of 0.90 cm. A few iterations of a LevenbergMarquardt routine using the MLP output as its initial guess took 15 ms and improved accuracy to 0.53 cm, which approaches the natural limit on accuracy imposed by noise. We applied these methods to localize single dipole sources from MEG components isolated by blind source separation and compared the estimated locations to those generated by standard manually assisted commercial software. Hum Brain Mapp 24:2134, 2005. 2004 WileyLiss, Inc.
MEANNORMALIZED STOCHASTIC GRADIENT FOR LARGESCALE DEEP LEARNING
"... Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel secondorder stochastic optimization algorithm. The algorithm is based on analytic results showing that a nonzero mean of features is harmful for the optimization. We prove conver ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel secondorder stochastic optimization algorithm. The algorithm is based on analytic results showing that a nonzero mean of features is harmful for the optimization. We prove convergence of our algorithm in a convex setting. In our experiments we show that our proposed algorithm converges faster than SGD. Further, in contrast to earlier work, our algorithm allows for training models with a factorized structure from scratch. We found this structure to be very useful not only because it accelerates training and decoding, but also because it is a very effective means against overfitting. Combining our proposed optimization algorithm with this model structure, model size can be reduced by a factor of eight and still improvements in recognition error rate are obtained. Additional gains are obtained by improving the Newbob learning rate strategy. Index Terms — deep learning, optimization, speech recognition, LVCSR 1.
An Investigation of the Gradient Descent Process in Neural Networks
, 1996
"... Usually gradient descent is merely a way to find a minimum, abandoned if a more efficient technique is available. Here we investigate the detailed properties of the gradient descent process, and the related topics of how gradients can be computed, what the limitations on gradient descent are, and ..."
Abstract
 Add to MetaCart
(Show Context)
Usually gradient descent is merely a way to find a minimum, abandoned if a more efficient technique is available. Here we investigate the detailed properties of the gradient descent process, and the related topics of how gradients can be computed, what the limitations on gradient descent are, and how the secondorder information that governs the dynamics of gradient descent can be probed. To develop our intuitions, gradient descent is applied to a simple robot arm dynamics compensation problem, using backpropagation on a temporal windows architecture. The results suggest that smooth filters can be easily learned, but that the deterministic gradient descent process can be slow and can exhibit oscillations. Algorithms to compute the gradient of recurrent networks are then surveyed in a general framework, leading to some unifications, a deeper understanding of recurrent networks, and some algorithmic extensions. By regarding deterministic gradient descent as a dynamic system we obtain results concerning its convergence, and a quantitative theory of its behavior
243264). Redwood City, CA: AddisonWesley Publishing. Neural net architectures for temporal sequence processing
, 2007
"... I present a general taxonomy of neural net architectures for processing timevarying patterns. This taxonomy subsumes many existing architectures in the literature, and points to several promising architectures that have yet to be examined. Any architecture that processes timevarying patterns requir ..."
Abstract
 Add to MetaCart
I present a general taxonomy of neural net architectures for processing timevarying patterns. This taxonomy subsumes many existing architectures in the literature, and points to several promising architectures that have yet to be examined. Any architecture that processes timevarying patterns requires two conceptually distinct components: a shortterm memory that holds on to relevant past events and an associator that uses the shortterm memory to classify or predict. My taxonomy is based on a characterization of shortterm memory models along the dimensions of form, content, and adaptability. Experiments on predicting future values of a financial time series (US dollar–Swiss franc exchange rates) are presented using several alternative memory models. The results of these experiments serve as a baseline against which more sophisticated architectures can be compared. Neural networks have proven to be a promising alternative to traditional techniques for nonlinear temporal prediction tasks (e.g., Curtiss, Brandemuehl, & Kreider, 1992; Lapedes & Farber, 1987; Weigend, Huberman, & Rumelhart, 1992). However, temporal prediction is a particularly challenging problem because conventional neural net architectures and algorithms are not well