Results 1  10
of
13
Online Learning with a Perceptron
 Europhysics Letters
, 1994
"... We study online learning of a linearly separable rule with a simple perceptron. Training utilizes a sequence of uncorrelated, randomly drawn Ndimensional input examples. In the thermodynamic limit the generalization error after training with P such examples can be calculated exactly. For the st ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We study online learning of a linearly separable rule with a simple perceptron. Training utilizes a sequence of uncorrelated, randomly drawn Ndimensional input examples. In the thermodynamic limit the generalization error after training with P such examples can be calculated exactly. For the standard perceptron algorithm it decreases like (N=P ) 1=3 for large P=N , in contrast to the faster (N=P ) 1=2 behavior of the socalled Hebbian learning. Furthermore, we show that a specific parameterfree online scheme, the AdaTron algorithm, gives an asymptotic (N=P )decay of the generalization error. This coincides (up to a constant factor) with the bound for any training process based on random examples, including off line learning. Simulations confirm our results. PACS. 87.10, 02.50, 05.90 A very important feature of Feedforward Neural Networks is their ability to learn a rule from examples [1, 2]. Methods known from Statistical Mechanics have been successfully used to s...
Online Learning with TimeCorrelated Examples
, 1998
"... We study the dynamics of online learning with timecorrelated patterns. In this, we make a distinction between "small" networks and "large" networks. "Small" networks have a finite number of input units and are usually studied using tools from stochastic approximation theory in the limit of small l ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We study the dynamics of online learning with timecorrelated patterns. In this, we make a distinction between "small" networks and "large" networks. "Small" networks have a finite number of input units and are usually studied using tools from stochastic approximation theory in the limit of small learning parameters. "Large" networks have an extensive number of input units. A description in terms of individual weights is no longer useful and tools from statistical mechanics can be applied to compute the evolution of macroscopic order parameters. We give general derivations for both cases, but in the end focus on the effect of correlations on plateaus. Plateaus are long time spans in which the performance of the networks hardly changes. Learning in both "small" and "large" multilayered perceptrons is often hampered by the presence of plateaus. The effect of correlations, however, appears to be quite different: they can have a huge beneficial effect in small networks, but seem to have ...
Online Learning From Finite Training Sets
 Europhysics Letters
, 1998
"... We analyse online gradient descent learning from finite training sets at noninfinitesimal learning rates j for both linear and nonlinear networks. In the linear case, exact results are obtained for the timedependent generalization error of networks with a large number of weights N , trained on p ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We analyse online gradient descent learning from finite training sets at noninfinitesimal learning rates j for both linear and nonlinear networks. In the linear case, exact results are obtained for the timedependent generalization error of networks with a large number of weights N , trained on p = ffN examples. This allows us to study in detail the effects of finite training set size ff on, for example, the optimal choice of learning rate j. We also compare online and offline learning, for respective optimal settings of j at given final learning time. Online learning turns out to be much more robust to input bias and actually outperforms offline learning when such bias is present; for unbiased inputs, online and offline learning perform almost equally well. Our analysis of online learning for nonlinear networks (namely, softcommittee machines), advances the theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; thes...
Learning in TwoLayered Networks with Correlated Examples
, 1997
"... Online learning in layered perceptrons is often hampered by plateaus in the time dependence of the performance. Studies on backpropagation in networks with a small number of input units have revealed that correlations between subsequently presented patterns shorten the length of such plateaus. We s ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Online learning in layered perceptrons is often hampered by plateaus in the time dependence of the performance. Studies on backpropagation in networks with a small number of input units have revealed that correlations between subsequently presented patterns shorten the length of such plateaus. We show how to extend the statistical mechanics framework to quantitatively check the effect of correlations on learning in networks with a large number of input units. The surprisingly compact description we obtain makes it possible to derive properties of onlearning with correlations directly from studies on online learning without correlations. Real World Computing Program y Foundation for Neural Networks 1 Introduction In recent years, considerable progress has been made in the study of online learning [1, 2, 3, 4]. The usual assumption is that presented examples are uncorrelated in time. This assumption is not only unnatural for biological learning systems, but also for artificial...
Statistical dynamics of online independent component analysis
 The Journal of Machine Learning Research, 4:1393
"... The learning dynamics of online independent component analysis is analysed in the limit of large data dimension. We study a simple Hebbian learning algorithm that can be used to separate out a small number of nonGaussian components from a highdimensional data set. The demixing matrix parameters ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The learning dynamics of online independent component analysis is analysed in the limit of large data dimension. We study a simple Hebbian learning algorithm that can be used to separate out a small number of nonGaussian components from a highdimensional data set. The demixing matrix parameters are confined to a Stiefel manifold of tall, orthogonal matrices and we introduce a natural gradient variant of the algorithm which is appropriate to learning on this manifold. For large input dimension the parameter trajectory of both algorithms passes through a sequence of unstable fixed points, each described by a diffusion process in a polynomial potential. Choosing the learning rate too large increases the escape time from each of these fixed points, effectively trapping the learning in a suboptimal state. In order to avoid these trapping states a very low learning rate must be chosen during the learning transient, resulting in learning timescales of O(N 2) or O(N 3) iterations where N is the data dimension. Escape from each suboptimal state results in a sequence of symmetry breaking events as the algorithm learns each source in turn. This is in marked contrast to the learning dynamics displayed by related online learning algorithms for multilayer neural networks and principal component analysis. Although the natural gradient variant of the algorithm has nice asymptotic convergence properties, it has an equivalent transient dynamics to the standard Hebbian algorithm.
Learning Curves of Online Offline Training
"... . The performance of online training is compared with off line or batch training using an unrealizable learning task. In naive off line training this task shows a tendency to strong overfitting on the other hand its optimal training scheme is known. In the regime, where overfitting occurs, on ..."
Abstract
 Add to MetaCart
. The performance of online training is compared with off line or batch training using an unrealizable learning task. In naive off line training this task shows a tendency to strong overfitting on the other hand its optimal training scheme is known. In the regime, where overfitting occurs, online training outperforms batch training quite easily. Asymptotically, offline training is better but if the learning rate is chosen carefully online training remains competitive. 1 Introduction Online training attracted much attention in recent time, for uptodate references see [4]. It is the main advantage of online training that it uses only the last example and does not retrain with the previous ones. Furthermore, theoretical examinations of online learning are easier. Very recent works like [5] claim that optimized online learning schemes yield the same asymptotical convergence rate as offline or batch training. Online learning is very attractive, since it has seve...
A Statistical Theory of Learning
, 1997
"... ; : : : ; xm ) 2 R m $r$^$7$$7O$N?6Iq$$$rCN$C$F$$$k$N$J$i!$ Jl?t`$r$3$NCN<1$K4p$E$$$F7h$a$l$P$h$$$,!$8=
Abstract
 Add to MetaCart
; : : : ; xm ) 2 R m $r<u$1!$=PNO y = (y 1 ; : : : ; y l ) 2 R l $r@8@.$9$k3X=,7O y = f(x; `) (1) $r9M$($k!%$b$7K>$^$7$$7O$N?6Iq$$$rCN$C$F$$$k$N$J$i!$ Jl?t`$r$3$NCN<1$K4p$E$$$F7h$a$l$P$h$$$,!$8=<B LdBj$H$7M?$($i$l$k$N$OItJ,E*$JCN<1$K2a$.$J$$$3$H$,B?$$!%FC$K$3$3$G$O$=$NCN<1$,NcBj$H$$$&7A$GM?$($i $l$k>l9g$r9M$($k$3$H$K$9$k!% $^$:Jl?t$N:GE,@$rDj$a$k$?$a$KB;<:4X?t$rDj5A$9$k!% d(x; y; `); (2) $O3X=,7O` $KF~NOx $,M?$($i$l$?$H$7$F!$$=$NK>$^$7$$=PNO$,y $G$"$C$?>l9g$NB;<:$rI=$9$H$9$k!%0J2<$G $OB;<:d $OJl?t` $K4X$7$FHyJ,2DG=$G$"$k$3$H$r2>Dj$7$F$*$/!%E}7WE*OHAH$K$*$$$F$O!$NcBj$O$"$k3NN(J,I[ P<F55.8
Annealed Online Learning in Multilayer Neural Networks
, 1998
"... In this article we will examine online learning with an annealed learning rate. Annealing the learning rate is necessary if online learning is to reach its optimal solution. With a fixed learning rate, the system will approximate the best solution only up to some fluctuations. These fluctuations are ..."
Abstract
 Add to MetaCart
In this article we will examine online learning with an annealed learning rate. Annealing the learning rate is necessary if online learning is to reach its optimal solution. With a fixed learning rate, the system will approximate the best solution only up to some fluctuations. These fluctuations are proportional to the size of the fixed learning rate. It has been shown that an optimal annealing can make online learning asymptotically efficient meaning asymptotically it learns as fast as possible. These results are until now only realized in very simple networks, like singlelayer perceptrons (section 3). Even the simplest multilayer network, the committee machine, shows an additional symptom, which makes straightforward annealing uneffective. This is because, at the beginning of learning the committee machine is attracted by a metastable, suboptimal solution (section 4). The system stays in this metastable solution for a long time and can only leave it, if the learning rate is not too...
A Dynamical Study of the Generalised Delta Rule
, 2000
"... The generalised delta rule is a powerful nonlinear distributed learning procedure capable of learning arbitrary mappings for artificial neural networks of any topology. Yet, the learning procedure is poorly specified in that it cannot specifically guarantee a solution for all solvable problems. Thi ..."
Abstract
 Add to MetaCart
The generalised delta rule is a powerful nonlinear distributed learning procedure capable of learning arbitrary mappings for artificial neural networks of any topology. Yet, the learning procedure is poorly specified in that it cannot specifically guarantee a solution for all solvable problems. This study focuses on developing a benchmarking procedure for the generalised delta rule that provides a visualisation of the complete dynamics of the procedure and allows optimisation of all the variables that comprise the system functions together. A number of dynamical modes of convergence for the procedure are shown, in particular universal convergence to global error minima. In a number of experiments with small networks, the procedure was found to exhibit regions of universal global convergence for particular system parameters. With each problem examined, a particular value or range of values for the learning rate parameter was found that tunes the network for optimal learning success. In conclusion, it was found that small values of the learning rate parameter are not necessarily optimal for obtaining global convergence. It is further conjectured that feedforward generalised delta rule networks have enough representational capacity to map any combinatorial Boolean logic function, that a convergence proof should exist for these problems, and that the long term behaviour of the procedure should be tractable under universality theory. iii Acknowledgements Thanks to Han, Ian and Nigel for starting it all, and to everyone at Nottingham University for their expert help. I would especially like to thank my examiners, Mark Plumbley and Frank Ritter, for their extensive positive feedback, which was incorporated into this thesis. iv List of Symbols net x Total input to unit x. T ...
Online Learning From Finite Training Sets and
 Neural Computation
, 1998
"... this paper, we give an exact analysis of online learning in a simple model system. Our aim is twofold: (1) to assess how the combination of noninfinitesimal learning rates j and finite training sets (containing ff examples per weight) affects online learning, and (2) to compare the generalization p ..."
Abstract
 Add to MetaCart
this paper, we give an exact analysis of online learning in a simple model system. Our aim is twofold: (1) to assess how the combination of noninfinitesimal learning rates j and finite training sets (containing ff examples per weight) affects online learning, and (2) to compare the generalization performance of online and offline learning. A priori, one Online learning can also be used to learn teacher rules that vary in time. The assumption of an infinite set (or `stream') of training examples is then much more plausible, and in fact necessary for continued adaptation of the student. We do not consider this case in the following