Results 1 -
8 of
8
Generalization Performance Of Regularized Neural Network Models
- Proceedings of the IEEE Workshop on Neural Networks for Signal Processing IV, Piscataway
, 1994
"... . Architecture optimization is a fundamental problem of neural network modeling. The optimal architecture is defined as the one which minimizes the generalization error. This paper addresses estimation of the generalization performance of regularized, complete neural network models. Regularization n ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
. Architecture optimization is a fundamental problem of neural network modeling. The optimal architecture is defined as the one which minimizes the generalization error. This paper addresses estimation of the generalization performance of regularized, complete neural network models. Regularization normally improves the generalization performance by restricting the model complexity. A formula for the optimal weight decay regularizer is derived. A regularized model may be characterized by an effective number of weights (parameters); however, it is demonstrated that no simple definition is possible. A novel estimator of the average generalization error (called FPER) is suggested and compared to the Final Prediction Error (FPE) and Generalized Prediction Error (GPE) estimators. In addition, comparative numerical studies demonstrate the qualities of the suggested estimator. INTRODUCTION One of the fundamental problems involved in design of neural network models is architecture optimizatio...
Design of Neural Network Filters
- Electronics Institute, Technical University of Denmark
, 1993
"... Emnet for n rv rende licentiatafhandling er design af neurale netv rks ltre. Filtre baseret pa neurale netv rk kan ses som udvidelser af det klassiske line re adaptive l-ter rettet mod modellering af uline re sammenh nge. Hovedv gten l gges pa en neural netv rks implementering af den ikke-rekursive, ..."
Abstract
-
Cited by 19 (12 self)
- Add to MetaCart
Emnet for n rv rende licentiatafhandling er design af neurale netv rks ltre. Filtre baseret pa neurale netv rk kan ses som udvidelser af det klassiske line re adaptive l-ter rettet mod modellering af uline re sammenh nge. Hovedv gten l gges pa en neural netv rks implementering af den ikke-rekursive, uline re adaptive model med additiv st j. Formalet er at klarl gge en r kke faser forbundet med design af neural netv rks arkitekturer med henblik pa at udf re forskellige \black-box " modellerings opgaver sa som: System identi kation, invers modellering og pr diktion af tidsserier. De v senligste bidrag omfatter: Formulering af en neural netv rks baseret kanonisk lter repr sentation, der danner baggrund for udvikling af et arkitektur klassi kationssystem. I hovedsagen drejer det sig om en skelnen mellem globale og lokale modeller. Dette leder til at en r kke kendte neurale netv rks arkitekturer kan klassi ceres, og yderligere abnes der mulighed for udvikling af helt nye strukturer. I denne sammenh ng ndes en gennemgang af en r kke velkendte arkitekturer. I s rdeleshed l gges der v gt pa behandlingen af multi-lags perceptron neural netv rket.
Upper and lower bounds on the learning curve for Gaussian processes
- Machine Learning
, 1999
"... In this paper we introduce and illustrate non-trivial upper and lower bounds on the learning curves for one-dimensional Gaussian Processes. ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
In this paper we introduce and illustrate non-trivial upper and lower bounds on the learning curves for one-dimensional Gaussian Processes.
Pruning from Adaptive Regularization
- Neural Computation
, 1993
"... Inspired by the recent upsurge of interest in Bayesian methods we consider adaptive regularization. A generalization based scheme for adaptation of regularization parameters is introduced and compared to Bayesian regularization. We show that pruning arises naturally within both adaptive regularizati ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Inspired by the recent upsurge of interest in Bayesian methods we consider adaptive regularization. A generalization based scheme for adaptation of regularization parameters is introduced and compared to Bayesian regularization. We show that pruning arises naturally within both adaptive regularization schemes. As model example we have chosen the simplest possible: estimating the mean of a random variable with known variance. Marked similarities are found between the two methods in that they both involve a "noise limit", below which they regularize with infinite weight decay, i.e., they prune. However, pruning is not always beneficial. We show explicitly that both methods in some cases may increase the generalization error. This corresponds to situations where the underlying assumptions of the regularizer are poorly matched to the environment. 1
Adaptive Regularization
- In Proceedings of the 1994 IEEE NNSP Workshop
, 1994
"... . Regularization, e.g., in the form of weight decay, is important for training and optimization of neural network architectures. In this work we provide a tool based on asymptotic sampling theory, for iterative estimation of weight decay parameters. The basic idea is to do a gradient descent in the ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
. Regularization, e.g., in the form of weight decay, is important for training and optimization of neural network architectures. In this work we provide a tool based on asymptotic sampling theory, for iterative estimation of weight decay parameters. The basic idea is to do a gradient descent in the estimated generalization error with respect to the regularization parameters. The scheme is implemented in our Designer Net framework for network training and pruning, i.e., is based on the diagonal Hessian approximation. The scheme does not require essential computational overhead in addition to what is needed for training and pruning. The viability of the approach is demonstrated in an experiment concerning prediction of the chaotic Mackey-Glass series. We find that the optimized weight decays are relatively large for densely connected networks in the initial pruning phase, while they decrease as pruning proceeds. INTRODUCTION Learning based on the conventional feed-forward net may be ana...
Bayesian Averaging is Well-Temperated
- Proceedings of NIPS 99
, 1999
"... Bayesian predictions are stochastic just like predictions of any other inference scheme that generalize from a finite sample. While a simple variational argument shows that Bayes averaging is generalization optimal given that the prior matches the teacher parameter distribution the situation is less ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Bayesian predictions are stochastic just like predictions of any other inference scheme that generalize from a finite sample. While a simple variational argument shows that Bayes averaging is generalization optimal given that the prior matches the teacher parameter distribution the situation is less clear if the teacher distribution is unknown. I define a class of averaging procedures, the temperated likelihoods, including both Bayes averaging with a uniform prior and maximum likelihood estimation as special cases. I show that Bayes is generalization optimal in this family for any teacher distribution for two learning problems that are analytically tractable: learning the mean of a Gaussian and asymptotics of smooth learners. 1 Introduction Learning is the stochastic process of generalizing from a random finite sample of data. Often a learning problem has natural quantitative measure of generalization. If a loss function is defined the natural measure is the generalization error, i.e...
Learning and Generalisation in Radial Basis Function Networks
"... The two-layer Radial Basis Function network, with fixed centres of the basis functions, is analysed within a stochastic training paradigm. Various definitions of generalisation error are considered, and two such definitions are employed in deriving generic learning curves and generalisation properti ..."
Abstract
- Add to MetaCart
The two-layer Radial Basis Function network, with fixed centres of the basis functions, is analysed within a stochastic training paradigm. Various definitions of generalisation error are considered, and two such definitions are employed in deriving generic learning curves and generalisation properties, both with and without a weight decay term. The generalisation error is shown analytically to be related to the evidence and, via the evidence, to the prediction error and free energy. The generalisation behaviour is explored; the generic learning curve is found to be inversely proportional to the number of training pairs presented. Optimisation of training is considered by minimising the generalisation error with respect to the free parameters of the training algorithms. Finally, the effect of the joint activations between hidden-layer units is examined and shown to speed training. Email: jason@cns.ed.ac.uk 1 1 Introduction Within the context of supervised learning in neural network...
Optimal Stopping and Effective Machine Complexity in Learning
- Advances in Neural Information Processing Systems 6
, 1994
"... We study the problem of when to stop learning a class of feedforward networks -- networks with linear outputs neuron and fixed input weights -- when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there are in g ..."
Abstract
- Add to MetaCart
We study the problem of when to stop learning a class of feedforward networks -- networks with linear outputs neuron and fixed input weights -- when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there are in general three distinct phases in the generalization performance in the learning process, and in particular, the network has better generalization performance when learning is stopped at a certain time before the global minimum of the empirical error is reached. A notion of effective size of a machine is defined and used to explain the trade-off between the complexity of the machine and the training error in the learning process. The study leads naturally to a network size selection criterion, which turns out to be a generalization of Akaike's Information Criterion for the learning process. It is shown that stopping learning before the global minimum of the empirical error has the effect of ne...

