Results 1  10
of
11
Generalization Performance Of Regularized Neural Network Models
 Proceedings of the IEEE Workshop on Neural Networks for Signal Processing IV, Piscataway
, 1994
"... . Architecture optimization is a fundamental problem of neural network modeling. The optimal architecture is defined as the one which minimizes the generalization error. This paper addresses estimation of the generalization performance of regularized, complete neural network models. Regularization n ..."
Abstract

Cited by 31 (8 self)
 Add to MetaCart
(Show Context)
. Architecture optimization is a fundamental problem of neural network modeling. The optimal architecture is defined as the one which minimizes the generalization error. This paper addresses estimation of the generalization performance of regularized, complete neural network models. Regularization normally improves the generalization performance by restricting the model complexity. A formula for the optimal weight decay regularizer is derived. A regularized model may be characterized by an effective number of weights (parameters); however, it is demonstrated that no simple definition is possible. A novel estimator of the average generalization error (called FPER) is suggested and compared to the Final Prediction Error (FPE) and Generalized Prediction Error (GPE) estimators. In addition, comparative numerical studies demonstrate the qualities of the suggested estimator. INTRODUCTION One of the fundamental problems involved in design of neural network models is architecture optimizatio...
Upper and lower bounds on the learning curve for Gaussian processes
 Machine Learning
, 1999
"... In this paper we introduce and illustrate nontrivial upper and lower bounds on the learning curves for onedimensional Gaussian Processes. ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
In this paper we introduce and illustrate nontrivial upper and lower bounds on the learning curves for onedimensional Gaussian Processes.
Design of Neural Network Filters
 Electronics Institute, Technical University of Denmark
, 1993
"... Emnet for n rv rende licentiatafhandling er design af neurale netv rks ltre. Filtre baseret pa neurale netv rk kan ses som udvidelser af det klassiske line re adaptive lter rettet mod modellering af uline re sammenh nge. Hovedv gten l gges pa en neural netv rks implementering af den ikkerekursive, ..."
Abstract

Cited by 21 (12 self)
 Add to MetaCart
Emnet for n rv rende licentiatafhandling er design af neurale netv rks ltre. Filtre baseret pa neurale netv rk kan ses som udvidelser af det klassiske line re adaptive lter rettet mod modellering af uline re sammenh nge. Hovedv gten l gges pa en neural netv rks implementering af den ikkerekursive, uline re adaptive model med additiv st j. Formalet er at klarl gge en r kke faser forbundet med design af neural netv rks arkitekturer med henblik pa at udf re forskellige \blackbox " modellerings opgaver sa som: System identi kation, invers modellering og pr diktion af tidsserier. De v senligste bidrag omfatter: Formulering af en neural netv rks baseret kanonisk lter repr sentation, der danner baggrund for udvikling af et arkitektur klassi kationssystem. I hovedsagen drejer det sig om en skelnen mellem globale og lokale modeller. Dette leder til at en r kke kendte neurale netv rks arkitekturer kan klassi ceres, og yderligere abnes der mulighed for udvikling af helt nye strukturer. I denne sammenh ng ndes en gennemgang af en r kke velkendte arkitekturer. I s rdeleshed l gges der v gt pa behandlingen af multilags perceptron neural netv rket.
Pruning from Adaptive Regularization
 Neural Computation
, 1993
"... Inspired by the recent upsurge of interest in Bayesian methods we consider adaptive regularization. A generalization based scheme for adaptation of regularization parameters is introduced and compared to Bayesian regularization. We show that pruning arises naturally within both adaptive regularizati ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
Inspired by the recent upsurge of interest in Bayesian methods we consider adaptive regularization. A generalization based scheme for adaptation of regularization parameters is introduced and compared to Bayesian regularization. We show that pruning arises naturally within both adaptive regularization schemes. As model example we have chosen the simplest possible: estimating the mean of a random variable with known variance. Marked similarities are found between the two methods in that they both involve a "noise limit", below which they regularize with infinite weight decay, i.e., they prune. However, pruning is not always beneficial. We show explicitly that both methods in some cases may increase the generalization error. This corresponds to situations where the underlying assumptions of the regularizer are poorly matched to the environment. 1
Adaptive Regularization
 In Proceedings of the 1994 IEEE NNSP Workshop
, 1994
"... . Regularization, e.g., in the form of weight decay, is important for training and optimization of neural network architectures. In this work we provide a tool based on asymptotic sampling theory, for iterative estimation of weight decay parameters. The basic idea is to do a gradient descent in the ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
. Regularization, e.g., in the form of weight decay, is important for training and optimization of neural network architectures. In this work we provide a tool based on asymptotic sampling theory, for iterative estimation of weight decay parameters. The basic idea is to do a gradient descent in the estimated generalization error with respect to the regularization parameters. The scheme is implemented in our Designer Net framework for network training and pruning, i.e., is based on the diagonal Hessian approximation. The scheme does not require essential computational overhead in addition to what is needed for training and pruning. The viability of the approach is demonstrated in an experiment concerning prediction of the chaotic MackeyGlass series. We find that the optimized weight decays are relatively large for densely connected networks in the initial pruning phase, while they decrease as pruning proceeds. INTRODUCTION Learning based on the conventional feedforward net may be ana...
Bayesian Averaging is WellTemperated
 Proceedings of NIPS 99
, 1999
"... Bayesian predictions are stochastic just like predictions of any other inference scheme that generalize from a finite sample. While a simple variational argument shows that Bayes averaging is generalization optimal given that the prior matches the teacher parameter distribution the situation is less ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Bayesian predictions are stochastic just like predictions of any other inference scheme that generalize from a finite sample. While a simple variational argument shows that Bayes averaging is generalization optimal given that the prior matches the teacher parameter distribution the situation is less clear if the teacher distribution is unknown. I define a class of averaging procedures, the temperated likelihoods, including both Bayes averaging with a uniform prior and maximum likelihood estimation as special cases. I show that Bayes is generalization optimal in this family for any teacher distribution for two learning problems that are analytically tractable: learning the mean of a Gaussian and asymptotics of smooth learners. 1 Introduction Learning is the stochastic process of generalizing from a random finite sample of data. Often a learning problem has natural quantitative measure of generalization. If a loss function is defined the natural measure is the generalization error, i.e...
Learning and Generalisation in Radial Basis Function Networks
"... The twolayer Radial Basis Function network, with fixed centres of the basis functions, is analysed within a stochastic training paradigm. Various definitions of generalisation error are considered, and two such definitions are employed in deriving generic learning curves and generalisation properti ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The twolayer Radial Basis Function network, with fixed centres of the basis functions, is analysed within a stochastic training paradigm. Various definitions of generalisation error are considered, and two such definitions are employed in deriving generic learning curves and generalisation properties, both with and without a weight decay term. The generalisation error is shown analytically to be related to the evidence and, via the evidence, to the prediction error and free energy. The generalisation behaviour is explored; the generic learning curve is found to be inversely proportional to the number of training pairs presented. Optimisation of training is considered by minimising the generalisation error with respect to the free parameters of the training algorithms. Finally, the effect of the joint activations between hiddenlayer units is examined and shown to speed training. Email: jason@cns.ed.ac.uk 1 1 Introduction Within the context of supervised learning in neural network...
Optimal Stopping and Effective Machine Complexity in Learning
 Advances in Neural Information Processing Systems 6
, 1994
"... We study the problem of when to stop learning a class of feedforward networks  networks with linear outputs neuron and fixed input weights  when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there are in g ..."
Abstract
 Add to MetaCart
We study the problem of when to stop learning a class of feedforward networks  networks with linear outputs neuron and fixed input weights  when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there are in general three distinct phases in the generalization performance in the learning process, and in particular, the network has better generalization performance when learning is stopped at a certain time before the global minimum of the empirical error is reached. A notion of effective size of a machine is defined and used to explain the tradeoff between the complexity of the machine and the training error in the learning process. The study leads naturally to a network size selection criterion, which turns out to be a generalization of Akaike's Information Criterion for the learning process. It is shown that stopping learning before the global minimum of the empirical error has the effect of ne...
ADAPTIVE REGULARIZATION OF NOISY LINEAR INVERSE PROBLEMS
"... In the Bayesian modeling framework there is a close relation between regularization and the prior distribution over parameters. For prior distributions in the exponential family, we show that the optimal hyperparameter, i.e., the optimal strength of regularization, satisfies a simple relation: The ..."
Abstract
 Add to MetaCart
(Show Context)
In the Bayesian modeling framework there is a close relation between regularization and the prior distribution over parameters. For prior distributions in the exponential family, we show that the optimal hyperparameter, i.e., the optimal strength of regularization, satisfies a simple relation: The expectation of the regularization function, i.e., takes the same value in the posterior and prior distribution. We present three examples: two simulations, and application in fMRI neuroimaging. 1. LINEAR INVERSE PROBLEMS Noisy linear inverse problems are of interest in data analysis, e.g., in astronomy, computerized tomography, early vision, electrocardiography, mathematical physics and metrology ([2]). Straightforward solutions in terms of matrix in
Communicated by David MacKay Pruning from Adaptive Regularization
"... Inspired by the recent upsurge of interest in Bayesian methods we consider adaptive regularization. A generalization based scheme for adaptation of regularization parameters is introduced and compared to Bayesian regularization. We show that pruning arises naturally within both adaptive regularizati ..."
Abstract
 Add to MetaCart
(Show Context)
Inspired by the recent upsurge of interest in Bayesian methods we consider adaptive regularization. A generalization based scheme for adaptation of regularization parameters is introduced and compared to Bayesian regularization. We show that pruning arises naturally within both adaptive regularization schemes. As model example we have chosen the simplest possible: estimating the mean of a random variable with known variance. Marked similarities are found between the two methods in that they both involve a ”noise limit, ” below which they regularize with infinite weight decay, i.e., they prune. However, pruning is not always beneficial. We show explicitly that both methods in some cases may increase the generalization error. This corresponds to situations where the underlying assumptions of the regularizer are poorly matched to the environment. 1