The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems
John E. Moody
Department of Computer Science, Yale University
P.O. Box 2158 Yale Station, New Haven, CT 06520-2158
We present an analysis of how the generalization performance (expected test set error) relates to the expected training set error for nonlinear learning systems, such as multilayer perceptrons and radial basis functions. The principal result is the following relationship (computed to second order) between the expected test set and training set errors: hE test ()i 0 hE train ()i + 2oe 2 eff p eff () n : (1) Here, n is the size of the training sample , oe 2 eff is the effective noise variance in the response variable(s), is a regularization or weight decay parameter, and p eff () is the effective number of parameters in the nonlinear model. The expectations h i of training set and test set errors are taken over possible training sets and training and test sets 0 respectively. The effective number of parameters p eff () usually differs from the true number of model parameters p for nonlinear or regularized models; this theoretical conclusion is supported by M...