#### DMCA

## Empirical Performance Assessment of Nonlinear Model Selection Techniques

### Citations

6580 |
Neural Networks for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ...e network where the sum runs over all weights and biases: 1 Ω = 2 ∑ i wi 2 It has been found empirically that a regularizer of this form can lead to significant improvements in network generalization.=-=[1]-=- Prediction Risk measures how well a model predicts the response value of a future observation. It can be estimated either by using resampling methods or algebraically, by using the asymptotic propert... |

190 | The effective number of parameters: An analysis of generalization and regularization in nonlinear learning systems.
- Moody
- 1992
(Show Context)
Citation Context ...ˆ Re s 2 pˆ eff ( λ) 2 where σ is an estimate of the noise variance on the data and the regularization parameter λ controls the effective number of parameters peff(λ) of the solution. As suggested in =-=[6]-=- it is not possible to define a single quantity which expresses the effective number of weights in the model. peff(λ) usually differs from the true number of model parameters p and depends upon the am... |

179 | Network information criterion - determining the number of hidden units for an artificial neural network model
- Murata, Yoshizawa, et al.
- 1994
(Show Context)
Citation Context ...nation of peff(λ) and σ ˆ . The effective number of parameters can then be used in a generalization of the AIC for the case of additive noise, denoted by Murata as NIC (Network Information Criterion) =-=[8]-=-. The underlying idea of NIC is to estimate the deviance for a data set of size n, compensating for the fact that the weights were chosen to fit the training set: NIC n * log( Re ) + 2 * pˆ ( λ) n ε s... |

84 | Prediction Risk and Architecture Selection for Neural Networks.
- Moody
- 1994
(Show Context)
Citation Context ... model selection techniques based on the Minimum Prediction Risk principle in regularized neural networks. Section 2 studies the Generalized Prediction Error for nonlinear systems introduced by Moody =-=[7]-=- which is based upon the notion of the effective number ofsparameters. Since it cannot be directly calculated, algebraic or resampling estimates are reviewed taking into account regularization terms i... |

56 |
Regression and Time Series Model Selection in
- Hurvich, Tsai
- 1989
(Show Context)
Citation Context ...inear models and unbiased nonlinear models, such as Mallow’s CP estimate, the Generalized Cross-Validation (GCV) formula, Akaike’s Final Prediction Error (FPE) and Akaike’s Information Criteria (AIC) =-=[5]-=-, etc. For general nonlinear learning systems which may be biased and may include weight decay or other regularizers Moody [7] was the first to introduce an estimate of Prediction Risk, (2)sthe Genera... |

31 | Generalized performances of regularized neural networks models
- Larsen, Hansen
- 1994
(Show Context)
Citation Context ...del. Algebraic estimates are based on the idea that the resubstitution error εRes is a biased estimate of the Prediction Risk εPR, thus the following equality can be stated: εPR = εRes + Penalty_Term =-=(3)-=- where the penalty-term represents a term which grows with the number of free parameters in the model. Thus, if the model is too simple it will give a large value for the criterion because the residua... |

18 | Statistical ideas for selecting network architectures,” in Neural Networks
- Ripley
- 1997
(Show Context)
Citation Context ...ms suggests the use of regularization techniques, such as weight decay, in order to reduce the variability of the fit, at the cost of bias, since the fitted curve will be smoother than the true curve =-=[9]-=-. Regularization adds a penalty Ω to the error function ε to give: = ε + λΩ εˆ (1) where the decay constant λ controls the extent to which the penalty term Ω influences the form of the solution. In pa... |

9 |
Principles of Neural Model Identification, Selection and Adequacy; With applications to financial econometrics
- Zapranis, Refenes
- 1999
(Show Context)
Citation Context ...d Prediction Risk. While estimating Prediction Risk is important for providing a way of estimating the expected error for predictions made by a model, it is also an important tool for model selection =-=[11]-=-. Despite the huge amount of network theory and the importance of neural networks in applied work, there is still little published work about the assessment on which model selection method works best ... |

4 |
CL, Tsoi AC. What size neural network gives optimal generalization ? Convergence properties of backpropagation. In
- Lawrence, Giles
- 1996
(Show Context)
Citation Context ...ranging from 1 to M. The training algorithm was Levenberg-Marquardt. For a network with H hidden units, the weights for the previously trained network were used to initialise H-1 of the hidden units, =-=(4)-=-swhile the weights for the H th hidden unit were generated from a pseudorandom normal distribution. The decay constant λ was fixed to 0.002. All simulations were performed 1000 times, each time genera... |

1 |
Vitányi P.M.B.: Model Selection for Neural Networks: Comparing MDL and NIC
- Brake, Kok
- 1994
(Show Context)
Citation Context ...eria (AIC) [5], etc. For general nonlinear learning systems which may be biased and may include weight decay or other regularizers Moody [7] was the first to introduce an estimate of Prediction Risk, =-=(2)-=-sthe Generalized Prediction Error (GPE), which for a data sample of size n can be expressed as: ˆ GPE( λ) = ε + 2σˆ Re s 2 pˆ eff ( λ) 2 where σ is an estimate of the noise variance on the data and th... |