. In this paper we address the important problem of optimizing regularization parameters in neural network modeling. The suggested optimization scheme is an extended version of the recently presented algorithm [24]. The idea is to minimize an empirical estimate -- like the cross-validation estimate -- of the generalization error with respect to regularization parameters. This is done by employing a simple iterative gradient descent scheme using virtually no additional programming overhead compared to standard training. Experiments with feed-forward neural network models for time series prediction and classification tasks showed the viability and robustness of the algorithm. Moreover, we provided some simple theoretical examples in order to illustrate the potential and limitations of the proposed regularization framework. 1 Introduction Neural networks are flexible tools for time series processing and pattern recognition. By increasing the number of hidden neurons in a 2-layer architec...
|
3359
|
Neural Networks for Pattern Recognition
– Bishop
- 1995
|
|
1409
|
Introduction to the Theory of Neural Computation
– Hertz, Krough, et al.
- 1999
|
|
804
|
System Identification, Theory for the User
– Ljung
- 1987
|
|
617
|
Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Philadeplphia
– Dennis, Schnabel
- 1996
|
|
456
|
Cross-validatory choice and assessment of statistical predictions,Journalofthe
– Stone
- 1974
|
|
361
|
Optimal brain damage
– LeCun, Denker, et al.
- 1990
|
|
306
|
A practical Bayesian framework for back propagation networks
– MacKay
- 1992
|
|
163
|
Predicting the future: A connectionist approach
– Weigend, Huberman, et al.
- 1990
|
|
140
|
Approximation capabilities of multilayer feedforward networks
– Hornik
- 1991
|
|
113
|
Network information criterion-determining the number of hidden units for an artificial neural network model
– Murate, Yoshizawa, et al.
- 1994
|
|
110
|
Simplifying neural networks by soft weight-sharing
– Nowlan, Hinton
- 1992
|
|
81
|
Fitting autoregressive models for prediction
– Akaike
- 1969
|
|
66
|
Prediction risk and architecture selection for neural networks". appears in, From Statistics to Neural Networks
– Moody
|
|
47
|
The predictive sample reuse method with applications
– Geisser
- 1975
|
|
46
|
Bayesian regularization and pruning using a Laplace prior
– Williams
- 1995
|
|
43
|
Asymptotic statistical theory of overtraining and cross-validation
– Amari, Murata, et al.
- 1997
|
|
34
|
Curvature-driven smoothing: a learning algorithm for feedforward networks
– Bishop
- 1993
|
|
29
|
Generalization performance of regularized neural network models
– Larsen, Hansen
- 1994
|
|
27
|
Adaptive radial basis function nonlinearities, and the problem of generalisation
– Lowe
- 1989
|
|
22
|
A bound on the error of cross-validation using the approximation and estimation rates, with consequences for the training-test split
– Kearns
- 1996
|
|
18
|
Design of Neural Network Filters
– Larsen
- 1993
|
|
15
|
Linear Unlearning for Cross-Validation
– Hansen, Larsen
- 1996
|
|
15
|
Empirical Generalization Assessment of Neural Network Models
– Larsen
- 1995
|
|
15
|
Non-Linear System Identification with Neural Networks
– Sjoberg
- 1995
|
|
14
|
Note on free lunches and cross-validation
– Goutte
- 1997
|
|
9
|
Adaptive regularization of neural classifiers
– Andersen, Larsen, et al.
- 1997
|
|
9
|
Smoothing regularizers for projective basis function networks
– Moody, Rgnvaldsson
- 1997
|
|
8
|
Adaptive regularization
– Hansen, Rasmussen, et al.
- 1994
|
|
8
|
A Generalization Error Estimate for Nonlinear Systems
– Larsen
- 1992
|
|
8
|
Design and Regularization of Neural Networks: The Optimal Use of A Validation Set
– Larsen, Hansen, et al.
- 1996
|
|
8
|
System identi cation | theory for the user. Englewood Cli s
– Ljung
- 1987
|
|
8
|
A dynamic neural network architecture by sequential partitioning of the input space
– Shadafan, Niranjan
- 1994
|
|
8
|
Designer networks for time series processing
– Svarer, Hansen, et al.
- 1993
|
|
8
|
The mathematics of search
– Wolpert, Macready
- 1995
|
|
7
|
Adaptive Regularization of Neural Networks Using Conjugate Gradient
– Goutte, Larsen
- 1998
|
|
7
|
A smoothing regularizer for feedforward and recurrent neural networks
– Wu, Moody
- 1996
|
|
6
|
Regularization with a Pruning Prior
– Goutte, Hansen
- 1997
|
|
6
|
Current status of peterson-barney vowel formant data
– Watrous
- 1991
|
|
5
|
Design and Evaluation of Neural Classifiers
– Hintz-Madsen, Pedersen, et al.
- 1996
|
|
3
|
Solla: Optimal Brain Damage
– Cun, Denker
- 1990
|
|
2
|
Control Methods Used in a Study of the Wowels
– Peterson
- 1952
|
|
2
|
No Free Lunch for Cross Validation. Neural Computation 8(7
– Zhu, Rohwer
- 1996
|
|
1
|
Le Cun: Improving Generalization Performance in Character Recognition
– Drucker
- 1991
|
|
1
|
Pruning from Adaptive Regularization. Neural Computation 6
– Hansen, Rasmussen
- 1994
|
|
1
|
Design and Evaluation of Neural Classi ers
– Hintz-Madsen, Pedersen, et al.
- 1996
|
|
1
|
et al. : Optimal Data Set Split Ratio for Empirical Generalization Error Estimates
– Larsen
|
|
1
|
Pedersen: Training Recurrent Networks
– With
- 1997
|
|
1
|
Non-Linear System Identi cation with Neural Networks
– Sjoberg
- 1995
|