Results 1 
5 of
5
Prediction risk and architecture selection for neural networks
, 1994
"... Abstract. We describe two important sets of tools for neural network modeling: prediction risk estimation and network architecture selection. Prediction risk is defined as the expected performance of an estimator in predicting new observations. Estimated prediction risk can be used both for estimati ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
Abstract. We describe two important sets of tools for neural network modeling: prediction risk estimation and network architecture selection. Prediction risk is defined as the expected performance of an estimator in predicting new observations. Estimated prediction risk can be used both for estimating the quality of model predictions and for model selection. Prediction risk estimation and model selection are especially important for problems with limited data. Techniques for estimating prediction risk include data resampling algorithms such as nonlinear cross–validation (NCV) and algebraic formulae such as the predicted squared error (PSE) and generalized prediction error (GPE). We show that exhaustive search over the space of network architectures is computationally infeasible even for networks of modest size. This motivates the use of heuristic strategies that dramatically reduce the search complexity. These strategies employ directed search algorithms, such as selecting the number of nodes via sequential network construction (SNC) and pruning inputs and weights via sensitivity based pruning (SBP) and optimal brain damage (OBD) respectively.
Flat Minima
, 1997
"... this paper (available on the WorldWide Web; see our home pages) contains pseudocode of an efficient implementation. It is based on fast multiplication of the Hessian and a vector due to Pearlmutter (1994) and Mller (1993). Acknowledgments ..."
Abstract

Cited by 32 (14 self)
 Add to MetaCart
this paper (available on the WorldWide Web; see our home pages) contains pseudocode of an efficient implementation. It is based on fast multiplication of the Hessian and a vector due to Pearlmutter (1994) and Mller (1993). Acknowledgments
Structural adaptation and generalization in supervised feedforward networks, d
 Artif. Neural Networks
, 1994
"... This work explores diverse techniques for improving the generalization ability of supervised feedforward neural networks via structural adaptation, and introduces a new network structure with sparse connectivity. Pruning methods which start from a large network and proceed in trimming it until a sa ..."
Abstract

Cited by 31 (22 self)
 Add to MetaCart
This work explores diverse techniques for improving the generalization ability of supervised feedforward neural networks via structural adaptation, and introduces a new network structure with sparse connectivity. Pruning methods which start from a large network and proceed in trimming it until a satisfactory solution is reached, are studied first. Then, construction methods, which build a network from a simple initial configuration, are presented. A survey of related results from the disciplines of function approximation theory, nonparametric statistical inference and estimation theory leads to methods for principled architecture selection and estimation of prediction error. A network based on sparse connectivity is proposed as an alternative approach to adaptive networks. The generalization ability of this network is improved by partly decoupling the outputs. We perform numerical simulations and provide comparative results for both classification and regression problems to show the generalization abilities of the sparse network. 1
Economic Forecasting: Challenges and Neural Network Solutions
 In Proceedings of the International Symposium on Artificial Neural Networks
, 1995
"... Macroeconomic forecasting is a very difficult task due to the lack of an accurate, convincing model of the economy. The most accurate models for economic forecasting, "black box" time series models, assume little about the structure of the economy. Constructing reliable time series models is challen ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Macroeconomic forecasting is a very difficult task due to the lack of an accurate, convincing model of the economy. The most accurate models for economic forecasting, "black box" time series models, assume little about the structure of the economy. Constructing reliable time series models is challenging due to short data series, high noise levels, nonstationarities, and nonlinear effects. This paper describes these challenges and surveys some neural network solutions to them. Important issues include balancing the bias/variance tradeoff and the noise/nonstationarity tradeoff. The methods surveyed include hyperparameter selection (regularization parameter and training window length), input variable selection and pruning, network architecture selection and pruning, new smoothing regularizers, and committee forecasts. Empirical results are presented for forecasting the U.S. Index of Industrial Production. These demonstrate that, relative to conventional linear time series and regression m...
Flat Minimum Search Finds Simple Nets
, 1994
"... We present a new algorithm for finding low complexity neural networks with high generalization capability. The algorithm searches for a "flat" minimum of the error function. A flat minimum is a large connected region in weightspace where the error remains approximately constant. An MDLbased argume ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We present a new algorithm for finding low complexity neural networks with high generalization capability. The algorithm searches for a "flat" minimum of the error function. A flat minimum is a large connected region in weightspace where the error remains approximately constant. An MDLbased argument shows that flat minima correspond to low expected overfitting. Although our algorithm requires the computation of second order derivatives, it has backprop's order of complexity. Automatically, it effectively prunes units, weights, and input lines. Various experiments with feedforward and recurrent nets are described. In an application to stock market prediction, flat minimum search outperforms (1) conventional backprop, (2) weight decay, (3) "optimal brain surgeon" / "optimal brain damage".