Results 1  10
of
13
Prediction risk and architecture selection for neural networks
, 1994
"... Abstract. We describe two important sets of tools for neural network modeling: prediction risk estimation and network architecture selection. Prediction risk is defined as the expected performance of an estimator in predicting new observations. Estimated prediction risk can be used both for estimati ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
Abstract. We describe two important sets of tools for neural network modeling: prediction risk estimation and network architecture selection. Prediction risk is defined as the expected performance of an estimator in predicting new observations. Estimated prediction risk can be used both for estimating the quality of model predictions and for model selection. Prediction risk estimation and model selection are especially important for problems with limited data. Techniques for estimating prediction risk include data resampling algorithms such as nonlinear cross–validation (NCV) and algebraic formulae such as the predicted squared error (PSE) and generalized prediction error (GPE). We show that exhaustive search over the space of network architectures is computationally infeasible even for networks of modest size. This motivates the use of heuristic strategies that dramatically reduce the search complexity. These strategies employ directed search algorithms, such as selecting the number of nodes via sequential network construction (SNC) and pruning inputs and weights via sensitivity based pruning (SBP) and optimal brain damage (OBD) respectively.
Preventing "Overfitting" of CrossValidation Data
 In Proceedings of the Fourteenth International Conference on Machine Learning
, 1997
"... Suppose that, for a learning task, we have to select one hypothesis out of a set of hypotheses (that may, for example, have been generated by multiple applications of a randomized learning algorithm). A common approach is to evaluate each hypothesis in the set on some previously unseen crossvalidat ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
Suppose that, for a learning task, we have to select one hypothesis out of a set of hypotheses (that may, for example, have been generated by multiple applications of a randomized learning algorithm). A common approach is to evaluate each hypothesis in the set on some previously unseen crossvalidation data, and then to select the hypothesis that had the lowest crossvalidation error. But when the crossvalidation data is partially corrupted such as by noise, and if the set of hypotheses we are selecting from is large, then "folklore" also warns about "overfitting" the crossvalidation data [Klockars and Sax, 1986, Tukey, 1949, Tukey, 1953]. In this paper, we explain how this "overfitting" really occurs, and show the surprising result that it can be overcome by selecting a hypothesis with a higher crossvalidation error, over others with lower crossvalidation errors. We give reasons for not selecting the hypothesis with the lowest crossvalidation error, and propose a new algorithm, L...
Flat Minima
, 1997
"... this paper (available on the WorldWide Web; see our home pages) contains pseudocode of an efficient implementation. It is based on fast multiplication of the Hessian and a vector due to Pearlmutter (1994) and Mller (1993). Acknowledgments ..."
Abstract

Cited by 32 (14 self)
 Add to MetaCart
this paper (available on the WorldWide Web; see our home pages) contains pseudocode of an efficient implementation. It is based on fast multiplication of the Hessian and a vector due to Pearlmutter (1994) and Mller (1993). Acknowledgments
Septic Shock Diagnosis by Neural Networks and Rule Based Systems
 in: L.C. Jain: Computational Intelligence Techniques In Medical Diagnosis And Prognosis
, 2001
"... In intensive care units physicians are aware of a high lethality rate of septic shock patients. In this contribution we present typical problems and results of a retrospective, data driven analysis based on two neural network methods applied on the data of two clinical studies. ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
In intensive care units physicians are aware of a high lethality rate of septic shock patients. In this contribution we present typical problems and results of a retrospective, data driven analysis based on two neural network methods applied on the data of two clinical studies.
Density estimation via crossvalidation: Model selection point of view
, 2009
"... The problem of model selection by crossvalidation is addressed in the density estimation framework. Extensively used in practice, crossvalidation (CV) remains poorly understood, especially in the nonasymptotic setting which is the main concern of this work. A recurrent problem with CV is the comp ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
The problem of model selection by crossvalidation is addressed in the density estimation framework. Extensively used in practice, crossvalidation (CV) remains poorly understood, especially in the nonasymptotic setting which is the main concern of this work. A recurrent problem with CV is the computation time it involves. This drawback is overcome here thanks to closedform expressions for the CV estimator of the risk for a broad class of widespread estimators: projection estimators. In order to shed new lights on CV procedures with respect to the cardinality p of the test set, the CV estimator is interpreted as a penalized criterion with a random penalty. For instance, the amount of penalization is shown to increase with p. A theoretical assessment of the CV performance is carried out thanks to two oracle inequalities applying to respectively bounded or squareintegrable densities. For several collections of models, adaptivity results with respect to Hölder and Besov spaces are derived as well.
Flat Minimum Search Finds Simple Nets
, 1994
"... We present a new algorithm for finding low complexity neural networks with high generalization capability. The algorithm searches for a "flat" minimum of the error function. A flat minimum is a large connected region in weightspace where the error remains approximately constant. An MDLbased argume ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We present a new algorithm for finding low complexity neural networks with high generalization capability. The algorithm searches for a "flat" minimum of the error function. A flat minimum is a large connected region in weightspace where the error remains approximately constant. An MDLbased argument shows that flat minima correspond to low expected overfitting. Although our algorithm requires the computation of second order derivatives, it has backprop's order of complexity. Automatically, it effectively prunes units, weights, and input lines. Various experiments with feedforward and recurrent nets are described. In an application to stock market prediction, flat minimum search outperforms (1) conventional backprop, (2) weight decay, (3) "optimal brain surgeon" / "optimal brain damage".
Presentation Graphics
"... KEY WORDS: statistical graphics, charts This paper surveys briefly the history of presentation graphics, principles of usage, and applications. To appear in the International Encyclopedia of the Figure 1 shows a graphic of death rates against birth rates per 100,000 population for 27 selected countr ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
KEY WORDS: statistical graphics, charts This paper surveys briefly the history of presentation graphics, principles of usage, and applications. To appear in the International Encyclopedia of the Figure 1 shows a graphic of death rates against birth rates per 100,000 population for 27 selected countries in a 1990 UN databank (Wilkinson, 1999). The contours in show two concentrations of countries. One, to the left,
Cross Validation Comparison of. . .
"... This paper is an initial report into the work NIST conducted immediately after the COCR conference. As such it is a provisional investigation of database quality. It is reasonable to conclude that the digits of SD3 are cleaner than those of TD1. However the study is not experimentally flawless  it ..."
Abstract
 Add to MetaCart
This paper is an initial report into the work NIST conducted immediately after the COCR conference. As such it is a provisional investigation of database quality. It is reasonable to conclude that the digits of SD3 are cleaner than those of TD1. However the study is not experimentally flawless  it is not a conclusion that the writers of SD3 em wrote characters neater than those of TD1, only that the characters ultimately included in the database are cleaner. One reason for this is that SD3 and TD1, both obtained from fields of full page forms, were arrived at with different segmenters. From a possible 65000 characters on each 500 form set, final numbers of human checked characters were 53449 (SD3) and 58646 (TD1). The SD3 segmentor, an old version, produced 9% fewer isolated characters than the updated model used for TD1, the principal reason for failure being connected characters. If the characters from SD3 that were not segmented resemble the difficult images that putatively characterize TD1 then the difference between the two databases may not be writerletter dependent at all, rather it would be a function of the writerconnectivity that different writer groups use.
Outlier Removal for Prediction of Covariance Matrices  with an Application to Portfolio Optimization
, 2000
"... this paper a simple algorithm to improve the naive prediction with systematic removals of outliers in data is presented. Outlier detection is a well established eld in statistics. For a thorough introduction refer to Barnett, Lewis (1994). Our method is a leaveoneout approachwhich has been previou ..."
Abstract
 Add to MetaCart
this paper a simple algorithm to improve the naive prediction with systematic removals of outliers in data is presented. Outlier detection is a well established eld in statistics. For a thorough introduction refer to Barnett, Lewis (1994). Our method is a leaveoneout approachwhich has been previously (Pena and Yohai (1994)) used for outlier detection in regression problems. However, our algorithm performs repetitive updates of a contamination factor in a waythatto the authors knowledge is novel. Also the application to portfolio optimization is new. Section 2 gives an introduction to the applicable parts of modern portfolio theory. Section 3 investigates the naive prediction empirically and determines a bench mark for the evaluation of the new algorithm which is described in Section 4. The empirical test results are presented in Section 5 and Section 6 concludes the report with a summary of results and conclusions