Results 1  10
of
32
Efficient algorithms for minimizing cross validation error
 In Proceedings of the Eleventh International Conference on Machine Learning
, 1994
"... Model selection is important in many areas of supervised learning. Given a dataset and a set of models for predicting with that dataset, we must choose the model which is expected to best predict future data. In some situations, such as online learning for control of robots or factories, data is che ..."
Abstract

Cited by 128 (6 self)
 Add to MetaCart
Model selection is important in many areas of supervised learning. Given a dataset and a set of models for predicting with that dataset, we must choose the model which is expected to best predict future data. In some situations, such as online learning for control of robots or factories, data is cheap and human expertise costly. Cross validation can then be a highly effective method for automatic model selection. Large scale cross validation search can, however, be computationally expensive. This paper introduces new algorithms to reduce the computational burden of such searches. We show how experimental design methods can achieve this, using a technique similar to a Bayesian version of Kaelbling’s Interval Estimation. Several improvements are then given, including (1) the use of blocking to quickly spot nearidentical models, and (2) schemata search: a new method for quickly finding families of relevant features. Experiments are presented for robot data and noisy synthetic datasets. The new algorithms speed up computation without sacrificing reliability, and in some cases are more reliable than conventional techniques. 1
Improving Regression Estimation: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization
, 1993
"... ..."
Prediction risk and architecture selection for neural networks
, 1994
"... Abstract. We describe two important sets of tools for neural network modeling: prediction risk estimation and network architecture selection. Prediction risk is defined as the expected performance of an estimator in predicting new observations. Estimated prediction risk can be used both for estimati ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
Abstract. We describe two important sets of tools for neural network modeling: prediction risk estimation and network architecture selection. Prediction risk is defined as the expected performance of an estimator in predicting new observations. Estimated prediction risk can be used both for estimating the quality of model predictions and for model selection. Prediction risk estimation and model selection are especially important for problems with limited data. Techniques for estimating prediction risk include data resampling algorithms such as nonlinear cross–validation (NCV) and algebraic formulae such as the predicted squared error (PSE) and generalized prediction error (GPE). We show that exhaustive search over the space of network architectures is computationally infeasible even for networks of modest size. This motivates the use of heuristic strategies that dramatically reduce the search complexity. These strategies employ directed search algorithms, such as selecting the number of nodes via sequential network construction (SNC) and pruning inputs and weights via sensitivity based pruning (SBP) and optimal brain damage (OBD) respectively.
A Model Selection Approach to Assessing the Information in the Term Structure Using Linear Models and Artificial Neural Networks
 Journal of Business and Economic Statistics
, 1992
"... We take a model selection approach to the question of whether forward interest rates are useful in predicting future spot rates, using a variety of outofsample forecastbased model selection criteria: forecast mean squared error, forecast direction accuracy, and forecastbased trading system profi ..."
Abstract

Cited by 53 (12 self)
 Add to MetaCart
We take a model selection approach to the question of whether forward interest rates are useful in predicting future spot rates, using a variety of outofsample forecastbased model selection criteria: forecast mean squared error, forecast direction accuracy, and forecastbased trading system profitability. We also examine the usefulness of a class of novel prediction models called "artificial neural networks," and investigate the issue of appropriate window sizes for rollingwindowbased prediction methods. Results indicate that the premium of the forward rate over the spot rate helps to predict the sign of future changes in the interest rate. Further, model selection based on an insample Schwarz Information Criterion (SIC) does not appear to be a reliable guide to outofsample performance, in the case of shortterm interest rates. Thus, the insample SIC apparently fails to offer a convenient shortcut to true outofsample performance measures. Keywords: Artificial Neural Network...
Smoothing Spline ANOVA with ComponentWise Bayesian "Confidence Intervals"
 Journal of Computational and Graphical Statistics
, 1992
"... We study a multivariate smoothing spline estimate of a function of several variables, based on an ANOVA decomposition as sums of main effect functions (of one variable), twofactor interaction functions (of two variables), etc. We derive the Bayesian "confidence intervals" for the components of this ..."
Abstract

Cited by 44 (17 self)
 Add to MetaCart
We study a multivariate smoothing spline estimate of a function of several variables, based on an ANOVA decomposition as sums of main effect functions (of one variable), twofactor interaction functions (of two variables), etc. We derive the Bayesian "confidence intervals" for the components of this decomposition and demonstrate that, even with multiple smoothing parameters, they can be efficiently computed using the publicly available code RKPACK, which was originally designed just to compute the estimates. We carry out a small Monte Carlo study to see how closely the actual properties of these componentwise confidence intervals match their nominal confidence levels. Lastly, we analyze some lake acidity data as a function of calcium concentration, latitude, and longitude, using both polynomial and thin plate spline main effects in the same model. KEY WORDS: Bayesian "confidence intervals"; Multivariate function estimation; RKPACK; Smoothing spline ANOVA. Chong Gu chong@pop.stat.pur...
A Methodology for Feature Selection Using MultiObjective Genetic Algorithms for Handwritten Digit String Recognition
 International Journal of Pattern Recognition and Artificial Intelligence
, 2003
"... In this paper a methodology for feature selection for the handwritten digit string recognition is proposed. Its novelty lies in the use of a multiobjective genetic algorithm where sensitivity analysis and neural network are employed to allow the use of a representative database to evaluate tness ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
In this paper a methodology for feature selection for the handwritten digit string recognition is proposed. Its novelty lies in the use of a multiobjective genetic algorithm where sensitivity analysis and neural network are employed to allow the use of a representative database to evaluate tness and the use of a validation database to identify the subsets of selected features that provide a good generalization. Some advantages of this approach include the ability to accommodate multiple criteria such as number of features and accuracy of the classier, as well as the capacity to deal with huge databases in order to adequately represent the pattern recognition problem. Comprehensive experiments on the NIST SD19 demonstrate the feasibility of the proposed methodology.
Discriminative Training of Hidden Markov Models
, 1998
"... vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Finding the Best Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Setting the Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Objective Functions 19 3.1 Properties of Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . . . 19 3.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Maximum Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Frame Discrimination . . . . . . . . . . . . . . . . ....
Statistical Ideas for Selecting Network Architectures
 Invited Presentation, Neural Information Processing Systems 8
, 1995
"... Choosing the architecture of a neural network is one of the most important problems in making neural networks practically useful, but accounts of applications usually sweep these details under the carpet. How many hidden units are needed? Should weight decay be used, and if so how much? What type of ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
Choosing the architecture of a neural network is one of the most important problems in making neural networks practically useful, but accounts of applications usually sweep these details under the carpet. How many hidden units are needed? Should weight decay be used, and if so how much? What type of output units should be chosen? And so on. We address these issues within the framework of statistical theory for model choice, which provides a number of workable approximate answers. This paper is principally concerned with architecture selection issues for feedforward neural networks (also known as multilayer perceptrons). Many of the same issues arise in selecting radial basis function networks, recurrent networks and more widely. These problems occur in a much wider context within statistics, and applied statisticians have been selecting and combining models for decades. Two recent discussions are [4, 5]. References [3, 20, 21, 22] discuss neural networks from a statistical perspecti...
Characterizing the Generalization Performance of Model Selection Strategies
 In ICML97
, 1997
"... : We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential structure of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. This leads to a new understanding o ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
: We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential structure of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. This leads to a new understanding of complexitypenalization methods: First, the penalty terms in effect postulate a particular profile for the variances as a function of model complexity if the postulated and true profiles do not match, then systematic underfitting or overfitting results, depending on whether the penalty terms are too large or too small. Second, it is usually best to penalize according to the true variances of the task, and therefore no fixed penalization strategy is optimal across all problems. We then use this bias/variance characterization to identify the notion of easy and hard model selection problems. In particular, we show that if the variance profile grows too rapidly in relation to the biases t...
Feature Selection with Neural Networks
 Behaviormetrika
, 1998
"... Features gathered from the observation of a phenomenon are not all equally informative: some of them may be noisy, correlated or irrelevant. Feature selection aims at selecting a feature set that is relevant for a given task. This problem is complex and remains an important issue in many domains. In ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
Features gathered from the observation of a phenomenon are not all equally informative: some of them may be noisy, correlated or irrelevant. Feature selection aims at selecting a feature set that is relevant for a given task. This problem is complex and remains an important issue in many domains. In the field of neural networks, feature selection has been studied for the last ten years and classical as well as original methods have been employed. This paper is a review of neural network approaches to feature selection. We first briefly introduce baseline statistical methods used in regression and classification. We then describe families of methods which have been developed specifically for neural networks. Representative methods are then compared on different test problems. Keywords Feature Selection, Subset selection, Variable Sensitivity, Sequential Search Sélection de Variables et Réseaux de Neurones Philippe LERAY et Patrick GALLINARI Résumé Les données collectées lors de l'obse...