Results 1 
6 of
6
A New MetricBased Approach to Model Selection
 In Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI97
, 1997
"... We introduce a new approach to model selection that performs better than the standard complexitypenalization and holdout error estimation techniques in many cases. The basic idea is to exploit the intrinsic metric structure of a hypothesis space, as determined by the natural distribution of unlabel ..."
Abstract

Cited by 42 (5 self)
 Add to MetaCart
We introduce a new approach to model selection that performs better than the standard complexitypenalization and holdout error estimation techniques in many cases. The basic idea is to exploit the intrinsic metric structure of a hypothesis space, as determined by the natural distribution of unlabeled training patterns, and use this metric as a reference to detect whether the empirical error estimates derived from a small (labeled) training sample can be trusted in the region around an empirically optimal hypothesis. Using simple metric intuitions we develop new geometric strategies for detecting overfitting and performing robust yet responsive model selection in spaces of candidate functions. These new metricbased strategies dramatically outperform previous approaches in experimental studies of classical polynomial curve fitting. Moreover, the technique is simple, efficient, and can be applied to most function learning tasks. The only requirement is access to an auxiliary collection ...
MetricBased Methods for Adaptive Model Selection and Regularization
 Machine Learning
, 2001
"... We present a general approach to model selection and regularization that exploits unlabeled data to adaptively control hypothesis complexity in supervised learning tasks. The idea is to impose a metric structure on hypotheses by determining the discrepancy between their predictions across the di ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
We present a general approach to model selection and regularization that exploits unlabeled data to adaptively control hypothesis complexity in supervised learning tasks. The idea is to impose a metric structure on hypotheses by determining the discrepancy between their predictions across the distribution of unlabeled data. We show how this metric can be used to detect untrustworthy training error estimates, and devise novel model selection strategies that exhibit theoretical guarantees against overtting (while still avoiding under tting). We then extend the approach to derive a general training criterion for supervised learningyielding an adaptive regularization method that uses unlabeled data to automatically set regularization parameters. This new criterion adjusts its regularization level to the specic set of training data received, and performs well on a variety of regression and conditional density estimation tasks. The only proviso for these methods is that s...
Characterizing the Generalization Performance of Model Selection Strategies
 In ICML97
, 1997
"... : We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential structure of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. This leads to a new understanding o ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
: We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential structure of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. This leads to a new understanding of complexitypenalization methods: First, the penalty terms in effect postulate a particular profile for the variances as a function of model complexity if the postulated and true profiles do not match, then systematic underfitting or overfitting results, depending on whether the penalty terms are too large or too small. Second, it is usually best to penalize according to the true variances of the task, and therefore no fixed penalization strategy is optimal across all problems. We then use this bias/variance characterization to identify the notion of easy and hard model selection problems. In particular, we show that if the variance profile grows too rapidly in relation to the biases t...
Model Selection for Small Sample Regression
, 2002
"... Model selection is an important ingredient of many machine learning algorithms, in particular when the sample size in small, in order to strike the right tradeoff between overfitting and underfitting. Previous classical results for linear regression are based on an asymptotic analysis. We present ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Model selection is an important ingredient of many machine learning algorithms, in particular when the sample size in small, in order to strike the right tradeoff between overfitting and underfitting. Previous classical results for linear regression are based on an asymptotic analysis. We present a new penalization method for performing model selection for regression that is appropriate even for small samples. Our penalization is based on an accurate estimator of the ratio of the expected training error and the expected generalization error, in terms of the expected eigenvalues of the input covariance matrix.
An Adaptive Regularization Criterion for Supervised Learning
 Proceedings of ICML'2000
, 2000
"... We introduce a new regularization criterion that exploits unlabeled data to adaptively control hypothesiscomplexity in general supervised learning tasks. The technique is based on an abstract metricspace view of supervised learning that has been successfully applied to model selection in pre ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We introduce a new regularization criterion that exploits unlabeled data to adaptively control hypothesiscomplexity in general supervised learning tasks. The technique is based on an abstract metricspace view of supervised learning that has been successfully applied to model selection in previous research. The new regularization criterion we introduce involves no free parameters and yet performs well on a variety of regression and conditional density estimation tasks. The only proviso is that sucient unlabeled training data be available. We demonstrate the eectiveness of our approach on learning radial basis functions and polynomials for regression, and learning logistic regression models for conditional density estimation. 1. Introduction In the canonical supervised learning task one is given a training set hx 1 ; y 1 i; :::; hx t ; y t i and attempts to infer a hypothesis function h : X ! Y that achieves a small prediction error err(h(x); y) on future test exampl...
Characterizing the Generalization Performance of Model Selection Strategies
, 1997
"... : We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential structure of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. This leads to a new understanding o ..."
Abstract
 Add to MetaCart
: We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential structure of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. This leads to a new understanding of complexitypenalization methods: First, the penalty terms in effect postulate a particular profile for the variances as a function of model complexity if the postulated and true profiles do not match, then systematic underfitting or overfitting results, depending on whether the penalty terms are too large or too small. Second, it is usually best to penalize according to the true variances of the task, and therefore no fixed penalization strategy is optimal across all problems. We then use this bias/variance characterization to identify the notion of easy and hard model selection problems. In particular, we show that if the variance profile grows too rapidly in relation to the biases t...