Results 1 -
7 of
7
A New Metric-Based Approach to Model Selection
- In Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97
, 1997
"... We introduce a new approach to model selection that performs better than the standard complexitypenalization and hold-out error estimation techniques in many cases. The basic idea is to exploit the intrinsic metric structure of a hypothesis space, as determined by the natural distribution of unlabel ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
We introduce a new approach to model selection that performs better than the standard complexitypenalization and hold-out error estimation techniques in many cases. The basic idea is to exploit the intrinsic metric structure of a hypothesis space, as determined by the natural distribution of unlabeled training patterns, and use this metric as a reference to detect whether the empirical error estimates derived from a small (labeled) training sample can be trusted in the region around an empirically optimal hypothesis. Using simple metric intuitions we develop new geometric strategies for detecting overfitting and performing robust yet responsive model selection in spaces of candidate functions. These new metric-based strategies dramatically outperform previous approaches in experimental studies of classical polynomial curve fitting. Moreover, the technique is simple, efficient, and can be applied to most function learning tasks. The only requirement is access to an auxiliary collection ...
Metric-Based Methods for Adaptive Model Selection and Regularization
- Machine Learning
, 2001
"... We present a general approach to model selection and regularization that exploits unlabeled data to adaptively control hypothesis complexity in supervised learning tasks. The idea is to impose a metric structure on hypotheses by determining the discrepancy between their predictions across the di ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
We present a general approach to model selection and regularization that exploits unlabeled data to adaptively control hypothesis complexity in supervised learning tasks. The idea is to impose a metric structure on hypotheses by determining the discrepancy between their predictions across the distribution of unlabeled data. We show how this metric can be used to detect untrustworthy training error estimates, and devise novel model selection strategies that exhibit theoretical guarantees against over-tting (while still avoiding under- tting). We then extend the approach to derive a general training criterion for supervised learning|yielding an adaptive regularization method that uses unlabeled data to automatically set regularization parameters. This new criterion adjusts its regularization level to the specic set of training data received, and performs well on a variety of regression and conditional density estimation tasks. The only proviso for these methods is that s...
Characterizing the Generalization Performance of Model Selection Strategies
- In ICML-97
, 1997
"... : We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential structure of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. This leads to a new understanding o ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
: We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential structure of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. This leads to a new understanding of complexity-penalization methods: First, the penalty terms in effect postulate a particular profile for the variances as a function of model complexity--- if the postulated and true profiles do not match, then systematic under-fitting or over-fitting results, depending on whether the penalty terms are too large or too small. Second, it is usually best to penalize according to the true variances of the task, and therefore no fixed penalization strategy is optimal across all problems. We then use this bias/variance characterization to identify the notion of easy and hard model selection problems. In particular, we show that if the variance profile grows too rapidly in relation to the biases t...
An Adaptive Regularization Criterion for Supervised Learning
- Proceedings of ICML'2000
, 2000
"... We introduce a new regularization criterion that exploits unlabeled data to adaptively control hypothesis-complexity in general supervised learning tasks. The technique is based on an abstract metric-space view of supervised learning that has been successfully applied to model selection in pre ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We introduce a new regularization criterion that exploits unlabeled data to adaptively control hypothesis-complexity in general supervised learning tasks. The technique is based on an abstract metric-space view of supervised learning that has been successfully applied to model selection in previous research. The new regularization criterion we introduce involves no free parameters and yet performs well on a variety of regression and conditional density estimation tasks. The only proviso is that sucient unlabeled training data be available. We demonstrate the eectiveness of our approach on learning radial basis functions and polynomials for regression, and learning logistic regression models for conditional density estimation. 1. Introduction In the canonical supervised learning task one is given a training set hx 1 ; y 1 i; :::; hx t ; y t i and attempts to infer a hypothesis function h : X ! Y that achieves a small prediction error err(h(x); y) on future test exampl...
Ordering and finding the best of K>2 supervised learning algorithms
- IEEE T. Pattern. Anal
, 2006
"... Abstract—Given a data set and a number of supervised learning algorithms, we would like to find the algorithm with the smallest expected error. Existing pairwise tests allow a comparison of two algorithms only; range tests and ANOVA check whether multiple algorithms have the same expected error and ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract—Given a data set and a number of supervised learning algorithms, we would like to find the algorithm with the smallest expected error. Existing pairwise tests allow a comparison of two algorithms only; range tests and ANOVA check whether multiple algorithms have the same expected error and cannot be used for finding the smallest. We propose a methodology, the MultiTest algorithm, whereby we order supervised learning algorithms taking into account 1) the result of pairwise statistical tests on expected error (what the data tells us), and 2) our prior preferences, e.g., due to complexity. We define the problem in graph-theoretic terms and propose an algorithm to find the “best ” learning algorithm in terms of these two criteria, or in the more general case, order learning algorithms in terms of their “goodness. ” Simulation results using five classification algorithms on 30 data sets indicate the utility of the method. Our proposed method can be generalized to regression and other loss functions by using a suitable pairwise test. Index Terms—Machine learning, classifier design and evaluation, experimental design. æ 1
Characterizing the Generalization Performance of Model Selection Strategies
, 1997
"... : We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential structure of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. This leads to a new understanding o ..."
Abstract
- Add to MetaCart
: We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential structure of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. This leads to a new understanding of complexity-penalization methods: First, the penalty terms in effect postulate a particular profile for the variances as a function of model complexity--- if the postulated and true profiles do not match, then systematic under-fitting or over-fitting results, depending on whether the penalty terms are too large or too small. Second, it is usually best to penalize according to the true variances of the task, and therefore no fixed penalization strategy is optimal across all problems. We then use this bias/variance characterization to identify the notion of easy and hard model selection problems. In particular, we show that if the variance profile grows too rapidly in relation to the biases t...
CROSS-VALIDATION FOR SUPERVISED LEARNING
, 2005
"... I would like to thank Prof. Dr. Ethem Alpaydın for his supervision in this Ph.D. study and for his support and advises. I am also grateful to all my friends, my family, my teachers and especially my love AYSEL (ISELL). I would also thank to all Matlab helpers including Oya Aran, Onur Dikmen, Berk Gö ..."
Abstract
- Add to MetaCart
I would like to thank Prof. Dr. Ethem Alpaydın for his supervision in this Ph.D. study and for his support and advises. I am also grateful to all my friends, my family, my teachers and especially my love AYSEL (ISELL). I would also thank to all Matlab helpers including Oya Aran, Onur Dikmen, Berk Gökberk, Itır Karaç, Mehmet Aydın Ula¸s. I also want to thank Arzucan Özgür for her support in printing and submitting this thesis. Without their support, this thesis would not been accomplished.

