Results 11  20
of
30
Characterizing the Generalization Performance of Model Selection Strategies
 In ICML97
, 1997
"... : We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential structure of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. This leads to a new understanding o ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
: We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential structure of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. This leads to a new understanding of complexitypenalization methods: First, the penalty terms in effect postulate a particular profile for the variances as a function of model complexity if the postulated and true profiles do not match, then systematic underfitting or overfitting results, depending on whether the penalty terms are too large or too small. Second, it is usually best to penalize according to the true variances of the task, and therefore no fixed penalization strategy is optimal across all problems. We then use this bias/variance characterization to identify the notion of easy and hard model selection problems. In particular, we show that if the variance profile grows too rapidly in relation to the biases t...
Model Selection for Small Sample Regression
, 2002
"... Model selection is an important ingredient of many machine learning algorithms, in particular when the sample size in small, in order to strike the right tradeoff between overfitting and underfitting. Previous classical results for linear regression are based on an asymptotic analysis. We present ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Model selection is an important ingredient of many machine learning algorithms, in particular when the sample size in small, in order to strike the right tradeoff between overfitting and underfitting. Previous classical results for linear regression are based on an asymptotic analysis. We present a new penalization method for performing model selection for regression that is appropriate even for small samples. Our penalization is based on an accurate estimator of the ratio of the expected training error and the expected generalization error, in terms of the expected eigenvalues of the input covariance matrix.
Extensions to MetricBased Model Selection
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... Metricbased methods have recently been introduced for model selection and regularization, often yielding very significant improvements over the alternatives tried (including crossvalidation). All these methods require unlabeled data over which to compare functions and detect gross differences i ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Metricbased methods have recently been introduced for model selection and regularization, often yielding very significant improvements over the alternatives tried (including crossvalidation). All these methods require unlabeled data over which to compare functions and detect gross differences in behavior away from the training points. We introduce three new extensions of the metric model selection methods and apply them to feature selection. The first extension takes advantage of the particular case of timeseries data in which the task involves prediction with a horizon h. The idea is to use at t the h unlabeled examples that precede t for model selection. The second extension takes advantage of the different error distributions of crossvalidation and the metric methods: crossvalidation tends to have a larger variance and is unbiased. A hybrid combining the two model selection methods is rarely beaten by any of the two methods. The third extension deals with the case when unlabeled data is not available at all, using an estimated input density. Experiments are described to study these extensions in the context of capacity control and feature subset selection.
Simple, robust, scalable semisupervised learning via expectation regularization
 The 24th International Conference on Machine Learning
, 2007
"... Although semisupervised learning has been an active area of research, its use in deployed applications is still relatively rare because the methods are often difficult to implement, fragile in tuning, or lacking in scalability. This paper presents expectation regularization, a semisupervised learn ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Although semisupervised learning has been an active area of research, its use in deployed applications is still relatively rare because the methods are often difficult to implement, fragile in tuning, or lacking in scalability. This paper presents expectation regularization, a semisupervised learning method for exponential family parametric models that augments the traditional conditional labellikelihood objective function with an additional term that encourages model predictions on unlabeled data to match certain expectationsâ€”such as label priors. The method is extremely easy to implement, scales as well as logistic regression, and can handle nonindependent features. We present experiments on five different data sets, showing accuracy improvements over other semisupervised methods. 1.
A Multistrategy Approach to Classifier Learning from Time Series
 Machine Learning
, 2000
"... Abstract. We present an approach to inductive concept learning using multiple models for time series. Our objective is to improve the efficiency and accuracy of concept learning by decomposing learning tasks that admit multiple types of learning architectures and mixture estimation methods. The deco ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Abstract. We present an approach to inductive concept learning using multiple models for time series. Our objective is to improve the efficiency and accuracy of concept learning by decomposing learning tasks that admit multiple types of learning architectures and mixture estimation methods. The decomposition method adapts attribute subset selection and constructive induction (cluster definition) to define new subproblems. To these problem definitions, we can apply metricbased model selection to select from a database of learning components, thereby producing a specification for supervised learning using a mixture model. We report positive learning results using temporal artificial neural networks (ANNs), on a synthetic, multiattribute learning problem and on a realworld time series monitoring application.
Time Series Learning with Probabilistic Network Composites
 University of Illinois
, 1998
"... The purpose of this research is to extend the theory of uncertain reasoning over time through integrated, multistrategy learning. Its focus is on decomposable, concept learning problems for classification of spatiotemporal sequences. Systematic methods of task decomposition using attributedriven m ..."
Abstract

Cited by 9 (9 self)
 Add to MetaCart
The purpose of this research is to extend the theory of uncertain reasoning over time through integrated, multistrategy learning. Its focus is on decomposable, concept learning problems for classification of spatiotemporal sequences. Systematic methods of task decomposition using attributedriven methods, especially attribute partitioning, are investigated. This leads to a novel and important type of unsupervised learning in which the feature construction (or extraction) step is modified to account for multiple sources of data and to systematically search for embedded temporal patterns. This modified technique is combined with traditional cluster definition methods to provide an effective mechanism for decomposition of time series learning problems. The decomposition process interacts with model selection from a collection of probabilistic models such as temporal artificial neural networks and temporal Bayesian networks. Models are chosen using a new quantitative (metricbased) approach that estimates expected performance of a learning architecture, algorithm, and mixture model on a newly defined subproblem. By mapping subproblems to customized configurations of probabilistic networks for time series learning, a hierarchical, supervised learning system with enhanced generalization quality can be automatically built. The system can improve data fusion
An Adaptive Regularization Criterion for Supervised Learning
 Proceedings of ICML'2000
, 2000
"... We introduce a new regularization criterion that exploits unlabeled data to adaptively control hypothesiscomplexity in general supervised learning tasks. The technique is based on an abstract metricspace view of supervised learning that has been successfully applied to model selection in pre ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We introduce a new regularization criterion that exploits unlabeled data to adaptively control hypothesiscomplexity in general supervised learning tasks. The technique is based on an abstract metricspace view of supervised learning that has been successfully applied to model selection in previous research. The new regularization criterion we introduce involves no free parameters and yet performs well on a variety of regression and conditional density estimation tasks. The only proviso is that sucient unlabeled training data be available. We demonstrate the eectiveness of our approach on learning radial basis functions and polynomials for regression, and learning logistic regression models for conditional density estimation. 1. Introduction In the canonical supervised learning task one is given a training set hx 1 ; y 1 i; :::; hx t ; y t i and attempts to infer a hypothesis function h : X ! Y that achieves a small prediction error err(h(x); y) on future test exampl...
Error Estimation and Model Selection
, 1999
"... Machine learning algorithms search a space of possible hypotheses and estimate the error of each hypotheses using a sample. Most often, the goal of classification tasks is to find a hypothesis with a low true (or generalization) misclassification probability (or error rate); however, only the sample ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Machine learning algorithms search a space of possible hypotheses and estimate the error of each hypotheses using a sample. Most often, the goal of classification tasks is to find a hypothesis with a low true (or generalization) misclassification probability (or error rate); however, only the sample (or empirical) error rate can actually be measured and minimized. The true error rate of the returned hypothesis is unknown but can, for instance, be estimated using cross validation, and very general worstcase bounds can be given. This doctoral dissertation addresses a compound of questions on error assessment and the intimately related selection of a "good" hypothesis language, or learning algorithm, for a given problem. In the first
On model selection and the disability of neural networks to Decompose Tasks
, 2002
"... A neural network with fixed topology can be regarded as a parametrization of functions, which decides on the correlations between functional variations when parameters are adapted. We propose an analysis, based on a differential geometry point of view, that allows to calculate these correlations. In ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
A neural network with fixed topology can be regarded as a parametrization of functions, which decides on the correlations between functional variations when parameters are adapted. We propose an analysis, based on a differential geometry point of view, that allows to calculate these correlations. In practise, this describes how one response is unlearned while another is trained. Concerning conventional feedforward neural networks we find that they generically introduce strong correlations, are predisposed to forgetting, and inappropriate for task decomposition. Perspectives to solve these problems are discussed.
Expected Error Analysis for Model Selection
 International Conference on Machine Learning (ICML
, 1999
"... In order to select a good hypothesis language (or model) from a collection of possible models, one has to assess the generalization performance of the hypothesis which is returned by a learner that is bound to use some particular model. This paper deals with a new and very efficient way of assessing ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
In order to select a good hypothesis language (or model) from a collection of possible models, one has to assess the generalization performance of the hypothesis which is returned by a learner that is bound to use some particular model. This paper deals with a new and very efficient way of assessing this generalization performance. We present a new analysis which characterizes the expected generalization error of the hypothesis with least training error in terms of the distribution of error rates of the hypotheses in the model. This distribution can be estimated very efficiently from the data which immediately leads to an efficient model selection algorithm. The analysis predicts learning curves with a very high precision and thus contributes to a better understanding of why and when overfitting occurs. We present empirical studies (controlled experiments on Boolean decision trees and a largescale text categorization problem) which show that the model selection algorithm leads to err...