Results 11  20
of
43
Model Selection for Small Sample Regression
, 2002
"... Model selection is an important ingredient of many machine learning algorithms, in particular when the sample size in small, in order to strike the right tradeoff between overfitting and underfitting. Previous classical results for linear regression are based on an asymptotic analysis. We present ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Model selection is an important ingredient of many machine learning algorithms, in particular when the sample size in small, in order to strike the right tradeoff between overfitting and underfitting. Previous classical results for linear regression are based on an asymptotic analysis. We present a new penalization method for performing model selection for regression that is appropriate even for small samples. Our penalization is based on an accurate estimator of the ratio of the expected training error and the expected generalization error, in terms of the expected eigenvalues of the input covariance matrix.
Extensions to MetricBased Model Selection
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... Metricbased methods have recently been introduced for model selection and regularization, often yielding very significant improvements over the alternatives tried (including crossvalidation). All these methods require unlabeled data over which to compare functions and detect gross differences i ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Metricbased methods have recently been introduced for model selection and regularization, often yielding very significant improvements over the alternatives tried (including crossvalidation). All these methods require unlabeled data over which to compare functions and detect gross differences in behavior away from the training points. We introduce three new extensions of the metric model selection methods and apply them to feature selection. The first extension takes advantage of the particular case of timeseries data in which the task involves prediction with a horizon h. The idea is to use at t the h unlabeled examples that precede t for model selection. The second extension takes advantage of the different error distributions of crossvalidation and the metric methods: crossvalidation tends to have a larger variance and is unbiased. A hybrid combining the two model selection methods is rarely beaten by any of the two methods. The third extension deals with the case when unlabeled data is not available at all, using an estimated input density. Experiments are described to study these extensions in the context of capacity control and feature subset selection.
Simple, robust, scalable semisupervised learning via expectation regularization
 The 24th International Conference on Machine Learning
, 2007
"... Although semisupervised learning has been an active area of research, its use in deployed applications is still relatively rare because the methods are often difficult to implement, fragile in tuning, or lacking in scalability. This paper presents expectation regularization, a semisupervised learn ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
Although semisupervised learning has been an active area of research, its use in deployed applications is still relatively rare because the methods are often difficult to implement, fragile in tuning, or lacking in scalability. This paper presents expectation regularization, a semisupervised learning method for exponential family parametric models that augments the traditional conditional labellikelihood objective function with an additional term that encourages model predictions on unlabeled data to match certain expectations—such as label priors. The method is extremely easy to implement, scales as well as logistic regression, and can handle nonindependent features. We present experiments on five different data sets, showing accuracy improvements over other semisupervised methods. 1.
Characterizing the Generalization Performance of Model Selection Strategies
 In ICML97
, 1997
"... : We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential structure of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. This leads to a new understanding o ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
(Show Context)
: We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential structure of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. This leads to a new understanding of complexitypenalization methods: First, the penalty terms in effect postulate a particular profile for the variances as a function of model complexity if the postulated and true profiles do not match, then systematic underfitting or overfitting results, depending on whether the penalty terms are too large or too small. Second, it is usually best to penalize according to the true variances of the task, and therefore no fixed penalization strategy is optimal across all problems. We then use this bias/variance characterization to identify the notion of easy and hard model selection problems. In particular, we show that if the variance profile grows too rapidly in relation to the biases t...
Time Series Learning with Probabilistic Network Composites
 University of Illinois
, 1998
"... The purpose of this research is to extend the theory of uncertain reasoning over time through integrated, multistrategy learning. Its focus is on decomposable, concept learning problems for classification of spatiotemporal sequences. Systematic methods of task decomposition using attributedriven m ..."
Abstract

Cited by 10 (10 self)
 Add to MetaCart
The purpose of this research is to extend the theory of uncertain reasoning over time through integrated, multistrategy learning. Its focus is on decomposable, concept learning problems for classification of spatiotemporal sequences. Systematic methods of task decomposition using attributedriven methods, especially attribute partitioning, are investigated. This leads to a novel and important type of unsupervised learning in which the feature construction (or extraction) step is modified to account for multiple sources of data and to systematically search for embedded temporal patterns. This modified technique is combined with traditional cluster definition methods to provide an effective mechanism for decomposition of time series learning problems. The decomposition process interacts with model selection from a collection of probabilistic models such as temporal artificial neural networks and temporal Bayesian networks. Models are chosen using a new quantitative (metricbased) approach that estimates expected performance of a learning architecture, algorithm, and mixture model on a newly defined subproblem. By mapping subproblems to customized configurations of probabilistic networks for time series learning, a hierarchical, supervised learning system with enhanced generalization quality can be automatically built. The system can improve data fusion
A Multistrategy Approach to Classifier Learning from Time Series
 Machine Learning
, 2000
"... Abstract. We present an approach to inductive concept learning using multiple models for time series. Our objective is to improve the efficiency and accuracy of concept learning by decomposing learning tasks that admit multiple types of learning architectures and mixture estimation methods. The deco ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
Abstract. We present an approach to inductive concept learning using multiple models for time series. Our objective is to improve the efficiency and accuracy of concept learning by decomposing learning tasks that admit multiple types of learning architectures and mixture estimation methods. The decomposition method adapts attribute subset selection and constructive induction (cluster definition) to define new subproblems. To these problem definitions, we can apply metricbased model selection to select from a database of learning components, thereby producing a specification for supervised learning using a mixture model. We report positive learning results using temporal artificial neural networks (ANNs), on a synthetic, multiattribute learning problem and on a realworld time series monitoring application.
An Adaptive Regularization Criterion for Supervised Learning
 Proceedings of ICML'2000
, 2000
"... We introduce a new regularization criterion that exploits unlabeled data to adaptively control hypothesiscomplexity in general supervised learning tasks. The technique is based on an abstract metricspace view of supervised learning that has been successfully applied to model selection in pre ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
We introduce a new regularization criterion that exploits unlabeled data to adaptively control hypothesiscomplexity in general supervised learning tasks. The technique is based on an abstract metricspace view of supervised learning that has been successfully applied to model selection in previous research. The new regularization criterion we introduce involves no free parameters and yet performs well on a variety of regression and conditional density estimation tasks. The only proviso is that sucient unlabeled training data be available. We demonstrate the eectiveness of our approach on learning radial basis functions and polynomials for regression, and learning logistic regression models for conditional density estimation. 1. Introduction In the canonical supervised learning task one is given a training set hx 1 ; y 1 i; :::; hx t ; y t i and attempts to infer a hypothesis function h : X ! Y that achieves a small prediction error err(h(x); y) on future test exampl...
Error Estimation and Model Selection
, 1999
"... Machine learning algorithms search a space of possible hypotheses and estimate the error of each hypotheses using a sample. Most often, the goal of classification tasks is to find a hypothesis with a low true (or generalization) misclassification probability (or error rate); however, only the sample ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Machine learning algorithms search a space of possible hypotheses and estimate the error of each hypotheses using a sample. Most often, the goal of classification tasks is to find a hypothesis with a low true (or generalization) misclassification probability (or error rate); however, only the sample (or empirical) error rate can actually be measured and minimized. The true error rate of the returned hypothesis is unknown but can, for instance, be estimated using cross validation, and very general worstcase bounds can be given. This doctoral dissertation addresses a compound of questions on error assessment and the intimately related selection of a "good" hypothesis language, or learning algorithm, for a given problem. In the first
Covalidation: Using model disagreement on unlabeled data to validate classification algorithms
 In Proceedings of NIPS. Citeseer
, 2004
"... In the context of binary classification, we define disagreement as a measure of how often two independentlytrained models differ in their classification of unlabeled data. We explore the use of disagreement for error estimation and model selection. We call the procedure covalidation, since the t ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
In the context of binary classification, we define disagreement as a measure of how often two independentlytrained models differ in their classification of unlabeled data. We explore the use of disagreement for error estimation and model selection. We call the procedure covalidation, since the two models effectively (in)validate one another by comparing results on unlabeled data, which we assume is relatively cheap and plentiful compared to labeled data. We show that perinstance disagreement is an unbiased estimate of the variance of error for that instance. We also show that disagreement provides a lower bound on the prediction (generalization) error, and a tight upper bound on the “variance of prediction error”, or the variance of the average error across instances, where variance is measured across training sets. We present experimental results on several data sets exploring covalidation for error estimation and model selection. The procedure is especially effective in active learning settings, where training sets are not drawn at random and cross validation overestimates error. 1
Evaluating Model Selection Abilities of Performance Measures
 AAAI06 Workshop on Evaluation Methods for Machine Learning, AAAI
, 2006
"... Model selection is an important task in machine learning and data mining. When using the holdout testing method to do model selection, a consensus in the machine learning community is that the same model selection goal should be used to identify the best model based on available data. However, follo ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Model selection is an important task in machine learning and data mining. When using the holdout testing method to do model selection, a consensus in the machine learning community is that the same model selection goal should be used to identify the best model based on available data. However, following the preliminary work of (Rosset 2004), we show that this is, in general, not true under highly uncertain situations where only very limited data are available. We thoroughly investigate model selection abilities of different measures under highly uncertain situations as we vary model selection goals, learning algorithms and class distributions. The experimental results show that a measure’s model selection ability is relatively stable to the model selection goals and class distributions. However, different learning algorithms call for different measures for model selection. For learning algorithms of SVM and KNN, generally the measures of RMS, SAUC, MXE perform the best. For learning algorithms of decision trees and naive Bayes, generally the measures of RMS, SAUC, MXE, AUC, APR have the best performance.