Results 11 - 20
of
25
Model Selection for Small Sample Regression
- Machine Learning
, 2000
"... Introduction Model selection is an important ingredient of many machine learning algorithms, in particular when the sample size in small, in order to strike the right tradeo between overtting and undertting. Previous classical results for linear regression are based on an asymptotical analysis. We ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Introduction Model selection is an important ingredient of many machine learning algorithms, in particular when the sample size in small, in order to strike the right tradeo between overtting and undertting. Previous classical results for linear regression are based on an asymptotical analysis. We present a new penalization method for performing model selection for regression that is appropriate even for small samples. Our penalization is based on accurate estimate of the ratio of the expected training error and the expected generalization error, in terms of the expected eigenvalues of the input covariance matrix. 2 Risk of the mean square estimor Given a collection of data (x 1 ; y 1 ); :::; (x n ; yn ), where y i = f(x i ; 0 ) + i and x i ,<
Extensions to Metric-Based Model Selection
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... Metric-based methods have recently been introduced for model selection and regularization, often yielding very significant improvements over the alternatives tried (including cross-validation). All these methods require unlabeled data over which to compare functions and detect gross differences i ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Metric-based methods have recently been introduced for model selection and regularization, often yielding very significant improvements over the alternatives tried (including cross-validation). All these methods require unlabeled data over which to compare functions and detect gross differences in behavior away from the training points. We introduce three new extensions of the metric model selection methods and apply them to feature selection. The first extension takes advantage of the particular case of time-series data in which the task involves prediction with a horizon h. The idea is to use at t the h unlabeled examples that precede t for model selection. The second extension takes advantage of the different error distributions of cross-validation and the metric methods: crossvalidation tends to have a larger variance and is unbiased. A hybrid combining the two model selection methods is rarely beaten by any of the two methods. The third extension deals with the case when unlabeled data is not available at all, using an estimated input density. Experiments are described to study these extensions in the context of capacity control and feature subset selection.
Time Series Learning with Probabilistic Network Composites
- University of Illinois
, 1998
"... The purpose of this research is to extend the theory of uncertain reasoning over time through integrated, multi-strategy learning. Its focus is on decomposable, concept learning problems for classification of spatiotemporal sequences. Systematic methods of task decomposition using attribute-driven m ..."
Abstract
-
Cited by 9 (9 self)
- Add to MetaCart
The purpose of this research is to extend the theory of uncertain reasoning over time through integrated, multi-strategy learning. Its focus is on decomposable, concept learning problems for classification of spatiotemporal sequences. Systematic methods of task decomposition using attribute-driven methods, especially attribute partitioning, are investigated. This leads to a novel and important type of unsupervised learning in which the feature construction (or extraction) step is modified to account for multiple sources of data and to systematically search for embedded temporal patterns. This modified technique is combined with traditional cluster definition methods to provide an effective mechanism for decomposition of time series learning problems. The decomposition process interacts with model selection from a collection of probabilistic models such as temporal artificial neural networks and temporal Bayesian networks. Models are chosen using a new quantitative (metric-based) approach that estimates expected performance of a learning architecture, algorithm, and mixture model on a newly defined subproblem. By mapping subproblems to customized configurations of probabilistic networks for time series learning, a hierarchical, supervised learning system with enhanced generalization quality can be automatically built. The system can improve data fusion
A Multistrategy Approach to Classifier Learning from Time Series
- Machine Learning
, 2000
"... Abstract. We present an approach to inductive concept learning using multiple models for time series. Our objective is to improve the efficiency and accuracy of concept learning by decomposing learning tasks that admit multiple types of learning architectures and mixture estimation methods. The deco ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Abstract. We present an approach to inductive concept learning using multiple models for time series. Our objective is to improve the efficiency and accuracy of concept learning by decomposing learning tasks that admit multiple types of learning architectures and mixture estimation methods. The decomposition method adapts attribute subset selection and constructive induction (cluster definition) to define new subproblems. To these problem definitions, we can apply metricbased model selection to select from a database of learning components, thereby producing a specification for supervised learning using a mixture model. We report positive learning results using temporal artificial neural networks (ANNs), on a synthetic, multiattribute learning problem and on a real-world time series monitoring application.
Simple, robust, scalable semi-supervised learning via expectation regularization
- The 24th International Conference on Machine Learning
, 2007
"... Although semi-supervised learning has been an active area of research, its use in deployed applications is still relatively rare because the methods are often difficult to implement, fragile in tuning, or lacking in scalability. This paper presents expectation regularization, a semi-supervised learn ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Although semi-supervised learning has been an active area of research, its use in deployed applications is still relatively rare because the methods are often difficult to implement, fragile in tuning, or lacking in scalability. This paper presents expectation regularization, a semi-supervised learning method for exponential family parametric models that augments the traditional conditional label-likelihood objective function with an additional term that encourages model predictions on unlabeled data to match certain expectations—such as label priors. The method is extremely easy to implement, scales as well as logistic regression, and can handle non-independent features. We present experiments on five different data sets, showing accuracy improvements over other semi-supervised methods. 1.
An Adaptive Regularization Criterion for Supervised Learning
- Proceedings of ICML'2000
, 2000
"... We introduce a new regularization criterion that exploits unlabeled data to adaptively control hypothesis-complexity in general supervised learning tasks. The technique is based on an abstract metric-space view of supervised learning that has been successfully applied to model selection in pre ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We introduce a new regularization criterion that exploits unlabeled data to adaptively control hypothesis-complexity in general supervised learning tasks. The technique is based on an abstract metric-space view of supervised learning that has been successfully applied to model selection in previous research. The new regularization criterion we introduce involves no free parameters and yet performs well on a variety of regression and conditional density estimation tasks. The only proviso is that sucient unlabeled training data be available. We demonstrate the eectiveness of our approach on learning radial basis functions and polynomials for regression, and learning logistic regression models for conditional density estimation. 1. Introduction In the canonical supervised learning task one is given a training set hx 1 ; y 1 i; :::; hx t ; y t i and attempts to infer a hypothesis function h : X ! Y that achieves a small prediction error err(h(x); y) on future test exampl...
On model selection and the disability of neural networks to Decompose Tasks
, 2002
"... A neural network with fixed topology can be regarded as a parametrization of functions, which decides on the correlations between functional variations when parameters are adapted. We propose an analysis, based on a differential geometry point of view, that allows to calculate these correlations. In ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
A neural network with fixed topology can be regarded as a parametrization of functions, which decides on the correlations between functional variations when parameters are adapted. We propose an analysis, based on a differential geometry point of view, that allows to calculate these correlations. In practise, this describes how one response is unlearned while another is trained. Concerning conventional feed-forward neural networks we find that they generically introduce strong correlations, are predisposed to forgetting, and inappropriate for task decomposition. Perspectives to solve these problems are discussed.
Expected Error Analysis for Model Selection
- International Conference on Machine Learning (ICML
, 1999
"... In order to select a good hypothesis language (or model) from a collection of possible models, one has to assess the generalization performance of the hypothesis which is returned by a learner that is bound to use some particular model. This paper deals with a new and very efficient way of assessing ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In order to select a good hypothesis language (or model) from a collection of possible models, one has to assess the generalization performance of the hypothesis which is returned by a learner that is bound to use some particular model. This paper deals with a new and very efficient way of assessing this generalization performance. We present a new analysis which characterizes the expected generalization error of the hypothesis with least training error in terms of the distribution of error rates of the hypotheses in the model. This distribution can be estimated very efficiently from the data which immediately leads to an efficient model selection algorithm. The analysis predicts learning curves with a very high precision and thus contributes to a better understanding of why and when over-fitting occurs. We present empirical studies (controlled experiments on Boolean decision trees and a large-scale text categorization problem) which show that the model selection algorithm leads to err...
Co-validation: Using model disagreement on unlabeled data to validate classification algorithms
- In NIPS
, 2004
"... In the context of binary classification, we define disagreement as a measure of how often two independently-trained models differ in their classification of unlabeled data. We explore the use of disagreement for error estimation and model selection. We call the procedure co-validation, since the two ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In the context of binary classification, we define disagreement as a measure of how often two independently-trained models differ in their classification of unlabeled data. We explore the use of disagreement for error estimation and model selection. We call the procedure co-validation, since the two models effectively (in)validate one another by comparing results on unlabeled data, which we assume is relatively cheap and plentiful compared to labeled data. We show that per-instance disagreement is an unbiased estimate of the variance of error for that instance. We also show that disagreement provides a lower bound on the prediction (generalization) error, and a tight upper bound on the “variance of prediction error”, or the variance of the average error across instances, where variance is measured across training sets. We present experimental results on several data sets exploring co-validation for error estimation and model selection. The procedure is especially effective in active learning settings, where training sets are not drawn at random and cross validation overestimates error. 1
Estimating the Expected Error of Empirical Minimizers for Model Selection
- In Proceedings of the Fifteenth National Conference on Arti Intelligence
, 1998
"... Model selection [e.g., 1] is considered the problem of choosing a hypothesis language which provides an optimal balance between low empirical error and high structural complexity. In this Abstract, we discuss the intuition of a new, very efficient approach to model selection. Our approach is inheren ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Model selection [e.g., 1] is considered the problem of choosing a hypothesis language which provides an optimal balance between low empirical error and high structural complexity. In this Abstract, we discuss the intuition of a new, very efficient approach to model selection. Our approach is inherently Bayesian [e.g., 2], but instead of using priors on target functions or hypotheses, we talk about priors on error values -- which leads us to a new mathematical characterization of the expected true error. In the setting of classification learning, a learner is given a sample, drawn according to an unknown distribution of labeled instances, and returns the empirical minimizer (the hypothesis with the least empirical error) which has a certain (unknown) true error. If this process is carried out repeatedly, the true error of the empirical minimizer will vary from run to run as the empirical minimizer depends on the (randomly drawn) sample. This induces a distribution of true errors of empirical minimizers, over the possible samples drawn according to the unknown distribution. If this distribution would be known, one could easily derive the expected true error of the empirical minimizer of a model by integrating over this distribution. This would immediately lead to an optimal model selection algorithm: Enumerate the models, calculate the expected error of each model by integrating over the error distribution, and select the model with the least expected error. PAC theory [3] and the VC framework provide worst-case bounds on the chance of drawing a sample such that the true error of the minimizer exceeds some " -- "worst-case" meaning that they hold for any distribution of instances and any concept in a given class. By contrast, we focus on how to determine this distributi...

