Results 1  10
of
14
SemiSupervised Learning Literature Survey
, 2006
"... We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter ..."
Abstract

Cited by 447 (8 self)
 Add to MetaCart
We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter excerpt from the author’s
doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest
version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Nonparametric function induction in semisupervised learning
 In Proc. Artificial Intelligence and Statistics
, 2005
"... There has been an increase of interest for semisupervised learning recently, because of the many datasets with large amounts of unlabeled examples and only a few labeled ones. This paper follows up on proposed nonparametric algorithms which provide an estimated continuous label for the given unlabe ..."
Abstract

Cited by 41 (5 self)
 Add to MetaCart
There has been an increase of interest for semisupervised learning recently, because of the many datasets with large amounts of unlabeled examples and only a few labeled ones. This paper follows up on proposed nonparametric algorithms which provide an estimated continuous label for the given unlabeled examples. First, it extends them to function induction algorithms that minimize a regularization criterion applied to an outofsample example, and happen to have the form of Parzen windows regressors. This allows to predict test labels without solving again a linear system of dimension n (the number of unlabeled and labeled training examples), which can cost O(n 3). Second, this function induction procedure gives rise to an efficient approximation of the training process, reducing the linear system to be solved to m ≪ n unknowns, using only a subset of m examples. An improvement of O(n 2 /m 2) in time can thus be obtained. Comparative experiments are presented, showing the good performance of the induction formula and approximation algorithm. 1
Generalization error bounds using unlabeled data
 in Learning Theory: 18th Annual Conference on Learning Theory, COLT 2005
, 2005
"... Abstract. We present two new methods for obtaining generalization error bounds in a semisupervised setting. Both methods are based on approximating the disagreement probability of pairs of classifiers using unlabeled data. The first method works in the realizable case. It suggests how the ERM princ ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Abstract. We present two new methods for obtaining generalization error bounds in a semisupervised setting. Both methods are based on approximating the disagreement probability of pairs of classifiers using unlabeled data. The first method works in the realizable case. It suggests how the ERM principle can be refined using unlabeled data and has provable optimality guarantees when the number of unlabeled examples is large. Furthermore, the technique extends easily to cover active learning. A downside is that the method is of little use in practice due to its limitation to the realizable case. The idea in our second method is to use unlabeled data to transform bounds for randomized classifiers into bounds for simpler deterministic classifiers. As a concrete example of how the general method works in practice, we apply it to a bound based on crossvalidation. The result is a semisupervised bound for classifiers learned based on all the labeled data. The bound is easy to implement and apply and should be tight whenever crossvalidation makes sense. Applying the bound to SVMs on the MNIST benchmark data set gives results that suggest that the bound may be tight enough to be useful in practice. 1
Stability of transductive regression algorithms
 In ICML
, 2008
"... This paper uses the notion of algorithmic stability to derive novel generalization bounds for several families of transductive regression algorithms, both by using convexity and closedform solutions. Our analysis helps compare the stability of these algorithms. It suggests that several existing alg ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
This paper uses the notion of algorithmic stability to derive novel generalization bounds for several families of transductive regression algorithms, both by using convexity and closedform solutions. Our analysis helps compare the stability of these algorithms. It suggests that several existing algorithms might not be stable but prescribes a technique to make them stable. It also reports the results of experiments with local transductive regression demonstrating the benefit of our stability bounds for model selection, in particular for determining the radius of the local neighborhood used by the algorithm. 1.
Trait Selection for Assessing Beef Meat Quality Using NonLinear SVM
, 2004
"... In this paper we show that it is possible to model sensory impressions of consumers about beef meat. This is not a straightforward task; the reason is that when we are aiming to induce a function that maps object descriptions into ratings, we must consider that consumers' ratings are just a way ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
In this paper we show that it is possible to model sensory impressions of consumers about beef meat. This is not a straightforward task; the reason is that when we are aiming to induce a function that maps object descriptions into ratings, we must consider that consumers' ratings are just a way to express their preferences about the products presented in the same testing session. Therefore, we had to use a special purpose SVM polynomial kernel. The training data set used collects the ratings of panels of experts and consumers; the meat was provided by 103 bovines of 7 Spanish breeds with different carcass weights and aging periods.
Model Selection: Beyond the Bayesian/Frequentist Divide
"... The principle of parsimony also known as “Ockham’s razor ” has inspired many theories of model selection. Yet such theories, all making arguments in favor of parsimony, are based on very different premises and have developed distinct methodologies to derive algorithms. We have organized challenges a ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
The principle of parsimony also known as “Ockham’s razor ” has inspired many theories of model selection. Yet such theories, all making arguments in favor of parsimony, are based on very different premises and have developed distinct methodologies to derive algorithms. We have organized challenges and edited a special issue of JMLR and several conference proceedings around the theme of model selection. In this editorial, we revisit the problem of avoiding overfitting in light of the latest results. We note the remarkable convergence of theories as different as Bayesian theory, Minimum Description Length, bias/variance tradeoff, Structural Risk Minimization, and regularization, in some approaches. We also present new and interesting examples of the complementarity of theories leading to hybrid algorithms, neither frequentist, nor Bayesian, or perhaps both frequentist and Bayesian!
Covalidation: Using model disagreement on unlabeled data to validate classification algorithms
 In NIPS
, 2004
"... In the context of binary classification, we define disagreement as a measure of how often two independentlytrained models differ in their classification of unlabeled data. We explore the use of disagreement for error estimation and model selection. We call the procedure covalidation, since the two ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
In the context of binary classification, we define disagreement as a measure of how often two independentlytrained models differ in their classification of unlabeled data. We explore the use of disagreement for error estimation and model selection. We call the procedure covalidation, since the two models effectively (in)validate one another by comparing results on unlabeled data, which we assume is relatively cheap and plentiful compared to labeled data. We show that perinstance disagreement is an unbiased estimate of the variance of error for that instance. We also show that disagreement provides a lower bound on the prediction (generalization) error, and a tight upper bound on the “variance of prediction error”, or the variance of the average error across instances, where variance is measured across training sets. We present experimental results on several data sets exploring covalidation for error estimation and model selection. The procedure is especially effective in active learning settings, where training sets are not drawn at random and cross validation overestimates error. 1
C.J.: Automatic model selection by modelling the distribution of residuals
 In: ECCV 2002, LNCS 2353
, 2002
"... Many problems in computer vision involve a choice of the most suitable model for a set of data. Typically one wishes to choose a model which best represents the data in a way that generalises to unseen data without overfitting. We propose an algorithm in which the quality of a model match can be det ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Many problems in computer vision involve a choice of the most suitable model for a set of data. Typically one wishes to choose a model which best represents the data in a way that generalises to unseen data without overfitting. We propose an algorithm in which the quality of a model match can be determined by calculating how well the distribution of model residuals matches a distribution estimated from the noise on the data. The distribution of residuals has two components the measurement noise, and the noise caused by the uncertainty in the model parameters. If the model is too complex to be supported by the data, then there will be large uncertainty in the parameters. We demonstrate that the algorithm can be used to select appropriate model complexity in a variety of problems, including polynomial fitting, and selecting the number of modes to match a shape model to noisy data. 1
SemiSupervised Model Selection Based on CrossValidation
, 2005
"... We propose a new semisupervised model selection method that is derived by applying the structural risk minimization principle to a recent semisupervised generalization error bound. This bound that we build on is based on the crossvalidation estimate underlying the popular crossvalidation model s ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We propose a new semisupervised model selection method that is derived by applying the structural risk minimization principle to a recent semisupervised generalization error bound. This bound that we build on is based on the crossvalidation estimate underlying the popular crossvalidation model selection heuristic. Thus, the proposed semisupervised method is closely connected to crossvalidation which makes studying these methods side by side very natural. We evaluate the performance of the proposed method and the crossvalidation heuristic empirically on the task of selecting the parameters of support vector machines. The experiments indicate that the models selected by the two methods have roughly the same accuracy. However, whereas the crossvalidation heuristic only proposes which classifier to choose, the semisupervised method provides also a reliable and reasonably tight generalization error guarantee for the chosen classifier. Thus, when unlabeled data is available, the proposed semisupervised method seems to have an advantage when reliable error guarantees are called for. In addition to the empirical evaluation, we also analyze the theoretical properties of the proposed method and prove that under suitable conditions it converges to the optimal model. ii 1
Analyzing Sensory Data Using NonLinear
, 2004
"... The quality of food can be assessed from di#erent points of view. In this paper, we deal with those aspects that can be appreciated through sensory impressions. When we are aiming to induce a function that maps object descriptions into ratings, we must consider that consumers ' ratings are just ..."
Abstract
 Add to MetaCart
The quality of food can be assessed from di#erent points of view. In this paper, we deal with those aspects that can be appreciated through sensory impressions. When we are aiming to induce a function that maps object descriptions into ratings, we must consider that consumers ' ratings are just a way to express their preferences about the products presented in the same testing session. Therefore, we postulate to learn from consumers' preference judgments instead of using an approach based on regression. This requires the use of special purpose kernels and feature subset selection methods. We illustrate the benefits of our approach in two families of realworld data bases.