Results 1  10
of
31
SemiSupervised Learning Literature Survey
, 2006
"... We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter ..."
Abstract

Cited by 447 (8 self)
 Add to MetaCart
We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter excerpt from the author’s
doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest
version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Semisupervised Learning by Entropy Minimization
"... We consider the semisupervised learning problem, where a decision rule is to be learned from labeled and unlabeled data. In this framework, we motivate minimum entropy regularization, which enables to incorporate unlabeled data in the standard supervised learning. This regularizer can be applied to ..."
Abstract

Cited by 81 (2 self)
 Add to MetaCart
We consider the semisupervised learning problem, where a decision rule is to be learned from labeled and unlabeled data. In this framework, we motivate minimum entropy regularization, which enables to incorporate unlabeled data in the standard supervised learning. This regularizer can be applied to any model of posterior probabilities. Our approach provides a new motivation for some existing semisupervised learning algorithms which are particular or limiting instances of minimum entropy regularization. A series of experiments illustrates that the proposed solution benefits from unlabeled data. The method challenges mixture models when the data are sampled from the distribution class spanned by the generative model. The performances are definitely in favor of minimum entropy regularization when generative models are misspecified, and the weighting of unlabeled data provides robustness to the violation of the “cluster assumption”. Finally, we also illustrate that the method can be far superior to manifold learning in high dimension spaces, and also when the manifolds are generated by moving examples along the discriminating directions.
Generalized expectation criteria for semisupervised learning of conditional random fields
 In In Proc. ACL, pages 870 – 878
, 2008
"... This paper presents a semisupervised training method for linearchain conditional random fields that makes use of labeled features rather than labeled instances. This is accomplished by using generalized expectation criteria to express a preference for parameter settings in which the model’s distri ..."
Abstract

Cited by 64 (8 self)
 Add to MetaCart
This paper presents a semisupervised training method for linearchain conditional random fields that makes use of labeled features rather than labeled instances. This is accomplished by using generalized expectation criteria to express a preference for parameter settings in which the model’s distribution on unlabeled data matches a target distribution. We induce target conditional probability distributions of labels given features from both annotated feature occurrences in context and adhoc feature majority label assignment. The use of generalized expectation criteria allows for a dramatic reduction in annotation time by shifting from traditional instancelabeling to featurelabeling, and the methods presented outperform traditional CRF training and other semisupervised methods when limited human effort is available. 1
SemiSupervised SelfTraining of Object Detection Models
 Seventh IEEE Workshop on Applications of Computer Vision
, 2005
"... The construction of appearancebased object detection systems is timeconsuming and difficult because a large number of training examples must be collected and manually labeled in order to capture variations in object appearance. Semisupervised training is a means for reducing the effort needed to ..."
Abstract

Cited by 63 (0 self)
 Add to MetaCart
The construction of appearancebased object detection systems is timeconsuming and difficult because a large number of training examples must be collected and manually labeled in order to capture variations in object appearance. Semisupervised training is a means for reducing the effort needed to prepare the training set by training the model with a small number of fully labeled examples and an additional set of unlabeled or weakly labeled examples. In this work we present a semisupervised approach to training object detection systems based on selftraining. We implement our approach as a wrapper around the training process of an existing object detector and present empirical results. The key contributions of this empirical study is to demonstrate that a model trained in this manner can achieve results comparable to a model trained in the traditional manner using a much larger set of fully labeled data, and that a training data selection metric that is defined independently of the detector greatly outperforms a selection metric based on the detection confidence generated by the detector.
Regularization and feature selection in leastsquares temporal difference learning (full version). Available at http://ai.stanford.edu/˜kolter
, 2009
"... We consider the task of reinforcement learning with linear value function approximation. Temporal difference algorithms, and in particular the LeastSquares Temporal Difference (LSTD) algorithm, provide a method for learning the parameters of the value function, but when the number of features is la ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
We consider the task of reinforcement learning with linear value function approximation. Temporal difference algorithms, and in particular the LeastSquares Temporal Difference (LSTD) algorithm, provide a method for learning the parameters of the value function, but when the number of features is large this algorithm can overfit to the data and is computationally expensive. In this paper, we propose a regularization framework for the LSTD algorithm that overcomes these difficulties. In particular, we focus on the case of l1 regularization, which is robust to irrelevant features and also serves as a method for feature selection. Although the l1 regularized LSTD solution cannot be expressed as a convex optimization problem, we present an algorithm similar to the Least Angle Regression (LARS) algorithm that can efficiently compute the optimal solution. Finally, we demonstrate the performance of the algorithm experimentally. 1.
On semisupervised classification
 In
, 2005
"... A graphbased prior is proposed for parametric semisupervised classification. The prior utilizes both labelled and unlabelled data; it also integrates features from multiple views of a given sample (e.g., multiple sensors), thus implementing a Bayesian form of cotraining. An EM algorithm for train ..."
Abstract

Cited by 40 (8 self)
 Add to MetaCart
A graphbased prior is proposed for parametric semisupervised classification. The prior utilizes both labelled and unlabelled data; it also integrates features from multiple views of a given sample (e.g., multiple sensors), thus implementing a Bayesian form of cotraining. An EM algorithm for training the classifier automatically adjusts the tradeoff between the contributions of: (a) the labelled data; (b) the unlabelled data; and (c) the cotraining information. Active label query selection is performed using a mutual information based criterion that explicitly uses the unlabelled data and the cotraining information. Encouraging results are presented on public benchmarks and on measured data from single and multiple sensors. 1
Multiconditional learning: generative/discriminative training for clustering and classification
, 2006
"... This paper presents multiconditional learning (MCL), a training criterion based on a product of multiple conditional likelihoods. When combining the traditional conditional probability of “label given input ” with a generative probability of “input given label ” the later acts as a surprisingly eff ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
This paper presents multiconditional learning (MCL), a training criterion based on a product of multiple conditional likelihoods. When combining the traditional conditional probability of “label given input ” with a generative probability of “input given label ” the later acts as a surprisingly effective regularizer. When applied to models with latent variables, MCL combines the structurediscovery capabilities of generative topic models, such as latent Dirichlet allocation and the exponential family harmonium, with the accuracy and robustness of discriminative classifiers, such as logistic regression and conditional random fields. We present results on several standard text data sets showing significant reductions in classification error due to MCL regularization, and substantial gains in precision and recall due to the latent structure discovered under MCL.
On Transductive Regression
, 2006
"... In many modern largescale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in th ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
In many modern largescale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in that setting. It presents explicit VCdimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classification bounds of Vapnik when applied to classification. It also presents a new transductive regression algorithm inspired by our bound that admits a primal and kernelized closedform solution and deals efficiently with large amounts of unlabeled data. The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. The comparison with other transductive regression algorithms shows that it performs well and that it can scale to large data sets. 1
Efficient GraphBased SemiSupervised Learning of Structured Tagging Models
"... We describe a new scalable algorithm for semisupervised training of conditional random fields (CRF) and its application to partofspeech (POS) tagging. The algorithm uses a similarity graph to encourage similar ngrams to have similar POS tags. We demonstrate the efficacy of our approach on a domai ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
We describe a new scalable algorithm for semisupervised training of conditional random fields (CRF) and its application to partofspeech (POS) tagging. The algorithm uses a similarity graph to encourage similar ngrams to have similar POS tags. We demonstrate the efficacy of our approach on a domain adaptation task, where we assume that we have access to large amounts of unlabeled data from the target domain, but no additional labeled data. The similarity graph is used during training to smooth the state posteriors on the target domain. Standard inference can be used at test time. Our approach is able to scale to very large problems and yields significantly improved target domain accuracy. 1
Distributed Information Regularization on Graphs
"... We provide a principle for semisupervised learning based on optimizing the rate of communicating labels for unlabeled points with side information. ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
We provide a principle for semisupervised learning based on optimizing the rate of communicating labels for unlabeled points with side information.