Results 1  10
of
95
SemiSupervised Learning Literature Survey
, 2006
"... We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter ..."
Abstract

Cited by 760 (8 self)
 Add to MetaCart
(Show Context)
We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter excerpt from the author’s
doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest
version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
CoTraining and Expansion: Towards Bridging Theory and Practice
, 2004
"... Cotraining is a method for combining labeled and unlabeled data when examples can be thought of as containing two distinct sets of features. It has had a number of practical successes, yet previous theoretical analyses have needed very strong assumptions on the data that are unlikely to be sati ..."
Abstract

Cited by 76 (5 self)
 Add to MetaCart
(Show Context)
Cotraining is a method for combining labeled and unlabeled data when examples can be thought of as containing two distinct sets of features. It has had a number of practical successes, yet previous theoretical analyses have needed very strong assumptions on the data that are unlikely to be satisfied in practice.
Understanding the Yarowsky Algorithm
 Computational Linguistics
, 2004
"... This paper analyzes it as optimizing an objective function. More specifically, a number of variants of the Yarowsky algorithm (though not the original algorithm itself ) are shown to optimize either likelihood or a closely related objective function K ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
(Show Context)
This paper analyzes it as optimizing an objective function. More specifically, a number of variants of the Yarowsky algorithm (though not the original algorithm itself ) are shown to optimize either likelihood or a closely related objective function K
Active Learning with Multiple Views
, 2002
"... Active learners alleviate the burden of labeling large amounts of data by detecting and asking the user to label only the most informative examples in the domain. We focus here on active learning for multiview domains, in which there are several disjoint subsets of features (views), each of which i ..."
Abstract

Cited by 53 (1 self)
 Add to MetaCart
(Show Context)
Active learners alleviate the burden of labeling large amounts of data by detecting and asking the user to label only the most informative examples in the domain. We focus here on active learning for multiview domains, in which there are several disjoint subsets of features (views), each of which is sufficient to learn the target concept. In this paper we make several contributions. First, we introduce CoTesting, which is the first approach to multiview active learning. Second, we extend the multiview learning framework by also exploiting weak views, which are adequate only for learning a concept that is more general/specific than the target concept. Finally, we empirically show that CoTesting outperforms existing active learners on a variety of real world domains such as wrapper induction, Web page classification, advertisement removal, and discourse tree parsing. 1.
Efficient coregularised least squares regression
 in ICML’06
, 2006
"... In many applications, unlabelled examples are inexpensive and easy to obtain. Semisupervised approaches try to utilise such examples to reduce the predictive error. In this paper, we investigate a semisupervised least squares regression algorithm based on the colearning approach. Similar to other ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
In many applications, unlabelled examples are inexpensive and easy to obtain. Semisupervised approaches try to utilise such examples to reduce the predictive error. In this paper, we investigate a semisupervised least squares regression algorithm based on the colearning approach. Similar to other semisupervised algorithms, our base algorithm has cubic runtime complexity in the number of unlabelled examples. To be able to handle larger sets of unlabelled examples, we devise a semiparametric variant that scales linearly in the number of unlabelled examples. Experiments show a significant error reduction by coregularisation and a large runtime improvement for the semiparametric approximation. Last but not least, we propose a distributed procedure that can be applied without collecting all data at a single site. 1.
Multiview regression via canonical correlation analysis
 In Proc. of Conference on Learning Theory
, 2007
"... Abstract. In the multiview regression problem, we have a regression problem where the input variable (which is a real vector) can be partitioned into two different views, where it is assumed that either view of the input is sufficient to make accurate predictions — this is essentially (a significan ..."
Abstract

Cited by 45 (6 self)
 Add to MetaCart
(Show Context)
Abstract. In the multiview regression problem, we have a regression problem where the input variable (which is a real vector) can be partitioned into two different views, where it is assumed that either view of the input is sufficient to make accurate predictions — this is essentially (a significantly weaker version of) the cotraining assumption for the regression problem. We provide a semisupervised algorithm which first uses unlabeled data to learn a norm (or, equivalently, a kernel) and then uses labeled data in a ridge regression algorithm (with this induced norm) to provide the predictor. The unlabeled data is used via canonical correlation analysis (CCA, which is a closely related to PCA for two random variables) to derive an appropriate norm over functions. We are able to characterize the intrinsic dimensionality of the subsequent ridge regression problem (which uses this norm) by the correlation coefficients provided by CCA in a rather simple expression. Interestingly, the norm used by the ridge regression algorithm is derived from CCA, unlike in standard kernel methods where a special apriori norm is assumed (i.e. a Banach space is assumed). We discuss how this result shows that unlabeled data can decrease the sample complexity. 1
Semisupervised learning for natural language
 MASTER’S THESIS, MIT
, 2005
"... Statistical supervised learning techniques have been successful for many natural language processing tasks, but they require labeled datasets, which can be expensive to obtain. On the other hand, unlabeled data (raw text) is often available “for free ” in large quantities. Unlabeled data has shown p ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
Statistical supervised learning techniques have been successful for many natural language processing tasks, but they require labeled datasets, which can be expensive to obtain. On the other hand, unlabeled data (raw text) is often available “for free ” in large quantities. Unlabeled data has shown promise in improving the performance of a number of tasks, e.g. word sense disambiguation, information extraction, and natural language parsing. In this thesis, we focus on two segmentation tasks, namedentity recognition and Chinese word segmentation. The goal of namedentity recognition is to detect and classify names of people, organizations, and locations in a sentence. The goal of Chinese word segmentation is to find the word boundaries in a sentence that has been written as a string of characters without spaces. Our approach is as follows: In a preprocessing step, we use raw text to cluster words and calculate mutual information statistics. The output of this step is then used as features in a supervised model, specifically a global linear model trained using
Semisupervised regression with cotraining style algorithms
, 2007
"... The traditional setting of supervised learning requires a large amount of labeled training examples in order to achieve good generalization. However, in many practical applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore, semisup ..."
Abstract

Cited by 43 (8 self)
 Add to MetaCart
(Show Context)
The traditional setting of supervised learning requires a large amount of labeled training examples in order to achieve good generalization. However, in many practical applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore, semisupervised learning has attracted much attention. Previous research on semisupervised learning mainly focuses on semisupervised classification. Although regression is almost as important as classification, semisupervised regression is largely understudied. In particular, although cotraining is a main paradigm in semisupervised learning, few works has been devoted to cotraining style semisupervised regression algorithms. In this paper, a cotraining style semisupervised regression algorithm, i.e. COREG, is proposed. This algorithm uses two regressors each labels the unlabeled data for the other regressor, where the confidence in labeling an unlabeled example is estimated through the amount of reduction in mean square error over the labeled neighborhood of that example. Analysis and experiments show that COREG can effectively exploit unlabeled data to improve regression estimates.
Weakly Supervised Natural Language Learning Without Redundant Views
 In Proceedings of HLTNAACL
, 2003
"... We investigate singleview algorithms as an alternative to multiview algorithms for weakly supervised learning for natural language processing tasks without a natural feature split. In particular, we apply cotraining, selftraining, and EM to one such task and find that both selftraining and FSEM ..."
Abstract

Cited by 42 (6 self)
 Add to MetaCart
We investigate singleview algorithms as an alternative to multiview algorithms for weakly supervised learning for natural language processing tasks without a natural feature split. In particular, we apply cotraining, selftraining, and EM to one such task and find that both selftraining and FSEM, a new variation of EM that incorporates feature selection, outperform cotraining and are comparatively less sensitive to parameter changes.
A General Model for Multiple View Unsupervised Learning
, 2008
"... Multiple view data, which have multiple representations from different feature spaces or graph spaces, arise in various data mining applications such as information retrieval, bioinformatics and social network analysis. Since different representations could have very different statistical properties ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
Multiple view data, which have multiple representations from different feature spaces or graph spaces, arise in various data mining applications such as information retrieval, bioinformatics and social network analysis. Since different representations could have very different statistical properties, how to learn a consensus pattern from multiple representations is a challenging problem. In this paper, we propose a general model for multiple view unsupervised learning. The proposed model introduces the concept of mapping function to make the different patterns from different pattern spaces comparable and hence an optimal pattern can be learned from the multiple patterns of multiple representations. Under this model, we formulate two specific models for