Results 1  10
of
94
SemiSupervised Learning Literature Survey
, 2006
"... We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter ..."
Abstract

Cited by 454 (8 self)
 Add to MetaCart
We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter excerpt from the author’s
doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest
version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Large scale transductive svms
 JMLR
"... We show how the ConcaveConvex Procedure can be applied to Transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is a ..."
Abstract

Cited by 62 (5 self)
 Add to MetaCart
We show how the ConcaveConvex Procedure can be applied to Transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is available at
Semisupervised discriminant analysis
 in Proc. of the IEEE Int’l Conf. on Comp. Vision (ICCV), Rio De Janeiro
, 2007
"... Linear Discriminant Analysis (LDA) has been a popular method for extracting features which preserve class separability. The projection vectors are commonly obtained by maximizing the between class covariance and simultaneously minimizing the within class covariance. In practice, when there is no suf ..."
Abstract

Cited by 42 (2 self)
 Add to MetaCart
Linear Discriminant Analysis (LDA) has been a popular method for extracting features which preserve class separability. The projection vectors are commonly obtained by maximizing the between class covariance and simultaneously minimizing the within class covariance. In practice, when there is no sufficient training samples, the covariance matrix of each class may not be accurately estimated. In this paper, we propose a novel method, called Semisupervised Discriminant Analysis (SDA), which makes use of both labeled and unlabeled samples. The labeled data points are used to maximize the separability between different classes and the unlabeled data points are used to estimate the intrinsic geometric structure of the data. Specifically, we aim to learn a discriminant function which is as smooth as possible on the data manifold. Experimental results on single training image face recognition and relevance feedback image retrieval demonstrate the effectiveness of our algorithm. 1.
Optimization Techniques for SemiSupervised Support Vector Machines
"... Due to its wide applicability, the problem of semisupervised classification is attracting increasing attention in machine learning. SemiSupervised Support Vector Machines (S 3 VMs) are based on applying the margin maximization principle to both labeled and unlabeled examples. Unlike SVMs, their fo ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
Due to its wide applicability, the problem of semisupervised classification is attracting increasing attention in machine learning. SemiSupervised Support Vector Machines (S 3 VMs) are based on applying the margin maximization principle to both labeled and unlabeled examples. Unlike SVMs, their formulation leads to a nonconvex optimization problem. A suite of algorithms have recently been proposed for solving S 3 VMs. This paper reviews key ideas in this literature. The performance and behavior of various S 3 VM algorithms is studied together, under a common experimental setting.
Deep learning via semisupervised embedding
 International Conference on Machine Learning
, 2008
"... We show how nonlinear embedding algorithms popular for use with shallow semisupervised learning techniques such as kernel methods can be applied to deep multilayer architectures, either as a regularizer at the output layer, or on each layer of the architecture. This provides a simple alternative to ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
We show how nonlinear embedding algorithms popular for use with shallow semisupervised learning techniques such as kernel methods can be applied to deep multilayer architectures, either as a regularizer at the output layer, or on each layer of the architecture. This provides a simple alternative to existing approaches to deep learning whilst yielding competitive error rates compared to those methods, and existing shallow semisupervised techniques. 1.
Ranking on graph data
 In ICML
, 2006
"... In ranking, one is given examples of order relationships among objects, and the goal is to learn from these examples a realvalued ranking function that induces a ranking or ordering over the object space. We consider the problem of learning such a ranking function when the data is represented as a ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
In ranking, one is given examples of order relationships among objects, and the goal is to learn from these examples a realvalued ranking function that induces a ranking or ordering over the object space. We consider the problem of learning such a ranking function when the data is represented as a graph, in which vertices correspond to objects and edges encode similarities between objects. Building on recent developments in regularization theory for graphs and corresponding Laplacianbased methods for classification, we develop an algorithmic framework for learning ranking functions on graph data. We provide generalization guarantees for our algorithms via recent results based on the notion of algorithmic stability, and give experimental evidence of the potential benefits of our framework. 1.
Domain Adaptation from Multiple Sources via Auxiliary Classifiers
"... We propose a multiple source domain adaptation method, referred to as Domain Adaptation Machine (DAM), to learn a robust decision function (referred to as target classifier) for label prediction of patterns from the target domain by leveraging a set of precomputed classifiers (referred to as auxili ..."
Abstract

Cited by 33 (11 self)
 Add to MetaCart
We propose a multiple source domain adaptation method, referred to as Domain Adaptation Machine (DAM), to learn a robust decision function (referred to as target classifier) for label prediction of patterns from the target domain by leveraging a set of precomputed classifiers (referred to as auxiliary/source classifiers) independently learned with the labeled patterns from multiple source domains. We introduce a new datadependent regularizer based on smoothness assumption into LeastSquares SVM (LSSVM), which enforces that the target classifier shares similar decision values with the auxiliary classifiers from relevant source domains on the unlabeled patterns of the target domain. In addition, we employ a sparsity regularizer to learn a sparse target classifier. Comprehensive experiments on the challenging TRECVID 2005 corpus demonstrate that DAM outperforms the existing multiple source domain adaptation methods for video concept detection in terms of effectiveness and efficiency. 1.
A continuation method for semisupervised svms
 In International Conference on Machine Learning
, 2006
"... SemiSupervised Support Vector Machines (S3VMs) are an appealing method for using unlabeled data in classification: their objective function favors decision boundaries which do not cut clusters. However their main problem is that the optimization problem is nonconvex and has many local minima, whic ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
SemiSupervised Support Vector Machines (S3VMs) are an appealing method for using unlabeled data in classification: their objective function favors decision boundaries which do not cut clusters. However their main problem is that the optimization problem is nonconvex and has many local minima, which often results in suboptimal performances. In this paper we propose to use a global optimization technique known as continuation to alleviate this problem. Compared to other algorithms minimizing the same objective function, our continuation method often leads to lower test errors. 1.
A comparative study of methods for transductive transfer learning
 In ICDM Workshop on Mining and Management of Biological Data
, 2007
"... The problem of transfer learning, where information gained in one learning task is used to improve performance in another related task, is an important new area of research. In this paper we address the subproblem of domain adaptation, in which a model trained over a source domain is generalized to ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
The problem of transfer learning, where information gained in one learning task is used to improve performance in another related task, is an important new area of research. In this paper we address the subproblem of domain adaptation, in which a model trained over a source domain is generalized to perform well on a related target domain, where these two domains ’ data are distributed similarly, but not identically. Previous work has studied the supervised version of this problem in which labeled data from both source and target domains are available for training. In this work, however, we study the more challenging problem of unsupervised transductive transfer learning, where no labeled data from the target domain are available at training time, but instead, unlabeled target test data are available during training. We describe some current stateoftheart inductive and transductive approaches involving three popular learning models, namely the maximum entropy, support vector machines and naive Bayes models. We then adapt these models to the problem of transfer learning for protein name extraction. In the process, we introduce a novel maximum entropy based technique, Iterative Feature Transformation (IFT), and show that it achieves comparable performance with stateoftheart transductive SVMs. Finally, we compare the relative strengths and weaknesses of these models across the various learning settings, shedding light both on the algorithms examined and the difficulty of the respective problems. In addition, we show how simple relaxations, such as providing additional information like the proportion of positive examples in the test data, can significantly improve the performance of some of the transductive transfer learners. 1
Semisupervised regression with cotraining style algorithms
, 2007
"... The traditional setting of supervised learning requires a large amount of labeled training examples in order to achieve good generalization. However, in many practical applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore, semisup ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
The traditional setting of supervised learning requires a large amount of labeled training examples in order to achieve good generalization. However, in many practical applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore, semisupervised learning has attracted much attention. Previous research on semisupervised learning mainly focuses on semisupervised classification. Although regression is almost as important as classification, semisupervised regression is largely understudied. In particular, although cotraining is a main paradigm in semisupervised learning, few works has been devoted to cotraining style semisupervised regression algorithms. In this paper, a cotraining style semisupervised regression algorithm, i.e. COREG, is proposed. This algorithm uses two regressors each labels the unlabeled data for the other regressor, where the confidence in labeling an unlabeled example is estimated through the amount of reduction in mean square error over the labeled neighborhood of that example. Analysis and experiments show that COREG can effectively exploit unlabeled data to improve regression estimates.