Results 1  10
of
94
SemiSupervised Learning Literature Survey
, 2006
"... We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter ..."
Abstract

Cited by 757 (8 self)
 Add to MetaCart
We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter excerpt from the author’s
doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest
version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Learning to combine bottomup and topdown segmentation
 in: European Conference on Computer Vision
"... Abstract. Bottomup segmentation based only on lowlevel cues is a notoriously difficult problem. This difficulty has lead to recent topdown segmentation algorithms that are based on classspecific image information. Despite the success of topdown algorithms, they often give coarse segmentations t ..."
Abstract

Cited by 131 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Bottomup segmentation based only on lowlevel cues is a notoriously difficult problem. This difficulty has lead to recent topdown segmentation algorithms that are based on classspecific image information. Despite the success of topdown algorithms, they often give coarse segmentations that can be significantly refined using lowlevel cues. This raises the question of how to combine both topdown and bottomup cues in a principled manner. In this paper we approach this problem using supervised learning. Given a training set of ground truth segmentations we train a fragmentbased segmentation algorithm which takes into account both bottomup and topdown cues simultaneously, in contrast to most existing algorithms which train topdown and bottomup modules separately. We formulate the problem in the framework of Conditional Random Fields (CRF) and derive a feature induction algorithm for CRF, which allows us to efficiently search over thousands of candidate fragments. Whereas pure topdown algorithms often require hundreds of fragments, our simultaneous learning procedure yields algorithms with a handful of fragments that are combined with lowlevel cues to efficiently compute high quality segmentations. 1
A Review of Kernel Methods in Machine Learning
, 2006
"... We review recent methods for learning with positive definite kernels. All these methods formulate learning and estimation problems as linear tasks in a reproducing kernel Hilbert space (RKHS) associated with a kernel. We cover a wide range of methods, ranging from simple classifiers to sophisticate ..."
Abstract

Cited by 95 (4 self)
 Add to MetaCart
We review recent methods for learning with positive definite kernels. All these methods formulate learning and estimation problems as linear tasks in a reproducing kernel Hilbert space (RKHS) associated with a kernel. We cover a wide range of methods, ranging from simple classifiers to sophisticated methods for estimation with structured data.
Maximum margin semisupervised learning for structured variables
 Advances in Neural Information Processing Systems 18
, 2005
"... Abstract Many realworld classification problems involve the prediction ofmultiple interdependent variables forming some structural dependency. Recent progress in machine learning has mainly focused onsupervised classification of such structured variables. In this paper, we investigate structured c ..."
Abstract

Cited by 65 (0 self)
 Add to MetaCart
(Show Context)
Abstract Many realworld classification problems involve the prediction ofmultiple interdependent variables forming some structural dependency. Recent progress in machine learning has mainly focused onsupervised classification of such structured variables. In this paper, we investigate structured classification in a semisupervised setting.We present a discriminative approach that utilizes the intrinsic geometry of input patterns revealed by unlabeled data points and wederive a maximummargin formulation of semisupervised learning for structured variables. Unlike transductive algorithms, our formulation naturally extends to new test points.
Structured prediction, dual extragradient and Bregman projections
 Journal of Machine Learning Research
, 2006
"... We present a simple and scalable algorithm for maximummargin estimation of structured output models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convexconcave saddlepoint problem that allows us to use simple projection methods ..."
Abstract

Cited by 62 (2 self)
 Add to MetaCart
We present a simple and scalable algorithm for maximummargin estimation of structured output models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convexconcave saddlepoint problem that allows us to use simple projection methods based on the dual extragradient algorithm (Nesterov, 2003). The projection step can be solved using dynamic programming or combinatorial algorithms for mincost convex flow, depending on the structure of the problem. We show that this approach provides a memoryefficient alternative to formulations based on reductions to a quadratic program (QP). We analyze the convergence of the method and present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm. 1 1.
Bayesian conditional random fields
 In Conference on Artificial Intelligence and Statistics (AISTATS), 2005. 193 Yuan
, 2005
"... We propose Bayesian Conditional Random Fields (BCRFs) for classifying interdependent and structured data, such as sequences, images or webs. BCRFs are a Bayesian approach to training and inference with conditional random fields, which were previously trained by maximizing likelihood (ML) (Lafferty e ..."
Abstract

Cited by 51 (1 self)
 Add to MetaCart
(Show Context)
We propose Bayesian Conditional Random Fields (BCRFs) for classifying interdependent and structured data, such as sequences, images or webs. BCRFs are a Bayesian approach to training and inference with conditional random fields, which were previously trained by maximizing likelihood (ML) (Lafferty et al., 2001). Our framework eliminates the problem of overfitting, and offers the full advantages of a Bayesian treatment. Unlike the ML approach, we estimate the posterior distribution of the model parameters during training, and average over this posterior during inference. We apply an extension of EP method, the power EP method, to incorporate the partition function. For algorithmic stability and accuracy, we flatten the approximation structures to avoid twolevel approximations. We demonstrate the superior prediction accuracy of BCRFs over conditional random fields trained with ML or MAP on synthetic and real datasets. 1
Minimizing and learning energy functions for sidechain prediction
 In RECOMB2007
, 2007
"... Sidechain prediction is an important subproblem of the general protein folding problem. Despite much progress in sidechain prediction, performance is far from satisfactory. As an example, the ROSETTA protocol that uses simulated annealing to select the minimum energy conformations, correctly predi ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
(Show Context)
Sidechain prediction is an important subproblem of the general protein folding problem. Despite much progress in sidechain prediction, performance is far from satisfactory. As an example, the ROSETTA protocol that uses simulated annealing to select the minimum energy conformations, correctly predicts the first two sidechain angles for approximately 72 % of the buried residues in a standard data set. Is further improvement more likely to come from better search methods, or from better energy functions? Given that exact minimization of the energy is NP hard, it is difficult to get a systematic answer to this question. In this paper, we present a novel search method and a novel method for learning energy functions from training data that are both based on Tree Reweighted Belief Propagation (TRBP). We find that TRBP can find the global optimum of the ROSETTA energy function in a few minutes of computation for approximately 85 % of the proteins in a standard benchmark set. TRBP can also effectively bound the partition function which enables using the Conditional Random Fields (CRF) framework for learning. Interestingly, finding the global minimum does not significantly improve sidechain prediction for
Semisupervised learning for natural language
 MASTER’S THESIS, MIT
, 2005
"... Statistical supervised learning techniques have been successful for many natural language processing tasks, but they require labeled datasets, which can be expensive to obtain. On the other hand, unlabeled data (raw text) is often available “for free ” in large quantities. Unlabeled data has shown p ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
Statistical supervised learning techniques have been successful for many natural language processing tasks, but they require labeled datasets, which can be expensive to obtain. On the other hand, unlabeled data (raw text) is often available “for free ” in large quantities. Unlabeled data has shown promise in improving the performance of a number of tasks, e.g. word sense disambiguation, information extraction, and natural language parsing. In this thesis, we focus on two segmentation tasks, namedentity recognition and Chinese word segmentation. The goal of namedentity recognition is to detect and classify names of people, organizations, and locations in a sentence. The goal of Chinese word segmentation is to find the word boundaries in a sentence that has been written as a string of characters without spaces. Our approach is as follows: In a preprocessing step, we use raw text to cluster words and calculate mutual information statistics. The output of this step is then used as features in a supervised model, specifically a global linear model trained using
Structured prediction via the extragradient method
"... We present a simple and scalable algorithm for largemargin estimation of structured models, including an important class of Markov networks and combinatorial models. The estimation problem can be formulated as a quadratic program (QP) that exploits the problem structure to achieve polynomial number ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
We present a simple and scalable algorithm for largemargin estimation of structured models, including an important class of Markov networks and combinatorial models. The estimation problem can be formulated as a quadratic program (QP) that exploits the problem structure to achieve polynomial number of variables and constraints. However, offtheshelf QP solvers scale poorly with problem and training sample size. We recast the formulation as a convexconcave saddle point problem that allows us to use simple projection methods. We show the projection step can be solved using combinatorial algorithms for mincost convex flow. We provide linear convergence guarantees for our method and present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm.
Augmented statistical models for speech recognition
 in Proc. ICASSP
, 2006
"... ..."
(Show Context)