Results 1 - 10
of
37
Semi-Supervised Learning Literature Survey
, 2006
"... We review the literature on semi-supervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semi-supervised learning. This document is a chapter ..."
Abstract
-
Cited by 268 (7 self)
- Add to MetaCart
We review the literature on semi-supervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semi-supervised learning. This document is a chapter excerpt from the author’s
doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest
version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Learning to combine bottom-up and top-down segmentation
- in: European Conference on Computer Vision
"... Abstract. Bottom-up segmentation based only on low-level cues is a notoriously difficult problem. This difficulty has lead to recent top-down segmentation algorithms that are based on class-specific image information. Despite the success of top-down algorithms, they often give coarse segmentations t ..."
Abstract
-
Cited by 53 (0 self)
- Add to MetaCart
Abstract. Bottom-up segmentation based only on low-level cues is a notoriously difficult problem. This difficulty has lead to recent top-down segmentation algorithms that are based on class-specific image information. Despite the success of top-down algorithms, they often give coarse segmentations that can be significantly refined using low-level cues. This raises the question of how to combine both top-down and bottom-up cues in a principled manner. In this paper we approach this problem using supervised learning. Given a training set of ground truth segmentations we train a fragment-based segmentation algorithm which takes into account both bottom-up and top-down cues simultaneously, in contrast to most existing algorithms which train top-down and bottom-up modules separately. We formulate the problem in the framework of Conditional Random Fields (CRF) and derive a feature induction algorithm for CRF, which allows us to efficiently search over thousands of candidate fragments. Whereas pure top-down algorithms often require hundreds of fragments, our simultaneous learning procedure yields algorithms with a handful of fragments that are combined with low-level cues to efficiently compute high quality segmentations. 1
Bayesian conditional random fields
- In Conference on Artificial Intelligence and Statistics (AISTATS), 2005. 193 Yuan
, 2005
"... We propose Bayesian Conditional Random Fields (BCRFs) for classifying interdependent and structured data, such as sequences, images or webs. BCRFs are a Bayesian approach to training and inference with conditional random fields, which were previously trained by maximizing likelihood (ML) (Lafferty e ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
We propose Bayesian Conditional Random Fields (BCRFs) for classifying interdependent and structured data, such as sequences, images or webs. BCRFs are a Bayesian approach to training and inference with conditional random fields, which were previously trained by maximizing likelihood (ML) (Lafferty et al., 2001). Our framework eliminates the problem of overfitting, and offers the full advantages of a Bayesian treatment. Unlike the ML approach, we estimate the posterior distribution of the model parameters during training, and average over this posterior during inference. We apply an extension of EP method, the power EP method, to incorporate the partition function. For algorithmic stability and accuracy, we flatten the approximation structures to avoid two-level approximations. We demonstrate the superior prediction accuracy of BCRFs over conditional random fields trained with ML or MAP on synthetic and real datasets. 1
Structured prediction, dual extragradient and Bregman projections
- Journal of Machine Learning Research
, 2006
"... We present a simple and scalable algorithm for maximum-margin estimation of structured output models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convex-concave saddle-point problem that allows us to use simple projection methods ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
We present a simple and scalable algorithm for maximum-margin estimation of structured output models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convex-concave saddle-point problem that allows us to use simple projection methods based on the dual extragradient algorithm (Nesterov, 2003). The projection step can be solved using dynamic programming or combinatorial algorithms for min-cost convex flow, depending on the structure of the problem. We show that this approach provides a memory-efficient alternative to formulations based on reductions to a quadratic program (QP). We analyze the convergence of the method and present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm. 1 1.
A Review of Kernel Methods in Machine Learning
, 2006
"... We review recent methods for learning with positive definite kernels. All these methods formulate learning and estimation problems as linear tasks in a reproducing kernel Hilbert space (RKHS) associated with a kernel. We cover a wide range of methods, ranging from simple classifiers to sophisticate ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
We review recent methods for learning with positive definite kernels. All these methods formulate learning and estimation problems as linear tasks in a reproducing kernel Hilbert space (RKHS) associated with a kernel. We cover a wide range of methods, ranging from simple classifiers to sophisticated methods for estimation with structured data.
Augmented statistical models for speech recognition
- in Proc. ICASSP
, 2006
"... Recently there has been significant interest in developing new acoustic models for speech recognition. One such model, that allows complex dependencies to be represented, is the augmented statistical model. This incorporates additional dependencies by constructing a local exponential expansion of a ..."
Abstract
-
Cited by 16 (9 self)
- Add to MetaCart
Recently there has been significant interest in developing new acoustic models for speech recognition. One such model, that allows complex dependencies to be represented, is the augmented statistical model. This incorporates additional dependencies by constructing a local exponential expansion of a standard HMM. Unfortunately, the resulting model often has an intractable normalisation term, rendering training difficult for all but binary classification tasks. In this paper, conditional augmented (C-Aug) models are proposed as an attractive alternative. Instead of modelling utterance likelihoods and inferring decision boundaries, C-Aug models directly model the posterior probability of class labels, conditioned on the utterance. The resulting model is easy to normalise and can be trained using conditional maximum likelihood estimation. In addition, as a convex model, the optimisation converges to a global maximum. 1.
Semi-supervised learning for structured output variables
- ICML06, 23rd International Conference on Machine Learning
, 2006
"... The problem of learning a mapping between input and structured, interdependent output variables covers sequential, spatial, and relational learning as well as predicting recursive structures. Joint feature representations of the input and output variables have paved the way to leveraging discriminat ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
The problem of learning a mapping between input and structured, interdependent output variables covers sequential, spatial, and relational learning as well as predicting recursive structures. Joint feature representations of the input and output variables have paved the way to leveraging discriminative learners such as SVMs to this class of problems. We address the problem of semi-supervised learning in joint input output spaces. The cotraining approach is based on the principle of maximizing the consensus among multiple independent hypotheses; we develop this principle into a semi-supervised support vector learning algorithm for joint input output spaces and arbitrary loss functions. Experiments investigate the benefit of semisupervised structured models in terms of accuracy and F1 score. 1.
Minimizing and learning energy functions for side-chain prediction
- In RECOMB2007
, 2007
"... Side-chain prediction is an important subproblem of the general protein folding problem. Despite much progress in side-chain prediction, performance is far from satisfactory. As an example, the ROSETTA protocol that uses simulated annealing to select the minimum energy conformations, correctly predi ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Side-chain prediction is an important subproblem of the general protein folding problem. Despite much progress in side-chain prediction, performance is far from satisfactory. As an example, the ROSETTA protocol that uses simulated annealing to select the minimum energy conformations, correctly predicts the first two side-chain angles for approximately 72 % of the buried residues in a standard data set. Is further improvement more likely to come from better search methods, or from better energy functions? Given that exact minimization of the energy is NP hard, it is difficult to get a systematic answer to this question. In this paper, we present a novel search method and a novel method for learning energy functions from training data that are both based on Tree Reweighted Belief Propagation (TRBP). We find that TRBP can find the global optimum of the ROSETTA energy function in a few minutes of computation for approximately 85 % of the proteins in a standard benchmark set. TRBP can also effectively bound the partition function which enables using the Conditional Random Fields (CRF) framework for learning. Interestingly, finding the global minimum does not significantly improve side-chain prediction for
Semi-supervised learning for natural language
- MASTER’S THESIS, MIT
, 2005
"... Statistical supervised learning techniques have been successful for many natural language processing tasks, but they require labeled datasets, which can be expensive to obtain. On the other hand, unlabeled data (raw text) is often available “for free ” in large quantities. Unlabeled data has shown p ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Statistical supervised learning techniques have been successful for many natural language processing tasks, but they require labeled datasets, which can be expensive to obtain. On the other hand, unlabeled data (raw text) is often available “for free ” in large quantities. Unlabeled data has shown promise in improving the performance of a number of tasks, e.g. word sense disambiguation, information extraction, and natural language parsing. In this thesis, we focus on two segmentation tasks, named-entity recognition and Chinese word segmentation. The goal of named-entity recognition is to detect and classify names of people, organizations, and locations in a sentence. The goal of Chinese word segmentation is to find the word boundaries in a sentence that has been written as a string of characters without spaces. Our approach is as follows: In a preprocessing step, we use raw text to cluster words and calculate mutual information statistics. The output of this step is then used as features in a supervised model, specifically a global linear model trained using
Multiview discriminative sequential learning
- Proceedings of the European Conference on Machine Learning
, 2005
"... 1 Introduction The problem of labeling observation sequences has applications that range fromlanguage processing tasks such as named entity recognition, part-of-speech tagging, and information extraction to biological tasks in which the instances areoften DNA strings. Traditionally, sequence models ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
1 Introduction The problem of labeling observation sequences has applications that range fromlanguage processing tasks such as named entity recognition, part-of-speech tagging, and information extraction to biological tasks in which the instances areoften DNA strings. Traditionally, sequence models such as the hidden Markov model and variants thereof have been applied to the label sequence learningproblem. Learning procedures for generative models adjust the parameters such that the joint likelihood of training observations and label sequences is maxi-mized. By contrast, from the application point of view the true benefit of a label sequence predictor corresponds to its ability to find the correct label sequencegiven an observation sequence.

