Results 1 - 10
of
11
Semi-supervised Learning of Classifiers: Theory, Algorithms and Their Application to Human-Computer Interaction
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2004
"... Automatic classification is one of the basic tasks required in any pattern recognition and human computer interaction application. In this paper we discuss training probabilistic classifiers with labeled and unlabeled data. We provide a new analysis that shows under what conditions unlabeled data ..."
Abstract
-
Cited by 47 (14 self)
- Add to MetaCart
Automatic classification is one of the basic tasks required in any pattern recognition and human computer interaction application. In this paper we discuss training probabilistic classifiers with labeled and unlabeled data. We provide a new analysis that shows under what conditions unlabeled data can be used in learning to improve classification performance. We also show that if the conditions are violated, using unlabeled data can be detrimental to classification performance. We discuss the implications of this analysis to a specific type of probabilistic classifiers, Bayesian networks, and propose a new structure learning algorithm that can utilize unlabeled data to improve classification. Finally, we show how the resulting algorithms are successfully employed in two applications related to human-computer interaction and pattern recognition; facial expression recognition and face detection.
Computing Regularization Paths for Learning Multiple Kernels
, 2005
"... The problem of learning a sparse conic combination of kernel functions or kernel matrices for classification or regression can be achieved via the regularization by a block 1-norm [1]. In this paper, we present an algorithm that computes the entire regularization path for these problems. ..."
Abstract
-
Cited by 34 (10 self)
- Add to MetaCart
The problem of learning a sparse conic combination of kernel functions or kernel matrices for classification or regression can be achieved via the regularization by a block 1-norm [1]. In this paper, we present an algorithm that computes the entire regularization path for these problems.
Semi-Supervised Learning of Mixture Models
- ICML-03, 20th International Conference on Machine Learning
, 2003
"... This paper analyzes the performance of semisupervised learning of mixture models. We show that unlabeled data can lead to an increase in classification error even in situations where additional labeled data would decrease classification error. We present a mathematical analysis of this "degrad ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
This paper analyzes the performance of semisupervised learning of mixture models. We show that unlabeled data can lead to an increase in classification error even in situations where additional labeled data would decrease classification error. We present a mathematical analysis of this "degradation" phenomenon and show that it is due to the fact that bias may be adversely affected by unlabeled data. We discuss the impact of these theoretical results to practical situations.
Learning hidden variable networks: The information bottleneck approach
- Journal of Machine Learning Research
, 2005
"... A central challenge in learning probabilistic graphical models is dealing with domains that involve hidden variables. The common approach for learning model parameters in such domains is the expectation maximization (EM) algorithm. This algorithm, however, can easily get trapped in suboptimal local ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
A central challenge in learning probabilistic graphical models is dealing with domains that involve hidden variables. The common approach for learning model parameters in such domains is the expectation maximization (EM) algorithm. This algorithm, however, can easily get trapped in suboptimal local maxima. Learning the model structure is even more challenging. The structural EM algorithm can adapt the structure in the presence of hidden variables, but usually performs poorly without prior knowledge about the cardinality and location of the hidden variables. In this work, we present a general approach for learning Bayesian networks with hidden variables that overcomes these problems. The approach builds on the information bottleneck framework of Tishby et al. (1999). We start by proving formal correspondence between the information bottleneck objective and the standard parametric EM functional. We then use this correspondence to construct a learning algorithm that combines an information-theoretic smoothing term with a continuation procedure. Intuitively, the algorithm bypasses local maxima and achieves superior solutions by following a continuous path from a solution of, an easy and smooth, target function, to a solution of the desired likelihood function. As we show, our algorithmic framework allows learning of the parameters as well as the structure of a network. In addition, it also allows us to introduce new hidden variables during model selection and learn their cardinality. We demonstrate the performance of our procedure on several challenging real-life data sets.
Semi-Supervised Learning of Mixture Models and Bayesian Networks
- Networks, Proceedings of the Twentieth International Conference of Machine Learning
, 2003
"... This paper analyzes the performance of semisupervised learning of mixture models. We show that unlabeled data can lead to an increase in classification error even in situations where additional labeled data would decrease classification error. This behavior contradicts several empirical results repo ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
This paper analyzes the performance of semisupervised learning of mixture models. We show that unlabeled data can lead to an increase in classification error even in situations where additional labeled data would decrease classification error. This behavior contradicts several empirical results reported in the literature. We present a mathematical analysis of this "degradation" phenomenon and show that it is due to the fact that bias may be adversely affected by unlabeled data.
Exploitation of unlabeled sequences in hidden markov models
- IEEE Trans. On Pattern Analysis and Machine Intelligence
, 2003
"... Abstract—This paper presents a method for effectively using unlabeled sequential data in the learning of hidden Markov models (HMMs). With the conventional approach, class labels for unlabeled data are assigned deterministically by HMMs learned from labeled data. Such labeling often becomes unreliab ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract—This paper presents a method for effectively using unlabeled sequential data in the learning of hidden Markov models (HMMs). With the conventional approach, class labels for unlabeled data are assigned deterministically by HMMs learned from labeled data. Such labeling often becomes unreliable when the number of labeled data is small. We propose an extended Baum-Welch (EBW) algorithm in which the labeling is undertaken probabilistically and iteratively so that the labeled and unlabeled data likelihoods are improved. Unlike the conventional approach, the EBW algorithm guarantees convergence to a local maximum of the likelihood. Experimental results on gesture data and speech data show that when labeled training data are scarce, by using unlabeled data, the EBW algorithm improves the classification performance of HMMs more robustly than the conventional naive labeling (NL) approach. Index Terms—Unlabeled data, sequential data, hidden Markov models, extended Baum-Welch algorithm. æ 1
Semisupervised Learning Of Classifiers With Application To Human-Computer Interaction
- Born, Max, Einstein’s Theory of Relativity
, 2003
"... With the growing use of computers and computing objects in the design of many of the day to day tools that humans use, human-computer intelligent interaction is seen as a necessary step for the ability to make computers better aid the human user. There are many tasks involved in designing good inter ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
With the growing use of computers and computing objects in the design of many of the day to day tools that humans use, human-computer intelligent interaction is seen as a necessary step for the ability to make computers better aid the human user. There are many tasks involved in designing good interaction between humans and machines. One basic task, related to many such applications, is automatic classification by the machine. Designing a classifier can be done by domain experts or by learning from training data. Training data can be labeled to the different classes or unlabeled. In this work I focus on training probabilistic classifiers with labeled and unlabeled data. I show under what conditions unlabeled data can be used to improve classification performance. I also show that it often occurs that if the conditions are violated, using unlabeled data can be detrimental to the classification performance. I discuss the implications of this analysis when learning a specific type of probabilistic classifiers, namely Bayesian networks, and propose structure learning algorithms that can potentially utilize unlabeled data to improve classification. I show how the theory and algorithms are successfully applied in two applications related to human-computer interaction: facial expression recognition and face detection.
U N I V E R S
"... The recent years have seen the emergence of multiple stochastic language and grammar models, that make use of Pitman-Yor processes as Bayesian priors. Thus far, those models proved very effective for NLP tasks that involve unsupervised inference. The aim of this project is to investigate on semi-sup ..."
Abstract
- Add to MetaCart
The recent years have seen the emergence of multiple stochastic language and grammar models, that make use of Pitman-Yor processes as Bayesian priors. Thus far, those models proved very effective for NLP tasks that involve unsupervised inference. The aim of this project is to investigate on semi-supervised learning methods and to test their applicability and effectiveness on this class of language models. The original methods have to be adapted as the usual semi-supervised inference with Expectation-Maximization (EM) is not applicable. The alternative is to retreat to Gibbs sampling. The two major contributions are novel variants of the Stable Mixing method (Corduneanu and Jaakkola, 2002) and of Active Learning (Cohn et al., 1996). Unlike their originals, the new methods are applicable to Gibbs-based inference methods. For an exemplary word separation task on the Bernstein Ratner corpus, the new Stable Mixing variant improves the word F0-score by 9 % in comparison to the base-line approach. 2 Acknowledgements I want to express my gratitute to my project supervisor, Miles Osborne, for his support
Homotopy-based Semi-Supervised Hidden Markov Models for Sequence Labeling ∗
"... This paper explores the use of the homotopy method for training a semi-supervised Hidden Markov Model (HMM) used for sequence labeling. We provide a novel polynomial-time algorithm to trace the local maximum of the likelihood function for HMMs from full weight on the labeled data to full weight on t ..."
Abstract
- Add to MetaCart
This paper explores the use of the homotopy method for training a semi-supervised Hidden Markov Model (HMM) used for sequence labeling. We provide a novel polynomial-time algorithm to trace the local maximum of the likelihood function for HMMs from full weight on the labeled data to full weight on the unlabeled data. We present an experimental analysis of different techniques for choosing the best balance between labeled and unlabeled data based on the characteristics observed along this path. Furthermore, experimental results on the field segmentation task in information extraction show that the Homotopy-based method significantly outperforms EM-based semisupervised learning, and provides a more accurate alternative to the use of held-out data to pick the best balance for combining labeled and unlabeled data. 1

