Results 1  10
of
79
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 758 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Unsupervised learning of finite mixture models
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2002
"... This paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization (EM) alg ..."
Abstract

Cited by 415 (22 self)
 Add to MetaCart
This paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach.
Semisupervised Learning by Entropy Minimization
"... We consider the semisupervised learning problem, where a decision rule is to be learned from labeled and unlabeled data. In this framework, we motivate minimum entropy regularization, which enables to incorporate unlabeled data in the standard supervised learning. This regularizer can be applied to ..."
Abstract

Cited by 101 (2 self)
 Add to MetaCart
(Show Context)
We consider the semisupervised learning problem, where a decision rule is to be learned from labeled and unlabeled data. In this framework, we motivate minimum entropy regularization, which enables to incorporate unlabeled data in the standard supervised learning. This regularizer can be applied to any model of posterior probabilities. Our approach provides a new motivation for some existing semisupervised learning algorithms which are particular or limiting instances of minimum entropy regularization. A series of experiments illustrates that the proposed solution benefits from unlabeled data. The method challenges mixture models when the data are sampled from the distribution class spanned by the generative model. The performances are definitely in favor of minimum entropy regularization when generative models are misspecified, and the weighting of unlabeled data provides robustness to the violation of the “cluster assumption”. Finally, we also illustrate that the method can be far superior to manifold learning in high dimension spaces, and also when the manifolds are generated by moving examples along the discriminating directions.
Variational Extensions to EM and Multinomial PCA
 In ECML 2002
, 2002
"... Several authors in recent years have proposed discrete analogues to principle component analysis intended to handle discrete or positive only data, for instance suited to analyzing sets of documents. Methods include nonnegative matrix factorization, probabilistic latent semantic analysis, and laten ..."
Abstract

Cited by 94 (14 self)
 Add to MetaCart
(Show Context)
Several authors in recent years have proposed discrete analogues to principle component analysis intended to handle discrete or positive only data, for instance suited to analyzing sets of documents. Methods include nonnegative matrix factorization, probabilistic latent semantic analysis, and latent Dirichlet allocation. This paper begins with a review of the basic theory of the variational extension to the expectation maximization algorithm, and then presents discrete component finding algorithms in that light. Experiments are conducted on both bigram word data and document bagofword to expose some of the subtleties of this new class of algorithms.
Segmentation of musical signals using hidden markov models
 In Proc. 110th Convention of the Audio Engineering Society
, 2001
"... This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request ..."
Abstract

Cited by 62 (8 self)
 Add to MetaCart
(Show Context)
This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request
Distribution of mutual information
 Advances in Neural Information Processing Systems 14: Proceedings of the 2002 Conference
, 2002
"... expectation and variance of mutual information. The mutual information of two random variables ı and j with joint probabilities {πij} is commonly used in learning Bayesian nets as well as in many other fields. The chances πij are usually estimated by the empirical sampling frequency nij/n leading to ..."
Abstract

Cited by 49 (12 self)
 Add to MetaCart
(Show Context)
expectation and variance of mutual information. The mutual information of two random variables ı and j with joint probabilities {πij} is commonly used in learning Bayesian nets as well as in many other fields. The chances πij are usually estimated by the empirical sampling frequency nij/n leading to a point estimate I(nij/n) for the mutual information. To answer questions like “is I(nij/n) consistent with zero? ” or “what is the probability that the true mutual information is much larger than the point estimate? ” one has to go beyond the point estimate. In the Bayesian framework one can answer these questions by utilizing a (second order) prior distribution p(π) comprising prior information about π. From the prior p(π) one can compute the posterior p(πn), from which the distribution p(In) of the mutual information can be calculated. We derive reliable and quickly computable approximations for p(In). We concentrate on the mean, variance, skewness, and kurtosis, and noninformative priors. For the mean we also
Supervised and SemiSupervised Separation of Sounds from SingleChannel Mixtures
"... Abstract. In this paper we describe a methodology for modelbased single channel separation of sounds. We present a sparse latent variable model that can learn sounds based on their distribution of time/frequency energy. This model can then be used to extract known types of sounds from mixtures in t ..."
Abstract

Cited by 46 (10 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we describe a methodology for modelbased single channel separation of sounds. We present a sparse latent variable model that can learn sounds based on their distribution of time/frequency energy. This model can then be used to extract known types of sounds from mixtures in two scenarios. One being the case where all sound types in the mixture are known, and the other being being the case where only the target or the interference models are known. The model we propose has close ties to nonnegative decompositions and latent variable models commonly used for semantic analysis. 1
Semisupervised learning for natural language
 MASTER’S THESIS, MIT
, 2005
"... Statistical supervised learning techniques have been successful for many natural language processing tasks, but they require labeled datasets, which can be expensive to obtain. On the other hand, unlabeled data (raw text) is often available “for free ” in large quantities. Unlabeled data has shown p ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
Statistical supervised learning techniques have been successful for many natural language processing tasks, but they require labeled datasets, which can be expensive to obtain. On the other hand, unlabeled data (raw text) is often available “for free ” in large quantities. Unlabeled data has shown promise in improving the performance of a number of tasks, e.g. word sense disambiguation, information extraction, and natural language parsing. In this thesis, we focus on two segmentation tasks, namedentity recognition and Chinese word segmentation. The goal of namedentity recognition is to detect and classify names of people, organizations, and locations in a sentence. The goal of Chinese word segmentation is to find the word boundaries in a sentence that has been written as a string of characters without spaces. Our approach is as follows: In a preprocessing step, we use raw text to cluster words and calculate mutual information statistics. The output of this step is then used as features in a supervised model, specifically a global linear model trained using
Representing hierarchical POMDPs as DBNs for multiscale robot localization
, 2004
"... We explore the advantages of representing hierarchical partially observable Markov decision processes (HPOMDPs) as dynamic Bayesian networks (DBNs). In particular, we focus on the special case of using HPOMDPs to represent multiresolution spatial maps for indoor robot navigation. Our results show ..."
Abstract

Cited by 34 (2 self)
 Add to MetaCart
We explore the advantages of representing hierarchical partially observable Markov decision processes (HPOMDPs) as dynamic Bayesian networks (DBNs). In particular, we focus on the special case of using HPOMDPs to represent multiresolution spatial maps for indoor robot navigation. Our results show that a DBN representation of HPOMDPs can train significantly faster than the original learning algorithm for HPOMDPs or the equivalent flat POMDP, and requires much less data. In addition, the DBN formulation can easily be extended to parameter tying and factoring of variables, which further reduces the time and sample complexity. This enables us to apply HPOMDP methods to much larger problems than previously possible. 1.
Sparse and shiftinvariant feature extraction from nonnegative data
, 2008
"... In this paper we describe a technique that allows the extraction of multiple local shiftinvariant features from analysis of nonnegative data of arbitrary dimensionality. Our approach employs a probabilistic latent variable model with sparsity constraints. We demonstrate its utility by performing f ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
(Show Context)
In this paper we describe a technique that allows the extraction of multiple local shiftinvariant features from analysis of nonnegative data of arbitrary dimensionality. Our approach employs a probabilistic latent variable model with sparsity constraints. We demonstrate its utility by performing feature extraction in a variety of domains ranging from audio to images and video. Index Terms — Feature extraction, Unsupervised learning 1.