Results 1  10
of
46
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 565 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Unsupervised learning of finite mixture models
 IEEE Transactions on pattern analysis and machine intelligence
, 2002
"... AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization ..."
Abstract

Cited by 271 (20 self)
 Add to MetaCart
AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach. Index TermsÐFinite mixtures, unsupervised learning, model selection, minimum message length criterion, Bayesian methods, expectationmaximization algorithm, clustering. æ 1
Semisupervised Learning by Entropy Minimization
"... We consider the semisupervised learning problem, where a decision rule is to be learned from labeled and unlabeled data. In this framework, we motivate minimum entropy regularization, which enables to incorporate unlabeled data in the standard supervised learning. This regularizer can be applied to ..."
Abstract

Cited by 81 (2 self)
 Add to MetaCart
We consider the semisupervised learning problem, where a decision rule is to be learned from labeled and unlabeled data. In this framework, we motivate minimum entropy regularization, which enables to incorporate unlabeled data in the standard supervised learning. This regularizer can be applied to any model of posterior probabilities. Our approach provides a new motivation for some existing semisupervised learning algorithms which are particular or limiting instances of minimum entropy regularization. A series of experiments illustrates that the proposed solution benefits from unlabeled data. The method challenges mixture models when the data are sampled from the distribution class spanned by the generative model. The performances are definitely in favor of minimum entropy regularization when generative models are misspecified, and the weighting of unlabeled data provides robustness to the violation of the “cluster assumption”. Finally, we also illustrate that the method can be far superior to manifold learning in high dimension spaces, and also when the manifolds are generated by moving examples along the discriminating directions.
Variational Extensions to EM and Multinomial PCA
 In ECML 2002
, 2002
"... Several authors in recent years have proposed discrete analogues to principle component analysis intended to handle discrete or positive only data, for instance suited to analyzing sets of documents. Methods include nonnegative matrix factorization, probabilistic latent semantic analysis, and laten ..."
Abstract

Cited by 80 (13 self)
 Add to MetaCart
Several authors in recent years have proposed discrete analogues to principle component analysis intended to handle discrete or positive only data, for instance suited to analyzing sets of documents. Methods include nonnegative matrix factorization, probabilistic latent semantic analysis, and latent Dirichlet allocation. This paper begins with a review of the basic theory of the variational extension to the expectation maximization algorithm, and then presents discrete component finding algorithms in that light. Experiments are conducted on both bigram word data and document bagofword to expose some of the subtleties of this new class of algorithms.
Segmentation of musical signals using hidden markov models
 In Proc. 110th Convention of the Audio Engineering Society
, 2001
"... This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request ..."
Abstract

Cited by 53 (8 self)
 Add to MetaCart
This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request
Distribution of mutual information
 Advances in Neural Information Processing Systems 14: Proceedings of the 2002 Conference
, 2002
"... expectation and variance of mutual information. The mutual information of two random variables ı and j with joint probabilities {πij} is commonly used in learning Bayesian nets as well as in many other fields. The chances πij are usually estimated by the empirical sampling frequency nij/n leading to ..."
Abstract

Cited by 38 (12 self)
 Add to MetaCart
expectation and variance of mutual information. The mutual information of two random variables ı and j with joint probabilities {πij} is commonly used in learning Bayesian nets as well as in many other fields. The chances πij are usually estimated by the empirical sampling frequency nij/n leading to a point estimate I(nij/n) for the mutual information. To answer questions like “is I(nij/n) consistent with zero? ” or “what is the probability that the true mutual information is much larger than the point estimate? ” one has to go beyond the point estimate. In the Bayesian framework one can answer these questions by utilizing a (second order) prior distribution p(π) comprising prior information about π. From the prior p(π) one can compute the posterior p(πn), from which the distribution p(In) of the mutual information can be calculated. We derive reliable and quickly computable approximations for p(In). We concentrate on the mean, variance, skewness, and kurtosis, and noninformative priors. For the mean we also
Selfsupervised Chinese Word Segmentation
 In F. Homan et al. (Eds.): Advances in Intelligent Data Analysis, Proceedings of the Fourth International Conference (IDA01), LNCS 2189
, 2001
"... We propose a new unsupervised training method for acquiring... ..."
Abstract

Cited by 29 (7 self)
 Add to MetaCart
We propose a new unsupervised training method for acquiring...
Representing hierarchical POMDPs as DBNs for multiscale robot localization
, 2004
"... We explore the advantages of representing hierarchical partially observable Markov decision processes (HPOMDPs) as dynamic Bayesian networks (DBNs). In particular, we focus on the special case of using HPOMDPs to represent multiresolution spatial maps for indoor robot navigation. Our results show ..."
Abstract

Cited by 28 (2 self)
 Add to MetaCart
We explore the advantages of representing hierarchical partially observable Markov decision processes (HPOMDPs) as dynamic Bayesian networks (DBNs). In particular, we focus on the special case of using HPOMDPs to represent multiresolution spatial maps for indoor robot navigation. Our results show that a DBN representation of HPOMDPs can train significantly faster than the original learning algorithm for HPOMDPs or the equivalent flat POMDP, and requires much less data. In addition, the DBN formulation can easily be extended to parameter tying and factoring of variables, which further reduces the time and sample complexity. This enables us to apply HPOMDP methods to much larger problems than previously possible. 1.
UNSUPERVISED MINING OF STATISTICAL TEMPORAL STRUCTURES IN VIDEO
"... In this paper, we present algorithms for unsupervised mining of structures in video using multiscale statistical models. Video structure are repetitive segments in a video stream with consistent statistical characteristics. Such structures can often be interpreted in relation to distinctive semanti ..."
Abstract

Cited by 25 (12 self)
 Add to MetaCart
In this paper, we present algorithms for unsupervised mining of structures in video using multiscale statistical models. Video structure are repetitive segments in a video stream with consistent statistical characteristics. Such structures can often be interpreted in relation to distinctive semantics, particularly in structured domains like sports. While much work in the literature explores the link between the observations and the semantics using supervised learning, we propose unsupervised structure mining algorithms that aim at alleviating the burden of labelling and training, as well as providing a scalable solution for generalizing video indexing techniques to heterogeneous content collections such as surveillance and consumer videos. Existing unsupervised video structuring works primarily use clustering techniques, while the rich statistical characteristics in the temporal dimension at di#erent granularity remain unexplored. Automatically identifying structures from an unknown domain poses significant challenges when domain knowledge is not explicitly present to assist algorithm design, model selection, and feature selection. In this work, we model multilevel statistical structures with hierarchical hidden Markov models based on a multilevel Markov dependency assumption. The parameters of the model are efficiently estimated using the EM algorithm, we have also developed a model structure learning algorithm that uses stochastic sampling techniques to find the optimal model structure, and a feature selection algorithm that automatically finds compact relevant feature sets using hybrid wrapperfilter methods. When tested on sports videos, the unsupervised learning scheme achieves very promising results: (1) The au1 tomatically selected feature set for soccer and b...
Supervised and SemiSupervised Separation of Sounds from SingleChannel Mixtures
"... Abstract. In this paper we describe a methodology for modelbased single channel separation of sounds. We present a sparse latent variable model that can learn sounds based on their distribution of time/frequency energy. This model can then be used to extract known types of sounds from mixtures in t ..."
Abstract

Cited by 22 (7 self)
 Add to MetaCart
Abstract. In this paper we describe a methodology for modelbased single channel separation of sounds. We present a sparse latent variable model that can learn sounds based on their distribution of time/frequency energy. This model can then be used to extract known types of sounds from mixtures in two scenarios. One being the case where all sound types in the mixture are known, and the other being being the case where only the target or the interference models are known. The model we propose has close ties to nonnegative decompositions and latent variable models commonly used for semantic analysis. 1