• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Structure Learning in conditional probability models via an entropic prior and parameter extinction (0)

by M Brand
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 41
Next 10 →

Dynamic Bayesian Networks: Representation, Inference and Learning

by Kevin Patrick Murphy , 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and bio-sequence analysis, and KFMs have bee ..."
Abstract - Cited by 393 (4 self) - Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and bio-sequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linear-Gaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data. In particular, the main novel technical contributions of this thesis are as follows: a way of representing Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of applying Rao-Blackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.

Variational Extensions to EM and Multinomial PCA

by Wray Buntine - In ECML 2002 , 2002
"... Several authors in recent years have proposed discrete analogues to principle component analysis intended to handle discrete or positive only data, for instance suited to analyzing sets of documents. Methods include non-negative matrix factorization, probabilistic latent semantic analysis, and laten ..."
Abstract - Cited by 64 (12 self) - Add to MetaCart
Several authors in recent years have proposed discrete analogues to principle component analysis intended to handle discrete or positive only data, for instance suited to analyzing sets of documents. Methods include non-negative matrix factorization, probabilistic latent semantic analysis, and latent Dirichlet allocation. This paper begins with a review of the basic theory of the variational extension to the expectation maximization algorithm, and then presents discrete component finding algorithms in that light. Experiments are conducted on both bigram word data and document bag-of-word to expose some of the subtleties of this new class of algorithms.

Segmentation of musical signals using hidden markov models

by Jean-julien Aucouturier, Mark S - In Proc. 110th Convention of the Audio Engineering Society , 2001
"... This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request ..."
Abstract - Cited by 41 (8 self) - Add to MetaCart
This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request

Distribution of Mutual Information

by Marcus Hutter - Advances in Neural Information Processing Systems 14 , 2001
"... The mutual information of two random variables i and j with joint probabilities t ij is commonly used in learning Bayesian nets as well as in many other fields. The chances t ij are usually estimated by the empirical sampling frequency n ij /n leading to a point estimate I(n ij /n) for the mutual in ..."
Abstract - Cited by 34 (12 self) - Add to MetaCart
The mutual information of two random variables i and j with joint probabilities t ij is commonly used in learning Bayesian nets as well as in many other fields. The chances t ij are usually estimated by the empirical sampling frequency n ij /n leading to a point estimate I(n ij /n) for the mutual information. To answer questions like "is I(n ij /n) consistent with zero?" or "what is the probability that the true mutual information is much larger than the point estimate?" one has to go beyond the point estimate. In the Bayesian framework one can answer these questions by utilizing a (second order) prior distribution p(t) comprising prior information about t. From the prior p(t) one can compute the posterior p(t|n), from which the distribution p(I|n) of the mutual information can be calculated. We derive reliable and quickly computable approximations for p(I|n). We concentrate on the mean, variance, skewness, and kurtosis, and non-informative priors. For the mean we also give an exact expression. Numerical issues and the range of validity are discussed.

Self-supervised Chinese Word Segmentation

by Fuchun Peng, Dale Schuurmans - In F. Homan et al. (Eds.): Advances in Intelligent Data Analysis, Proceedings of the Fourth International Conference (IDA-01), LNCS 2189 , 2001
"... We propose a new unsupervised training method for acquiring... ..."
Abstract - Cited by 22 (7 self) - Add to MetaCart
We propose a new unsupervised training method for acquiring...

Representing hierarchical POMDPs as DBNs for multi-scale robot localization

by Georgios Theocharous, Kevin Murphy, Leslie Pack Kaelbling , 2004
"... We explore the advantages of representing hierarchical partially observable Markov decision processes (H-POMDPs) as dynamic Bayesian networks (DBNs). In particular, we focus on the special case of using H-POMDPs to represent multiresolution spatial maps for indoor robot navigation. Our results show ..."
Abstract - Cited by 18 (2 self) - Add to MetaCart
We explore the advantages of representing hierarchical partially observable Markov decision processes (H-POMDPs) as dynamic Bayesian networks (DBNs). In particular, we focus on the special case of using H-POMDPs to represent multiresolution spatial maps for indoor robot navigation. Our results show that a DBN representation of H-POMDPs can train significantly faster than the original learning algorithm for H-POMDPs or the equivalent flat POMDP, and requires much less data. In addition, the DBN formulation can easily be extended to parameter tying and factoring of variables, which further reduces the time and sample complexity. This enables us to apply H-POMDP methods to much larger problems than previously possible. 1.

Unsupervised Mining of Statistical Temporal Structures

by Lexing Xie, Shih-Fu Chang, Ajay Divakaran, Huifang Sun - VIDEO MINING, AZREIL ROSENFELD, DAVID DOERMANN, DANIEL DEMENTHON EDS , 2003
"... In this paper, we present algorithms for unsupervised mining of structures in video using multiscale statistical models. Video structure are repetitive segments in a video stream with consistent statistical characteristics. Such structures can often be interpreted in relation to distinctive semant ..."
Abstract - Cited by 17 (8 self) - Add to MetaCart
In this paper, we present algorithms for unsupervised mining of structures in video using multiscale statistical models. Video structure are repetitive segments in a video stream with consistent statistical characteristics. Such structures can often be interpreted in relation to distinctive semantics, particularly in structured domains like sports. While much work in the literature explores the link between the observations and the semantics using supervised learning, we propose unsupervised structure mining algorithms that aim at alleviating the burden of labelling and training, as well as providing a scalable solution for generalizing video indexing techniques to heterogeneous content collections such as surveillance and consumer videos. Existing unsupervised video structuring works primarily use clustering techniques, while the rich statistical characteristics in the temporal dimension at different granularity remain unexplored. Automatically identifying structures from an unknown domain poses significant challenges when domain knowledge is not explicitly present to assist algorithm design, model selection, and feature selection. In this work, we model multi-level statistical structures with hierarchical hidden Markov models based on a multi-level Markov dependency assumption. The parameters of the model are efficiently estimated using the EM algorithm, we have also developed a model structure learning algorithm that uses stochastic sampling techniques to find the optimal model structure, and a feature selection algorithm that automatically finds compact relevant feature sets using hybrid wrapper-filter methods. When tested on sports videos, the unsupervised learning scheme achieves very promising results: (1) The automatically selectead feature set...

Unsupervised selection and estimation of finite mixture models

by Mário A. T. Figueiredo, Anil K. Jain - in Proc. Int. Conf. Pattern Recognition , 2000
"... We describe a new method for fitting mixture models to multivariate data which performs component selection and does not require external initialization. The novelty of our approach includes: an MML-like (minimum message length) model selection criterion; inclusion of the criterion into the expectat ..."
Abstract - Cited by 10 (3 self) - Add to MetaCart
We describe a new method for fitting mixture models to multivariate data which performs component selection and does not require external initialization. The novelty of our approach includes: an MML-like (minimum message length) model selection criterion; inclusion of the criterion into the expectation-maximization (EM) algorithm (increasing its ability to escape from local maxima); an initialization strategy supported on the interpretation of EM as a selfannealing algorithm. 1.

Semi-supervised learning for natural language

by Percy Liang - MASTER’S THESIS, MIT , 2005
"... Statistical supervised learning techniques have been successful for many natural language processing tasks, but they require labeled datasets, which can be expensive to obtain. On the other hand, unlabeled data (raw text) is often available “for free ” in large quantities. Unlabeled data has shown p ..."
Abstract - Cited by 10 (0 self) - Add to MetaCart
Statistical supervised learning techniques have been successful for many natural language processing tasks, but they require labeled datasets, which can be expensive to obtain. On the other hand, unlabeled data (raw text) is often available “for free ” in large quantities. Unlabeled data has shown promise in improving the performance of a number of tasks, e.g. word sense disambiguation, information extraction, and natural language parsing. In this thesis, we focus on two segmentation tasks, named-entity recognition and Chinese word segmentation. The goal of named-entity recognition is to detect and classify names of people, organizations, and locations in a sentence. The goal of Chinese word segmentation is to find the word boundaries in a sentence that has been written as a string of characters without spaces. Our approach is as follows: In a preprocessing step, we use raw text to cluster words and calculate mutual information statistics. The output of this step is then used as features in a supervised model, specifically a global linear model trained using

Sparse and shift-invariant feature extraction from non-negative data

by Paris Smaragdis, Bhiksha Raj, Madhusudana Shashanka , 2008
"... In this paper we describe a technique that allows the extraction of multiple local shift-invariant features from analysis of non-negative data of arbitrary dimensionality. Our approach employs a probabilistic latent variable model with sparsity constraints. We demonstrate its utility by performing f ..."
Abstract - Cited by 9 (2 self) - Add to MetaCart
In this paper we describe a technique that allows the extraction of multiple local shift-invariant features from analysis of non-negative data of arbitrary dimensionality. Our approach employs a probabilistic latent variable model with sparsity constraints. We demonstrate its utility by performing feature extraction in a variety of domains ranging from audio to images and video. Index Terms — Feature extraction, Unsupervised learning 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University