Results 1  10
of
80
The Infinite Hidden Markov Model
 Machine Learning
, 2002
"... We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. Th ..."
Abstract

Cited by 489 (32 self)
 Add to MetaCart
We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying statetransition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infiniteconsider, for example, symbols being possible words appearing in English text.
Variational Inference for Bayesian Mixtures of Factor Analysers
 In Advances in Neural Information Processing Systems 12
, 2000
"... We present an algorithm that infers the model structure of a mixture of factor analysers using an ecient and deterministic variational approximation to full Bayesian integration over model parameters. This procedure can automatically determine the optimal number of components and the local dimension ..."
Abstract

Cited by 149 (16 self)
 Add to MetaCart
We present an algorithm that infers the model structure of a mixture of factor analysers using an ecient and deterministic variational approximation to full Bayesian integration over model parameters. This procedure can automatically determine the optimal number of components and the local dimensionality of each component (i.e. the number of factors in each factor analyser). Alternatively it can be used to infer posterior distributions over number of components and dimensionalities. Since all parameters are integrated out the method is not prone to over tting. Using a stochastic procedure for adding components it is possible to perform the variational optimisation incrementally and to avoid local maxima. Results show that the method works very well in practice and correctly infers the number and dimensionality of nontrivial synthetic examples. By importance sampling from the variational approximation we show how to obtain unbiased estimates of the true evidence, the exa...
Propagation Algorithms for Variational Bayesian Learning
 In Advances in Neural Information Processing Systems 13
, 2001
"... Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugateexponential graphical models. We show how the belief propagation and the junction tree algorithms ..."
Abstract

Cited by 110 (14 self)
 Add to MetaCart
Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugateexponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linearGaussian statespace models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the statespace model in a variety of synthetic problems and one real highdimensional data set.
Implementing approximate Bayesian inference for latent Gaussian models using integrated nested Laplace approximations: A manual for the inlaprogram
, 2008
"... Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalised) linear models, (generalised) additive models, smoothingspline models, statespace models, semiparametric regression, spatial and spatiotemp ..."
Abstract

Cited by 79 (16 self)
 Add to MetaCart
Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalised) linear models, (generalised) additive models, smoothingspline models, statespace models, semiparametric regression, spatial and spatiotemporal models, logGaussian Coxprocesses, geostatistical and geoadditive models. In this paper we consider approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with nonGaussian response variables. The posterior marginals are not available in closed form due to the nonGaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, both in terms of convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations
Graphical Models and Variational Methods
, 2001
"... We review the use of variational methods of approximating inference and learning in probabilistic graphical models. In particular, we focus on variational approximations to the integrals required for Bayesian learning. For models in the conjugateexponential family, a generalisation of the EM algori ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
We review the use of variational methods of approximating inference and learning in probabilistic graphical models. In particular, we focus on variational approximations to the integrals required for Bayesian learning. For models in the conjugateexponential family, a generalisation of the EM algorithm is derived that iterates between optimising hyperparameters of the distribution over parameters, and inferring the hidden variable distributions. These approximations make use of available propagation algorithms for probabilistic graphical models. We give two case studies of how the variational Bayesian approach can be used to learn model structure: inferring the number of clusters and dimensionalities in a mixture of factor analysers, and inferring the dimension of the state space of a linear dynamical system. Finally, importance sampling corrections to the variational approximations are discussed, along with their limitations.
A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers
"... There is growing interest in applying Bayesian techniques to NLP problems. There are a number of different estimators for Bayesian models, and it is useful to know what kinds of tasks each does well on. This paper compares a variety of different Bayesian estimators for Hidden Markov Model POS tagger ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
There is growing interest in applying Bayesian techniques to NLP problems. There are a number of different estimators for Bayesian models, and it is useful to know what kinds of tasks each does well on. This paper compares a variety of different Bayesian estimators for Hidden Markov Model POS taggers with various numbers of hidden states on data sets of different sizes. Recent papers have given contradictory results when comparing Bayesian estimators to Expectation Maximization (EM) for unsupervised HMM POS tagging, and we show that the difference in reported results is largely due to differences in the size of the training data and the number of states in the HMM. We invesigate a variety of samplers for HMMs, including some that these earlier papers did not study. We find that all of Gibbs samplers do well with small data sets and few states, and that Variational Bayes does well on large data sets and is competitive with the Gibbs samplers. In terms of times of convergence, we find that Variational Bayes was the fastest of all the estimators, especially on large data sets, and that explicit Gibbs sampler (both pointwise and sentenceblocked) were generally faster than their collapsed counterparts on large data sets. 1
Variational Bayesian grammar induction for natural language
 In International Colloquium on Grammatical Inference
, 2006
"... Abstract. This paper presents a new grammar induction algorithm for probabilistic contextfree grammars (PCFGs). There is an approach to PCFG induction that is based on parameter estimation. Following this approach, we apply the variational Bayes to PCFGs. The variational Bayes (VB) is an approximat ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
Abstract. This paper presents a new grammar induction algorithm for probabilistic contextfree grammars (PCFGs). There is an approach to PCFG induction that is based on parameter estimation. Following this approach, we apply the variational Bayes to PCFGs. The variational Bayes (VB) is an approximation of Bayesian learning. It has been empirically shown that VB is less likely to cause overfitting. Moreover, the free energy of VB has been successfully used in model selection. Our algorithm can be seen as a generalization of PCFG induction algorithms proposed before. In the experiments, we empirically show that induced grammars achieve better parsing results than those of other PCFG induction algorithms. Based on the better parsing results, we give examples of recursive grammatical structures found by the proposed algorithm. 1
Beam Sampling for the Infinite Hidden Markov Model
"... The infinite hidden Markov model is a nonparametric extension of the widely used hidden Markov model. Our paper introduces a new inference algorithm for the infinite Hidden Markov model called beam sampling. Beam sampling combines slice sampling, which limits the number of states considered at each ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
The infinite hidden Markov model is a nonparametric extension of the widely used hidden Markov model. Our paper introduces a new inference algorithm for the infinite Hidden Markov model called beam sampling. Beam sampling combines slice sampling, which limits the number of states considered at each time step to a finite number, with dynamic programming, which samples whole state trajectories efficiently. Our algorithm typically outperforms the Gibbs sampler and is more robust. We present applications of iHMM inference using the beam sampler on changepoint detection and text prediction problems. 1.
Why doesnâ€™t EM find good HMM POStaggers
 In EMNLP
, 2007
"... This paper investigates why the HMMs estimated by ExpectationMaximization (EM) produce such poor results as PartofSpeech (POS) taggers. We find that the HMMs estimated by EM generally assign a roughly equal number of word tokens to each hidden state, while the empirical distribution of tokens to ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
This paper investigates why the HMMs estimated by ExpectationMaximization (EM) produce such poor results as PartofSpeech (POS) taggers. We find that the HMMs estimated by EM generally assign a roughly equal number of word tokens to each hidden state, while the empirical distribution of tokens to POS tags is highly skewed. This motivates a Bayesian approach using a sparse prior to bias the estimator toward such a skewed distribution. We investigate Gibbs Sampling (GS) and Variational Bayes (VB) estimators and show that VB converges faster than GS for this task and that VB significantly improves 1to1 tagging accuracy over EM. We also show that EM does nearly as well as VB when the number of hidden HMM states is dramatically reduced. We also point out the high variance in all of these estimators, and that they require many more iterations to approach convergence than usually thought. 1
2004. An application of the variational Bayesian approach to probabilistic contextfree grammars
 In International Joint Conference on Natural Language Processing Workshop Beyond Shallow Analyses
, 2004
"... We present an efficient learning algorithm for probabilistic contextfree grammars based on the variational Bayesian approach. Although the maximum likelihood method has traditionally been used for learning probabilistic language models, Bayesian learning is, in principle, less likely to cause overf ..."
Abstract

Cited by 23 (12 self)
 Add to MetaCart
We present an efficient learning algorithm for probabilistic contextfree grammars based on the variational Bayesian approach. Although the maximum likelihood method has traditionally been used for learning probabilistic language models, Bayesian learning is, in principle, less likely to cause overfitting problems than the maximum likelihood method. We show that the computational complexity of our algorithm is equal to that of the InsideOutside algorithm. We also report results of experiments to compare precisions of the InsideOutside algorithm and our algorithm. 1