Results 1  10
of
149
Latent dirichlet allocation
 Journal of Machine Learning Research
, 2003
"... We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a threelevel hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, ..."
Abstract

Cited by 2399 (63 self)
 Add to MetaCart
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a threelevel hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model. 1.
Learning in graphical models
, 2004
"... Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve largescale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for ..."
Abstract

Cited by 614 (10 self)
 Add to MetaCart
Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve largescale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in largescale data analysis problems. We also present examples of graphical models in bioinformatics, errorcontrol coding and language processing. Key words and phrases: Probabilistic graphical models, junction tree algorithm, sumproduct algorithm, Markov chain Monte Carlo, variational inference, bioinformatics, errorcontrol coding.
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 565 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Modeling annotated data
 In Proc. of the 26th Intl. ACM SIGIR Conference
, 2003
"... We consider the problem of modeling annotated data—data with multiple types where the instance of one type (such as a caption) serves as a description of the other type (such as an image). We describe three hierarchical probabilistic mixture models that are aimed at such data, culminating in the Cor ..."
Abstract

Cited by 334 (11 self)
 Add to MetaCart
We consider the problem of modeling annotated data—data with multiple types where the instance of one type (such as a caption) serves as a description of the other type (such as an image). We describe three hierarchical probabilistic mixture models that are aimed at such data, culminating in the CorrLDA model, a latent variable model that is effective at modeling the joint distribution of both types and the conditional distribution of the annotation given the primary type. We take an empirical Bayes approach to finding parameter estimates and conduct experiments in heldout likelihood, automatic annotation, and textbased image retrieval using the Corel database of images and captions. 1
Competitive Coevolution through Evolutionary Complexification
 Journal of Artificial Intelligence Research
, 2002
"... Two major goals in machine learning are the discovery of complex multidimensional solutions and continual improvement of existing solutions. In this paper, we argue that complexification, i.e. the incremental elaboration of solutions through adding new structure, achieves both these goals. We demons ..."
Abstract

Cited by 149 (57 self)
 Add to MetaCart
Two major goals in machine learning are the discovery of complex multidimensional solutions and continual improvement of existing solutions. In this paper, we argue that complexification, i.e. the incremental elaboration of solutions through adding new structure, achieves both these goals. We demonstrate the power of complexification through the NeuroEvolution of Augmenting Topologies (NEAT) method, which evolves increasingly complex neural network architectures. NEAT is applied to an openended coevolutionary robot duel domain where robot controllers compete head to head. Because the robot duel domain supports a wide range of sophisticated strategies, and because coevolution benefits from an escalating arms race, it serves as a suitable testbed for observing the effect of evolving increasingly complex controllers. The result is an arms race of increasingly sophisticated strategies. When compared to the evolution of networks with fixed structure, complexifying networks discover significantly more sophisticated strategies. The results suggest that in order to realize the full potential of evolution, and search in general, solutions must be allowed to complexify as well as optimize.
Variational inference for Dirichlet process mixtures
 Bayesian Analysis
, 2005
"... Abstract. Dirichlet process (DP) mixture models are the cornerstone of nonparametric Bayesian statistics, and the development of MonteCarlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of nonparametric Bayesian methods to a variety of practical data analysis prob ..."
Abstract

Cited by 130 (16 self)
 Add to MetaCart
Abstract. Dirichlet process (DP) mixture models are the cornerstone of nonparametric Bayesian statistics, and the development of MonteCarlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of nonparametric Bayesian methods to a variety of practical data analysis problems. However, MCMC sampling can be prohibitively slow, and it is important to explore alternatives. One class of alternatives is provided by variational methods, a class of deterministic algorithms that convert inference problems into optimization problems (Opper and Saad 2001; Wainwright and Jordan 2003). Thus far, variational methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias 2000; Ghahramani and Beal 2001; Blei et al. 2003). In this paper, we present a variational inference algorithm for DP mixtures. We present experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gaussians and present an application to a largescale image analysis problem.
Propagation Algorithms for Variational Bayesian Learning
 In Advances in Neural Information Processing Systems 13
, 2001
"... Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugateexponential graphical models. We show how the belief propagation and the junction tree algorithms ..."
Abstract

Cited by 110 (14 self)
 Add to MetaCart
Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugateexponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linearGaussian statespace models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the statespace model in a variety of synthetic problems and one real highdimensional data set.
Variational message passing
 Journal of Machine Learning Research
, 2005
"... This paper presents Variational Message Passing (VMP), a general purpose algorithm for applying variational inference to a Bayesian Network. Like belief propagation, Variational Message Passing proceeds by passing messages between nodes in the graph and updating posterior beliefs using local operati ..."
Abstract

Cited by 83 (6 self)
 Add to MetaCart
This paper presents Variational Message Passing (VMP), a general purpose algorithm for applying variational inference to a Bayesian Network. Like belief propagation, Variational Message Passing proceeds by passing messages between nodes in the graph and updating posterior beliefs using local operations at each node. Each such update increases a lower bound on the log evidence (unless already at a local maximum). In contrast to belief propagation, VMP can be applied to a very general class of conjugateexponential models because it uses a factorised variational approximation. Furthermore, by introducing additional variational parameters, VMP can be applied to models containing nonconjugate distributions. The VMP framework also allows the lower bound to be evaluated, and this can be used both for model comparison and for detection of convergence. Variational Message Passing has been implemented in the form of a general purpose inference engine called VIBES (‘Variational Inference for BayEsian networkS’) which allows models to be specified graphically and then solved variationally without recourse to coding.
Implementing approximate Bayesian inference for latent Gaussian models using integrated nested Laplace approximations: A manual for the inlaprogram
, 2008
"... Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalised) linear models, (generalised) additive models, smoothingspline models, statespace models, semiparametric regression, spatial and spatiotemp ..."
Abstract

Cited by 79 (16 self)
 Add to MetaCart
Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalised) linear models, (generalised) additive models, smoothingspline models, statespace models, semiparametric regression, spatial and spatiotemporal models, logGaussian Coxprocesses, geostatistical and geoadditive models. In this paper we consider approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with nonGaussian response variables. The posterior marginals are not available in closed form due to the nonGaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, both in terms of convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations
Online Learning for Latent Dirichlet Allocation
"... We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collection ..."
Abstract

Cited by 61 (8 self)
 Add to MetaCart
We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time. 1