Results 1  10
of
61
Latent dirichlet allocation
 Journal of Machine Learning Research
, 2003
"... We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a threelevel hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, ..."
Abstract

Cited by 2399 (63 self)
 Add to MetaCart
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a threelevel hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model. 1.
A theory of cortical responses
, 2005
"... This article concerns the nature of evoked brain responses and the principles underlying their generation. We start with the premise that the sensory brain has evolved to represent or infer the causes of changes in its sensory inputs. The problem of inference is well formulated in statistical terms. ..."
Abstract

Cited by 101 (21 self)
 Add to MetaCart
This article concerns the nature of evoked brain responses and the principles underlying their generation. We start with the premise that the sensory brain has evolved to represent or infer the causes of changes in its sensory inputs. The problem of inference is well formulated in statistical terms. The statistical fundaments of inference may therefore afford important constraints on neuronal implementation. By formulating the original ideas of Helmholtz on perception, in terms of modernday statistical theories, one arrives at a model of perceptual inference and learning that can explain a remarkable range of neurobiological facts. It turns out that the problems of inferring the causes of sensory input (perceptual inference) and learning the relationship between input and cause (perceptual learning) can be resolved using exactly the same principle. Specifically, both inference and learning rest on minimizing the brain’s free energy, as defined in statistical physics. Furthermore, inference and learning can proceed in a biologically plausible fashion. Cortical responses can be seen as the brain’s attempt to minimize the free energy induced by a stimulus and thereby encode the most likely cause of that stimulus. Similarly, learning emerges from changes in synaptic efficacy that minimize the free energy, averaged over all stimuli encountered. The underlying scheme rests on empirical Bayes and hierarchical models
Classical and Bayesian inference in neuroimaging: Theory
 NeuroImage
, 2002
"... This paper reviews hierarchical observation models, used in functional neuroimaging, in a Bayesian light. It emphasizes the common ground shared by classical and Bayesian methods to show that conventional analyses of neuroimaging data can be usefully extended within an empirical Bayesian framework. ..."
Abstract

Cited by 99 (37 self)
 Add to MetaCart
This paper reviews hierarchical observation models, used in functional neuroimaging, in a Bayesian light. It emphasizes the common ground shared by classical and Bayesian methods to show that conventional analyses of neuroimaging data can be usefully extended within an empirical Bayesian framework. In particular we formulate the procedures used in conventional data analysis in terms of hierarchical linear models and establish a connection between classical inference and parametric empirical Bayes (PEB) through covariance component estimation. This estimation is based on an expectation maximization or EM algorithm. The key point is that hierarchical models not only provide for appropriate inference at the highest level but that one can revisit lower levels suitably
Honest Exploration of Intractable Probability Distributions Via Markov Chain Monte Carlo
 STATISTICAL SCIENCE
, 2001
"... Two important questions that must be answered whenever a Markov chain Monte Carlo (MCMC) algorithm is used are (Q1) What is an appropriate burnin? and (Q2) How long should the sampling continue after burnin? Developing rigorous answers to these questions presently requires a detailed study of the ..."
Abstract

Cited by 68 (18 self)
 Add to MetaCart
Two important questions that must be answered whenever a Markov chain Monte Carlo (MCMC) algorithm is used are (Q1) What is an appropriate burnin? and (Q2) How long should the sampling continue after burnin? Developing rigorous answers to these questions presently requires a detailed study of the convergence properties of the underlying Markov chain. Consequently, in most practical applications of MCMC, exact answers to (Q1) and (Q2) are not sought. The goal of this paper is to demystify the analysis that leads to honest answers to (Q1) and (Q2). The authors hope that this article will serve as a bridge between those developing Markov chain theory and practitioners using MCMC to solve practical problems. The ability to formally address (Q1) and (Q2) comes from establishing a drift condition and an associated minorization condition, which together imply that the underlying Markov chain is geometrically ergodic. In this paper, we explain exactly what drift and minorization are as well as how and why these conditions can be used to form rigorous answers to (Q1) and (Q2). The basic ideas are as follows. The results of Rosenthal (1995) and Roberts and Tweedie (1999) allow one to use drift and minorization conditions to construct a formula giving an analytic upper bound on the distance to stationarity. A rigorous answer to (Q1) can be calculated using this formula. The desired characteristics of the target distribution are typically estimated using ergodic averages. Geometric ergodicity of the underlying Markov chain implies that there are central limit theorems available for ergodic averages (Chan and Geyer 1994). The regenerative simulation technique (Mykland, Tierney and Yu 1995, Robert 1995) can be used to get a consistent estimate of the variance of the asymptotic nor...
Classical and Bayesian inference in neuroimaging: applications
 NeuroImage
"... introduced empirical Bayes as a potentially useful way to estimate and make inferences about effects in hierarchical models. In this paper we present a series of models that exemplify the diversity of problems that can be addressed within this framework. In hierarchical linear observation models, bo ..."
Abstract

Cited by 57 (12 self)
 Add to MetaCart
introduced empirical Bayes as a potentially useful way to estimate and make inferences about effects in hierarchical models. In this paper we present a series of models that exemplify the diversity of problems that can be addressed within this framework. In hierarchical linear observation models, both classical and empirical Bayesian approaches can be framed in terms of covariance component estimation (e.g., variance partitioning). To illustrate the use of the expectation– maximization (EM) algorithm in covariance component estimation we focus first on two important problems in fMRI: nonsphericity induced by (i) serial or temporal correlations among errors and (ii) variance components caused by the hierarchical nature of multisubject studies. In hierarchical observation models,
Spatiallyadaptive penalties for spline fitting
 Australian and New Zealand Journal of Statistics
, 2000
"... We study spline fitting with a roughness penalty that adapts to spatial heterogeneity in the regression function. Our estimates are pth degree piecewise polynomials with p − 1 continuous derivatives. A large and fixed number of knots is used and smoothing is achieved by putting a quadratic penalty ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
We study spline fitting with a roughness penalty that adapts to spatial heterogeneity in the regression function. Our estimates are pth degree piecewise polynomials with p − 1 continuous derivatives. A large and fixed number of knots is used and smoothing is achieved by putting a quadratic penalty on the jumps of the pth derivative at the knots. To be spatially adaptive, the logarithm of the penalty is itself a linear spline but with relatively few knots and with values at the knots chosen to minimize GCV. This locallyadaptive spline estimator is compared with other spline estimators in the literature such as cubic smoothing splines and knotselection techniques for leastsquares regression. Our estimator can be interpreted as an empirical Bayes estimate for a prior allowing spatial heterogeneity. In cases of spatially heterogeneous regression functions,
Learning and inference in the brain
, 2003
"... This article is about how the brain data mines its sensory inputs. There are several architectural principles of functional brain anatomy that have emerged from careful anatomic and physiologic studies over the past century. These principles are considered in the light of representational learning t ..."
Abstract

Cited by 31 (7 self)
 Add to MetaCart
This article is about how the brain data mines its sensory inputs. There are several architectural principles of functional brain anatomy that have emerged from careful anatomic and physiologic studies over the past century. These principles are considered in the light of representational learning to see if they could have been predicted a priori on the basis of purely theoretical considerations. We first review the organisation of hierarchical sensory cortices, paying special attention to the distinction between forward and backward connections. We then review various approaches to representational learning as special cases of generative models, starting with supervised learning and ending with learning based upon empirical Bayes. The latter predicts many features, such as a hierarchical cortical system, prevalent topdown backward influences and functional asymmetries between forward and backward connections that are seen in the real brain. The key points made in this article are: (i) hierarchical generative models enable the learning of empirical priors and eschew prior assumptions about the causes of sensory input that are inherent in nonhierarchical models. These assumptions are necessary for learning schemes based on information theory and efficient or sparse coding, but are not necessary in a hierarchical context. Critically, the anatomical infrastructure that may implement generative models in the brain is hierarchical. Furthermore, learning based on empirical Bayes can proceed in a biologically plausible way. (ii) The second point is that backward connections are essential if the processes generating inputs cannot be inverted, or the inversion cannot be parameterised. Because these processes involve manytoone mappings, are nonlinear and dynamic in nature, they are generally noninvertible. This enforces an explicit parameterisation of generative models (i.e. backward
Geometric Ergodicity of Gibbs and Block Gibbs Samplers for a Hierarchical Random Effects Model
, 1998
"... We consider fixed scan Gibbs and block Gibbs samplers for a Bayesian hierarchical random effects model with proper conjugate priors. A drift condition given in Meyn and Tweedie (1993, Chapter 15) is used to show that these Markov chains are geometrically ergodic. Showing that a Gibbs sampler is geom ..."
Abstract

Cited by 28 (8 self)
 Add to MetaCart
We consider fixed scan Gibbs and block Gibbs samplers for a Bayesian hierarchical random effects model with proper conjugate priors. A drift condition given in Meyn and Tweedie (1993, Chapter 15) is used to show that these Markov chains are geometrically ergodic. Showing that a Gibbs sampler is geometrically ergodic is the first step towards establishing central limit theorems, which can be used to approximate the error associated with Monte Carlo estimates of posterior quantities of interest. Thus, our results will be of practical interest to researchers using these Gibbs samplers for Bayesian data analysis. Key words and phrases: Bayesian model, Central limit theorem, Drift condition, Markov chain, Monte Carlo, Rate of convergence, Variance Components AMS 1991 subject classifications: Primary 60J27, secondary 62F15 1 Introduction Gelfand and Smith (1990, Section 3.4) introduced the Gibbs sampler for the hierarchical oneway random effects model with proper conjugate priors. Rosen...