Results 1 - 10
of
62
Gibbs Sampling Methods for Stick-Breaking Priors
"... ... In this paper we present two general types of Gibbs samplers that can be used to fit posteriors of Bayesian hierarchical models based on stick-breaking priors. The first type of Gibbs sampler, referred to as a Polya urn Gibbs sampler, is a generalized version of a widely used Gibbs sampling meth ..."
Abstract
-
Cited by 160 (16 self)
- Add to MetaCart
... In this paper we present two general types of Gibbs samplers that can be used to fit posteriors of Bayesian hierarchical models based on stick-breaking priors. The first type of Gibbs sampler, referred to as a Polya urn Gibbs sampler, is a generalized version of a widely used Gibbs sampling method currently employed for Dirichlet process computing. This method applies to stick-breaking priors with a known P'olya urn characterization; that is priors with an explicit and simple prediction rule. Our second method, the blocked Gibbs sampler, is based on a entirely different approach that works by directly sampling values from the posterior of the random measure. The blocked Gibbs sampler can be viewed as a more general approach as it works without requiring an explicit prediction rule. We find that the blocked Gibbs avoids some of the limitations seen with the Polya urn approach and should be simpler for non-experts to use.
Hierarchical topic models and the nested Chinese restaurant process
- Advances in Neural Information Processing Systems
, 2004
"... We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested ..."
Abstract
-
Cited by 127 (19 self)
- Add to MetaCart
We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts. 1
Infinite Latent Feature Models and the Indian Buffet Process
, 2005
"... We define a probability distribution over equivalence classes of binary matrices with a finite number of rows and an unbounded number of columns. This distribution ..."
Abstract
-
Cited by 122 (30 self)
- Add to MetaCart
We define a probability distribution over equivalence classes of binary matrices with a finite number of rows and an unbounded number of columns. This distribution
The Infinite Gaussian Mixture Model
- In Advances in Neural Information Processing Systems 12
, 2000
"... In a Bayesian mixture model it is not necessary a priori to limit the number of components to be finite. In this paper an infinite Gaussian mixture model is presented which neatly sidesteps the difficult problem of finding the "right" number of mixture components. Inference in the model is done usin ..."
Abstract
-
Cited by 122 (6 self)
- Add to MetaCart
In a Bayesian mixture model it is not necessary a priori to limit the number of components to be finite. In this paper an infinite Gaussian mixture model is presented which neatly sidesteps the difficult problem of finding the "right" number of mixture components. Inference in the model is done using an efficient parameter-free Markov Chain that relies entirely on Gibbs sampling.
Multi-task learning for classification with dirichlet process priors
- Journal of Machine Learning Research
, 2007
"... Multi-task learning (MTL) is considered for logistic-regression classifiers, based on a Dirichlet process (DP) formulation. A symmetric MTL (SMTL) formulation is considered in which classifiers for multiple tasks are learned jointly, with a variational Bayesian (VB) solution. We also consider an asy ..."
Abstract
-
Cited by 57 (6 self)
- Add to MetaCart
Multi-task learning (MTL) is considered for logistic-regression classifiers, based on a Dirichlet process (DP) formulation. A symmetric MTL (SMTL) formulation is considered in which classifiers for multiple tasks are learned jointly, with a variational Bayesian (VB) solution. We also consider an asymmetric MTL (AMTL) formulation in which the posterior density function from the SMTL model parameters, from previous tasks, is used as a prior for a new task; this approach has the significant advantage of not requiring storage and use of all previous data from prior tasks. The AMTL formulation is solved with a simple Markov Chain Monte Carlo (MCMC) construction. Comparisons are also made to simpler approaches, such as single-task learning, pooling of data across tasks, and simplified approximations to DP. A comprehensive analysis of algorithm performance is addressed through consideration of two data sets that are matched to the MTL problem.
Modelling heterogeneity with and without the Dirichlet process
, 2001
"... We investigate the relationships between Dirichlet process (DP) based models and allocation models for a variable number of components, based on exchangeable distributions. It is shown that the DP partition distribution is a limiting case of a Dirichlet± multinomial allocation model. Comparisons of ..."
Abstract
-
Cited by 49 (3 self)
- Add to MetaCart
We investigate the relationships between Dirichlet process (DP) based models and allocation models for a variable number of components, based on exchangeable distributions. It is shown that the DP partition distribution is a limiting case of a Dirichlet± multinomial allocation model. Comparisons of posterior performance of DP and allocation models are made in the Bayesian paradigm and illustrated in the context of univariate mixture models. It is shown in particular that the unbalancedness of the allocation distribution, present in the prior DP model, persists a posteriori. Exploiting the model connections, a new MCMC sampler for general DP based models is introduced, which uses split/merge moves in a reversible jump framework. Performance of this new sampler relative to that of some traditional samplers for DP processes is then explored.
Posterior consistency of Dirichlet mixtures in density estimation
- Ann. Statist
, 1999
"... A Dirichlet mixture of normal densities is a useful choice for a prior distribution on densities in the problem of Bayesian density estimation. In the recent years, efficient Markov chain Monte Carlo method for the computation of the posterior distribution has been developed. The method has been app ..."
Abstract
-
Cited by 47 (17 self)
- Add to MetaCart
A Dirichlet mixture of normal densities is a useful choice for a prior distribution on densities in the problem of Bayesian density estimation. In the recent years, efficient Markov chain Monte Carlo method for the computation of the posterior distribution has been developed. The method has been applied to data arising from different fields of interest. The important issue of consistency was however left open. In this paper, we settle this issue in affirmative. 1. Introduction. Recent
Bayesian density regression
- JOURNAL OF THE ROYAL STATISTICAL SOCIETY B
, 2007
"... This article considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors. The conditional response dis-tribution is expressed as a nonparametric mixture of parametric densities, with the mixture distri-bution changing acc ..."
Abstract
-
Cited by 27 (17 self)
- Add to MetaCart
This article considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors. The conditional response dis-tribution is expressed as a nonparametric mixture of parametric densities, with the mixture distri-bution changing according to location in the predictor space. A new class of priors for dependent random measures is proposed for the collection of random mixing measures at each location. The conditional prior for the random measure at a given location is expressed as a mixture of a Dirichlet process (DP) distributed innovation measure and neighboring random measures. This specifica-tion results in a coherent prior for the joint measure, with the marginal random measure at each location being a finite mixture of DP basis measures. Integrating out the infinite-dimensional col-lection of mixing measures, we obtain a simple expression for the conditional distribution of the subject-specific random variables, which generalizes the Pólya urn scheme. Properties are consid-ered and a simple Gibbs sampling algorithm is developed for posterior computation. The methods are illustrated using simulated data examples and epidemiologic studies.
Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities
- Ann. Statist
, 2001
"... We study the rates of convergence of the maximum likelihood estimator (MLE) and posterior distribution in density estimation problems, where the densities are location or location-scale mixtures of normal distributions with the scale parameter lying between two positive numbers. The true density is ..."
Abstract
-
Cited by 23 (9 self)
- Add to MetaCart
We study the rates of convergence of the maximum likelihood estimator (MLE) and posterior distribution in density estimation problems, where the densities are location or location-scale mixtures of normal distributions with the scale parameter lying between two positive numbers. The true density is also assumed to lie in this class with the true mixing distribution either compactly supported or having sub-Gaussian tails. We obtain bounds for Hellinger bracketing entropies for this class, and from these bounds, we deduce the convergence rates of (sieve) MLEs in Hellinger distance. The rate turns out to be �log n � κ / √ n, where κ ≥ 1 is a constant that depends on the type of mixtures and the choice of the sieve. Next, we consider a Dirichlet mixture of normals as a prior on the unknown density. We estimate the prior probability of a certain Kullback-Leibler type neighborhood and then invoke a general theorem that computes the posterior convergence rate in terms the growth rate of the Hellinger entropy and the concentration rate of the prior. The posterior distribution is also seen to converge at the rate �log n � κ / √ n in, where κ now depends on the tail behavior of the base measure of the Dirichlet process. 1. Introduction. A
Issues in Bayesian Analysis of Neural Network Models
, 1998
"... This paper discusses these issues exploring the potentiality of Bayesian ideas in the analysis of NN models. Buntine and Weigend (1991) and MacKay (1992) have provided frameworks for their Bayesian analysis based on Gaussian approximations and Neal (1993) has applied hybrid Monte Carlo ideas. Ripley ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
This paper discusses these issues exploring the potentiality of Bayesian ideas in the analysis of NN models. Buntine and Weigend (1991) and MacKay (1992) have provided frameworks for their Bayesian analysis based on Gaussian approximations and Neal (1993) has applied hybrid Monte Carlo ideas. Ripley (1993) and Cheng and Titterington (1994) have dwelt on the power of these ideas, specially as far as interpretation and architecture selection are concerned. See MacKay (1995) for a recent review. From a statistical modeling point of view NN's are a special instance of mixture models. Many issues about posterior multimodality and computational strategies in NN modeling are of relevance in the wider class of mixture models. Related recent references in the Bayesian literature on mixture models include Diebolt and Robert (1994), Escobar and West (1994), Robert and Mengersen (1995), Roeder and Wasserman (1995), West (1994), West and Cao (1993), West, Muller and Escobar (1994), and West and Turner (1994). We concentrate on approximation problems, though many of our suggestions can be translated to other areas. For those problems, NN's are viewed as highly nonlinear (semiparametric) approximators, where parameters are typically estimated by least squares. Applications of interest for practicioners include nonlinear regression, stochastic optimisation and regression metamodels for simulation output. The main issue we address here is how to undertake a Bayesian analysis of a NN model, and the uses of it we may make. Our contributions include: an evaluation of computational approaches to Bayesian analysis of NN models, including a novel Markov chain Monte Carlo scheme; a suggestion of a scheme for handling a variable architecture model and a scheme for combining NN models with more ...

