Results 1  10
of
27
Estimating the integrated likelihood via posterior simulation using the harmonic mean identity
 Bayesian Statistics
, 2007
"... The integrated likelihood (also called the marginal likelihood or the normalizing constant) is a central quantity in Bayesian model selection and model averaging. It is defined as the integral over the parameter space of the likelihood times the prior density. The Bayes factor for model comparison a ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
The integrated likelihood (also called the marginal likelihood or the normalizing constant) is a central quantity in Bayesian model selection and model averaging. It is defined as the integral over the parameter space of the likelihood times the prior density. The Bayes factor for model comparison and Bayesian testing is a ratio of integrated likelihoods, and the model weights in Bayesian model averaging are proportional to the integrated likelihoods. We consider the estimation of the integrated likelihood from posterior simulation output, aiming at a generic method that uses only the likelihoods from the posterior simulation iterations. The key is the harmonic mean identity, which says that the reciprocal of the integrated likelihood is equal to the posterior harmonic mean of the likelihood. The simplest estimator based on the identity is thus the harmonic mean of the likelihoods. While this is an unbiased and simulationconsistent estimator, its reciprocal can have infinite variance and so it is unstable in general. We describe two methods for stabilizing the harmonic mean estimator. In the first one, the parameter space is reduced in such a way that the modified estimator involves a harmonic mean of heaviertailed densities, thus resulting in a finite variance estimator. The resulting
Functional clustering by bayesian wavelet methods
 Journal of the Royal Statistical Society B
, 2006
"... Summary. We propose a nonparametric Bayes wavelet model for clustering of functional data. The waveletbased methodology is aimed at the resolution of generic global and local features during clustering and is suitable for clustering high dimensional data. Based on the Dirichlet process, the nonpara ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
Summary. We propose a nonparametric Bayes wavelet model for clustering of functional data. The waveletbased methodology is aimed at the resolution of generic global and local features during clustering and is suitable for clustering high dimensional data. Based on the Dirichlet process, the nonparametric Bayes model extends the scope of traditional Bayes wavelet methods to functional clustering and allows the elicitation of prior belief about the regularity of the functions and the number of clusters by suitably mixing the Dirichlet processes. Posterior inference is carried out by Gibbs sampling with conjugate priors, which makes the computation straightforward. We use simulated as well as real data sets to illustrate the suitability of the approach over other alternatives.
NONPARAMETRIC FUNCTIONAL DATA ANALYSIS THROUGH BAYESIAN DENSITY ESTIMATION
, 2007
"... In many modern experimental settings, observations are obtained in the form of functions, and interest focuses on inferences on a collection of such functions. Some examples are conductivitytemperaturedepth (CTD) data in oceanography, doseresponse models in epidemiology and timecourse microarray ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
In many modern experimental settings, observations are obtained in the form of functions, and interest focuses on inferences on a collection of such functions. Some examples are conductivitytemperaturedepth (CTD) data in oceanography, doseresponse models in epidemiology and timecourse microarray experiments in biology and medicine. In this paper we propose a hierarchical model that allows us to simultaneously estimate multiple curves nonparametrically by using dependent Dirichlet Process mixtures of Gaussians to characterize the joint distribution of predictors and outcomes. Function estimates are then induced through the conditional distribution of the outcome given the predictors. The resulting approach allows for flexible estimation and clustering, while borrowing information across curves. We also show that the function estimates we obtain are consistent on the space of integrable functions. As an illustration, we consider an application to the analysis of CTD data in the north Atlantic.
Bayesian modelbased clustering procedures
 Journal of Computational and Graphical Statistics
"... This article establishes a general formulation for Bayesian modelbased clustering, in which subset labels are exchangeable, and items are also exchangeable, possibly up to covariate effects. The notational framework is rich enough to encompass a variety of existing procedures, including some recent ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
This article establishes a general formulation for Bayesian modelbased clustering, in which subset labels are exchangeable, and items are also exchangeable, possibly up to covariate effects. The notational framework is rich enough to encompass a variety of existing procedures, including some recently discussed methods involving stochastic search or hierarchical clustering, but more importantly allows the formulation of clustering procedures that are optimal with respect to a specified loss function. Our focus is on loss functions based on pairwise coincidences, that is, whether pairs of items are clustered into the same subset or not. Optimization of the posterior expected loss function can be formulated as a binary integer programming problem, which can be readily solved by standard software when clustering a modest number of items, but quickly becomes impractical as problem scale increases. To combat this, a new heuristic itemswapping algorithm is introduced. This performs well in our numerical experiments, on both simulated and real data examples. The article includes a comparison of the statistical performance of the (approximate) optimal clustering with earlier methods that are modelbased but ad hoc in their detailed definition.
Nonparametric bayes conditional distribution modeling with variable selection
 Journal of the American Statistical Association
, 2009
"... This article considers methodology for flexibly characterizing the relationship between a response and multiple predictors. Goals are (1) to estimate the conditional response distribution addressing the distributional changes across the predictor space, and (2) to identify important predictors for t ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
This article considers methodology for flexibly characterizing the relationship between a response and multiple predictors. Goals are (1) to estimate the conditional response distribution addressing the distributional changes across the predictor space, and (2) to identify important predictors for the response distribution change both with local regions and globally. We first introduce the probit stickbreaking process (PSBP) as a prior for an uncountable collection of predictordependent random probability measures and propose a PSBP mixture (PSBPM) of normal regressions for modeling the conditional distributions. A global variable selection structure is incorporated to discard unimportant predictors, while allowing estimation of posterior inclusion probabilities. Local variable selection is conducted relying on the conditional distribution estimates at different predictor points. An efficient stochastic search sampling algorithm is proposed for posterior computation. The methods are illustrated through simulation and applied to an epidemiologic study.
Nonparametric Bayes applications to biostatistics,” Bayesian Nonparametrics: Principles and Practice
 In
, 2010
"... Biomedical research has clearly evolved at a dramatic rate in the past decade, with improvements in technology leading to a fundamental shift in the way in which data are collected and analyzed. Before this paradigm shift, studies were most commonly designed to be simple and to focus on relationship ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Biomedical research has clearly evolved at a dramatic rate in the past decade, with improvements in technology leading to a fundamental shift in the way in which data are collected and analyzed. Before this paradigm shift, studies were most commonly designed to be simple and to focus on relationships among a few variables of primary interest. For example, in
Bayesian model based clustering procedures
 Journal of Computational and Graphical Statistics Lo
, 2006
"... This paper establishes a general framework for Bayesian modelbased clustering, in which subset labels are exchangeable, and items are also exchangeable, possibly up to covariate effects. It is rich enough to encompass a variety of existing procedures, including some recently discussed methodologies ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
This paper establishes a general framework for Bayesian modelbased clustering, in which subset labels are exchangeable, and items are also exchangeable, possibly up to covariate effects. It is rich enough to encompass a variety of existing procedures, including some recently discussed methodologies involving stochastic search or hierarchical clustering, but more importantly allows the formulation of clustering procedures that are optimal with respect to a specified loss function. Our focus is on loss functions based on pairwise coincidences, that is, whether pairs of items are clustered into the same subset or not. Optimisation of the posterior expected loss function can be formulated as a binary integer programming problem, which can be readily solved, for example by the simplex method, when clustering a modest number of items, but quickly becomes impractical as problem scale increases. To combat this, a new heuristic itemswapping algorithm is introduced. This performs well in our numerical experiments, on both simulated and real data examples. The paper includes a comparison of the statistical performance of the (approximate) optimal clustering with earlier methods that are modelbased but ad hoc in their detailed definition.
Fast Bayesian Inference in Dirichlet Process Mixture Models
, 2008
"... There has been increasing interest in applying Bayesian nonparametric methods in large samples and high dimensions. As Markov chain Monte Carlo (MCMC) algorithms are often infeasible, there is a pressing need for much faster algorithms. This article proposes a fast approach for inference in Dirichle ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
There has been increasing interest in applying Bayesian nonparametric methods in large samples and high dimensions. As Markov chain Monte Carlo (MCMC) algorithms are often infeasible, there is a pressing need for much faster algorithms. This article proposes a fast approach for inference in Dirichlet process mixture (DPM) models. Viewing the partitioning of subjects into clusters as a model selection problem, we propose a sequential greedy search algorithm for selecting the partition. Then, when conjugate priors are chosen, the resulting posterior conditionally on the selected partition is available in closed form. This approach allows testing of parametric models versus nonparametric alternatives based on Bayes factors. We evaluate the approach using simulation studies and compare it with four other fast nonparametric methods in the literature. We apply the proposed approach to three datasets including one from a large epidemiologic study. Matlab codes for the simulation and data analyses using the proposed approach are available online in the supplemental materials.
Estimation of semiparametric models in the presence of endogeneity and sample selection
 Journal of Computational and Graphical Statistics
, 2009
"... We analyze a semiparametric model for data that suffer from the problems of incidental truncation, where some of the data are observed for only part of the sample with a probability that depends on a selection equation, and of endogeneity, where a covariate is correlated with the disturbance term. T ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We analyze a semiparametric model for data that suffer from the problems of incidental truncation, where some of the data are observed for only part of the sample with a probability that depends on a selection equation, and of endogeneity, where a covariate is correlated with the disturbance term. The introduction of nonparametric functions in the model permits significant flexibility in the way covariates affect response variables. We present an efficient Bayesian method for the analysis of such models that allows us to consider general systems of outcome variables and endogenous regressors that are continuous, binary, censored, or ordered. Estimation is computationally inexpensive as it does not require data augmentation for the missing outcomes, thus reducing computational demands and enhancing the mixing of the Markov chain Monte Carlo simulation algorithm. The methods are applied in a model of women’s labor force participation and logwage determination that accounts for endogeneity, incidental truncation, and nonlinear covariate effects.
Additive cubic spline regression with Dirichlet process mixture errors
 Journal of Econometrics
, 2010
"... The goal of this paper is to develop a flexible Bayesian analysis of regression models for continuous and categorical outcomes. In the models we study, covariate (or regression) effects are modeled additively by cubic splines, and the error distribution (that of the latent outcomes in the case of ca ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The goal of this paper is to develop a flexible Bayesian analysis of regression models for continuous and categorical outcomes. In the models we study, covariate (or regression) effects are modeled additively by cubic splines, and the error distribution (that of the latent outcomes in the case of categorical data) is modeled as a Dirichlet process mixture. We employ a relatively unexplored but attractive basis in which the spline coefficients are the unknown function ordinates at the knots. We exploit this feature to develop a proper prior distribution on the coefficients that involves the first and second differences of the ordinates, quantities about which one may have prior knowledge. We also discuss the problem of comparing models with different numbers of knots or different error distributions through marginal likelihoods and Bayes factors which are computed within the framework of Chib (1995) as extended to DPM models by Basu and Chib (2003). The techniques are illustrated with simulated and real data.