Results 1 
6 of
6
Nonlinear Models Using Dirichlet Process Mixtures
"... We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, nonparametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relations ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, nonparametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relationship becomes nonlinear if the mixture contains more than one component, with different regression coefficients. We use simulated data to compare the performance of this new approach to alternative methods such as multinomial logit (MNL) models, decision trees, and support vector machines. We also evaluate our approach on two classification problems: identifying the folding class of protein sequences and detecting Parkinson’s disease. Our model can sometimes improve predictive accuracy. Moreover, by grouping observations into subpopulations (i.e., mixture components), our model can sometimes provide insight into hidden structure in the data.
Variable Selection in Nonparametric Random Effects Models
"... In analyzing longitudinal or clustered data with a mixed effects model (Laird and Ware, 1982), one may be concerned about violations of normality. Such violations can potentially impact subset selection for the fixed and random effects components of the model, inferences on the heterogeneity structu ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
In analyzing longitudinal or clustered data with a mixed effects model (Laird and Ware, 1982), one may be concerned about violations of normality. Such violations can potentially impact subset selection for the fixed and random effects components of the model, inferences on the heterogeneity structure, and the accuracy of predictions. This article focuses on Bayesian methods for subset selection in nonparametric random effects models in which one is uncertain about the predictors to be included and the distribution of their random effects. We characterize the unknown distribution of the individualspecific regression coefficients using a weighted sum of Dirichlet process (DP)distributed latent variables. By using carefullychosen mixture priors for coefficients in the base distributions of the component DPs, we allow fixed and random effects to be effectively dropped out of the model. A stochastic search Gibbs sampler is developed for posterior computation, and the methods are illustrated using simulated data and real data from a multilaboratory bioassay study.
MODEL SELECTION, COVARIANCE SELECTION AND BAYES CLASSIFICATION VIA SHRINKAGE
, 2006
"... The naive Bayes classifier (NB) has exhibited its “mysterious ” but outstanding classification ability in practice, in spite of its often unrealistic conditional independence assumption. This simple assumption implies the adoption of a diagonal structure for the underlying classspecific precision ..."
Abstract
 Add to MetaCart
The naive Bayes classifier (NB) has exhibited its “mysterious ” but outstanding classification ability in practice, in spite of its often unrealistic conditional independence assumption. This simple assumption implies the adoption of a diagonal structure for the underlying classspecific precision matrices. However, the NB leaves covariates interrelationships unrevealed. In this dissertation, we will extend the NB from the perspectives of covariance modeling and classification. Due to the positive definiteness constraint and the rapidlygrowing number of parameters with dimensions, covariance estimation in a multivariate normal population has been a classic but challenging statistical problem. Sparse shrinkage covariance/precision matrix estimation has been obeyed as an important principle in covariance/precision matrix modeling. However, many existing models can only shrink the covariance/precision matrix toward a predefined diagonal structure. We model a precision matrix via its Cholesky decomposition in terms of compositional regression coefficient matrix and error precisions. Our approach aims at estimating
Statistics and Computing manuscript No. (will be inserted by the editor) Robust Estimation of the Correlation Matrix of Longitudinal
"... Abstract We propose a doublerobust procedure for modeling the correlation matrix of a longitudinal dataset. It is based on an alternative Cholesky decomposition of the form Σ = DLL ⊤ D where D is a diagonal matrix proportional to the square roots of the diagonal entries of Σ and L is a unit lowert ..."
Abstract
 Add to MetaCart
Abstract We propose a doublerobust procedure for modeling the correlation matrix of a longitudinal dataset. It is based on an alternative Cholesky decomposition of the form Σ = DLL ⊤ D where D is a diagonal matrix proportional to the square roots of the diagonal entries of Σ and L is a unit lowertriangular matrix determining solely the correlation matrix. The first robustness is with respect to model misspecification for the innovation variances in D, and the second is robustness to outliers in the data. The latter is handled using heavytailed multivariate tdistributions with unknown degrees of freedom. We develop a Fisher scoring algorithm for computing the maximum likelihood estimator of the parameters when the nonredundant and unconstrained entries of (L, D) are modeled parsimoniously using covariates. We compare our results with those based on the modified Cholesky decomposition of the form LD 2 L ⊤ using simulations and a real dataset.
BAYESIAN METHODS TO IMPUTE MISSING COVARIATES FOR CAUSAL INFERENCE AND MODEL SELECTION
, 2008
"... This thesis presents new approaches to deal with missing covariate data in two situations; matching in observational studies and model selection for generalized linear models. In observational studies, inferences about treatment effects are often affected by confounding covariates. Analysts can redu ..."
Abstract
 Add to MetaCart
This thesis presents new approaches to deal with missing covariate data in two situations; matching in observational studies and model selection for generalized linear models. In observational studies, inferences about treatment effects are often affected by confounding covariates. Analysts can reduce bias due to differences in control and treated units ’ observed covariates using propensity score matching, which results in a matched control group with similar characteristics to the treated group. Propensity scores are typically estimated from the data using a logistic regression. When covariates are partially observed, missing values can be filled in using multiple imputation. Analysts can estimate propensity scores from the imputed data sets to find a matched control set. Typically, in observational studies, covariates are spread thinly over a large space. It is not always clear what an appropriate imputation model for the missing data should be. Implausible imputations can influence the matches selected and hence the estimate of the treatment effect. In propensity score matching, units tend to be selected from among those lying in the treated units ’ covariate space.
Bayesian Models for Variable Selection that Incorporate Biological Information
"... Variable selection has been the focus of much research in recent years. Bayesian methods have found many successful applications, particularly in situations where the amount of measured variables can be much greater than the number of observations. One such example is the analysis of genomics data. ..."
Abstract
 Add to MetaCart
Variable selection has been the focus of much research in recent years. Bayesian methods have found many successful applications, particularly in situations where the amount of measured variables can be much greater than the number of observations. One such example is the analysis of genomics data. In this paper we first review Bayesian variable selection methods for linear settings, including regression and classification models. We focus in particular on recent prior constructions that have been used for the analysis of genomic data and briefly describe two novel applications that integrate different sources of biological information into the analysis of experimental data. Next, we address variable selection for a different modeling context, i.e. mixture models. We address both clustering and discriminant analysis settings and conclude with an application to gene expression data for patients affected by leukemia.