Results 1  10
of
32
Evolutionary stochastic search for Bayesian model exploration,” Bayesian Analysis
, 2010
"... ar ..."
(Show Context)
Bayesian Adaptive Sampling for Variable Selection and Model Averaging
"... For the problem of model choice in linear regression, we introduce a Bayesian adaptive sampling algorithm (BAS), that samples models without replacement from the space of models. For problems that permit enumeration of all models BAS is guaranteed to enumerate the model space in 2 p iterations where ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
(Show Context)
For the problem of model choice in linear regression, we introduce a Bayesian adaptive sampling algorithm (BAS), that samples models without replacement from the space of models. For problems that permit enumeration of all models BAS is guaranteed to enumerate the model space in 2 p iterations where p is the number of potential variables under consideration. For larger problems where sampling is required, we provide conditions under which BAS provides perfect samples without replacement. When the sampling probabilities in the algorithm are the marginal variable inclusion probabilities, BAS may be viewed as sampling models “near ” the median probability model of Barbieri and Berger. As marginal inclusion probabilities are not known in advance we discuss several strategies to estimate adaptively the marginal inclusion probabilities within BAS. We illustrate the performance of the algorithm using simulated and real data and show that BAS can outperform Markov chain Monte Carlo methods. The algorithm is implemented in the R package BAS available at CRAN.
Searching for convergence in phylogenetic Markov chain Monte Carlo, Syst
 Biol
"... Abstract. — Markov chain Monte Carlo (MCMC) is a methodology that is gaining widespread use in the phylogenetics community and is central to phylogenetic software packages such as MrBayes. An important issue for users of MCMC methods is how to select appropriate values for adjustable parameters such ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Abstract. — Markov chain Monte Carlo (MCMC) is a methodology that is gaining widespread use in the phylogenetics community and is central to phylogenetic software packages such as MrBayes. An important issue for users of MCMC methods is how to select appropriate values for adjustable parameters such as the length of the Markov chain or chains, the sampling density, the proposal mechanism, and, if Metropoliscoupled MCMC is being used, the number of heated chains and their temperatures. Although some parameter settings have been examined in detail in the literature, others are frequently chosen with more regard to computational time or personal experience with other data sets. Such choices may lead to inadequate sampling of tree space or an inefficient use of computational resources. We performed a detailed study of convergence and mixing for 70 randomly selected, putatively orthologous protein sets with different sizes and taxonomic compositions. Replicated runs from multiple random starting points permit a more rigorous assessment of convergence, and we developed two novel statistics, 8 and e, for this purpose. Although likelihood values invariably stabilized quickly, adequate sampling of the posterior distribution of tree topologies took considerably longer. Our results suggest that multimodality is common for data sets with 30 or more taxa and that this results in slow convergence and mixing. However, we also found that the pragmatic approach of combining data from several short, replicated runs into a "metachain " to estimate bipartition posterior probabilities provided good approximations, and that such estimates were no worse in approximating a reference posterior distribution than those obtained using a single long run of the same length as
Automatic Bayesian model averaging for linear regression and applications in Bayesian curve tting
 Statist. Sinica
, 2001
"... Abstract: With the development of MCMC methods, Bayesian methods play a more and more important role in model selection and statistical prediction. However, the sensitivity of the methods to prior distributions has caused much difficulty to users. In the context of multiple linear regression, we pro ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Abstract: With the development of MCMC methods, Bayesian methods play a more and more important role in model selection and statistical prediction. However, the sensitivity of the methods to prior distributions has caused much difficulty to users. In the context of multiple linear regression, we propose an automatic prior setting, in which there is no parameter to be specified by users. Under the prior setting, we show that sampling from the posterior distribution is approximately equivalent to sampling from a Boltzmann distribution defined on Cp values. The numerical results show that the Bayesian model averaging procedure resulted from the automatic prior settin provides a significant improvement in predictive performance over other two procedures proposed in the literature. The procedure is extended to the problem of Bayesian curve fitting with regression splines. Evolutionary Monte Carlo is used to sample from the posterior distributions. Key words and phrases: Bayesian model averaging, curve fitting, evolutionary
Sequential Monte Carlo on large binary sampling spaces
 Statist. Comput
, 2011
"... A Monte Carlo algorithm is said to be adaptive if it automatically calibrates its current proposal distribution using past simulations. The choice of the parametric family that defines the set of proposal distributions is critical for good performance. In this paper, we present such a parametric fam ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
A Monte Carlo algorithm is said to be adaptive if it automatically calibrates its current proposal distribution using past simulations. The choice of the parametric family that defines the set of proposal distributions is critical for good performance. In this paper, we present such a parametric family for adaptive sampling on highdimensional binary spaces. A practical motivation for this problem is variable selection in a linear regression context. We want tosamplefromaBayesian posterior distribution on the model space using an appropriate version of Sequential Monte Carlo. Raw versions of Sequential Monte Carlo are easily implemented using binary vectors with independent components. For highdimensional problems, however, these simple proposals do not yield satisfactory results. The key to an efficient adaptive algorithm are binary parametric families which take correlations intoaccount, analogously tothemultivariate normaldistribution on continuous spaces. We provide a review of models for binary data and make one of them work in the context of Sequential Monte Carlo sampling. Computational studies on real life data with about a hundred covariates suggest that, on difficult instances, our Sequential Monte Carlo approach clearly outperforms standard techniques based on Markov chain exploration.
Darwinian Evolution in Parallel Universes: A Parallel Genetic Algorithm for Variable Selection
"... The need to identify a few important variables that affect a certain outcome of interest commonly arises in various industrial engineering applications. The genetic algorithm (GA) appears to be a natural tool for solving such a problem. In this article we first demonstrate that the GA is actually no ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
The need to identify a few important variables that affect a certain outcome of interest commonly arises in various industrial engineering applications. The genetic algorithm (GA) appears to be a natural tool for solving such a problem. In this article we first demonstrate that the GA is actually not a particularly effective variable selection tool, and then propose a very simple modification. Our idea is to run a number of GAs in parallel without allowing each GA to fully converge, and to consolidate the information from all the individual GAs in the end. We call the resulting algorithm the parallel genetic algorithm (PGA). Using a number of both simulated and real examples, we show that the PGA is an interesting as well as highly competitive and easytouse variable selection tool.
Bayesian Model Search and Multilevel Inference for SNP Association Studies
 Annals of Applied Statistics
, 2010
"... Technological advances in genotyping have given rise to hypothesisbased association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis inclu ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Technological advances in genotyping have given rise to hypothesisbased association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA’s statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally “validated ” in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN. 1. Introduction. Recent
Distributed Evolutionary Monte Carlo with Applications to Bayesian Analysis

, 2005
"... Sampling from multimodal and high dimensional target distribution posits a great challenge in Bayesian analysis. This paper combines the attractive features of the distributed genetic algorithm and the Markov Chain Monte Carlo, resulting in a new Monte Carlo algorithm Distributed Evolutionary Monte ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Sampling from multimodal and high dimensional target distribution posits a great challenge in Bayesian analysis. This paper combines the attractive features of the distributed genetic algorithm and the Markov Chain Monte Carlo, resulting in a new Monte Carlo algorithm Distributed Evolutionary Monte Carlo (DEMC) for realvalued problems. DEMC evolves a population of the Markov chains through genetic operators to explore the target function efficiently. The promising potential of the DEMC algorithm is illustrated by applying it to multimodal samples, Bayesian Neural Network and logistic regression inference.
Automatic smoothing for discontinuous regression functions: Supporting document. Download: http://www.stat.colostate.edu/∼tlee/ PSfiles/support.ps.gz
, 2002
"... Abstract: This article proposes an automatic smoothing method for recovering discontinuous regression functions. The method models the target regression function with a series of disconnected cubic regression splines which partition the function’s domain. In this way discontinuity points can be inc ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Abstract: This article proposes an automatic smoothing method for recovering discontinuous regression functions. The method models the target regression function with a series of disconnected cubic regression splines which partition the function’s domain. In this way discontinuity points can be incorporated in a fitted curve simply as the boundary points between adjacent splines. Three objective criteria are constructed and compared for choosing the number and placement of these discontinuity points as well as the amount of smoothing. These criteria are derived from three fundamentally different model selection methods: AIC, GCV and the MDL principle. Practical optimization of these criteria is done by genetic algorithms. Simulation results show that the proposed method is superior to many existing smoothing methods when the target function is nonsmooth. The method is further made robust by using a Gaussian mixture approach to model outliers.
Variable selection in regression mixture modeling for the discovery of gene regulatory networks
 Journal of the American Statistical Association
, 2007
"... The profusion of genomic data through genome sequencing and gene expression microarray technology has facilitated statistical research in determining gene interactions regulating a biological process. Current methods generally consist of a twostage procedure: clustering gene expression measurement ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
The profusion of genomic data through genome sequencing and gene expression microarray technology has facilitated statistical research in determining gene interactions regulating a biological process. Current methods generally consist of a twostage procedure: clustering gene expression measurements, and searching for regulatory “switches”, typically short, conserved sequence patterns (motifs) in the DNA sequence adjacent to the genes. This process often leads to misleading conclusions as incorrect cluster selection may lead to missing important regulatory motifs or making many false discoveries. Treating cluster memberships as known, rather than estimated, introduces bias into analyses, preventing uncertainty about cluster parameters. Further, there is underutilization of the available data, as the sequence information is ignored for purposes of expression clustering and viceversa. We propose a way to address these issues by combining gene clustering and motif discovery in a unified framework, a mixture of hierarchical regression models, with unknown components representing the latent gene clusters, and genomic sequence features linked to the resultant gene ex