Results 1  10
of
46
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by . . .
, 2003
"... The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The ..."
Abstract

Cited by 851 (14 self)
 Add to MetaCart
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The core of this method is a simple hillclimbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distancebased method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximumlikelihood programs and much higher than the performance of distancebased and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximumlikelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distancebased and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/. [Algorithm; computer simulations; maximum likelihood; phylogeny; rbcL; RDPII project.] The size of homologous sequence data sets has increased dramatically in recent years, and many of these data sets now involve several hundreds of taxa. Moreover, current probabilist...
Bayesian phylogenetic inference via Markov chain Monte Carlo methods
 Biometrics
, 1999
"... SUMMARY. We derive a Markov chain to sample from the posterior distribution for a phylogenetic tree given sequence information from the corresponding set of organisms, a stochastic model for these data, and a prior distribution on the space of trees. A transformation of the tree into a canonical cop ..."
Abstract

Cited by 85 (3 self)
 Add to MetaCart
SUMMARY. We derive a Markov chain to sample from the posterior distribution for a phylogenetic tree given sequence information from the corresponding set of organisms, a stochastic model for these data, and a prior distribution on the space of trees. A transformation of the tree into a canonical cophenetic matrix form suggests a simple and effective proposal distribution for selecting candidate trees close to the current tree in the chain. We illustrate the algorithm with restriction site data on 9 plant species, then extend to DNA sequences from 32 species of fish. The algorithm mixes well in both examples from random starting trees, generating reproducible estimates and credible sets for the path of evolution.
Empirical and Hierarchical Bayesian Estimation of Ancestral States
 SYST. BIOL. 50(3):351–366
, 2001
"... Several methods have been proposed to infer the states at the ancestral nodes on a phylogeny. These methods assume a specific tree and set of branch lengths when estimating the ancestral character state. Inferences of the ancestral states, then, are conditioned on the tree and branch lengths being ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
Several methods have been proposed to infer the states at the ancestral nodes on a phylogeny. These methods assume a specific tree and set of branch lengths when estimating the ancestral character state. Inferences of the ancestral states, then, are conditioned on the tree and branch lengths being true. We develop a hierarchical Bayes method for inferring the ancestral states on a tree. The method integrates over uncertainty in the tree, branch lengths, and substitution model parameters by using Markov chain Monte Carlo. We compare the hierarchical Bayes inferences of ancestral states with inferences of ancestral states made under the assumption that a specific tree is correct. We find that the methods are correlated, but that accommodating uncertainty in parameters of the phylogenetic model can make inferences of ancestral states even more uncertain than they would be in an empirical Bayes analysis.
Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics
, 2004
"... Motivation: Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrap ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
Motivation: Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrapment in local optima. Metropolis coupled MCMC [(MC) 3], a variant of MCMC, allows multiple peaks in the landscape of trees to be more readily explored, but at the cost of increased execution time. Results: This paper presents a parallel algorithm for (MC) 3. The proposed parallel algorithm retains the ability to explore multiple peaks in the posterior distribution of trees while maintaining a fast execution time. The algorithm has been implemented using two popular parallel programming models: message passing and shared memory. Performance results indicate nearly linear speed improvement in both programming models for small and large data sets. Availability: MrBayes v3.0 is available at
Properties of phylogenetic trees generated by Yuletype speciation models
 Math. Biosci
, 2001
"... We investigate some discrete structural properties of evolutionary trees generated under simple null models of speciation, such as the Yule model. These models have been used as priors in Bayesian approaches to phylogenetic analysis, and also to test hypotheses concerning the speciation process. In ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
We investigate some discrete structural properties of evolutionary trees generated under simple null models of speciation, such as the Yule model. These models have been used as priors in Bayesian approaches to phylogenetic analysis, and also to test hypotheses concerning the speciation process. In this paper we describe new results for three properties of trees generated under such models. Firstly, for a rooted tree generated by the Yule model we describe the probability distribution on the depth �number of edges from the root) of the most recent common ancestor of a random subset of k species. Next we show that, for trees generated under the Yule model, the approximate position of the root can be estimated from the associated unrooted tree, even for trees with a large number of leaves. Finally, we analyse a biologically motivated extension of the Yule model and describe its distribution on tree shapes when speciation occurs
Parallel MetropolisCoupled Markov Chain Monte Carlo for Bayesian Phylogenetic Inference
, 2003
"... Motivation: Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov Chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrap ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
Motivation: Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov Chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrapment in local optima. A variant of MCMC, known as MetropolisCoupled MCMC allows multiple peaks in the landscape of trees to be more readily explored, but at the cost of increased execution time. Results: This paper presents a parallel algorithm for MetropolisCoupled MCMC. The proposed parallel algorithm retains the ability to explore multiple peaks in the posterior distribution of trees while maintaining a fast execution time. The algorithm has been implemented using two popular parallel programming models: message passing and shared memory. Performance results indicate nearly linear speed improvement in both programming models for small and large data sets. Availability: MrBayes v3.0 is available at http://morphbank.ebc.uu.se/mrbayes3/.
Markov Chain Monte Carlo For The Bayesian Analysis Of Evolutionary Trees From Aligned Molecular Sequences
 Proceedings of the AMSIMSSIAM Joint Summer Research Conference on Statistics and Molecular Biology
, 1998
"... Introduction. Stochastic models have long been considered useful for describing variation in the molecular sequences of extant populations (e.g., Jukes and Cantor, 1969; Felsenstein, 1973; Kimura, 1980). Parameters in such models include the phylogeny, which encodes the pattern of evolutionary relat ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
Introduction. Stochastic models have long been considered useful for describing variation in the molecular sequences of extant populations (e.g., Jukes and Cantor, 1969; Felsenstein, 1973; Kimura, 1980). Parameters in such models include the phylogeny, which encodes the pattern of evolutionary relationships among populations, and substitution rates, which describe how molecules change over time within populations. It seems quite natural to infer these parameters using the induced likelihood function in some way, but such inference has been difficult in practice because computations can be prohibitively expensive. Owing to the Markovian nature of the standard models, evaluation of the likelihood function follows straightforward recursive equations, and so evaluation is not the difficult part. The difficulty arises with optimization, since the likelihood resides over a complicated parameter space, and seems to admit no simple representation (Felsenstein, 1981, 1983; Goldman, 1990
Bayesian Analysis of Molecular Evolution using MrBayes
, 2004
"... Stochastic models of evolution play a prominent role in the field of molecular evolution; they are used in applications as far ranging as phylogeny estimation, uncovering the pattern of DNA substitution, identifying amino acids under directional selection, and in inferring the history of a populatio ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Stochastic models of evolution play a prominent role in the field of molecular evolution; they are used in applications as far ranging as phylogeny estimation, uncovering the pattern of DNA substitution, identifying amino acids under directional selection, and in inferring the history of a population using
Parallel algorithms for Bayesian phylogenetic inference
 Journal of Parallel Distributed Computing
, 2003
"... The combination of a Markov chain Monte Carlo (MCMC) method with likelihoodbased assessment of phylogenies is becoming a popular alternative to direct likelihood optimization. However, MCMC, like maximum likelihood, is a computationallyexpensive method. To approximate the posterior distribution of ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
The combination of a Markov chain Monte Carlo (MCMC) method with likelihoodbased assessment of phylogenies is becoming a popular alternative to direct likelihood optimization. However, MCMC, like maximum likelihood, is a computationallyexpensive method. To approximate the posterior distribution of phylogenies, a Markov chain is constructed, using the Metropolis algorithm, such that the chain has the posterior distribution of the parameters of phylogenies as its stationary distribution. This paper describes parallel algorithms and their MPIbased parallel implementation for MCMCbased Bayesian phylogenetic inference. Bayesian phylogenetic inference is computationally expensive both in time and in memory requirements. Our variations on MCMC and their implementation were done to permit the studyof large phylogenetic problems. In our approach, we can distribute either entire chains or parts of a chain to different processors, since in current models the columns of the data are independent. Evaluations on a 32node Beowulf cluster suggest the problem scales well. A number of important points are identified, including a superlinear speedup due to more effective cache usage and the point at which additional processors slow down the process due to communication overhead.
Oh Brother, Where Art Thou? A Bayes Factor Test for Recombination with Uncertain Heritage
, 2002
"... Current methods to identify recombination between subtypes of human immunodeficiency virus 1 (HIV1) fall into a sequential testing trap, in which significance is assessed conditional on parental representative sequences and crossover points (COPs) that maximize the same test statistic. We overcame ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
Current methods to identify recombination between subtypes of human immunodeficiency virus 1 (HIV1) fall into a sequential testing trap, in which significance is assessed conditional on parental representative sequences and crossover points (COPs) that maximize the same test statistic. We overcame this shortfall by testing for recombination while inferring parental heritage and COPs using an extended Bayesian multiple changepoint model. The model assumes that aligned molecular sequence data consist of an unknown number of contiguous segments that may support alternative topologies or varying evolutionary pressures. We allowed for heterogeneity in the substitution process and specifically tested for intersubtype recombination using Bayes factors. We also developed a new class of priors to assess significance across a wide range of support for recombination in the data. We applied our method to three putative gag gene recombinants. HIV1 isolate RW024 decisively supported recombination with an inferred parental heritage of AD and a COP 95 % Bayesian credible interval of (1152, 1178) using the HXB2 numbering scheme. HIV1 isolate VI557 barely supported recombination. HIV1 isolate RF decisively rejected recombination as expected, given that the sequence is commonly used as a reference sequence for subtype B. We employed scaled regeneration quantile plots to assess convergence and found this approach convenient to use even for our