Results 1 - 10
of
29
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by . . .
, 2003
"... The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The ..."
Abstract
-
Cited by 381 (5 self)
- Add to MetaCart
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The core of this method is a simple hill-climbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distance-based method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximum-likelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distance-based and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/. [Algorithm; computer simulations; maximum likelihood; phylogeny; rbcL; RDPII project.] The size of homologous sequence data sets has increased dramatically in recent years, and many of these data sets now involve several hundreds of taxa. Moreover, current probabilist...
Bayesian phylogenetic inference via Markov chain Monte Carlo methods
- Biometrics
, 1999
"... SUMMARY. We derive a Markov chain to sample from the posterior distribution for a phylogenetic tree given sequence information from the corresponding set of organisms, a stochastic model for these data, and a prior distribution on the space of trees. A transformation of the tree into a canonical cop ..."
Abstract
-
Cited by 46 (3 self)
- Add to MetaCart
SUMMARY. We derive a Markov chain to sample from the posterior distribution for a phylogenetic tree given sequence information from the corresponding set of organisms, a stochastic model for these data, and a prior distribution on the space of trees. A transformation of the tree into a canonical cophenetic matrix form suggests a simple and effective proposal distribution for selecting candidate trees close to the current tree in the chain. We illustrate the algorithm with restriction site data on 9 plant species, then extend to DNA sequences from 32 species of fish. The algorithm mixes well in both examples from random starting trees, generating reproducible estimates and credible sets for the path of evolution.
Parallel Metropolis-Coupled Markov Chain Monte Carlo for Bayesian Phylogenetic Inference
, 2003
"... Motivation: Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov Chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrap ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Motivation: Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov Chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrapment in local optima. A variant of MCMC, known as Metropolis-Coupled MCMC allows multiple peaks in the landscape of trees to be more readily explored, but at the cost of increased execution time. Results: This paper presents a parallel algorithm for Metropolis-Coupled MCMC. The proposed parallel algorithm retains the ability to explore multiple peaks in the posterior distribution of trees while maintaining a fast execution time. The algorithm has been implemented using two popular parallel programming models: message passing and shared memory. Performance results indicate nearly linear speed improvement in both programming models for small and large data sets. Availability: MrBayes v3.0 is available at http://morphbank.ebc.uu.se/mrbayes3/.
Properties of phylogenetic trees generated by Yule-type speciation models
- Math. Biosci
, 2001
"... We investigate some discrete structural properties of evolutionary trees generated under simple null models of speciation, such as the Yule model. These models have been used as priors in Bayesian approaches to phylogenetic analysis, and also to test hypotheses concerning the speciation process. In ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We investigate some discrete structural properties of evolutionary trees generated under simple null models of speciation, such as the Yule model. These models have been used as priors in Bayesian approaches to phylogenetic analysis, and also to test hypotheses concerning the speciation process. In this paper we describe new results for three properties of trees generated under such models. Firstly, for a rooted tree generated by the Yule model we describe the probability distribution on the depth �number of edges from the root) of the most recent common ancestor of a random subset of k species. Next we show that, for trees generated under the Yule model, the approximate position of the root can be estimated from the associated unrooted tree, even for trees with a large number of leaves. Finally, we analyse a biologically motivated extension of the Yule model and describe its distribution on tree shapes when speciation occurs
Parallel algorithms for Bayesian phylogenetic inference
- Journal of Parallel Distributed Computing
, 2003
"... The combination of a Markov chain Monte Carlo (MCMC) method with likelihood-based assessment of phylogenies is becoming a popular alternative to direct likelihood optimization. However, MCMC, like maximum likelihood, is a computationallyexpensive method. To approximate the posterior distribution of ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
The combination of a Markov chain Monte Carlo (MCMC) method with likelihood-based assessment of phylogenies is becoming a popular alternative to direct likelihood optimization. However, MCMC, like maximum likelihood, is a computationallyexpensive method. To approximate the posterior distribution of phylogenies, a Markov chain is constructed, using the Metropolis algorithm, such that the chain has the posterior distribution of the parameters of phylogenies as its stationary distribution. This paper describes parallel algorithms and their MPI-based parallel implementation for MCMC-based Bayesian phylogenetic inference. Bayesian phylogenetic inference is computationally expensive both in time and in memory requirements. Our variations on MCMC and their implementation were done to permit the studyof large phylogenetic problems. In our approach, we can distribute either entire chains or parts of a chain to different processors, since in current models the columns of the data are independent. Evaluations on a 32-node Beowulf cluster suggest the problem scales well. A number of important points are identified, including a superlinear speedup due to more effective cache usage and the point at which additional processors slow down the process due to communication overhead.
Markov Chain Monte Carlo For The Bayesian Analysis Of Evolutionary Trees From Aligned Molecular Sequences
- Proceedings of the AMS-IMS-SIAM Joint Summer Research Conference on Statistics and Molecular Biology
, 1998
"... Introduction. Stochastic models have long been considered useful for describing variation in the molecular sequences of extant populations (e.g., Jukes and Cantor, 1969; Felsenstein, 1973; Kimura, 1980). Parameters in such models include the phylogeny, which encodes the pattern of evolutionary relat ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Introduction. Stochastic models have long been considered useful for describing variation in the molecular sequences of extant populations (e.g., Jukes and Cantor, 1969; Felsenstein, 1973; Kimura, 1980). Parameters in such models include the phylogeny, which encodes the pattern of evolutionary relationships among populations, and substitution rates, which describe how molecules change over time within populations. It seems quite natural to infer these parameters using the induced likelihood function in some way, but such inference has been difficult in practice because computations can be prohibitively expensive. Owing to the Markovian nature of the standard models, evaluation of the likelihood function follows straightforward recursive equations, and so evaluation is not the difficult part. The difficulty arises with optimization, since the likelihood resides over a complicated parameter space, and seems to admit no simple representation (Felsenstein, 1981, 1983; Goldman, 1990
Likelihoods on coalescents: a Monte Carlo sampling approach to inferring parameters from population samples of molecular data
- In Statistics in Molecular Biology and Genetics
, 1999
"... 2 When population samples of molecular data, such as sequences, are taken, the members of the sample are related by a gene tree whose shape is affected by the population processes, such as genetic drift, change of population size, and migration. Genetic parameters such as recombination also affect t ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
2 When population samples of molecular data, such as sequences, are taken, the members of the sample are related by a gene tree whose shape is affected by the population processes, such as genetic drift, change of population size, and migration. Genetic parameters such as recombination also affect that genealogy. Likelihood inference of these parameters involves summing over all possible genealogies. There is a vast number of these, so that exact computation is not possible. Griffiths and Tavaré have proposed computing these likelihoods by Monte Carlo integration. Our group is doing this by the Metropolis-Hastings method of Markov Chain Monte Carlo integration. We now have, in our LAMARC package, programs to do this for constant-sized and growing populations, and for geographically structured populations. The bias of the estimator of population growth rate is discussed. One can also allow for samples stratified in time, as with fossil DNA or sequential samples from the population of a virus in a patient. A program for recombining
PBPI: a High Performance Implementation of Bayesian Phylogenetic Inference
- In Proc. of Supercomputing’2006
, 2006
"... This paper describes the implementation and performance of PBPI, a parallel implementation of Bayesian phylogenetic inference method for DNA sequence data. By combining the Markov Chain Monte Carlo (MCMC) method with likelihood-based assessment of phylogenies, Bayesian phylogenetic inferences can in ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper describes the implementation and performance of PBPI, a parallel implementation of Bayesian phylogenetic inference method for DNA sequence data. By combining the Markov Chain Monte Carlo (MCMC) method with likelihood-based assessment of phylogenies, Bayesian phylogenetic inferences can incorporate complex statistic models into the process of phylogenetic tree estimation. However, Bayesian analyses are extremely computationally expensive. PBPI uses algorithmic improvements and parallel processing to achieve significant performance improvement over comparable Bayesian phylogenetic inference programs. We evaluated the performance and accuracy of PBPI using a simulated dataset on System X, a terascale supercomputer at Virginia Tech. Our results show that PBPI identifies equivalent tree estimates 1424 times faster on 256 processors than a widely-used, best-available (albeit sequential), Bayesian phylogenetic inference program. PBPI also achieves linear speedup with the number of processors for large problem sizes. Most importantly, the PBPI framework enables Bayesian phylogenetic analysis of large datasets previously impracticable. 1.
Oh Brother, Where Art Thou? - A Bayes Factor Test for Recombination with Uncertain Heritage
"... Current methods to identify recombination between genetic variants of human immunodeciency virus-1 (HIV-1) fall into a sequential testing trap, in which signicance is assessed conditional on the parentals and crossover points (COPs) that maximize the same test statistic. We overcome this shortfall b ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Current methods to identify recombination between genetic variants of human immunodeciency virus-1 (HIV-1) fall into a sequential testing trap, in which signicance is assessed conditional on the parentals and crossover points (COPs) that maximize the same test statistic. We overcome this shortfall by testing for recombination while simultaneously inferring parentals and COPs using an extended Bayesian multiple change-point model. The model assumes that aligned molecular sequence data consist of an unknown number of contiguous segments that may support alternative topologies or varying evolutionary pressures. We test for recombination using Bayes factors and develop a new class of priors that allows us to assess signicance across a wide range of support. We apply our method to three putative, gag gene recombinants. HIV-1 isolate RW024 decisively supports recombination with an inferred parental heritage of AD and a COP 95% Bayesian credible interval of (1152; 1178) using the HXB2 numb...
Bayesian Analysis of Molecular Evolution using MrBayes
, 2004
"... Stochastic models of evolution play a prominent role in the field of molecular evolution; they are used in applications as far ranging as phylogeny estimation, uncovering the pattern of DNA substitution, identifying amino acids under directional selection, and in inferring the history of a populatio ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Stochastic models of evolution play a prominent role in the field of molecular evolution; they are used in applications as far ranging as phylogeny estimation, uncovering the pattern of DNA substitution, identifying amino acids under directional selection, and in inferring the history of a population using

