## Very fast algorithms for evaluating the stability of ML and Bayesian phylogenetic trees from sequence data (2002)

Venue: | Genome Informatics |

Citations: | 19 - 0 self |

### BibTeX

@ARTICLE{Waddell02veryfast,

author = {Peter J. Waddell and Hirohisa Kishino and Rissa Ota},

title = {Very fast algorithms for evaluating the stability of ML and Bayesian phylogenetic trees from sequence data},

journal = {Genome Informatics},

year = {2002},

volume = {13},

pages = {82--92}

}

### OpenURL

### Abstract

Evolutionary trees sit at the core of all realistic models describing a set of related sequences, including alignment, homology search, ancestral protein reconstruction and 2D/3D structural change. It is important to assess the stochastic error when estimating a tree, including models using the most realistic likelihood-based optimizations, yet computation times may be many days or weeks. If so, the bootstrap is computationally prohibitive. Here we show that the extremely fast “resampling of estimated log likelihoods ” or RELL method behaves well under more general circumstances than previously examined. RELL approximates the bootstrap (BP) proportions of trees better that some bootstrap methods that rely on fast heuristics to search the tree space. The BIC approximation of the Bayesian posterior probability (BPP) of trees is made more accurate by including an additional term related to the determinant of the information matrix (which may also be obtained as a product of gradient or score vectors). Such estimates are shown to be very close to MCMC chain values. Our analysis of mammalian mitochondrial amino acid sequences suggest that when model breakdown occurs, as it typically does for sequences separated by more than a few million years, the BPP values are far too peaked and the real fluctuations in the likelihood of the data

### Citations

2307 |
Estimating the dimension of a model
- SCHWARZ
- 1978
(Show Context)
Citation Context ... PAUP was used to convert these into edge support values. The results are illustrated using the RNA alignment from [3].s84 Waddell et al. Quantifying the effect of the information matrix upon the BIC =-=[18]-=- approximation as used for trees [15, 28] we used the same DNA alignment as [28], the RNA data set above [3], and a larger mammalian mitochondrial amino-acid alignment from Figure 1 of [26]. The covar... |

1482 |
PAUP *: phylogenetic analysis using parsimony (*and other methods), v. 4.0
- SWOFFORD
- 1997
(Show Context)
Citation Context .... The effect of ancestral polymorphism upon bootstrap support for a tree is also examined. 2 Materials and Methods To assess the suitability of RELL on trees of more than four taxa, the program PAUP* =-=[21]-=- was used to estimate a bootstrap proportions with different tree search strategies under the HKY model. MOLPHY [1] was used to estimate the RELL proportion for a set of trees, and PAUP was used to co... |

1115 |
Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39
- Felsenstein
- 1985
(Show Context)
Citation Context ...her random fluctuations that can occur). A favored approach to assessing stochastic variation is to randomly resample the columns of aligned data with replacement to create pseudo-replicate data sets =-=[5, 7, 22]-=-, each of which is then analyzed in exactly the same way as the original data and the number of times an edge was in the optimal tree of replicates recorded (the bootstrap proportion, BP). Unfortunate... |

706 |
PAML: a program package for phylogenetic analysis by maximum likelihood
- Yang
- 1997
(Show Context)
Citation Context ...nt as [28], the RNA data set above [3], and a larger mammalian mitochondrial amino-acid alignment from Figure 1 of [26]. The covariance matrix and other support values were output by the program PAML =-=[31]-=- using the mtAA rate matrix of that program. Direct posterior probabilities were estimated with MrBayes 2.01 [12], while adjusted bootstrap values used CONSEL [20]. The parametric bootstrap used Seq-G... |

685 |
MRBAYES: Bayesian inference of phylogenetic trees
- JP, Ronquist
- 2001
(Show Context)
Citation Context ...Markov Chain (MCMC) methods; that is the model is randomly modified and it “walks” through the parameter space guided only by the shape of the space and an acceptance step based on a likelihood ratio =-=[12]-=-. Such integrations are typically married to Bayesian statistics. The computational load of both methods can be very high (e.g. one week on the fastest available single processor to evaluate a single ... |

612 |
The jackknife, the bootstrap, and other resampling plans
- Efron
- 1982
(Show Context)
Citation Context ...her random fluctuations that can occur). A favored approach to assessing stochastic variation is to randomly resample the columns of aligned data with replacement to create pseudo-replicate data sets =-=[5, 7, 22]-=-, each of which is then analyzed in exactly the same way as the original data and the number of times an edge was in the optimal tree of replicates recorded (the bootstrap proportion, BP). Unfortunate... |

261 |
Cases in which parsimony or compatibility methods will be positively misleading
- Felsenstein
- 1978
(Show Context)
Citation Context ...rom site patterns and so higher likelihood may tend to have the largest variance. This relationship can break down in cases of long edges attract (and hence possible inconsistency of tree estimation) =-=[6, 30]-=-. If generally true this suggests that BIC values will tend to be slightly more conservative than the true Bayesian posterior probabilities for trees under typical conditions (not long edges attract).... |

208 |
Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea
- Kishino, Hasegawa
- 1989
(Show Context)
Citation Context ...8 141075.7 28950.2 3.2 Better Approximating Bayesian Posterior Probabilities Bayesian posterior probabilities (BPPfs) for trees may be approximated by exponentiation of the likelihood of the ML trees =-=[13, 15]-=-. Waddell et al. [28] showed that this approach gave closer estimates of the BPPfs than the bootstrap. In this BIC approximation of the BPP a term that is dropped from the Laplace asymptotic result wa... |

178 |
Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees
- Rambaut, Grassly
- 1997
(Show Context)
Citation Context ...ing the mtAA rate matrix of that program. Direct posterior probabilities were estimated with MrBayes 2.01 [12], while adjusted bootstrap values used CONSEL [20]. The parametric bootstrap used Seq-Gen =-=[17]-=- and Pseq-Gen [8] to simulate datasets under the ML parameters of the model, which were then analyzed under the same model using PAUP* or MOLPHY for amino acids. Combined bootstrap/MCMC chain results ... |

154 |
An approximately unbiased test of phylogenetic tree selection. Syst. Biol
- Shimodaira
(Show Context)
Citation Context ...an MCMC chain, although each chain length may be shortened as there is some robustness gained from multiple random starts. It is interesting that NP, a bootstrap corrected for curvature of boundaries =-=[19, 20]-=-, gives values very close to the standard bootstrap. This suggests curvature is a minor problem for this data and does not explain the big differences between non-parametric bootstrap and Bayesian sup... |

145 |
Evolutionary relationship of dna sequences in nite populations
- Tajima
- 1983
(Show Context)
Citation Context ...there is a sufficiently close trifurcation in the tree that the tree for a site and the species tree are different due to ancestral polymorphism (a special case of the gene-tree, species-tree problem =-=[10, 23]-=- studied by [24]). A concern is whether this will make either bootstrap proportions or BPP values poor at estimating the species tree. Another way of asking the question is whether recombination see e... |

127 |
CONSEL: for assessing the confidence of phylogenetic tree selection
- Shimodaira, Hasegawa
- 2001
(Show Context)
Citation Context ...lues were output by the program PAML [31] using the mtAA rate matrix of that program. Direct posterior probabilities were estimated with MrBayes 2.01 [12], while adjusted bootstrap values used CONSEL =-=[20]-=-. The parametric bootstrap used Seq-Gen [17] and Pseq-Gen [8] to simulate datasets under the ML parameters of the model, which were then analyzed under the same model using PAUP* or MOLPHY for amino a... |

124 | Bayesian phylogenetic inference using dna sequences: a markov chain monte carlo method - Yang, Rannala - 1997 |

80 |
MOLPHY version 2.3. (Programs for molecular phylogenetics based on maximum likelihood). Distributed by the authors
- J, Hasegawa
- 1996
(Show Context)
Citation Context ...o assess the suitability of RELL on trees of more than four taxa, the program PAUP* [21] was used to estimate a bootstrap proportions with different tree search strategies under the HKY model. MOLPHY =-=[1]-=- was used to estimate the RELL proportion for a set of trees, and PAUP was used to convert these into edge support values. The results are illustrated using the RNA alignment from [3].s84 Waddell et a... |

72 |
Maximum likelihood inference of protein phylogeny and the origin of chloroplasts
- Kishino, Miyata, et al.
- 1990
(Show Context)
Citation Context ...e most promising ways to cut this expense is to resample estimated log likelihoods (RELL) of sites, so the recalculation of support for a tree on a replicate is as fast as summing a vector of numbers =-=[14]-=-. To date its behavior has only been explored on a 4-taxon data set. RELL has also been used in applications such as the centered bootstrapping of sequence data [27]. While Bayesian methods output pos... |

33 | Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA
- Adachi, Waddell, et al.
- 2000
(Show Context)
Citation Context ...h such approaches can rectify the summation to be over the bootstrap frequency distribution of trees and not just those returned by the MCMC chain, e.g. see the method for estimating rate matrices in =-=[2]-=-). All present indicators are that bootstrap continues to give much more reasonable estimates of the error about trees under circumstances in which the model does not fit well, and thus seems a more a... |

26 | Empirical and hierarchical Bayesian estimation of ancestral states
- HUELSENBECK, BOLLBACK
- 2001
(Show Context)
Citation Context ...xpect. This is an important point for all those that use trees in some part of their genomic studies. For example, for those who will use posterior probabilities of ancestral sequence reconstructions =-=[11]-=-, the accuracy of the ancestral sequence may be far less than the model suggests (although such approaches can rectify the summation to be over the bootstrap frequency distribution of trees and not ju... |

20 |
Towards resolving the interordinal relationships of placental mammals. Systematic Biology
- Waddell, Okada, et al.
- 1999
(Show Context)
Citation Context ...ximations -248 -26222 -26220 -26218 -26216 -26214 -26212 -26210 -26208 -26206 to the tree in Figure 1 of [26]. This is one of the trees used in -248.2 the recent reclassification of placental mammals =-=[29]-=-. The results are given in Table 2 for all rearrangements of the -248.4 tree that do not decrease the lnL value massively. “Ce-248.6 tung” indicates the clade (Cetartiodactyla, Perissodactyla), -248.8... |

17 |
A phylogenetic foundation for comparative mammalian genomics
- Waddell, Kishino, et al.
- 2001
(Show Context)
Citation Context ...ochastic variability are dependent upon the model’s accuracy. If the model of sequence evolution is erroneous the BPP value for an edge can be much further from reality than the bootstrap (BP) values =-=[28]-=-. The latter are based on the real data and not a model that will often overestimate the length and hence support for an edge in the tree. It is possible to use the bootstrap and start a MCMC run for ... |

16 |
Accuracies of the simple methods for estimating the bootstrap probability of a maximum-likelihood tree
- HASEGAWA, KISHINO
- 1994
(Show Context)
Citation Context ...paper we present new results and algorithms to help approximate bootstrap support for ML and MCMC trees with minimal computational effort. Results include demonstrating that the very fast RELL method =-=[9, 14]-=- is accurate in analyses of more than four taxa. We extend and evaluate previous approximations BPP values using ML values. Methods of combining the bootstrap resampling procedures with MCMC chains ar... |

14 |
Assessing the cretaceous superordinal divergence times within birds and placental mammals by using whole mitochondrial protein sequences and an extended statistical framework. Syst Biol, 48(1):119–37
- Waddell, Cao, et al.
- 1999
(Show Context)
Citation Context ...on the BIC [18] approximation as used for trees [15, 28] we used the same DNA alignment as [28], the RNA data set above [3], and a larger mammalian mitochondrial amino-acid alignment from Figure 1 of =-=[26]-=-. The covariance matrix and other support values were output by the program PAML [31] using the mtAA rate matrix of that program. Direct posterior probabilities were estimated with MrBayes 2.01 [12], ... |

11 |
Testing the constant-rate neutral allele model with protein sequence data
- Hudson
- 1983
(Show Context)
Citation Context ...there is a sufficiently close trifurcation in the tree that the tree for a site and the species tree are different due to ancestral polymorphism (a special case of the gene-tree, species-tree problem =-=[10, 23]-=- studied by [24]). A concern is whether this will make either bootstrap proportions or BPP values poor at estimating the species tree. Another way of asking the question is whether recombination see e... |

9 |
PSeq-Gen: An application for the Monte Carlo simulation of protein sequence evolution along phylogenetic trees
- Grassly, Adachi, et al.
- 1997
(Show Context)
Citation Context ...matrix of that program. Direct posterior probabilities were estimated with MrBayes 2.01 [12], while adjusted bootstrap values used CONSEL [20]. The parametric bootstrap used Seq-Gen [17] and Pseq-Gen =-=[8]-=- to simulate datasets under the ML parameters of the model, which were then analyzed under the same model using PAUP* or MOLPHY for amino acids. Combined bootstrap/MCMC chain results done manually usi... |

8 |
Statistical methods of phylogenetic analysis: including Hadamard conjugations, LogDet transforms, and maximum likelihood
- Waddell
- 1995
(Show Context)
Citation Context ...tly close trifurcation in the tree that the tree for a site and the species tree are different due to ancestral polymorphism (a special case of the gene-tree, species-tree problem [10, 23] studied by =-=[24]-=-). A concern is whether this will make either bootstrap proportions or BPP values poor at estimating the species tree. Another way of asking the question is whether recombination see estimates of the ... |

4 |
Biological sequence analysis: probabalistic models of proteins and nucleic acids
- Durbin
- 1998
(Show Context)
Citation Context ...a simple example, the statistically optimal weighting of a set of training sequences for a Hidden Markov Model (for example, of gene search) should use a tree if the genes evolved according to a tree =-=[4]-=-. Any other weighting of the training sequences (e.g. maximum entropy weights without the biological tree) may be computationally attractive, but they cannot fully represent the information in the seq... |

3 |
Comments on the quartet puzzling method for finding maximum likelihood tree topologies
- Cao, Adachi, et al.
- 1998
(Show Context)
Citation Context ...KY model. MOLPHY [1] was used to estimate the RELL proportion for a set of trees, and PAUP was used to convert these into edge support values. The results are illustrated using the RNA alignment from =-=[3]-=-.s84 Waddell et al. Quantifying the effect of the information matrix upon the BIC [18] approximation as used for trees [15, 28] we used the same DNA alignment as [28], the RNA data set above [3], and ... |

3 |
The sampling distributions and covariance matrix of phylogenetic spectra, Molecular Biology and Evolution
- Waddell, Penny, et al.
- 1994
(Show Context)
Citation Context ...mption of sites evolving independently (the standard assumption of ML models at present), the marginal variance of a site pattern is binomially distributed with variance proportional to its frequency =-=[30]-=-. Thus, the parameters (edges) with most support from site patterns and so higher likelihood may tend to have the largest variance. This relationship can break down in cases of long edges attract (and... |

2 |
Bayesian approaches to phylogenetics: Relationships between likelihood ratios and posterior probabilities
- Kishino, Waddell, et al.
- 2000
(Show Context)
Citation Context ...ut this incurs the full cost of multiple runs. When the model does fit, BPP values tend to closely follow the ML values given sufficient data (and given that the priors are treated in equivalent ways =-=[15, 28]-=-). In this paper we present new results and algorithms to help approximate bootstrap support for ML and MCMC trees with minimal computational effort. Results include demonstrating that the very fast R... |

2 |
Statistical distribution for testing the resolved tree against the star tree
- Ota, Waddell, et al.
- 1999
(Show Context)
Citation Context .... of ∆lnL between trees is ∼ 6 − 10 lnL units. Under the model, asymptotically as sequence lengths go to infinity, trees that are close in likelihood are close to expansions of a partly resolved tree =-=[16]-=-. If a tree is resolved about one edge when the true tree is the same but unresolved on just this edge, then the likelihood difference between these trees should follow a chi-bar distribution with d.f... |

2 | The consistency of ML plus other “Predictive” methods of phylogenetic analysis and the role of - Waddell - 1998 |

2 |
Rapid evaluation of the phylogenetic congruence of sequence data using likelihood ratio tests, Molecular Biology and Evolution
- Waddell, Kishino, et al.
- 1988
(Show Context)
Citation Context ...ast as summing a vector of numbers [14]. To date its behavior has only been explored on a 4-taxon data set. RELL has also been used in applications such as the centered bootstrapping of sequence data =-=[27]-=-. While Bayesian methods output posterior probabilities (BPP) as part of their usual runs [12], these measures of stochastic variability are dependent upon the model’s accuracy. If the model of sequen... |