Results 1  10
of
1,486
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by . . .
, 2003
"... The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The ..."
Abstract

Cited by 1332 (22 self)
 Add to MetaCart
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The core of this method is a simple hillclimbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distancebased method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximumlikelihood programs and much higher than the performance of distancebased and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximumlikelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distancebased and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/. [Algorithm; computer simulations; maximum likelihood; phylogeny; rbcL; RDPII project.] The size of homologous sequence data sets has increased dramatically in recent years, and many of these data sets now involve several hundreds of taxa. Moreover, current probabilist...
Model selection and model averaging in phylogenetics: Advantages of the AIC and Bayesian approaches over likelihood ratio tests. Syst. Biol
, 2004
"... Abstract.—Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects o ..."
Abstract

Cited by 270 (5 self)
 Add to MetaCart
Abstract.—Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (modelaveraged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AICbased model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus (genus Carabus) ground beetles described by Sota and Vogler (2001). [AIC; Bayes factors; BIC; likelihood ratio tests; model averaging; model uncertainty; model selection; multimodel inference.] It is clear that models of nucleotide substitution (henceforth models of evolution) play a significant role
Likelihoodbased tests of topologies in phylogenetics. Syst. Biol
, 2000
"... Abstract.—Likelihoodbased statistical tests of competing evolutionary hypotheses (tree topologies) have been available for approximately a decade. By far the most commonly used is the Kishino–Hasegawa test. However, the assumptions that have to be made to ensure the validity of the Kishino–Hasegawa ..."
Abstract

Cited by 165 (2 self)
 Add to MetaCart
Abstract.—Likelihoodbased statistical tests of competing evolutionary hypotheses (tree topologies) have been available for approximately a decade. By far the most commonly used is the Kishino–Hasegawa test. However, the assumptions that have to be made to ensure the validity of the Kishino–Hasegawa test place important restrictions on its applicability. In particular, it is only valid when the topologies being compared are speci�ed a priori. Unfortunately, this means that the Kishino–Hasegawa test may be severely biased in many cases in which it is now commonly used: for example, in any case in which one of the competing topologies has been selected for testing because it is the maximum likelihood topology for the data set at hand. We review the theory of the Kishino–Hasegawa test and contend that for the majority of popular applications this test should not be used. Previously published results from invalid applications of the Kishino–Hasegawa test should be treated extremely cautiously, and future applications should use appropriate alternative tests instead. We review such alternative tests, both nonparametric and parametric, and give two examples which illustrate the importance of our contentions. [Kishino– Hasegawa test; maximum likelihood; phylogeny; Shimodaira–Hasegawa test; statistical tests; tree topology.] Hasegawa and Kishino (1989) and Kishino and Hasegawa(1989)developed methods for estimating the standard error and con�dence intervals for the difference in loglikelihoods between two topologically distinct phylogenetic trees representing hypotheses that might explain particular aligned sequence data sets. The method initially was introduced to compute con�dence intervals on posterior probabilities for topologies in a
Full reconstruction of Markov models on evolutionary trees: identifiability and consistency
 Math. Biosci
, 1996
"... A Markov model of evolution of characters on a phylogenetic tree consists of a tree topology together with a speci cation of probability transition matrices on the edges of the tree. Previous work has shown that under mild conditions, the tree topology may be reconstructed, in the sense that the top ..."
Abstract

Cited by 86 (0 self)
 Add to MetaCart
(Show Context)
A Markov model of evolution of characters on a phylogenetic tree consists of a tree topology together with a speci cation of probability transition matrices on the edges of the tree. Previous work has shown that under mild conditions, the tree topology may be reconstructed, in the sense that the topology is identi able from knowledge of the joint distribution of character states at pairs of terminal nodes of the tree. Also, the method of maximum likelihood is statistically consistent for inferring the tree topology. In this paper we answer the analogous questions for reconstruction of the full model, including the edge transition matrices: under mild conditions, such full reconstruction is achievable, not by using pairs of terminal nodes, but rather by using triples of terminal nodes. The identi ability result generalizes previous results that were restricted either to characters having two states or to transition matrices having special structure. The proof develops matrix relationships that may be exploited to identify the model. We also use the identi ability result to prove that the method of maximum likelihood is consistent for reconstructing the full model. 1 markov models on evolutionary trees 2 1
Inferring Evolutionary Trees with Strong Combinatorial Evidence
 THEORETICAL COMPUTER SCIENCE
, 1997
"... We consider the problem of inferring the evolutionary tree of a set of n species. We propose a quartet reconstruction method which specifically produces trees whose edges have strong combinatorial evidence. Let Q be a set of resolved quartets defined on the studied species, the method computes th ..."
Abstract

Cited by 77 (13 self)
 Add to MetaCart
We consider the problem of inferring the evolutionary tree of a set of n species. We propose a quartet reconstruction method which specifically produces trees whose edges have strong combinatorial evidence. Let Q be a set of resolved quartets defined on the studied species, the method computes the unique maximum subset Q of Q which is equivalent to a tree and outputs the corresponding tree as an estimate of the species' phylogeny. We use a characterization of the subset Q due to [6] to provide an O(n 4 ) incremental algorithm for this variant of the NPhard quartet consistency problem. Moreover, when chosing the resolution of the quartets by the FourPoint Method (FPM) and considering the CavenderFarris model of evolution, we show that the convergence rate of the Q method is at worst polynomial when the maximum evolutive distance between two species is bounded. We complete these theoretical results by an experimental study on real and simulated data sets. The results ...
Fast and Accurate Phylogeny Reconstruction Algorithms Based on the MinimumEvolution Principle
 JOURNAL OF COMPUTATIONAL BIOLOGY
, 2002
"... The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be statistically consistent when it is used in conjunction with ordinary leastsquares (OLS) fitting of a metric to a tree structure. The traditional approach to using ME has been to start with the Neighbor Joining (NJ) to ..."
Abstract

Cited by 75 (6 self)
 Add to MetaCart
The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be statistically consistent when it is used in conjunction with ordinary leastsquares (OLS) fitting of a metric to a tree structure. The traditional approach to using ME has been to start with the Neighbor Joining (NJ) topology for a given matrix and then do a topological search from that starting point. The first stage requires O(n³) time, where n is the number of taxa, while the current implementations of the second are in O(p n³) or more, where p is the number of swaps performed by the program. In this paper, we examine a greedy approach to minimum evolution which produces a starting topology in O(n²) time. Moreover, we provide an algorithm that searches for the best topology using nearest neighbor interchanges (NNIs), where the cost of doing p NNIs is O(n² C p n), i.e., O(n²) in practice because p is always much smaller than n. The Greedy Minimum Evolution (GME) algorithm, when used in combination with NNIs, produces trees which are fairly close to NJ trees in terms of topological accuracy. We also examine ME under a balanced weighting scheme, where sibling subtrees have equal weight, as opposed to the standard “unweighted ” OLS, where
Phylogenetic Tree Construction Using Markov Chain Monte Carlo
, 1999
"... We describe a Bayesian method based on Markov chain simulation to study the phylogenetic relationship in a group of DNA sequences. Under simple models of mutational events, our method produces a Markov chain whose stationary distribution is the conditional distribution of the phylogeny given the obs ..."
Abstract

Cited by 74 (0 self)
 Add to MetaCart
We describe a Bayesian method based on Markov chain simulation to study the phylogenetic relationship in a group of DNA sequences. Under simple models of mutational events, our method produces a Markov chain whose stationary distribution is the conditional distribution of the phylogeny given the observed sequences. Our algorithm strikes a reasonable balance between the desire to move globally through the space of phylogenies and the need to make computationally feasible moves in areas of high probability. Since phylogenetic information is described by a tree, we have created new diagnostics to handle this type of data structure. An important byproduct of the Markov chain Monte Carlo phylogeny building technique is that it provides estimates and corresponding measures of variability for any aspect of the phylogeny under study.
Modelling the Covarion Hypothesis of Nucleotide Substitution
 Math. Biosci
, 1998
"... A "covarion" model for nucleotide substitution which allows sites to turn "on" and "off" with time was proposed 27 years ago by Fitch and Markowitz. It has been argued recently that evidence supports such models over later, alternative models which postulate a static di ..."
Abstract

Cited by 59 (3 self)
 Add to MetaCart
(Show Context)
A "covarion" model for nucleotide substitution which allows sites to turn "on" and "off" with time was proposed 27 years ago by Fitch and Markowitz. It has been argued recently that evidence supports such models over later, alternative models which postulate a static distribution of rates across sites. However, in contrast to these latter wellstudied models, little is known about the analytic properties of the former model. Here we analyse a covarionstyle model and show (i) how to obtain the evolutionary distance between two species from the expected proportion of sites where two species differ, (ii) that the covarion model gives identical results to a suitably chosen ratesacrosssites model if several sequences are compared in pairs using only the expected proportion of sites at which they differ, and give conditions under which the two models will give identical results if the full joint probability matrix is examined, (iii) that the two models can, in principle, be distinguished when there are at least four monophyletic groups of species. This last result is based on a distance measure which is tree additive under certain versions of the covarion model but which, in general, will not be additive under a ratesacrosssites model. The measure constructed does not require knowledge of the parameters of the model and so shows that sequences generated by the covarion model do in fact contain information about the underlying tree.
Identification of a novel Gammaretrovirus in prostate tumors of patients homozygous for R462Q RNASEL variant. PLoS Pathog 2: e25
, 2006
"... Ribonuclease L (RNase L) is an important effector of the innate antiviral response. Mutations or variants that impair function of RNase L, particularly R462Q, have been proposed as susceptibility factors for prostate cancer. Given the role of this gene in viral defense, we sought to explore the poss ..."
Abstract

Cited by 47 (7 self)
 Add to MetaCart
(Show Context)
Ribonuclease L (RNase L) is an important effector of the innate antiviral response. Mutations or variants that impair function of RNase L, particularly R462Q, have been proposed as susceptibility factors for prostate cancer. Given the role of this gene in viral defense, we sought to explore the possibility that a viral infection might contribute to prostate cancer in individuals harboring the R462Q variant. A viral detection DNA microarray composed of oligonucleotides corresponding to the most conserved sequences of all known viruses identified the presence of gammaretroviral sequences in cDNA samples from seven of 11 R462Qhomozygous (QQ) cases, and in one of eight heterozygous (RQ) and homozygous wildtype (RR) cases. An expanded survey of 86 tumors by specific RTPCR detected the virus in eight of 20 QQ cases (40%), compared with only one sample (1.5%) among 66 RQ and RR cases. The fulllength viral genome was cloned and sequenced independently from three positive QQ cases. The virus, named XMRV, is closely related to xenotropic murine leukemia viruses (MuLVs), but its sequence is clearly distinct from all known members of this group.