Results 1  10
of
174
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by . . .
, 2003
"... The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The ..."
Abstract

Cited by 851 (14 self)
 Add to MetaCart
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The core of this method is a simple hillclimbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distancebased method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximumlikelihood programs and much higher than the performance of distancebased and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximumlikelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distancebased and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/. [Algorithm; computer simulations; maximum likelihood; phylogeny; rbcL; RDPII project.] The size of homologous sequence data sets has increased dramatically in recent years, and many of these data sets now involve several hundreds of taxa. Moreover, current probabilist...
Invitation to FixedParameter Algorithms
, 2002
"... Contents 1. Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Keep the Parameter Fixed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Preliminaries and Agreements . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..."
Abstract

Cited by 293 (73 self)
 Add to MetaCart
Contents 1. Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Keep the Parameter Fixed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Preliminaries and Agreements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Parameterized Complexitya Brief Overview . . . . . . . . . . . . . . 6 1.3.1 Basic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.2 Interpreting FixedParameter Tractability . . . . . . . . . . . 9 1.4 Vertex Cover  an Illustrative Example . . . . . . . . . . . . . . . . . 11 1.4.1 Parameterize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.2 Specialize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.3 Generalize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.4 Count or Enumerate . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Likelihoodbased tests of topologies in phylogenetics. Syst. Biol
, 2000
"... Abstract.—Likelihoodbased statistical tests of competing evolutionary hypotheses (tree topologies) have been available for approximately a decade. By far the most commonly used is the Kishino–Hasegawa test. However, the assumptions that have to be made to ensure the validity of the Kishino–Hasegawa ..."
Abstract

Cited by 75 (3 self)
 Add to MetaCart
Abstract.—Likelihoodbased statistical tests of competing evolutionary hypotheses (tree topologies) have been available for approximately a decade. By far the most commonly used is the Kishino–Hasegawa test. However, the assumptions that have to be made to ensure the validity of the Kishino–Hasegawa test place important restrictions on its applicability. In particular, it is only valid when the topologies being compared are speci�ed a priori. Unfortunately, this means that the Kishino–Hasegawa test may be severely biased in many cases in which it is now commonly used: for example, in any case in which one of the competing topologies has been selected for testing because it is the maximum likelihood topology for the data set at hand. We review the theory of the Kishino–Hasegawa test and contend that for the majority of popular applications this test should not be used. Previously published results from invalid applications of the Kishino–Hasegawa test should be treated extremely cautiously, and future applications should use appropriate alternative tests instead. We review such alternative tests, both nonparametric and parametric, and give two examples which illustrate the importance of our contentions. [Kishino– Hasegawa test; maximum likelihood; phylogeny; Shimodaira–Hasegawa test; statistical tests; tree topology.] Hasegawa and Kishino (1989) and Kishino and Hasegawa(1989)developed methods for estimating the standard error and con�dence intervals for the difference in loglikelihoods between two topologically distinct phylogenetic trees representing hypotheses that might explain particular aligned sequence data sets. The method initially was introduced to compute con�dence intervals on posterior probabilities for topologies in a
Inferring Evolutionary Trees with Strong Combinatorial Evidence
 THEORETICAL COMPUTER SCIENCE
, 1997
"... We consider the problem of inferring the evolutionary tree of a set of n species. We propose a quartet reconstruction method which specifically produces trees whose edges have strong combinatorial evidence. Let Q be a set of resolved quartets defined on the studied species, the method computes th ..."
Abstract

Cited by 71 (11 self)
 Add to MetaCart
We consider the problem of inferring the evolutionary tree of a set of n species. We propose a quartet reconstruction method which specifically produces trees whose edges have strong combinatorial evidence. Let Q be a set of resolved quartets defined on the studied species, the method computes the unique maximum subset Q of Q which is equivalent to a tree and outputs the corresponding tree as an estimate of the species' phylogeny. We use a characterization of the subset Q due to [6] to provide an O(n 4 ) incremental algorithm for this variant of the NPhard quartet consistency problem. Moreover, when chosing the resolution of the quartets by the FourPoint Method (FPM) and considering the CavenderFarris model of evolution, we show that the convergence rate of the Q method is at worst polynomial when the maximum evolutive distance between two species is bounded. We complete these theoretical results by an experimental study on real and simulated data sets. The results ...
A Practical Algorithm for Recovering the Best Supported Edges of an Evolutionary Tree (Extended Abstract)
, 2000
"... ) Vincent Berry David Bryant y Tao Jiang z Paul Kearney x Ming Li  Todd Wareham k Haoyong Zhang Abstract It is now routine for biologists to conduct evolutionary analyses of large DNA and protein sequence datasets. A computational bottleneck in these analyses is the recovery of the t ..."
Abstract

Cited by 46 (7 self)
 Add to MetaCart
) Vincent Berry David Bryant y Tao Jiang z Paul Kearney x Ming Li  Todd Wareham k Haoyong Zhang Abstract It is now routine for biologists to conduct evolutionary analyses of large DNA and protein sequence datasets. A computational bottleneck in these analyses is the recovery of the topology of the evolutionary tree for a set of sequences. This paper presents a practical solution to this challenging problem. In particular, a new technique, called hypercleaning, is presented that can be combined with various treebuilding algorithms to efficiently reconstruct from sequence data the best supported edges of the evolutionary tree. More precisely, the hypercleaning technique computes from sequence data a small subset of edges that is likely to contain most edges of the correct Address: D'epartement d'Informatique Fondamentale et Applications, LIRMM, Universit'e de Montpellier II, France. Part of this work was done at the D'epartement de Math'ematiques,EURISE, Universit'e de S...
Quartet Cleaning: Improved Algorithms and Simulations
, 1999
"... A critical step in all quartet methods for constructing evolutionary trees is the inference of the topology for each set of four species (i.e. quartet). It is a wellknown fact that all quartet topology inference methods make mistakes that result in the incorrect inference of quartet topology. These ..."
Abstract

Cited by 42 (2 self)
 Add to MetaCart
A critical step in all quartet methods for constructing evolutionary trees is the inference of the topology for each set of four species (i.e. quartet). It is a wellknown fact that all quartet topology inference methods make mistakes that result in the incorrect inference of quartet topology. These mistakes are called quartet errors. In this paper, two efficient algorithms for correcting bounded numbers of quartet errors are presented. These "quartet cleaning" algorithms are shown to be optimal in that no algorithm can correct more quartet errors. An extensive simulation study reveals that sets of quartet topologies inferred by three popular methods (Neighbor Joining [15], Ordinal Quartet [14] and Maximum Parsimony [10]) almost always contain quartet errors and that a large portion of these quartet errors are corrected by the quartet cleaning algorithms.
Orchestrating Quartets: Approximation and Data Correction (Extended Abstract)
 Proceedings of the 39th IEEE Symposium on Foundations of Computer Science
, 1998
"... Inferring evolutionary trees has long been a challenging problem both for biologists and computer scientists. In recent years research has concentrated on the quartet method paradigm for inferring evolutionary trees. Quartet methods proceed by first inferring the evolutionary history for every set o ..."
Abstract

Cited by 37 (5 self)
 Add to MetaCart
Inferring evolutionary trees has long been a challenging problem both for biologists and computer scientists. In recent years research has concentrated on the quartet method paradigm for inferring evolutionary trees. Quartet methods proceed by first inferring the evolutionary history for every set of four species (resulting in a set Q of inferred quartet topologies) and then recombining these inferred quartet topologies to form an evolutionary tree. This paper presents two results on the quartet method paradigm. The first is a polynomial time approximation scheme (PTAS) for recombining the inferred quartet topologies optimally. This is an important result since, to date, there have been no polynomialtime algorithms with performance guarantees for quartet methods. In fact, this is the first known PTAS for inferring evolutionary trees under any paradigm. To achieve this result the natural denseness of the set Q is exploited. The second...
Models of molecular evolution and phylogeny
 Genome Res
, 1998
"... Phylogenetic reconstruction is a fastgrowing field that is enriched by different statistical approaches and by findings and applications in a broad range of biological areas. Fundamental to these are the mathematical models used to describe the patterns of DNA base substitution and amino acid repla ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
Phylogenetic reconstruction is a fastgrowing field that is enriched by different statistical approaches and by findings and applications in a broad range of biological areas. Fundamental to these are the mathematical models used to describe the patterns of DNA base substitution and amino acid replacement. These may become some of the basic models for comparative genome research. We discuss these models, including the analysis of observed DNA base and amino acid mutation patterns, the concept of site heterogeneity, and the incorporation of structural biology data, all of which have become particularly important in recent years. We also describe the use of such models in phylogenetic reconstruction and statistical methods for the comparison of different models. PCR has deeply transformed and boosted phylogenetic studies. At the same time, the statistical analysis of evolutionary relationships among species has recently revealed important biotechnological uses. For example, the understanding of viral quasispecies variation allows us to trace routes of infectious disease transmission. The analysis of the host–
A Polynomial Time Approximation Scheme for Inferring Evolutionary Trees from Quartet Topologies and Its Application
 SIAM Journal on Computing
, 2000
"... . Inferring evolutionary trees has long been a challenging problem both for biologists and computer scientists. In recent years research has concentrated on the quartet method paradigm for inferring evolutionary trees. Quartet methods proceed by first inferring the evolutionary history for every set ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
. Inferring evolutionary trees has long been a challenging problem both for biologists and computer scientists. In recent years research has concentrated on the quartet method paradigm for inferring evolutionary trees. Quartet methods proceed by first inferring the evolutionary history for every set of four species (resulting in a set Q of inferred quartet topologies) and then recombining these inferred quartet topologies to form an evolutionary tree. This paper presents two results on the quartet method paradigm. The first is a polynomial time approximation scheme (PTAS) for recombining the inferred quartet topologies optimally. This is an important result since, to date, there have been no polynomial time algorithms with performance guarantees for quartet methods. To achieve this result the natural denseness of the set Q is exploited. The second result is a new technique, called quartet cleaning, that detects and corrects errors in the set Q with performance guarantees. This result h...
Scaling up accurate phylogenetic reconstruction from geneorder data
, 2002
"... Motivation: Phylogenetic reconstruction from geneorder data has attracted increasing attention from both biologists and computer scientists over the last few years. Methods used in reconstruction include distancebased methods (such as neighborjoining), parsimony methods using sequencebased encod ..."
Abstract

Cited by 30 (14 self)
 Add to MetaCart
Motivation: Phylogenetic reconstruction from geneorder data has attracted increasing attention from both biologists and computer scientists over the last few years. Methods used in reconstruction include distancebased methods (such as neighborjoining), parsimony methods using sequencebased encodings, Bayesian approaches, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach, but cannot handle more than about 15 genomes of limited size (e.g., organelles). Results: We report here on our successful efforts to scale up direct optimization through a twostep approach: the first step decomposes the dataset into smaller pieces and runs the direct optimization (GRAPPA) on the smaller pieces, while the second step builds a tree from the results obtained on the smaller pieces. We used the sophisticated diskcovering method (DCM) pioneered by Warnow and her group, suitably modified to take into account the computational limitations of GRAPPA. We find that DCMGRAPPA scales gracefully to at least 1,000 genomes of a few hundred genes each and retains surprisingly high accuracy throughout the range: in our experiments, the topological error rate rarely exceeded a few percent. Thus, reconstruction based on geneorder data can now be accomplished with high accuracy on datasets of significant size. Availability: All of our software is available in source form under GPL at www.compbio.unm.edu Contact: