• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst (2003)

by S Guindon, O Gascuel
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 2,181
Next 10 →

A mitochondrial protein compendium elucidates complex I disease biology

by David J. Pagliarini, Sarah E. Calvo, Betty Chang, Sunil A. Sheth, Scott B. Vafai, Shao-en Ong, Geoffrey A. Walford, Canny Sugiana, Avihu Boneh, William K. Chen - Cell , 2008
"... Mitochondria are complex organelles whose dysfunction underlies a broad spectrum of human diseases. Identifying all of the proteins resident in this organelle and understanding how they integrate into pathways represent major challenges in cell biology. Toward this goal, we performed mass spectromet ..."
Abstract - Cited by 81 (5 self) - Add to MetaCart
Mitochondria are complex organelles whose dysfunction underlies a broad spectrum of human diseases. Identifying all of the proteins resident in this organelle and understanding how they integrate into pathways represent major challenges in cell biology. Toward this goal, we performed mass spectrometry, GFP tagging, and machine learning to create a mitochondrial compendium of 1098 genes and their protein expression across 14 mouse tissues. We link poorly characterized proteins in this inventory to known mitochondrial pathways by virtue of shared evolutionary history. Using this approach, we predict 19 proteins to be important for the function of complex I (CI) of the electron transport chain. We validate a subset of these predictions using RNAi, including C8orf38, which we further show harbors an inherited mutation in a lethal, infantile CI deficiency. Our results have important implications for understanding CI function and pathogenesis and, more generally, illustrate how our compendium can serve as a foundation for systematic investigations of mitochondria.
(Show Context)

Citation Context

...-3. Mouse genes with zero or one bacterial homologs were called ‘‘eukaryotic innovations.’’ We built a rooted phylogenetic tree of 42 eukaryotic species and a bacterial outgroup (E. coli) with PhyML (=-=Guindon and Gascuel, 2003-=-) (JTT matrix, four substitution rate categories) on the basis of ClustalW multiple alignments of six well-conserved mouse proteins (Rps16, Ak2, Drg1, Dpm1, Cct7, and Psmc3) that were concatenated and...

Estimating Species Phylogenies Using Coalescence Times among Sequences

by Liang Liu, Lili Yu, Dennis K. Pearl, Scott V. Edwards , 2009
"... The estimation of species trees (phylogenies) is one of the most important problems in evolutionary biology, and recently, there has been greater appreciation of the need to estimate species trees directly rather than using gene trees as a surrogate. A Bayesian method constructed under the multispec ..."
Abstract - Cited by 63 (9 self) - Add to MetaCart
The estimation of species trees (phylogenies) is one of the most important problems in evolutionary biology, and recently, there has been greater appreciation of the need to estimate species trees directly rather than using gene trees as a surrogate. A Bayesian method constructed under the multispecies coalescent model can consistently estimate species trees but involves intensive computation, which can hinder its application to the phylogenetic analysis of large-scale genomic data. Many summary statistics–based approaches, such as shallowest coalescences (SC) and Global LAteSt Split (GLASS), have been developed to infer species phylogenies for multilocus data sets. In this paper, we propose 2 methods, species tree estimation using average ranks of coalescences (STAR) and species tree estimation using average coalescence times (STEAC), based on the summary statistics of coalescence times. It can be shown that the 2 methods are statistically consistent under the multispecies coalescent model. STAR uses the ranks of coalescences and is thus resistant to variable substitution rates along the branches in gene trees. A simulation study suggests that STAR consistently outperforms STEAC, SC, and GLASS when the substitution rates among lineages are highly variable. Two real genomic data sets were analyzed by the 2 methods and produced species trees that are consistent with previous results. [Coalescent model; gene tree; species tree.]

Phylogenomics and the reconstruction of the tree of life

by Frédéric Delsuc, Henner Brinkmann - Nat Rev Genet , 2005
"... As more complete genomes are sequenced, phylogenetic analysis is entering a new era — that of phylogenomics. One branch of this expanding field aims to reconstruct the evolutionary history of organisms based on the analysis of their genomes. Recent studies have demonstrated the power of this approac ..."
Abstract - Cited by 54 (2 self) - Add to MetaCart
As more complete genomes are sequenced, phylogenetic analysis is entering a new era — that of phylogenomics. One branch of this expanding field aims to reconstruct the evolutionary history of organisms based on the analysis of their genomes. Recent studies have demonstrated the power of this approach, which has the potential to provide answers to a number of fundamental evolutionary questions. However, challenges for the future have also been revealed. The very nature of the evolutionary history of organisms and the limitations of current phylogenetic reconstruction methods mean that part of the tree of life halsde-00193293, version 1- 3 Dec 2007 may prove difficult, if not impossible, to resolve with confidence. Introductory paragraph Understanding phylogenetic relationships between organisms is a prerequisite of almost any evolutionary study, as contemporary species all share a common history through their ancestry. The notion of phylogeny follows directly from the theory of evolution presented by Charles Darwin in “The Origin of Species ” 1: the only illustration in his famous book is the first representation of evolutionary relationships among species, in the form of a

FastTree: computing large minimum evolution trees with profiles instead of a distance matrix

by Morgan N. Price, Paramvir S. Dehal, Adam P. Arkin - Mol. Biol. Evol , 2009
"... As DNA sequencing accelerates, gene families are growing rapidly, but stan-dard methods for inferring phylogenies become computationally demanding for alignments of thousands of sequences. We present FastTree, a method for con-structing large phylogenies and for estimating their reliability. Instead ..."
Abstract - Cited by 52 (0 self) - Add to MetaCart
As DNA sequencing accelerates, gene families are growing rapidly, but stan-dard methods for inferring phylogenies become computationally demanding for alignments of thousands of sequences. We present FastTree, a method for con-structing large phylogenies and for estimating their reliability. Instead of stor-ing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining, and uses heuristics to quickly identify candidate joins. FastTree then refines the topol-ogy with nearest-neighbor interchanges according to the minimum-evolution criterion. Compared to using a distance matrix, FastTree reduces the memory required from O(N2) to O(NLa + N N) and reduces the computation time from O(N2L) to O(N N log(N)La), where N is the number of sequences, L is the width of the alignment, and a is the size of the alphabet. To estimate the tree’s reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over distance matrix approaches. FastTree constructed trees, includ-ing support values, for biological alignments with 39,092 or 158,022 distinct sequences in less time than it takes to compute a distance matrix and in a frac-tion of the space. Traditional neighbor joining with 100 bootstraps would be 10,000 times slower and would require 50 gigabytes of memory. In simulations, FastTree is slightly more accurate than other minimum-evolution methods such as neighbor joining, BIONJ, or FastME, and on genuine alignments, FastTree produces topologies with higher likelihoods. FastTree is available at
(Show Context)

Citation Context

...alignment positions that were over 25% gaps. For alignments of up to 1,250 sequences, we inferred a maximum-likelihood phylogeny using PhyML 3.0 with the JTT model and no rate variation across sites (=-=Guindon and Gascuel, 2003-=-). Given the topology, we inferred the evolutionary rate of each site using proml from the phylip package, gamma-distributed rates (8 categories), and a coefficient of variation of 1 (http://evolution...

Evidence for multiple reversals of asymmetric mutational constraints during the evolution of the mitochondrial genome of metazoa, and consequences for phylogenetic inferences

by Alexandre Hassanin, Jean Deutsch - Syst. Biol , 2005
"... Abstract.—Mitochondrial DNA (mtDNA) sequences are commonly used for inferring phylogenetic relationships. However, the strand-specific bias in the nucleotide composition of the mtDNA, which is thought to reflect asymmetric mutational constraints, combined with the important compositional heterogenei ..."
Abstract - Cited by 50 (0 self) - Add to MetaCart
Abstract.—Mitochondrial DNA (mtDNA) sequences are commonly used for inferring phylogenetic relationships. However, the strand-specific bias in the nucleotide composition of the mtDNA, which is thought to reflect asymmetric mutational constraints, combined with the important compositional heterogeneity among taxa, are known to be highly problematic for phylogenetic analyses. Here, nucleotide composition was compared across 49 species of Metazoa (34 arthropods, 2 annelids, 2 molluscs, and 11 deuterosomes), and analyzed for a mtDNA fragment including six protein-coding genes, i.e., atp6, atp8, cox1, cox2, cox3, and nad2. The analyses show that most metazoan species present a clear strand asymmetry, where one strand is biased in favor of A and C, whereas the other strand has a reverse bias, i.e., in favor of T and G. The origin of this strand bias can be related to asymmetric mutational constraints involving deaminations of A and C nucleotides during the replication and/or transcription processes. The analyses reveal that six unrelated genera are characterized by a reversal of the usual strand bias, i.e., Argiope (Araneae), Euscorpius (Scorpiones), Tigriopus (Maxillopoda), Branchiostoma (Cephalochordata), Florometra (Echinodermata), and Katharina (Mollusca). It is proposed that asymmetric mutational constraints have been independently reversed in these six genera, through an inversion of the control region, i.e., the region that contains most regulatory elements for replication and transcription of the mtDNA. We show that reversals of asymmetric mutational
(Show Context)

Citation Context

...so obtained under the ML method by using the program SEQBOOT in the PHYLIP package Version 3.6b (Felsenstein, 2004) for generating 100 bootstrapped data sets, and by analysing the latters with PHYML (=-=Guindon and Gascuel, 2003-=-). RESULTS Nucleotide Composition at Synonymous Third Codon Positions The nucleotide compositions at synonymous third codon positions of the mt fragment including the six coding genes atp6 and atp8, c...

Phylogenetic models of rate heterogeneity: A high performance computing perspective

by Alexandros Stamatakis - In Proceedings of the 20th Internationational Parallel and Distributed Processing Symposium (IPDPS , 2006
"... Inference of phylogenetic trees using the maximum likelihood (ML) method is NP-hard. Furthermore, the computation of the likelihood function for huge trees of more than 1,000 organisms is computationally intensive due to a large amount of floating point operations and high memory consumption. Within ..."
Abstract - Cited by 43 (9 self) - Add to MetaCart
Inference of phylogenetic trees using the maximum likelihood (ML) method is NP-hard. Furthermore, the computation of the likelihood function for huge trees of more than 1,000 organisms is computationally intensive due to a large amount of floating point operations and high memory consumption. Within this context, the present paper compares two competing mathematical models that account for evolutionary rate heterogeneity: the Γ and CAT models. The intention of this paper is to show that—from a purely empirical point of view—CAT can be used instead of Γ. The main advantage of CAT over Γ consists in significantly lower memory consumption and faster inference times. An experimental study using RAxML has been performed on 19 real-world datasets comprising 73 up to 1,663 DNA sequences. Results show that CAT is on average 5.5 times faster than Γ and—surprisingly enough—also yields trees with slightly superior Γ likelihood values. The usage of the CAT model decreases the amount of average L2 and L3 cache misses by factor 8.55. 1.
(Show Context)

Citation Context

...rithmic complexity and the high computational cost of the ML function, significant progress has been achieved with the release of fast and accurate sequential and parallel programs such as e.g. PHYML =-=[8]-=-, IQPNNI [16], MetaPIGA [13], TreeFinder [11], GAML [2], TREE-PUZZLE [22] and RAxML [23]. Typically, these programs allow for inference of 1,000 taxon trees on a single CPU in reasonable times. Since ...

A specific genetic background is required for acquisition and expression of virulence factors

by Patricia Escobar-Páramo , Olivier Clermont , Anne-Béatrice Blanc-Potard , Hung Bui , Chantal Le Bouguénec , Erick Denamur - in Escherichia coli,”Molecular Biology and Evolution, , 2004
"... In bacteria, the evolution of pathogenicity seems to be the result of the constant arrival of virulence factors (VFs) into the bacterial genome. However, the integration, retention, and/or expression of these factors may be the result of the interaction between the new arriving genes and the bacter ..."
Abstract - Cited by 43 (13 self) - Add to MetaCart
In bacteria, the evolution of pathogenicity seems to be the result of the constant arrival of virulence factors (VFs) into the bacterial genome. However, the integration, retention, and/or expression of these factors may be the result of the interaction between the new arriving genes and the bacterial genomic background. To test this hypothesis, a phylogenetic analysis was done on a collection of 98 Escherichia coli/Shigella strains representing the pathogenic and commensal diversity of the species. The distribution of 17 VFs associated to the different E. coli pathovars was superimposed on the phylogenetic tree. Three major types of VFs can be recognized: (1) VFs that arrive and are expressed in different genetic backgrounds (such as VFs associated with the pathovars of mild chronic diarrhea: enteroaggregative, enteropathogenic, and diffusely-adhering E. coli), (2) VFs that arrive in different genetic backgrounds but are preferentially found, associated with a specific pathology, in only one particular background (such as VFs associated with extraintestinal diseases), and (3) VFs that require a particular genetic background for the arrival and expression of their virulence potential (such as VFs associated with pathovars typical of severe acute diarrhea: enterohemorragic, enterotoxigenic, and enteroinvasive E. coli strains). The possibility of a single arrival of VFs by chance, followed by a vertical transmission, was ruled out by comparing the evolutionary histories of some of these VFs to the strain phylogeny. These evidences suggest that important changes in the genome of E. coli have occurred during the diversification of the species, allowing the virulence factors associated with severe acute diarrhea to arrive in the population. Thus, the E. coli genome seems to be formed by an ''ancestral'' and a ''derived'' background, each one responsible for the acquisition and expression of different virulence factors.
(Show Context)

Citation Context

...quences were aligned using the Clustal program (Higgins, Bleasby, and Fuchs 1992) from the Sequence Navigator package. Neighbor-joining analyses were performed using the BioNJ method of PAUP* version 4.0 (Swofford 2002). The semistrict consensus trees, as well as the bootstrap trees, were obtained using maximum parsimony as the optimality criteria, with the heuristic search of PAUP* 4.0 with 1,000 iterations. The starting tree for the analyses was constructed via stepwise addition with the TBR branch-swapping algorithm. Maximum-likelihood and Bayesian analyses were performed using the PHYML (Guindon and Gascuel 2003) and MrBayes version 2.01 (Huelsenbeck and Ronquist 2001) programs, respectively. Results The Strain Phylogeny The general topology of the semistrict consensus tree of the 98 E. coli/Shigella strains analyzed (with E. fergusonii as the outgroup) based on simultaneous analysis of the sequence data of the six essential genes (trpA, trpB, pabB, putP, icd, and polB) is shown in figure 1. Six major groups of E. coli (A, B1, C, E, D, and B2), in addition to the different Shigella monophyletic groups form the core of the E. coli species. Groups A, B1, D, and B2 have been reported previously as the ma...

TranslatorX: multiple alignment of nucleotide

by Federico Abascal, Rafael Zardoya, Maximilian J. Telford , 2010
"... sequences guided by amino acid translations ..."
Abstract - Cited by 42 (4 self) - Add to MetaCart
sequences guided by amino acid translations
(Show Context)

Citation Context

...le 2). Next, we tested the reliability of the different alignments by analysing their phylogenetic performance. Phylogenetic trees were inferred using the maximum likelihood-based software Phyml v3.0 =-=(19)-=- using the best-fit model GTR+I+G as identified by ModelTest (20). In spite of the differences between alignments, the topology of tree recovered was stable. To measure the benefits of TranslatorX fur...

Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene

by Nicolas Rodrigue , Nicolas Lartillot , David Bryant , Hervé Philippe , 2005
"... Abstract Standard likelihood-based frameworks in phylogenetics consider the process of evolution of a sequence site by site. Assuming that sites evolve independently greatly simplifies the required calculations. However, this simplification is known to be incorrect in many cases. Here, a computatio ..."
Abstract - Cited by 41 (11 self) - Add to MetaCart
Abstract Standard likelihood-based frameworks in phylogenetics consider the process of evolution of a sequence site by site. Assuming that sites evolve independently greatly simplifies the required calculations. However, this simplification is known to be incorrect in many cases. Here, a computational method that allows for general dependence between sites of a sequence is investigated. Using this method, measures acting as sequence fitness proxies can be considered over a phylogenetic tree. In this work, a set of statistically derived amino acid pairwise potentials, developed in the context of protein threading, is used to account for what we call the structural fitness of a sequence. We describe a model combining statistical potentials with an empirical amino acid substitution matrix. We propose such a combination as a useful way of capturing the complexity of protein evolution. Finally, we outline features of the model using three datasets and show the approach's sensitivity to different tree topologies. D
(Show Context)

Citation Context

... (CAA27994), Nannospalax ehrenbergi (P04248), Homo sapiens (CAA25109), Gorilla gorilla (P02147), Ornithorhynchus anatinus (P02196) and Tachyglossus aculeatus (P02195). ! MYO4-153: This is also a dataset of myoglobin sequences, here taken from the 4 species P. catodon (P02185), O. orca (P02173), Graptemys geographica (P02201) and Chelonia mydas caranigra (MYTTG). N. Rodrigue et al. / Gene 347 (2005) 207–217 209We worked under a fixed tree topology for all datasets. Topologies were obtained by ML, under a JTT+F model, with gamma+invariant distributed rates across sites, using the PhyML program (Guindon and Gascuel, 2003). Protein structures are assumed constant throughout the tree. In practice, we used as a reference the structure of one of the sequences in the dataset, as determined by X-ray crystallography. The structure of E. coli (PDB code: 1HKA) was used as a reference for the PPK10-158 dataset, and that of P. catodon (PDB code: 1MBD) for both MYO10-153 and MYO4-153 datasets. Imposing the structure of a reference sequence on other sequences is simplified when these are of identical length. Therefore, we constructed alignments without gaps. This was accomplished using sequences of the same length as the r...

A.: Initial Experiences Porting a Bioinformatics Application to a Graphics

by Maria Charalambous, Pedro Trancoso, Ros Stamatakis - Processor. Lecture Notes in Computer Science 3746 , 2005
"... Abstract. Bioinformatics applications are one of the most relevant and compute-demanding applications today. While normally these applica-tions are executed on clusters or dedicated parallel systems, in this work we explore the use of an alternative architecture. We focus on exploiting the compute-i ..."
Abstract - Cited by 41 (3 self) - Add to MetaCart
Abstract. Bioinformatics applications are one of the most relevant and compute-demanding applications today. While normally these applica-tions are executed on clusters or dedicated parallel systems, in this work we explore the use of an alternative architecture. We focus on exploiting the compute-intensive characteristics offered by the graphics processors (GPU) in order to accelerate a bioinformatics application. The GPU is a good match for these applications as it is an inexpensive, high-performance SIMD architecture. In our initial experiments we evaluate the use of a regular graphics card to improve the performance of RAxML, a bioinformatics program for phylogenetic tree inference. In this paper we focus on porting to the GPU the most time-consuming loop, which accounts for nearly 50 % of the total execution time. The preliminary results show that the loop code achieves a speedup of 3x while the whole application with a single loop optimization, achieves a speedup of 1.2x. 1
(Show Context)

Citation Context

...on. This means that a phylogenetic analysis with an elaborate model such as ML requires significantly more time but yields trees with superior accuracy than Neighbor Joining [16] or Maximum Parsimony =-=[17, 18]-=-. However, due to the higher accuracy it is desirable to infer complex large trees with ML. The current version of RAxML incorporates novel fast hill climbing and simulated annealing heuristics and is...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University