Results 11 - 20
of
2,181
A mitochondrial protein compendium elucidates complex I disease biology
- Cell
, 2008
"... Mitochondria are complex organelles whose dysfunction underlies a broad spectrum of human diseases. Identifying all of the proteins resident in this organelle and understanding how they integrate into pathways represent major challenges in cell biology. Toward this goal, we performed mass spectromet ..."
Abstract
-
Cited by 81 (5 self)
- Add to MetaCart
(Show Context)
Mitochondria are complex organelles whose dysfunction underlies a broad spectrum of human diseases. Identifying all of the proteins resident in this organelle and understanding how they integrate into pathways represent major challenges in cell biology. Toward this goal, we performed mass spectrometry, GFP tagging, and machine learning to create a mitochondrial compendium of 1098 genes and their protein expression across 14 mouse tissues. We link poorly characterized proteins in this inventory to known mitochondrial pathways by virtue of shared evolutionary history. Using this approach, we predict 19 proteins to be important for the function of complex I (CI) of the electron transport chain. We validate a subset of these predictions using RNAi, including C8orf38, which we further show harbors an inherited mutation in a lethal, infantile CI deficiency. Our results have important implications for understanding CI function and pathogenesis and, more generally, illustrate how our compendium can serve as a foundation for systematic investigations of mitochondria.
Estimating Species Phylogenies Using Coalescence Times among Sequences
, 2009
"... The estimation of species trees (phylogenies) is one of the most important problems in evolutionary biology, and recently, there has been greater appreciation of the need to estimate species trees directly rather than using gene trees as a surrogate. A Bayesian method constructed under the multispec ..."
Abstract
-
Cited by 63 (9 self)
- Add to MetaCart
The estimation of species trees (phylogenies) is one of the most important problems in evolutionary biology, and recently, there has been greater appreciation of the need to estimate species trees directly rather than using gene trees as a surrogate. A Bayesian method constructed under the multispecies coalescent model can consistently estimate species trees but involves intensive computation, which can hinder its application to the phylogenetic analysis of large-scale genomic data. Many summary statistics–based approaches, such as shallowest coalescences (SC) and Global LAteSt Split (GLASS), have been developed to infer species phylogenies for multilocus data sets. In this paper, we propose 2 methods, species tree estimation using average ranks of coalescences (STAR) and species tree estimation using average coalescence times (STEAC), based on the summary statistics of coalescence times. It can be shown that the 2 methods are statistically consistent under the multispecies coalescent model. STAR uses the ranks of coalescences and is thus resistant to variable substitution rates along the branches in gene trees. A simulation study suggests that STAR consistently outperforms STEAC, SC, and GLASS when the substitution rates among lineages are highly variable. Two real genomic data sets were analyzed by the 2 methods and produced species trees that are consistent with previous results. [Coalescent model; gene tree; species tree.]
Phylogenomics and the reconstruction of the tree of life
- Nat Rev Genet
, 2005
"... As more complete genomes are sequenced, phylogenetic analysis is entering a new era — that of phylogenomics. One branch of this expanding field aims to reconstruct the evolutionary history of organisms based on the analysis of their genomes. Recent studies have demonstrated the power of this approac ..."
Abstract
-
Cited by 54 (2 self)
- Add to MetaCart
As more complete genomes are sequenced, phylogenetic analysis is entering a new era — that of phylogenomics. One branch of this expanding field aims to reconstruct the evolutionary history of organisms based on the analysis of their genomes. Recent studies have demonstrated the power of this approach, which has the potential to provide answers to a number of fundamental evolutionary questions. However, challenges for the future have also been revealed. The very nature of the evolutionary history of organisms and the limitations of current phylogenetic reconstruction methods mean that part of the tree of life halsde-00193293, version 1- 3 Dec 2007 may prove difficult, if not impossible, to resolve with confidence. Introductory paragraph Understanding phylogenetic relationships between organisms is a prerequisite of almost any evolutionary study, as contemporary species all share a common history through their ancestry. The notion of phylogeny follows directly from the theory of evolution presented by Charles Darwin in “The Origin of Species ” 1: the only illustration in his famous book is the first representation of evolutionary relationships among species, in the form of a
FastTree: computing large minimum evolution trees with profiles instead of a distance matrix
- Mol. Biol. Evol
, 2009
"... As DNA sequencing accelerates, gene families are growing rapidly, but stan-dard methods for inferring phylogenies become computationally demanding for alignments of thousands of sequences. We present FastTree, a method for con-structing large phylogenies and for estimating their reliability. Instead ..."
Abstract
-
Cited by 52 (0 self)
- Add to MetaCart
(Show Context)
As DNA sequencing accelerates, gene families are growing rapidly, but stan-dard methods for inferring phylogenies become computationally demanding for alignments of thousands of sequences. We present FastTree, a method for con-structing large phylogenies and for estimating their reliability. Instead of stor-ing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining, and uses heuristics to quickly identify candidate joins. FastTree then refines the topol-ogy with nearest-neighbor interchanges according to the minimum-evolution criterion. Compared to using a distance matrix, FastTree reduces the memory required from O(N2) to O(NLa + N N) and reduces the computation time from O(N2L) to O(N N log(N)La), where N is the number of sequences, L is the width of the alignment, and a is the size of the alphabet. To estimate the tree’s reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over distance matrix approaches. FastTree constructed trees, includ-ing support values, for biological alignments with 39,092 or 158,022 distinct sequences in less time than it takes to compute a distance matrix and in a frac-tion of the space. Traditional neighbor joining with 100 bootstraps would be 10,000 times slower and would require 50 gigabytes of memory. In simulations, FastTree is slightly more accurate than other minimum-evolution methods such as neighbor joining, BIONJ, or FastME, and on genuine alignments, FastTree produces topologies with higher likelihoods. FastTree is available at
Evidence for multiple reversals of asymmetric mutational constraints during the evolution of the mitochondrial genome of metazoa, and consequences for phylogenetic inferences
- Syst. Biol
, 2005
"... Abstract.—Mitochondrial DNA (mtDNA) sequences are commonly used for inferring phylogenetic relationships. However, the strand-specific bias in the nucleotide composition of the mtDNA, which is thought to reflect asymmetric mutational constraints, combined with the important compositional heterogenei ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
(Show Context)
Abstract.—Mitochondrial DNA (mtDNA) sequences are commonly used for inferring phylogenetic relationships. However, the strand-specific bias in the nucleotide composition of the mtDNA, which is thought to reflect asymmetric mutational constraints, combined with the important compositional heterogeneity among taxa, are known to be highly problematic for phylogenetic analyses. Here, nucleotide composition was compared across 49 species of Metazoa (34 arthropods, 2 annelids, 2 molluscs, and 11 deuterosomes), and analyzed for a mtDNA fragment including six protein-coding genes, i.e., atp6, atp8, cox1, cox2, cox3, and nad2. The analyses show that most metazoan species present a clear strand asymmetry, where one strand is biased in favor of A and C, whereas the other strand has a reverse bias, i.e., in favor of T and G. The origin of this strand bias can be related to asymmetric mutational constraints involving deaminations of A and C nucleotides during the replication and/or transcription processes. The analyses reveal that six unrelated genera are characterized by a reversal of the usual strand bias, i.e., Argiope (Araneae), Euscorpius (Scorpiones), Tigriopus (Maxillopoda), Branchiostoma (Cephalochordata), Florometra (Echinodermata), and Katharina (Mollusca). It is proposed that asymmetric mutational constraints have been independently reversed in these six genera, through an inversion of the control region, i.e., the region that contains most regulatory elements for replication and transcription of the mtDNA. We show that reversals of asymmetric mutational
Phylogenetic models of rate heterogeneity: A high performance computing perspective
- In Proceedings of the 20th Internationational Parallel and Distributed Processing Symposium (IPDPS
, 2006
"... Inference of phylogenetic trees using the maximum likelihood (ML) method is NP-hard. Furthermore, the computation of the likelihood function for huge trees of more than 1,000 organisms is computationally intensive due to a large amount of floating point operations and high memory consumption. Within ..."
Abstract
-
Cited by 43 (9 self)
- Add to MetaCart
(Show Context)
Inference of phylogenetic trees using the maximum likelihood (ML) method is NP-hard. Furthermore, the computation of the likelihood function for huge trees of more than 1,000 organisms is computationally intensive due to a large amount of floating point operations and high memory consumption. Within this context, the present paper compares two competing mathematical models that account for evolutionary rate heterogeneity: the Γ and CAT models. The intention of this paper is to show that—from a purely empirical point of view—CAT can be used instead of Γ. The main advantage of CAT over Γ consists in significantly lower memory consumption and faster inference times. An experimental study using RAxML has been performed on 19 real-world datasets comprising 73 up to 1,663 DNA sequences. Results show that CAT is on average 5.5 times faster than Γ and—surprisingly enough—also yields trees with slightly superior Γ likelihood values. The usage of the CAT model decreases the amount of average L2 and L3 cache misses by factor 8.55. 1.
A specific genetic background is required for acquisition and expression of virulence factors
- in Escherichia coli,”Molecular Biology and Evolution,
, 2004
"... In bacteria, the evolution of pathogenicity seems to be the result of the constant arrival of virulence factors (VFs) into the bacterial genome. However, the integration, retention, and/or expression of these factors may be the result of the interaction between the new arriving genes and the bacter ..."
Abstract
-
Cited by 43 (13 self)
- Add to MetaCart
(Show Context)
In bacteria, the evolution of pathogenicity seems to be the result of the constant arrival of virulence factors (VFs) into the bacterial genome. However, the integration, retention, and/or expression of these factors may be the result of the interaction between the new arriving genes and the bacterial genomic background. To test this hypothesis, a phylogenetic analysis was done on a collection of 98 Escherichia coli/Shigella strains representing the pathogenic and commensal diversity of the species. The distribution of 17 VFs associated to the different E. coli pathovars was superimposed on the phylogenetic tree. Three major types of VFs can be recognized: (1) VFs that arrive and are expressed in different genetic backgrounds (such as VFs associated with the pathovars of mild chronic diarrhea: enteroaggregative, enteropathogenic, and diffusely-adhering E. coli), (2) VFs that arrive in different genetic backgrounds but are preferentially found, associated with a specific pathology, in only one particular background (such as VFs associated with extraintestinal diseases), and (3) VFs that require a particular genetic background for the arrival and expression of their virulence potential (such as VFs associated with pathovars typical of severe acute diarrhea: enterohemorragic, enterotoxigenic, and enteroinvasive E. coli strains). The possibility of a single arrival of VFs by chance, followed by a vertical transmission, was ruled out by comparing the evolutionary histories of some of these VFs to the strain phylogeny. These evidences suggest that important changes in the genome of E. coli have occurred during the diversification of the species, allowing the virulence factors associated with severe acute diarrhea to arrive in the population. Thus, the E. coli genome seems to be formed by an ''ancestral'' and a ''derived'' background, each one responsible for the acquisition and expression of different virulence factors.
TranslatorX: multiple alignment of nucleotide
, 2010
"... sequences guided by amino acid translations ..."
(Show Context)
Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene
, 2005
"... Abstract Standard likelihood-based frameworks in phylogenetics consider the process of evolution of a sequence site by site. Assuming that sites evolve independently greatly simplifies the required calculations. However, this simplification is known to be incorrect in many cases. Here, a computatio ..."
Abstract
-
Cited by 41 (11 self)
- Add to MetaCart
(Show Context)
Abstract Standard likelihood-based frameworks in phylogenetics consider the process of evolution of a sequence site by site. Assuming that sites evolve independently greatly simplifies the required calculations. However, this simplification is known to be incorrect in many cases. Here, a computational method that allows for general dependence between sites of a sequence is investigated. Using this method, measures acting as sequence fitness proxies can be considered over a phylogenetic tree. In this work, a set of statistically derived amino acid pairwise potentials, developed in the context of protein threading, is used to account for what we call the structural fitness of a sequence. We describe a model combining statistical potentials with an empirical amino acid substitution matrix. We propose such a combination as a useful way of capturing the complexity of protein evolution. Finally, we outline features of the model using three datasets and show the approach's sensitivity to different tree topologies. D
A.: Initial Experiences Porting a Bioinformatics Application to a Graphics
- Processor. Lecture Notes in Computer Science 3746
, 2005
"... Abstract. Bioinformatics applications are one of the most relevant and compute-demanding applications today. While normally these applica-tions are executed on clusters or dedicated parallel systems, in this work we explore the use of an alternative architecture. We focus on exploiting the compute-i ..."
Abstract
-
Cited by 41 (3 self)
- Add to MetaCart
(Show Context)
Abstract. Bioinformatics applications are one of the most relevant and compute-demanding applications today. While normally these applica-tions are executed on clusters or dedicated parallel systems, in this work we explore the use of an alternative architecture. We focus on exploiting the compute-intensive characteristics offered by the graphics processors (GPU) in order to accelerate a bioinformatics application. The GPU is a good match for these applications as it is an inexpensive, high-performance SIMD architecture. In our initial experiments we evaluate the use of a regular graphics card to improve the performance of RAxML, a bioinformatics program for phylogenetic tree inference. In this paper we focus on porting to the GPU the most time-consuming loop, which accounts for nearly 50 % of the total execution time. The preliminary results show that the loop code achieves a speedup of 3x while the whole application with a single loop optimization, achieves a speedup of 1.2x. 1