• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. (2001)

by M Remm, Storm CEV, Sonnhammer ELL
Venue:J Mol Biol
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 310
Next 10 →

Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world

by Eugene V. Koonin, Yuri I. Wolf , 2008
"... ..."
Abstract - Cited by 112 (3 self) - Add to MetaCart
Abstract not found

Orthologs, paralogs, and evolutionary genomics 1.

by Eugene V Koonin - Annual Review of Genetics , 2005
"... Abstract Orthologs and paralogs are two fundamentally different types of homologous genes that evolved, respectively, by vertical descent from a single ancestral gene and by duplication. Orthology and paralogy are key concepts of evolutionary genomics. A clear distinction between orthologs and para ..."
Abstract - Cited by 112 (12 self) - Add to MetaCart
Abstract Orthologs and paralogs are two fundamentally different types of homologous genes that evolved, respectively, by vertical descent from a single ancestral gene and by duplication. Orthology and paralogy are key concepts of evolutionary genomics. A clear distinction between orthologs and paralogs is critical for the construction of a robust evolutionary classification of genes and reliable functional annotation of newly sequenced genomes. Genome comparisons show that orthologous relationships with genes from taxonomically distant species can be established for the majority of the genes from each sequenced genome. This review examines in depth the definitions and subtypes of orthologs and paralogs, outlines the principal methodological approaches employed for identification of orthology and paralogy, and considers evolutionary and functional implications of these concepts.
(Show Context)

Citation Context

...roteins that also may artificially bridge unrelated COGs. Several other approaches for identification of orthologs, based on either specially designed clustering procedures or on explicit phylogenetic analysis, have been developed to overcome these problems and better disentangle orthologs and paralogs. In particular, the INPARANOID procedure developed by Sonnhammer and coworkers identifies clusters of orthologs, including co-orthologous sets of inparalogs, for pairs of genomes, by first detecting SymBets and then incorporating additional inparalogs according to developed statistical criteria (67, 82). High accuracy of identification of inparalogs seems to be achievable with this approach, but the inability to handle multiple genomes simultaneously is a serious limitation. Another method for ortholog detection developed by the same group involves comparison of gene trees with species trees, with the goal of direct identification of orthologs (86). The parts of the gene tree that have the same topology as the species tree are inferred to include orthologs. In principle, this and similar phylogenomic [i.e., applying phylogenetic analysis on genome scale (19)] methods are supposed to provide ...

OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups

by Feng Chen, Aaron J. Mackey, Christian J. Stoeckert, David S. Roos - Nucleic Acids Res , 2006
"... of ortholog groups ..."
Abstract - Cited by 96 (12 self) - Add to MetaCart
of ortholog groups
(Show Context)

Citation Context

...ese inparalogous relationships, and 65 incorporating edges connecting the resulting co-orthologs, overcomes the inability of simple reciprocal best hit approaches to detect many-to-many relationships =-=(3,4)-=-. Edge weights are then adjusted to account for genome-to-genome similarity averages, and the resulting graph is clustered using the 70 MCL algorithm (5), reducing large clusters containing weak singl...

Phytozome: a comparative platform for green plant

by David M. Goodstein, Shengqiang Shu, Russell Howson, Rochak Neupane, Richard D. Hayes, Joni Fazo, Therese Mitros, William Dirks, Uffe Hellsten, Nicholas Putnam, Daniel S. Rokhsar , 2011
"... genomics ..."
Abstract - Cited by 83 (1 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...le from Phytozome’s GBrowse genome browser. GENE FAMILY CONSTRUCTION Large scale, automated gene family construction is typically based on distance methods [Phytome (33), PlantTribes (34), InParanoid =-=(35)-=-, OrthoMCL (36)] or, less frequently, distance-plus-character methods [OrthologID (37), TreeFam (38)], using a single peptide per locus in each genome under consideration. These distance-based methods...

Probabilistic model of the human proteinprotein interaction network." Nat Biotechnol 23(8

by Daniel R Rhodes , Scott A Tomlins , Sooryanarayana Varambally , Vasudeva Mahavisno , Terrence Barrette , Shanker Kalyana-Sundaram , Debashis Ghosh , Akhilesh Pandey , Arul M Chinnaiyan , 2005
"... A catalog of all human protein-protein interactions would provide scientists with a framework to study protein deregulation in complex diseases such as cancer. Here we demonstrate that a probabilistic analysis integrating model organism interactome data, protein domain data, genomewide gene express ..."
Abstract - Cited by 55 (0 self) - Add to MetaCart
A catalog of all human protein-protein interactions would provide scientists with a framework to study protein deregulation in complex diseases such as cancer. Here we demonstrate that a probabilistic analysis integrating model organism interactome data, protein domain data, genomewide gene expression data and functional annotation data predicts nearly 40,000 protein-protein interactions in humans⎯a result comparable to those obtained with experimental and computational approaches in model organisms. We validated the accuracy of the predictive model on an independent test set of known interactions and also experimentally confirmed two predicted interactions relevant to human cancer, implicating uncharacterized proteins into definitive pathways. We also applied the human interactome network to cancer genomics data and identified several interaction subnetworks activated in cancer. This integrative analysis provides a comprehensive framework for exploring the human protein interaction network. We began by assembling a collection of genomic and proteomic data potentially useful in predicting human protein-protein interactions that included model organism protein-protein interactions 1 , protein domain assignments 2 , gene expression measurements in human tissue samples 3 and biological function annotations 4 ( A gold standard positive set (GSP) of 11,678 distinct protein-protein interactions among 5,505 proteins was queried from the Human Protein Reference Database (HPRD) 12 , a resource that contains known protein-protein interactions manually curated from the literature by expert biologists. A gold standard negative set (GSN) of 3,106,928 protein pairs was defined, in which one protein was assigned to the plasma membrane cellular component and the other to the nuclear cellular component by the Gene Ontology Consortium 4 . Although it is known that membrane proteins can occasionally interact with nuclear proteins, we demonstrated that there are far fewer known interactions within GSN than would be expected by chance Model organism protein-protein interactions From the Database of Interacting Proteins (DIP) 1 , we queried high-throughput interactome data from three model organisms: Sacchromyces cerevisiae

Pairwise alignment of protein interaction networks

by Mehmet Koyutürk, Yohan Kim, Umut Topkara, Shankar Subramaniam, Wojciech Szpankowski, Ananth Grama - Journal of Computational Biology , 2006
"... With an ever-increasing amount of available data on protein–protein interaction (PPI) networks and research revealing that these networks evolve at a modular level, discovery of conserved patterns in these networks becomes an important problem. Although available data on protein–protein interactions ..."
Abstract - Cited by 51 (4 self) - Add to MetaCart
With an ever-increasing amount of available data on protein–protein interaction (PPI) networks and research revealing that these networks evolve at a modular level, discovery of conserved patterns in these networks becomes an important problem. Although available data on protein–protein interactions is currently limited, recently developed algorithms have been shown to convey novel biological insights through employment of elegant mathematical models. The main challenge in aligning PPI networks is to define a graph theoretical measure of similarity between graph structures that captures underlying biological phenomena accurately. In this respect, modeling of conservation and divergence of interactions, as well as the interpretation of resulting alignments, are important design parameters. In this paper, we develop a framework for comprehensive alignment of PPI networks, which is inspired by duplication/divergence models that focus on understanding the evolution of protein interactions. We propose a mathematical model that extends the concepts of match, mismatch, and gap in sequence alignment to that of match, mismatch, and duplication in network alignment and evaluates similarity between graph structures through a scoring function that accounts for evolutionary events. By relying on evolutionary models, the proposed framework facilitates interpretation of resulting alignments in terms of not only conservation but also divergence of modularity in PPI networks. Furthermore, as in the case of sequence alignment, our model allows flexibility in adjusting parameters to quantify underlying evolutionary relationships. Based on the proposed model, we formulate PPI network alignment as an optimization problem and present fast algorithms to solve this problem. Detailed experimental results from an implementation of the proposed framework show that our algorithm is able to discover conserved interaction patterns very effectively, in terms of both accuracies and computational cost. Key words: protein–protein interactions, network alignment, evolutionary models. 1.
(Show Context)

Citation Context

...ralogs will be likely to have more common interactions than out-paralogs. Here, we use the terms in-paralog and out-paralog for proteins that are duplicated before and after speciation, respectively (=-=Remm et al., 2001-=-). While comparatively analyzing the proteome and interactome, it is important to distinguish in-paralogs from out-paralogs since the former are more likely to be functionally related. This, however, ...

Graph-based analysis and visualization of experimental results with ONDEX

by Jacob Köhler, Jan Baumbach, Jan Taubert, Michael Specht, Andre Skusa, Er Rüegg, Chris Rawlings, Paul Verrier - Bioinformatics , 2006
"... doi:10.1093/bioinformatics/btl081 ..."
Abstract - Cited by 48 (8 self) - Add to MetaCart
doi:10.1093/bioinformatics/btl081

Assignment of orthologous genes via genome rearrangement

by Xin Chen, Jie Zheng, Zheng Fu, Peng Nan, Yang Zhong, Stefano Lonardi, Tao Jiang - IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2005
"... Abstract—The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics. Existing methods that assign orthologs based on the similarity between DNA or protein sequences may make erroneous assignments when sequence similarity does not cl ..."
Abstract - Cited by 47 (4 self) - Add to MetaCart
Abstract—The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics. Existing methods that assign orthologs based on the similarity between DNA or protein sequences may make erroneous assignments when sequence similarity does not clearly delineate the evolutionary relationship among genes of the same families. In this paper, we present a new approach to ortholog assignment that takes into account both sequence similarity and evolutionary events at a genome level, where orthologous genes are assumed to correspond to each other in the most parsimonious evolving scenario under genome rearrangement. First, the problem is formulated as that of computing the signed reversal distance with duplicates between the two genomes of interest. Then, the problem is decomposed into two new optimization problems, called minimum common partition and maximum cycle decomposition, for which efficient heuristic algorithms are given. Following this approach, we have implemented a highthroughput system for assigning orthologs on a genome scale, called SOAR, and tested it on both simulated data and real genome sequence data. Compared to a recent ortholog assignment method based entirely on homology search (called INPARANOID), SOAR shows a marginally better performance in terms of sensitivity on the real data set because it is able to identify several correct orthologous pairs that are missed by INPARANOID. The simulation results demonstrate that SOAR, in general, performs better than the iterated exemplar algorithm in terms of computing the reversal distance and assigning correct orthologs. Index Terms—Ortholog, paralog, gene duplication, genome rearrangement, reversal, comparative genomics. 1

Pairwise Local Alignment of Protein Interaction Networks Guided by Models of Evolution

by Mehmet Koyutürk, Ananth Grama, Wojciech Szpankowski - In RECOMB , 2005
"... Abstract. With ever increasing amount of available data on proteinprotein interaction (PPI) networks and research revealing that these networks evolve at a modular level, discovery of conserved patterns in these networks becomes an important problem. Recent algorithms on aligning PPI networks target ..."
Abstract - Cited by 45 (5 self) - Add to MetaCart
Abstract. With ever increasing amount of available data on proteinprotein interaction (PPI) networks and research revealing that these networks evolve at a modular level, discovery of conserved patterns in these networks becomes an important problem. Recent algorithms on aligning PPI networks target simplified structures such as conserved pathways to render these problems computationally tractable. However, since conserved structures that are parts of functional modules and protein complexes generally correspond to dense subnets of the network, algorithms that are able to extract conserved patterns in terms of general graphs are necessary. With this motivation, we focus here on discovering protein sets that induce subnets that are highly conserved in the interactome of a pair of species. For this purpose, we develop a framework that formally defines the pairwise local alignment problem for PPI networks, models the problem as a graph optimization problem, and presents fast algorithms for this problem. In order to capture the underlying biological processes correctly, we base our framework on duplication/divergence models that focus on understanding the evolution of PPI networks. Experimental results from an implementation of the proposed framework show that our algorithm is able to discover conserved interaction patterns very effectively (in terms of accuracies and computational cost). While we focus on pairwise local alignment of PPI networks in this paper, the proposed algorithm can be easily adapted to finding matches for a subnet query in a database of PPI networks. 1
(Show Context)

Citation Context

...ment, only duplications that correspond to proteins that are duplicated after the split of species are of interest. Such protein pairs are called in-paralogs, while the others are called out-paralogs =-=[28]-=-. Unfortunately, distinguishing between inparalogs and out-paralogs is not trivial. Therefore, we assign similarity scores to protein pairs conservatively by detecting orthologs and in-paralogs using ...

Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana.

by Insuk Lee , Bindu Ambaru , Pranjali Thakkar , Edward M Marcotte , Seung Y Rhee , Insuk Lee - Nat. Biotechnol. , 2010
"... We introduce a rational approach for associating genes with plant traits by combined use of a genome-scale functional network and targeted reverse genetic screening. We present a probabilistic network (AraNet) of functional associations among 19,647 (73%) genes of the reference flowering plant Arab ..."
Abstract - Cited by 45 (10 self) - Add to MetaCart
We introduce a rational approach for associating genes with plant traits by combined use of a genome-scale functional network and targeted reverse genetic screening. We present a probabilistic network (AraNet) of functional associations among 19,647 (73%) genes of the reference flowering plant Arabidopsis thaliana. AraNet associations are predictive for diverse biological pathways, and outperform predictions derived only from literature-based protein interactions, achieving 21% precision for 55% of genes. AraNet prioritizes genes for limited-scale functional screening, resulting in a hit-rate tenfold greater than screens of random insertional mutants, when applied to early seedling development as a test case. By interrogating network neighborhoods, we identify AT1G80710 (now DROUGHT SENSITIVE 1; DRS1) and AT3G05090 (now LATERAL ROOT STIMULATOR 1; LRS1) as regulators of drought sensitivity and lateral root development, respectively. AraNet (http://www.functionalnet.org/aranet/) provides a resource for plant gene function identification and genetic dissection of plant traits.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University