Results 1 - 10
of
22
Models of molecular evolution and phylogeny
- Genome Res
, 1998
"... Phylogenetic reconstruction is a fast-growing field that is enriched by different statistical approaches and by findings and applications in a broad range of biological areas. Fundamental to these are the mathematical models used to describe the patterns of DNA base substitution and amino acid repla ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
Phylogenetic reconstruction is a fast-growing field that is enriched by different statistical approaches and by findings and applications in a broad range of biological areas. Fundamental to these are the mathematical models used to describe the patterns of DNA base substitution and amino acid replacement. These may become some of the basic models for comparative genome research. We discuss these models, including the analysis of observed DNA base and amino acid mutation patterns, the concept of site heterogeneity, and the incorporation of structural biology data, all of which have become particularly important in recent years. We also describe the use of such models in phylogenetic reconstruction and statistical methods for the comparison of different models. PCR has deeply transformed and boosted phylogenetic studies. At the same time, the statistical analysis of evolutionary relationships among species has recently revealed important biotechnological uses. For example, the understanding of viral quasispecies variation allows us to trace routes of infectious disease transmission. The analysis of the host–
M: Predicting functional gene links from phylogenetic-statistical analyses of whole genomes
- PLoS Comput Biol
"... An important element of the developing field of proteomics is to understand protein-protein interactions and other functional links amongst genes. Across-species correlation methods for detecting functional links work on the premise that functionally linked proteins will tend to show a common patter ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
An important element of the developing field of proteomics is to understand protein-protein interactions and other functional links amongst genes. Across-species correlation methods for detecting functional links work on the premise that functionally linked proteins will tend to show a common pattern of presence and absence across a range of genomes. We describe a maximum likelihood statistical model for predicting functional gene linkages. The method detects independent instances of the correlated gain or loss of pairs of proteins on phylogenetic trees, reducing the high rates of false positives observed in conventional across-species methods that do not explicitly incorporate a phylogeny. We show, in a dataset of 10,551 protein pairs, that the phylogenetic method improves by up to 35 % on across-species analyses at identifying known functionally linked proteins. The method shows that protein pairs with at least two to three correlated events of gain or loss are almost certainly functionally linked. Contingent evolution, in which one gene’s presence or absence depends upon the presence of another, can also be detected phylogenetically, and may identify genes whose functional significance depends upon its interaction with other genes. Incorporating phylogenetic information improves the prediction of functional linkages. The improvement derives from having a lower rate of false positives and from detecting trends that across-species analyses miss. Phylogenetic methods can easily be incorporated into the screening of large-scale bioinformatics datasets to identify sets of protein links and to characterise gene networks. Citation: Barker D, Pagel M (2005) Predicting functional gene links from phylogenetic-statistical analyses of whole genomes. PLoS Comp Biol 1(1): e3.
Coevolving protein residues: maximum likelihood identification and relationship to structure
- J. Mol. Biol
, 1999
"... There has been a great deal of recent research on ..."
Detecting the Coevolution of Biosequences – an example of RNA interaction prediction
, 2007
"... Abstract. A probabilistic graphical model is proposed in order to detect the coevolution between different sites in biological sequences. The model extends the continuous-time Markov process of sequence substitution for single nucleic or amino acids and imposes general constraints regarding simultan ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. A probabilistic graphical model is proposed in order to detect the coevolution between different sites in biological sequences. The model extends the continuous-time Markov process of sequence substitution for single nucleic or amino acids and imposes general constraints regarding simultaneous changes on the substitution rate matrix. Given a multiple sequence alignment for each molecule of interest and a phylogenetic tree, the model can predict potential interactions within or between nucleic acids and proteins. Initial validation of the model is carried out using tRNA and 16S rRNA sequence data. The model accurately identifies the secondary interactions of tRNA as well as several known tertiary interactions. In addition, results on 16S rRNA data indicate this general and simple coevolutionary model outperforms several other parametric and non-parametric methods in predicting secondary interactions. Furthermore, the majority of the putative predictions exhibit either direct contact or proximity of the nucleotide pairs in the 3D structure of the T. thermophilus ribosomal small subunit. The results on RNA data suggest a general model of coevolution might be applied to other types of interactions between protein, DNA and RNA molecules.
Phylogenetic dependency networks: Inferring patterns of adaptation in HIV
, 2009
"... This is to certify that I have examined this copy of a doctoral dissertation by ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This is to certify that I have examined this copy of a doctoral dissertation by
Estimating a Binary Character’s Effect on Speciation and Extinction
, 2007
"... Determining whether speciation and extinction rates depend on the state of a particular character has been of long-standing interest to evolutionary biologists. To assess the effect of a character on diversification rates using likelihood methods requires that we be able to calculate the probability ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Determining whether speciation and extinction rates depend on the state of a particular character has been of long-standing interest to evolutionary biologists. To assess the effect of a character on diversification rates using likelihood methods requires that we be able to calculate the probability that a group of extant species would have evolved as observed, given a particular model of the character’s effect. Here we describe how to calculate this probability for a phylogenetic tree and a two-state (binary) character under a simple model of evolution (the “BiSSE ” model, binary-state speciation and extinction). The model involves six parameters, specifying two speciation rates (rate when the lineage is in state 0; rate when in state 1), two extinction rates (when in state 0; when in state 1), and two rates of character state change (from 0 to 1, and from 1 to 0). Using these probability calculations, we can do maximum likelihood inference to estimate the model’s parameters and perform hypothesis tests (e.g., is the rate of speciation elevated for one character state over the other?). We demonstrate the application of the method using simulated data with known parameter values. [Birth-death process; branching process; cladogenesis; extinction; key innovation; macroevolution; phylogeny; speciation; speciose; statistical inference.]
A Pair-to-Pair Amino Acids Substitution Matrix and its Applications for Protein Structure Prediction
"... ABSTRACT We present a new structurally derived pair-to-pair substitution matrix (P2PMAT). This matrix is constructed from a very large amount of integrated high quality multiple sequence alignments (Blocks) and protein structures. It evaluates the likelihoods of all 160,000 pair-to-pair substitution ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
ABSTRACT We present a new structurally derived pair-to-pair substitution matrix (P2PMAT). This matrix is constructed from a very large amount of integrated high quality multiple sequence alignments (Blocks) and protein structures. It evaluates the likelihoods of all 160,000 pair-to-pair substitutions. P2PMAT matrix implicitly accounts for evolutionary conservation, correlated mutations, and residue–residue contact potentials. The usefulness of the matrix for structural predictions is shown in this article. Predicting protein residue–residue contacts from sequence information alone, by our method (P2PConPred) is particularly accurate in the protein cores, where it performs better than other basic contact prediction methods (increasing accuracy by 25–60%). The method mean accuracy for protein cores is 24 % for 59 diverse families and 34 % for a subset of proteins shorter than 100 residues. This is above the level that was recently shown to be sufficient to significantly improve ab initio protein structure prediction. We also demonstrate the ability of our approach to identify native structures within large sets of (300–2000) protein decoys. On the basis of evolutionary information alone our method ranks thenativestructureinthetop0.3%ofthedecoysin 4/10 of the sets, and in 8/10 of sets the native structure is ranked in the top 10 % of the decoys. The method can, thus, be used to assist filtering wrong models, complimenting traditional scoring functions.
PREPRINT, TO APPEAR IN IEEE TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 1 Graphical Models of Residue Coupling in Protein Families
"... Abstract — Many statistical measures and algorithmic techniques have been proposed for studying residue coupling in protein families. Generally speaking, two residue positions are considered coupled if, in the sequence record, some of their amino acid type combinations are significantly more common ..."
Abstract
- Add to MetaCart
Abstract — Many statistical measures and algorithmic techniques have been proposed for studying residue coupling in protein families. Generally speaking, two residue positions are considered coupled if, in the sequence record, some of their amino acid type combinations are significantly more common than others. While the proposed approaches have proven useful in finding and describing coupling, a significant missing component is a formal probabilistic model that explicates and compactly represents the coupling, integrates information about sequence, structure, and function, and supports inferential procedures for analysis, diagnosis, and prediction. We present an approach to learning and using probabilistic graphical models of residue coupling. These models capture significant conservation and coupling constraints observable in a multiply-aligned set of sequences. Our approach can place a structural prior on considered couplings, so that all identified relationships have direct mechanistic explanations. It can also incorporate information about functional classes, and thereby learn a differential graphical model that distinguishes constraints common to all classes from those unique to individual classes. Such differential models separately account for class-specific conservation and family-wide coupling, two different sources of sequence covariation. They are then able to perform interpretable functional classification of new sequences, explaining classification decisions in terms of the underlying conservation and coupling constraints. We apply our approach in studies of both G protein-coupled receptors and PDZ domains, identifying and analyzing family-wide and class-specific constraints, and performing functional classification. The results demonstrate that graphical models of residue coupling provide a powerful tool for uncovering, representing, and utilizing significant sequencestructure-function relationships in protein families. Index Terms — Correlated mutations, graphical models, evolutionary covariation, sequence-structure-function relationships, functional classification I.
The Ancestral Distance Test: What Relatedness can Reveal about Correlated Evolution in Large Lineages with Missing Character Data and Incomplete Phylogenies
- Sys. Biol.
, 2006
"... The ancestral distance test is introduced to detect correlated evolution between two binary traits in large phylogenies that may lack resolved subclades, branch lengths, and/or comparative data. We define the ancestral distance as the time separating a randomly sampled taxon from its most recent anc ..."
Abstract
- Add to MetaCart
The ancestral distance test is introduced to detect correlated evolution between two binary traits in large phylogenies that may lack resolved subclades, branch lengths, and/or comparative data. We define the ancestral distance as the time separating a randomly sampled taxon from its most recent ancestor (MRA) with extant descendants that have an independent trait. The sampled taxon either has (target sample) orlacks (nontarget sample) adependent trait. Modeled as a Markov process, we show that the distribution of ancestral distances for the target sample is identical to that of the nontarget sample when characters are uncorrelated, whereas ancestral distances are smaller on average for the target sample when characters are correlated. Simulations suggest that the ancestral distance can be estimated using the time, total branch length, taxonomic rank, or number of speciation events between a sampled taxon and the MRA. These results are shown to be robust to deviations from Markov assumptions. A Monte Carlo technique estimates P-values when fully resolved phylogenies with branch lengths are available, and we evaluate the Monte Carlo approach using a data set with known correlation. Measures of relatedness were found to provide a robust means to test hypotheses of correlated character evolution.
Syst. Biol. 52(1):55--65, 2003
"... Systematists expect their hypotheses to be asymptotically precise. As the number of phylogenetically informative characters for a set of taxa increases, the relationships implied should stabilize on some topology. If true, this increasing stability should clearly manifest itself if an index of con ..."
Abstract
- Add to MetaCart
Systematists expect their hypotheses to be asymptotically precise. As the number of phylogenetically informative characters for a set of taxa increases, the relationships implied should stabilize on some topology. If true, this increasing stability should clearly manifest itself if an index of congruence is plotted against the accumulating number of characters.

