Results 1 - 10
of
93
Combining phylogenetic and hidden Markov models in biosequence analysis
- J. Comput. Biol
, 2004
"... A few models have appeared in recent years that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way the process changes from one site to the next. These models combine phylogenetic models of molecular evolution, which apply to individ ..."
Abstract
-
Cited by 135 (13 self)
- Add to MetaCart
(Show Context)
A few models have appeared in recent years that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way the process changes from one site to the next. These models combine phylogenetic models of molecular evolution, which apply to individual sites, and hidden Markov models, which allow for changes from site to site. Besides improving the realism of ordinary phylogenetic models, they are potentially very powerful tools for inference and prediction—for gene finding, for example, or prediction of secondary structure. In this paper, we review progress on combined phylogenetic and hidden Markov models and present some extensions to previous work. Our main result is a simple and efficient method for accommodating higher-order states in the HMM, which allows for context-sensitive models of substitution— that is, models that consider the effects of neighboring bases on the pattern of substitution. We present experimental results indicating that higher-order states, autocorrelated rates, and multiple functional categories all lead to significant improvements in the fit of a combined phylogenetic and hidden Markov model, with the effect of higher-order states being particularly pronounced.
Modeling compositional heterogeneity
- Syst Biol
, 2004
"... Abstract.—Compositional heterogeneity among lineages can compromise phylogenetic analyses, because models in common use assume compositionally homogeneous data. Models that can accommodate compositional heterogeneity with few extra parameters are described here, and used in two examples where the tr ..."
Abstract
-
Cited by 74 (0 self)
- Add to MetaCart
Abstract.—Compositional heterogeneity among lineages can compromise phylogenetic analyses, because models in common use assume compositionally homogeneous data. Models that can accommodate compositional heterogeneity with few extra parameters are described here, and used in two examples where the true tree is known with confidence. It is shown using likelihood ratio tests that adequate modeling of compositional heterogeneity can be achieved with few composition parameters, that the data may not need to be modelled with separate composition parameters for each branch in the tree. Tree searching and placement of composition vectors on the tree are done in a Bayesian framework using Markov chain Monte Carlo (MCMC) methods. Assessment of fit of the model to the data is made in both maximum likelihood (ML) and Bayesian frameworks. In an ML framework, overall model fit is assessed using the Goldman-Cox test, and the fit of the composition implied by a (possibly heterogeneous) model to the composition of the data is assessed using a novel tree-and model-based composition fit test. In a Bayesian framework, overall model fit and composition fit are assessed using posterior predictive simulation. It is shown that when composition is not accommodated, then the model does not fit, and incorrect trees are found; but when composition is accommodated, the model then fits, and the known correct phylogenies are obtained. [Compositional heterogeneity; Markov chain Monte Carlo; maximum likelihood; model assessment; model selection; phylogenetics.] Markov process models used for phylogenetic analysis
Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. Mol Biol Evol,
, 2004
"... The degree to which an amino acid site is free to vary is strongly dependent on its structural and functional importance. An amino acid that plays an essential role is unlikely to change over evolutionary time. Hence, the evolutionary rate at an amino acid site is indicative of how conserved this s ..."
Abstract
-
Cited by 65 (11 self)
- Add to MetaCart
The degree to which an amino acid site is free to vary is strongly dependent on its structural and functional importance. An amino acid that plays an essential role is unlikely to change over evolutionary time. Hence, the evolutionary rate at an amino acid site is indicative of how conserved this site is and, in turn, allows evaluation of its importance in maintaining the structure/function of the protein. When using probabilistic methods for site-specific rate inference, few alternatives are possible. In this study we use simulations to compare the maximum-likelihood and Bayesian paradigms. We study the dependence of inference accuracy on such parameters as number of sequences, branch lengths, the shape of the rate distribution, and sequence length. We also study the possibility of simultaneously estimating branch lengths and sitespecific rates. Our results show that a Bayesian approach is superior to maximum-likelihood under a wide range of conditions, indicating that the prior that is incorporated into the Bayesian computation significantly improves performance. We show that when branch lengths are unknown, it is better first to estimate branch lengths and then to estimate site-specific rates. This procedure was found to be superior to estimating both the branch lengths and sitespecific rates simultaneously. Finally, we illustrate the difference between maximum-likelihood and Bayesian methods when analyzing site-conservation for the apoptosis regulator protein Bcl-x L .
Computational Identification of Evolutionarily Conserved Exons
, 2004
"... Phylogenetic hidden Markov models (phylo-HMMs) have recently been proposed as a means for addressing a multispecies version of the ab initio gene prediction problem. These models allow sequence divergence,a phylogeny,patterns of substitution,and base composition all to be considered simultaneously,i ..."
Abstract
-
Cited by 62 (13 self)
- Add to MetaCart
Phylogenetic hidden Markov models (phylo-HMMs) have recently been proposed as a means for addressing a multispecies version of the ab initio gene prediction problem. These models allow sequence divergence,a phylogeny,patterns of substitution,and base composition all to be considered simultaneously,in a single unified probabilistic model. Here,we apply phylo-HMMs to a restricted version of the gene prediction problem in which individual exons are sought that are evolutionarily conserved across a diverse set of species. We discuss two new methods for improving prediction performance: (1) the use of context-dependent phylogenetic models,which capture phenomena such as a strong CpG effect in noncoding regions and a preference for synonymous rather than nonsynonymous substitutions in coding regions; and (2) a novel strategy for incorporating insertions and deletion (indels) into the state-transition structure of the model,which captures the different characteristic patterns of alignment gaps in coding and noncoding regions. We also discuss the technique,previously used in pairwise gene predictors,of explicitly modeling conserved noncoding sequence to help reduce false positive predictions. These methods have been incorporated into an exon prediction program called ExoniPhy, and tested with two large datasets. Experimental results indicate that all three methods produce significant improvements in prediction performance. In combination,they lead to prediction accuracy comparable to that of some of the best available gene predictors,despite several limitations of our current models.
Phylogenomics and the reconstruction of the tree of life
- Nat Rev Genet
, 2005
"... As more complete genomes are sequenced, phylogenetic analysis is entering a new era — that of phylogenomics. One branch of this expanding field aims to reconstruct the evolutionary history of organisms based on the analysis of their genomes. Recent studies have demonstrated the power of this approac ..."
Abstract
-
Cited by 54 (2 self)
- Add to MetaCart
As more complete genomes are sequenced, phylogenetic analysis is entering a new era — that of phylogenomics. One branch of this expanding field aims to reconstruct the evolutionary history of organisms based on the analysis of their genomes. Recent studies have demonstrated the power of this approach, which has the potential to provide answers to a number of fundamental evolutionary questions. However, challenges for the future have also been revealed. The very nature of the evolutionary history of organisms and the limitations of current phylogenetic reconstruction methods mean that part of the tree of life halsde-00193293, version 1- 3 Dec 2007 may prove difficult, if not impossible, to resolve with confidence. Introductory paragraph Understanding phylogenetic relationships between organisms is a prerequisite of almost any evolutionary study, as contemporary species all share a common history through their ancestry. The notion of phylogeny follows directly from the theory of evolution presented by Charles Darwin in “The Origin of Species ” 1: the only illustration in his famous book is the first representation of evolutionary relationships among species, in the form of a
Monophyly of primary photosynthetic eukaryotes: green plants, red algae, and Glaucophytes
, 2005
"... The monophyly of plastids is supported by several ..."
(Show Context)
Phylogenetic hidden Markov models
- IN STATISTICAL METHODS IN MOLECULAR EVOLUTION
, 2005
"... Phylogenetic hidden Markov models, or phylo-HMMs, are probabilistic models that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way this process changes from one site to the next. By treating molecular evolution as a combination of tw ..."
Abstract
-
Cited by 37 (6 self)
- Add to MetaCart
(Show Context)
Phylogenetic hidden Markov models, or phylo-HMMs, are probabilistic models that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way this process changes from one site to the next. By treating molecular evolution as a combination of two Markov processes—one that operates in the dimension of space (along a genome) and one that operates in the dimension of time (along the branches of a phylogenetic tree)—these models allow aspects of both sequence structure and sequence evolution to be captured. Moreover, as we will discuss, they permit key computations to be performed exactly and efficiently. Phylo-HMMs allow evolutionary information to be brought to bear on a wide variety of problems of sequence “segmentation, ” such as gene prediction and the identification of conserved elements. Phylo-HMMs were first proposed as a way of improving phylogenetic models that allow for variation among sites in the rate of substitution [8, 52]. Soon afterward, they were adapted for the problem of secondary structure
Testing alternative vicariance scenarios in Western Mediterranean discoglossid frogs
- MOLECULAR PHYLOGENETICS AND EVOLUTION
, 2004
"... ..."
Profiling Phylogenetic Informativeness
, 2007
"... The resolution of four controversial topics in phylogenetic experimental design hinges upon the informativeness of characters about the historical relationships among taxa. These controversies regard the power of different classes of phylogenetic character, the relative utility of increased taxonom ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
The resolution of four controversial topics in phylogenetic experimental design hinges upon the informativeness of characters about the historical relationships among taxa. These controversies regard the power of different classes of phylogenetic character, the relative utility of increased taxonomic versus character sampling, the differentiation between lack of phylogenetic signal and a historical rapid radiation, and the design of taxonomically broad phylogenetic studies optimized by taxonomically sparse genome-scale data. Quantification of the informativeness of characters for resolution of phylogenetic hypotheses during specified historical epochs is key to the resolution of these controversies. Here, such a measure of phylogenetic informativeness is formulated. The optimal rate of evolution of a character to resolve a dated four-taxon polytomy is derived. By scaling the asymptotic informativeness of a character evolving at a nonoptimal rate by the derived asymptotic optimum, and by normalizing so that net phylogenetic informativeness is equivalent for all rates when integrated across all of history, an informativeness profile across history is derived. Calculation of the informativeness per base pair allows estimation of the cost-effectiveness of character sampling. Calculation of the informativeness per million years allows comparison across historical radiations of the utility of a gene for the inference of rapid adaptive radiation. The theory is applied to profile the phylogenetic informativeness of the genes BRCA1, RAG1, GHR, and c-myc from a muroid rodent sequence data set. Bounded integrations of the phylogenetic profile of these genes over four epochs comprising the diversifications of the muroid rodents, the mammals, the lobe-limbed vertebrates, and the early metazoans demonstrate the