#### DMCA

## Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. Mol Biol Evol, (2004)

Citations: | 65 - 11 self |

### Citations

6493 |
The neighbor-joining method: a new method for reconstructing phylogenetic trees
- Saitou, Nei
(Show Context)
Citation Context ...ngle tree, or to a specific MSA, seven different data sets were tested (table 1). In four data sets (1–4 in table 1) the dependency between the number of categories and the accuracy of the inferred rates was tested on the phylogenetic trees as in figure 2. The rate at each position was drawn from a Gamma distribution with a given value of a. Two values of a were considered: a 0.3 represents a severe among-site rate variation while a 1.0 is an example of little among-site rate variation. In data sets 5–7 (table 1), the trees used for the simulations were based on neighbor-joining (NJ) trees (Saitou and Nei 1987) inferred from real data sets. In this case, the rate at each position was drawn from a rate distribution that was obtained by analyzing the three real data sets using ML. In all cases 500 positions were simulated. Accuracy as a Function of Rate Variation To study the effect of different levels of rate variation, the simulated rates were drawn from a 24-category discrete Gamma distribution with a specified a parameter. Fifteen different values of a were checked, ranging from 0.1 to 1.5 at equal intervals. This range appears to cover most of the values estimated from real data sets (Sullivan, H... |

1361 |
The Neutral Theory of Molecular Evolution
- Kimura
- 1983
(Show Context)
Citation Context ... to be superior to estimating both the branch lengths and sitespecific rates simultaneously. Finally, we illustrate the difference between maximum-likelihood and Bayesian methods when analyzing site-conservation for the apoptosis regulator protein Bcl-xL. Introduction Rates of evolution in proteins are expected to vary among sites due to different selective constraints. Under the neutral theory of molecular evolution, amino acid positions that are under stringent selective constraints are expected to be highly conserved; positions that are more tolerant to replacement are most often variable (Kimura 1983). Conserved sites may point to functionally and structurally important regions involved in such activities as ligand binding, enzymatic activity, protein-protein interactions, or folding (Lichtarge and Sowa 2002). Numerous site-specific conservation scores have been proposed over the years (reviewed in Valdar 2002; see also del Sol Mesa, Pazos, and Valencia 2003, Yao et al. 2003). Though evolution is the driving force that determines site conservation, none of these methods make full use of either the information contained in the phylogenetic tree or the stochastic nature of amino acid replace... |

348 |
Phylogenetic inference
- Swofford, Olsen, et al.
- 1996
(Show Context)
Citation Context ...where with no effect on the calculations (Felsenstein 1981). Given r, the likelihood P(data j r, T ) can be calculated using Felsenstein’s (1981) postorder tree traversal algorithm. The ML rate estimate is the rate that maximizes the likelihood function P(data j r, T ). In the rare case where all the characters at the leaves are different, the ML value of r is infinite (see also Nielsen 1997). To avoid this, we set an upper bound on r (rmax 20.0). Empirical Bayesian Estimation of Evolutionary Rates In the Bayesian case, a prior Gamma distribution over the rates is assumed (Jin and Nei 1990; Swofford et al. 1996; Yang 1996). The Gamma distribution with parameters a and b has a mean a/b and variance a/b2. We set a b so that the mean rate over all sites is 1.0 and the variance is 1/a. The shape of the Gamma distribution is then determined by a. When a . 1, the distribution is bellshaped, suggesting little rate heterogeneity. When a ! ‘, there is a single rate for all sites. In the case of a , 1, the distribution is highly skewed and is L-shaped. This situation indicates high levels of rate variation. Within the Bayesian framework, the posterior probability is obtained from the likelihood function and ... |

276 |
Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites.
- Yang
- 1993
(Show Context)
Citation Context ...re not known in advance. We conclude with an illustrative biological example. Materials and Methods Maximum-Likelihood Estimation of Evolutionary Rates The branch lengths of the phylogenetic tree represent the average evolutionary rate across all sites. A site-specific rate, r, indicates how fast this site evolves relative to the average. A rate of 2.0 indicates a site that evolves two times faster than the average. Thus, site-specific rates inferred here are not absolute evolutionary rates that require knowledge of divergence times, but rather they represent a comparative quantity. We follow Yang (1993) and present the likelihood computation using an example tree shown in figure 1. We assume here that the tree T (s, t), defined by its tree topology s and associated branch lengths t, is known in advance. Nodes are labeled as in figure 1. The probability of the data given the rate parameter r is Pðdata j r; TÞ X X1;X2 2 Aminoacidsf g pX1 3PX1;Mðrt1Þ3PX2;Gðrt2Þ 3PX2;Mðrt3Þ3PX1;Iðrt4Þ3PX1;X2ðrt5Þ; ð1Þ where pX1 is the frequency of amino acid X1, and PX1;X2 (rt) is the probability that amino acid X1 will be replaced by amino acid X2 along a branch of length t, given that the evolutionary rate... |

234 | Bayesian inference of phylogeny and its impact on evolutionary biology.
- Huelsenbeck, Ronquist, et al.
- 2001
(Show Context)
Citation Context ...n one part of the tree but are variable in the other. Such rate shifts may indicate change in the selection intensity at specific sites during evolution (reviewed in Gaucher et al. 2002). Rate shifts can also be inferred using an empirical Bayesian approach (Susko et al. 2002; Blouin, Boucher, and Roger 2003) or by using ML (Knudsen and Miyamoto 2001; Pupko and Galtier 2002). In our simulations we assumed that the tree topology is known a priori. In cases where this is not the case, one might use the Markov chain Monte Carlo technique to take the uncertainty of the tree topology into account (Huelsenbeck et al. 2001). Bayesian methods in phylogeny were recently criticized by Suzuki, Glazko, and Nei (2002) in the context of overestimation of Bayesian support for internal nodes. In our case, however, we limited the Bayesian part to a Gamma prior over the evolutionary rates, which is not the case with Bayesian methods that aim at inferring phylogenies. When using a discrete approximation to the Gamma distribution, as in EB-EXP, the number of discrete categories must be specified. Yang (1994, 1995) suggested that four rate categories are sufficient to provide an optimum or near-optimum fit by the model to the... |

155 | A likelihood approach to estimating phylogeny from discrete morphological character data.
- Lewis
- 2001
(Show Context)
Citation Context ...r. MSE measures the deviation of the inferred rate from its true value for each site independently from the other sites. The correlation coefficient, however, measures to what extent the inferred and simulated rates vary together. Thus, when the rates are nearly homogenous (i.e., high a values), rates with extreme values are rare and the inference is more accurate (low MSE). Correlation coefficients, however, are expected to be relatively low. Another shortcoming of the ML method is that its point estimates tend to adopt extreme values when the amount of data drops below a critical threshold (Lewis 2001). Thus, when the data are scarce, as was the case when rates were inferred from less than 12 sequences, ML resulted in very rough estimates (MSE 2.92 and 2.0 for six and 12 sequences, respectively, compared with 0.51 and 0.32, respectively, for EB-EXP). Figure 9a and b show scatter plots of inferred rates obtained using the ML and EB-EXP methods versus the simulated values. Whereas several extreme values were observed using the ML method (fig. 9a), the inferred rates of the EB-EXP method were clustered close to the y x line (fig. 9b). When a large amount of sequences is available, one could ... |

140 |
Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18(Suppl 1):S71–S77
- Pupko, RE, et al.
- 2002
(Show Context)
Citation Context ...e methods make full use of either the information contained in the phylogenetic tree or the stochastic nature of amino acid replacements. This deficit may lead to erroneous predictions. For example, when branch lengths are ignored, a replacement on a short branch will be given the same weight as one occurring on a long branch. However, an amino acid replacement between two divergent sequences is less surprising than one occurring between two closely related sequences. The incorporation of advanced evolutionary models was proved to greatly increase the accuracy of site-specific rate inference (Pupko et al. 2002). Evolutionary rates are commonly measured as number of replacements per amino acid site per year. The term site-specific evolutionary rate in the context of our conservation scores is different. Here, the rate is relative to the average evolutionary rate across all sites and hence is unitless. In addition, for each site we assume that the rate is constant across all lineages. Finally, in this paper we limit our discussion to site-specific rate inference that is based on probabilistic evolutionary models. Currently, likelihood methods are considered state-ofthe-art phylogenetic techniques, all... |

103 |
Structure of Bcl-xL-Bak peptide complex: recognition between regulators of apoptosis. Science
- Sattler, Liang, et al.
- 1997
(Show Context)
Citation Context ...l this data set BCL-BIG. A smaller MSA consisting of the 14 closest homologs of Bcl-xL was also constructed. This set only includes representatives from the anti-apoptotic family. We call this data set BCL-SMALL. For both data sets, an NJ tree was inferred using pairwise distances estimated by ML. Branch lengths in the resulting tree were then optimized using ML. The trees and the MSAs were given as input to the EB-EXP and the ML rate-inference methods. The inferred rates were then projected onto the three dimensional structure of a complex between Bcl-xL and a Bak BH3 fragment (PDB ID: 1bxl; Sattler et al. 1997). In this step, the continuous evolutionary rates were partitioned into a discrete scale of 9 bins. The range of each bin varied such that each one contained 1/9 of the positions. Bin 9 contained the most conserved positions and bin 1 contained the most variable positions. The conservation pattern obtained by both EB-EXP and ML using the BCL-BIG set of homologs yielded two main surface patches of conserved residues (fig. 8a and b). The first patch corresponds to a hydrophobic groove, formed by residues from the BH1, BH2, and BH3 regions. This patch is the binding site for the Bak peptide. The ... |

92 | Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends Genet. - Whelan, Lio, et al. - 2001 |

81 |
Scoring residue conservation.
- Valdar
- 2002
(Show Context)
Citation Context ...vary among sites due to different selective constraints. Under the neutral theory of molecular evolution, amino acid positions that are under stringent selective constraints are expected to be highly conserved; positions that are more tolerant to replacement are most often variable (Kimura 1983). Conserved sites may point to functionally and structurally important regions involved in such activities as ligand binding, enzymatic activity, protein-protein interactions, or folding (Lichtarge and Sowa 2002). Numerous site-specific conservation scores have been proposed over the years (reviewed in Valdar 2002; see also del Sol Mesa, Pazos, and Valencia 2003, Yao et al. 2003). Though evolution is the driving force that determines site conservation, none of these methods make full use of either the information contained in the phylogenetic tree or the stochastic nature of amino acid replacements. This deficit may lead to erroneous predictions. For example, when branch lengths are ignored, a replacement on a short branch will be given the same weight as one occurring on a long branch. However, an amino acid replacement between two divergent sequences is less surprising than one occurring between two ... |

80 |
Evolutionary predictions of binding surfaces and interactions.
- Lichtarge, Sowa
- 2002
(Show Context)
Citation Context ...onservation for the apoptosis regulator protein Bcl-xL. Introduction Rates of evolution in proteins are expected to vary among sites due to different selective constraints. Under the neutral theory of molecular evolution, amino acid positions that are under stringent selective constraints are expected to be highly conserved; positions that are more tolerant to replacement are most often variable (Kimura 1983). Conserved sites may point to functionally and structurally important regions involved in such activities as ligand binding, enzymatic activity, protein-protein interactions, or folding (Lichtarge and Sowa 2002). Numerous site-specific conservation scores have been proposed over the years (reviewed in Valdar 2002; see also del Sol Mesa, Pazos, and Valencia 2003, Yao et al. 2003). Though evolution is the driving force that determines site conservation, none of these methods make full use of either the information contained in the phylogenetic tree or the stochastic nature of amino acid replacements. This deficit may lead to erroneous predictions. For example, when branch lengths are ignored, a replacement on a short branch will be given the same weight as one occurring on a long branch. However, an am... |

68 | Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. - Suzuki, Glazko, et al. - 2002 |

58 |
Limitations of the evolutionary parsimony method of phylogenetic analysis.
- Jin, Nei
- 1990
(Show Context)
Citation Context ... have its root anywhere with no effect on the calculations (Felsenstein 1981). Given r, the likelihood P(data j r, T ) can be calculated using Felsenstein’s (1981) postorder tree traversal algorithm. The ML rate estimate is the rate that maximizes the likelihood function P(data j r, T ). In the rare case where all the characters at the leaves are different, the ML value of r is infinite (see also Nielsen 1997). To avoid this, we set an upper bound on r (rmax 20.0). Empirical Bayesian Estimation of Evolutionary Rates In the Bayesian case, a prior Gamma distribution over the rates is assumed (Jin and Nei 1990; Swofford et al. 1996; Yang 1996). The Gamma distribution with parameters a and b has a mean a/b and variance a/b2. We set a b so that the mean rate over all sites is 1.0 and the variance is 1/a. The shape of the Gamma distribution is then determined by a. When a . 1, the distribution is bellshaped, suggesting little rate heterogeneity. When a ! ‘, there is a single rate for all sites. In the case of a , 1, the distribution is highly skewed and is L-shaped. This situation indicates high levels of rate variation. Within the Bayesian framework, the posterior probability is obtained from the li... |

57 |
Bayesian methods: An analysis for statisticians and interdisciplinary researchers
- Leonard, Hsu
- 1999
(Show Context)
Citation Context ...viations for amino acids. 1782 Mayrose et al. when analyzing real data sets. If a is unknown and only the branch lengths are known a priori, one may estimate a by maximizing P(data j a, T ) using a discrete distribution to approximate the Gamma distribution (Yang 1994; Yang and Wang 1995). The estimated a can then be used in the prior Gamma distribution for the Bayesian method. The replacement of a by its estimate has an empirical Bayesian justification (Yang and Wang 1995). Empirical Bayesian approaches differ from other Bayesian methods in that the prior is determined, in part, by the data (Leonard and Hsu 1999). Computing the rate estimate using equation (3) with an empirical Bayesian estimate of a is referred here as EB-EXP. Estimating Branch Lengths When the branch lengths are unknown one may estimate the branch lengths using the classical ML approach and then treat these branch lengths as known for the task of rate estimation, using either the ML or the Bayesian method. When inferring the branch length in this case we assumed a Gamma distribution and found the ML estimates of the a parameter and the branch lengths simultaneously. Alternatively, in the maximum-likelihood framework one can consider... |

57 | An accurate, sensitive, and scalable method to identify functional sites in protein structures.
- Yao, DM, et al.
- 2003
(Show Context)
Citation Context ... the neutral theory of molecular evolution, amino acid positions that are under stringent selective constraints are expected to be highly conserved; positions that are more tolerant to replacement are most often variable (Kimura 1983). Conserved sites may point to functionally and structurally important regions involved in such activities as ligand binding, enzymatic activity, protein-protein interactions, or folding (Lichtarge and Sowa 2002). Numerous site-specific conservation scores have been proposed over the years (reviewed in Valdar 2002; see also del Sol Mesa, Pazos, and Valencia 2003, Yao et al. 2003). Though evolution is the driving force that determines site conservation, none of these methods make full use of either the information contained in the phylogenetic tree or the stochastic nature of amino acid replacements. This deficit may lead to erroneous predictions. For example, when branch lengths are ignored, a replacement on a short branch will be given the same weight as one occurring on a long branch. However, an amino acid replacement between two divergent sequences is less surprising than one occurring between two closely related sequences. The incorporation of advanced evolutiona... |

31 |
From structure to function: Approaches and limitations. Nat Struct Mol Biol
- Thornton
(Show Context)
Citation Context ...es and d 0.1 for (a) ML and (b) EB-EXP. The grey line marks the y x line. Comparison of Site-Specific Rate-Inference Methods 1789 A robust evolutionary analysis can provide means for the identification of patches of conserved residues on the protein surface, which are essential for the protein’s function. The bottleneck for the in silico identification of these functional patches appears to be the availability of sequence data (Bell and Ben-Tal 2003). Too little variation in the MSA caused by too few sequences or too little diversity among them can render evolutionary analysis meaningless (Thornton et al. 2000). Ten available homologous proteins appear to be the sensitivity threshold when using ML (Bell and Ben-Tal 2003). Our study implies that these are exactly the conditions where EBEXP is distinctly better than ML. Acknowledgments We thank Karen B. Avraham for introducing us to genes involved in hearing loss and Yossi Rosenberg for his help incorporating EB-EXP into the ConSurf server. N.B.-T. was supported by a Research Career Development Award from the Israel Cancer Research Fund. T.P. was supported by a grant in Complexity Science from the Yeshaia Horvitz Association. We thank three anonymous ... |

28 |
A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins.
- Knudsen, Miyamoto
- 2001
(Show Context)
Citation Context ...ecific rate-inference techniques. We also studied the effect of various parameters on the accuracy of each method. One basic assumption in this study was that the rate at each site is constant during evolution. However, one might also try to find sites that are conserved in one part of the tree but are variable in the other. Such rate shifts may indicate change in the selection intensity at specific sites during evolution (reviewed in Gaucher et al. 2002). Rate shifts can also be inferred using an empirical Bayesian approach (Susko et al. 2002; Blouin, Boucher, and Roger 2003) or by using ML (Knudsen and Miyamoto 2001; Pupko and Galtier 2002). In our simulations we assumed that the tree topology is known a priori. In cases where this is not the case, one might use the Markov chain Monte Carlo technique to take the uncertainty of the tree topology into account (Huelsenbeck et al. 2001). Bayesian methods in phylogeny were recently criticized by Suzuki, Glazko, and Nei (2002) in the context of overestimation of Bayesian support for internal nodes. In our case, however, we limited the Bayesian part to a Gamma prior over the evolutionary rates, which is not the case with Bayesian methods that aim at inferring p... |

25 | The effect of topology on estimates of among-site rate variation. - Sullivan, Holsinger, et al. - 1996 |

21 | A covarion-based method for detecting molecular adaptation: application to the evolution of primate mitochondrial genomes.
- Pupko, Galtier
- 2002
(Show Context)
Citation Context ...iques. We also studied the effect of various parameters on the accuracy of each method. One basic assumption in this study was that the rate at each site is constant during evolution. However, one might also try to find sites that are conserved in one part of the tree but are variable in the other. Such rate shifts may indicate change in the selection intensity at specific sites during evolution (reviewed in Gaucher et al. 2002). Rate shifts can also be inferred using an empirical Bayesian approach (Susko et al. 2002; Blouin, Boucher, and Roger 2003) or by using ML (Knudsen and Miyamoto 2001; Pupko and Galtier 2002). In our simulations we assumed that the tree topology is known a priori. In cases where this is not the case, one might use the Markov chain Monte Carlo technique to take the uncertainty of the tree topology into account (Huelsenbeck et al. 2001). Bayesian methods in phylogeny were recently criticized by Suzuki, Glazko, and Nei (2002) in the context of overestimation of Bayesian support for internal nodes. In our case, however, we limited the Bayesian part to a Gamma prior over the evolutionary rates, which is not the case with Bayesian methods that aim at inferring phylogenies. When using a ... |

20 |
ClustalW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positionspecific gap penalties and weight matrix choice. Nucleic Acids Res.
- Gibson
- 1994
(Show Context)
Citation Context ...e is known. Bcl-xL contains all four BH domains, whereas distantly related proteins that promote apoptosis posses only BH3. The BH1, BH2, and BH3 domains strongly influence homo- and heterodimerization of Bcl-xL. BH4 has been shown to be essential for Bcl-xL to prevent apoptotic mitochondrial changes (Shimizu et al. 2000). Homologous sequences were obtained from the SwissProt database (www.expasy.org/sprot/). Since only five orthologous sequences were obtained, we supplemented the alignment with 26 paralogous sequences. An MSA of these homologs was built using ClustalW (Thompson, Higgins, and Gibson 1994). We call this data set BCL-BIG. A smaller MSA consisting of the 14 closest homologs of Bcl-xL was also constructed. This set only includes representatives from the anti-apoptotic family. We call this data set BCL-SMALL. For both data sets, an NJ tree was inferred using pairwise distances estimated by ML. Branch lengths in the resulting tree were then optimized using ML. The trees and the MSAs were given as input to the EB-EXP and the ML rate-inference methods. The inferred rates were then projected onto the three dimensional structure of a complex between Bcl-xL and a Bak BH3 fragment (PDB ID... |

17 |
Testing for differences in rates-across-sites distributions in phylogenetic subtrees.
- Susko, Inagaki, et al.
- 2002
(Show Context)
Citation Context ...we used simulations to compare the empirical Bayesian and ML site-specific rate-inference techniques. We also studied the effect of various parameters on the accuracy of each method. One basic assumption in this study was that the rate at each site is constant during evolution. However, one might also try to find sites that are conserved in one part of the tree but are variable in the other. Such rate shifts may indicate change in the selection intensity at specific sites during evolution (reviewed in Gaucher et al. 2002). Rate shifts can also be inferred using an empirical Bayesian approach (Susko et al. 2002; Blouin, Boucher, and Roger 2003) or by using ML (Knudsen and Miyamoto 2001; Pupko and Galtier 2002). In our simulations we assumed that the tree topology is known a priori. In cases where this is not the case, one might use the Markov chain Monte Carlo technique to take the uncertainty of the tree topology into account (Huelsenbeck et al. 2001). Bayesian methods in phylogeny were recently criticized by Suzuki, Glazko, and Nei (2002) in the context of overestimation of Bayesian support for internal nodes. In our case, however, we limited the Bayesian part to a Gamma prior over the evolutionar... |

15 |
Site-by-site estimation of the rate of substitution and the correlation of rates in mitochondrial DNA.
- Nielsen
- 1997
(Show Context)
Citation Context ...o not. Both approaches have solid statistical foundations and are closely related, as they use the same models of evolution and operate within the same statistical framework. The ML approach for estimating site-specific conservation scores chooses the rate that yields the highest probability to the observed data. The first site-specific rate estimation usingML was the DNArates program developed in the early 1990s by Gary Olsen. A paper describing DNArates was never published, but documentation can be found at http://geta.life.uiuc.edu/;gary/programs/ DNArates.html (see also Felsenstein 2001). Nielsen (1997) also studied ML based estimation for DNA sequences and suggested incorporating a Gamma prior to avoid cases where the ML estimate is infinite. Using the same ML methodology, Pupko et al. (2002) developed the Rate4Site tool for the identification of functional regions in proteins. Rate4Site was embedded in the ConSurf server (Glaser et al. 2003; http://consurf.tau.ac.il) and successfully identified functional residues at the contact interface of several proteins (Donaudy et al. 2003; Mella et al. 2003; Ramelot et al. 2003; RamShankar et al. 2003). Bayesian inference is based on the posterior p... |

14 | Pattern of nucleotide substitution and rate heterogeneity in the hypervariable regions I and II of human mtDNA. - Meyer, Weiss, et al. - 1999 |

14 |
Mixed model analysis of DNA sequence evolution.
- Yang, Wang
- 1995
(Show Context)
Citation Context ...ol for the identification of functional regions in proteins. Rate4Site was embedded in the ConSurf server (Glaser et al. 2003; http://consurf.tau.ac.il) and successfully identified functional residues at the contact interface of several proteins (Donaudy et al. 2003; Mella et al. 2003; Ramelot et al. 2003; RamShankar et al. 2003). Bayesian inference is based on the posterior probability distribution, which is directly proportional to the product of the prior distribution and the likelihood. A Bayesian approach, assuming a Gamma prior for DNA sequences, was suggested by Yang and collaborators (Yang and Wang 1995; Excoffier and Yang 1999). Computing a Bayesian estimate based on a continuous Gamma Key words: rate variation among sites, evolutionary conservation, empirical Bayesian methods, bioinformatics, Bcl-xL. E-mail: dgraur@uh.edu. Mol. Biol. Evol. 21(9):1781–1791. 2004 doi:10.1093/molbev/msh194 Advance Access publication June 16, 2004 Molecular Biology and Evolution vol. 21 no. 9 Society for Molecular Biology and Evolution 2004; all rights reserved. distribution is computationally impracticable for even a modest number of sequences (Yang 1996). Yang (1994) suggested the discrete Gamma model as a... |

7 | Identifying site-specific substitution rates. - Meyer, Haeseler - 2003 |

5 |
An empirical analysis of mt 16S rRNA covarion-like evolution in insects: Site-specific rate variation is clustered and frequently detected.
- Misof, Anderson, et al.
- 2002
(Show Context)
Citation Context .... This is unfortunate, since these are exactly the rates we seek to identify when predicting functionally important sites. We note, however, that Yang’s (1994) emphasis was either phylogenetic tree reconstruction or estimating the shape of the Gamma distribution, which may not change dramatically with the number of categories. In contrast, here we were interested in the rates themselves. The discrete Gamma method with eight categories was recently used by Susko et al. (2002) to infer rate shifts in different subtrees and by Excoffier and Yang (1999), Meyer, Weiss, and von Haeseler (1999), and Misof et al. (2002) to infer substitution rates per site. In light of our findings, choosing 16 categories instead of eight may improve the results. The simulation results showed that EB-EXP performs better thanML. Since both methods use the same likelihood function in their computations, the differences between EBEXP and ML must be due to the incorporation of the prior distribution, which reduces the posterior probability of extreme unfavorable observed rates in EB-EXP. It can be claimed that the superiority of the Bayesian approach depends on how well the prior function fits the data. An empirical Bayesian app... |

5 |
Solution structure of Vibrio cholerae protein VC0424: A variation of the ferredoxin-like fold. Protein Sci.
- Ramelot, Ni, et al.
- 2003
(Show Context)
Citation Context ...geta.life.uiuc.edu/;gary/programs/ DNArates.html (see also Felsenstein 2001). Nielsen (1997) also studied ML based estimation for DNA sequences and suggested incorporating a Gamma prior to avoid cases where the ML estimate is infinite. Using the same ML methodology, Pupko et al. (2002) developed the Rate4Site tool for the identification of functional regions in proteins. Rate4Site was embedded in the ConSurf server (Glaser et al. 2003; http://consurf.tau.ac.il) and successfully identified functional residues at the contact interface of several proteins (Donaudy et al. 2003; Mella et al. 2003; Ramelot et al. 2003; RamShankar et al. 2003). Bayesian inference is based on the posterior probability distribution, which is directly proportional to the product of the prior distribution and the likelihood. A Bayesian approach, assuming a Gamma prior for DNA sequences, was suggested by Yang and collaborators (Yang and Wang 1995; Excoffier and Yang 1999). Computing a Bayesian estimate based on a continuous Gamma Key words: rate variation among sites, evolutionary conservation, empirical Bayesian methods, bioinformatics, Bcl-xL. E-mail: dgraur@uh.edu. Mol. Biol. Evol. 21(9):1781–1791. 2004 doi:10.1093/molbev/msh... |

2 |
Information transfer in the penta-EF-hand protein sorcin does not operate via the canonical structural/ functional pairing. A study with site-specific mutants.
- Mella, Colotti, et al.
- 2003
(Show Context)
Citation Context ...be found at http://geta.life.uiuc.edu/;gary/programs/ DNArates.html (see also Felsenstein 2001). Nielsen (1997) also studied ML based estimation for DNA sequences and suggested incorporating a Gamma prior to avoid cases where the ML estimate is infinite. Using the same ML methodology, Pupko et al. (2002) developed the Rate4Site tool for the identification of functional regions in proteins. Rate4Site was embedded in the ConSurf server (Glaser et al. 2003; http://consurf.tau.ac.il) and successfully identified functional residues at the contact interface of several proteins (Donaudy et al. 2003; Mella et al. 2003; Ramelot et al. 2003; RamShankar et al. 2003). Bayesian inference is based on the posterior probability distribution, which is directly proportional to the product of the prior distribution and the likelihood. A Bayesian approach, assuming a Gamma prior for DNA sequences, was suggested by Yang and collaborators (Yang and Wang 1995; Excoffier and Yang 1999). Computing a Bayesian estimate based on a continuous Gamma Key words: rate variation among sites, evolutionary conservation, empirical Bayesian methods, bioinformatics, Bcl-xL. E-mail: dgraur@uh.edu. Mol. Biol. Evol. 21(9):1781–1791. 2004 d... |

2 |
Contribution of connexin26 (GJB2) mutations and founder effect to non-syndromic hearing loss in India.
- RamShankar, Girirajan, et al.
- 2003
(Show Context)
Citation Context ...ary/programs/ DNArates.html (see also Felsenstein 2001). Nielsen (1997) also studied ML based estimation for DNA sequences and suggested incorporating a Gamma prior to avoid cases where the ML estimate is infinite. Using the same ML methodology, Pupko et al. (2002) developed the Rate4Site tool for the identification of functional regions in proteins. Rate4Site was embedded in the ConSurf server (Glaser et al. 2003; http://consurf.tau.ac.il) and successfully identified functional residues at the contact interface of several proteins (Donaudy et al. 2003; Mella et al. 2003; Ramelot et al. 2003; RamShankar et al. 2003). Bayesian inference is based on the posterior probability distribution, which is directly proportional to the product of the prior distribution and the likelihood. A Bayesian approach, assuming a Gamma prior for DNA sequences, was suggested by Yang and collaborators (Yang and Wang 1995; Excoffier and Yang 1999). Computing a Bayesian estimate based on a continuous Gamma Key words: rate variation among sites, evolutionary conservation, empirical Bayesian methods, bioinformatics, Bcl-xL. E-mail: dgraur@uh.edu. Mol. Biol. Evol. 21(9):1781–1791. 2004 doi:10.1093/molbev/msh194 Advance Access public... |

1 | Associate Editor Accepted - Pamilo - 2004 |