## Structured Probabilistic Models of Proteins across Spatial and Fitness Landscapes (2011)

### Cached

### Download Links

- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]
- [reports-archive.adm.cs.cmu.edu]

### BibTeX

@MISC{Kamichetty11structuredprobabilistic,

author = {Hetunandan Kamichetty and Jaime Carbonell},

title = {Structured Probabilistic Models of Proteins across Spatial and Fitness Landscapes},

year = {2011}

}

### OpenURL

### Abstract

representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government

### Citations

4079 |
Convex Optimization
- Boyd, Vandenberghe
- 2004
(Show Context)
Citation Context ...m can be solved extremely efficiently (in linear time) using an algorithm described in Schmidt et al. [2008]. Methods based on projected gradients are guaranteed to converge to a 73stationary point [=-=Boyd and Vandenberghe, 2004-=-], and convexity ensures that this stationary point is globally optimal. In order to scale the method to significantly larger domains, we can sub-divide the structure learning problem into two steps. ... |

955 |
Pro®le hidden Markov models
- Eddy
- 1998
(Show Context)
Citation Context ...ucture using a variety of techniques, including: (i) our algorithm, GREMLIN; (ii) the greedy algorithm of Thomas et al. [2005, 2008b], denoted GMRC method’; and (iii) 75Profile Hidden Markov Models [=-=Eddy, 1998-=-] used by Bateman et al. [2002]. We note that the GMRC method only considers edges that meet certain coupling criteria (see Thomas et al. [2005, 2008b] for details). In particular, we found that it re... |

932 | Optimizing search engines using clickthrough data
- Joachims
- 2002
(Show Context)
Citation Context ...me loss-function L between G and y on B. Many approaches have been developed for the task of learning to rank, especially in IR tasks like document24retrieval [Herbrich et al., 2000] and web-search [=-=Joachims, 2002-=-]. These tasks differ in their choice of the loss function L and the algorithms used to minimize it. While initial approaches to ranking approached the ranking problem as a large number of pair-wise c... |

842 | The Protein data bank - Berman, Westbrook, et al. - 2000 |

747 |
Information theory and statistical mechanics
- Jaynes
- 1957
(Show Context)
Citation Context ...ases and as n −→ ∞, S −→ ∞ and is completely unconnected to Sphysical. This problem arises in many scenarios, most notably for our purposes, in informationtheoretic treatments of statistical physics [=-=Jaynes, 1963-=-, 1968]. Fortunately, a solution to this problem is available, which to the best of our knowledge is due to E.T. Jaynes [Jaynes, 1963]. By using a measure (i.e. a possibly unnormalized probability dis... |

567 |
A novel genetic system to detect protein-protein interactions
- Fields, Song
- 1989
(Show Context)
Citation Context ... of the cell; transient or persistent complexes mediate processes including regulation, signaling, transport, and catalysis. While coarse-grained, high-throughput techniques such as yeast two-hybrid [=-=Fields and Song, 1989-=-] are primarily focused on which proteins interact, finer-grained techniques based on structural analysis address questions of how and why these interactions occur. By modeling the physical interactio... |

519 | The genetical evolution of social behaviour - Hamilton - 1964 |

469 | The Pfam protein families database. Nucleic Acids Res - Bateman, Birney, et al. |

411 |
Subjectivity and Correlation in Randomized Strategies
- Aumann
- 1974
(Show Context)
Citation Context ... of a mixed strategy profile specifies that each player samples from πi independent of other players. Relaxing this requirement of independence results in equilibria called correlated equilibria (CE)[=-=Aumann, 1974-=-]. Thus, a CE is any joint distribution π over the player’s actions such that ∀i, ∀a i , a ′ i ∈ A i , ∑ π(a i , a −i )ui(a i , a −i ) ≥ ∑ π(a i , a −i )ui(a ′ i, a −i ) a−i a−i It is easy to see that... |

388 | Learning to rank using gradient descent
- Burges, Shaked, et al.
- 2005
(Show Context)
Citation Context ...pproaches have shown the utility of using loss-functions based on the entire rank, or the so-called “list-wise” approaches [Cao et al., 2007, Xia et al., 2008]. Further, a “soft” approach to ranking [=-=Burges et al., 2005-=-] has allowed the use of gradient-based continuous optimization techniques instead of combinatorial optimization. We use a “list-wise” soft-ranking approach to ranking since it has been shown to have ... |

380 |
Inferring phylogenies. Sinauer Associates
- Felsenstein
- 2004
(Show Context)
Citation Context ...n associated with a sequence-only approach to learning a statistical model for a domain family is that the correlations observed in the MSA can be inflated due to phylogeny [Pollock and Taylor, 1997, =-=Felsenstein, 2003-=-]. A pair of co-incident mutations at the root of the tree can appear as a significant dependency even though they correspond to just once co-incident mutation event. To test if this was the case with... |

324 | Correlated Equilibrium as an Expression of Bayesian Rationality. Econometrica 55
- Aumann
- 1987
(Show Context)
Citation Context ..., a −i ) ∀a ∈ A, π(a) ≥ 0 ∑ a∈A π(a) = 1 CE have several properties that make it more attractive than NE: they can lead to more efficient outcomes; they can be viewed as a Bayesian alternative to NE [=-=Aumann, 1987-=-], they are easier to compute than NE (which are PPAD-complete [Daskalakis et al., 2009]). Finally, there exist natural algorithms that allow players of a game to converge to a CE[Foster and Vohra, 19... |

319 |
Large Margin Rank Boundaries for Ordinal Regression
- Herbrich, Graepel, et al.
- 2000
(Show Context)
Citation Context ... score for each model that minimizes some loss-function L between G and y on B. Many approaches have been developed for the task of learning to rank, especially in IR tasks like document24retrieval [=-=Herbrich et al., 2000-=-] and web-search [Joachims, 2002]. These tasks differ in their choice of the loss function L and the algorithms used to minimize it. While initial approaches to ranking approached the ranking problem ... |

260 | Approximating probabilistic inference in Bayesian belief networks is NP-hard
- Dagum, Luby
- 1993
(Show Context)
Citation Context ... in Eq. 3.4 is straightforward for any given configuration of the random variables. Computing the partition function in Eq. 3.5, on the other hand, is computationally intractable in the general case [=-=Dagum and Chavez, 1993-=-] because it involves sum18ming over every state. However, a number of rigorous approximation algorithms have been devised for performing inference in MRFs. Significantly, it has been shown that math... |

238 | The Complexity of Computing a Nash Equilibrium - Daskalakis, Goldberg, et al. - 2009 |

147 | Learning to rank: from pairwise approach to listwise approach
- Cao, Qin, et al.
- 2007
(Show Context)
Citation Context ... pair-wise classifications [Herbrich et al., 2000, Joachims, 2002], recent approaches have shown the utility of using loss-functions based on the entire rank, or the so-called “list-wise” approaches [=-=Cao et al., 2007-=-, Xia et al., 2008]. Further, a “soft” approach to ranking [Burges et al., 2005] has allowed the use of gradient-based continuous optimization techniques instead of combinatorial optimization. We use ... |

123 | Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations - Guerois, Nielsen, et al. - 2002 |

122 |
On the Cut Polytope
- Barahona, Mahjoub
- 1985
(Show Context)
Citation Context ...ersal must be different from at the beginning, a contradiction). The advantage of these constraints is that their violation can be detected and a violated constraint identified in graphs in polytime [=-=Barahona and Mahjoub, 1986-=-] by computing shortest paths in a related graph. Violated cycle inequalities are incorporated incrementally into the constraint set and the LP is re-solved until no more violations occur. Sontag and ... |

110 | Efficiency of pseudo-likelihood estimation for simple Gaussian fields”, Biometrika 64 - Besag - 1977 |

90 |
Correlated mutations and residue contacts in proteins
- Gobel, Sander, et al.
- 1994
(Show Context)
Citation Context ...of much interest due to its wide utility. Much of the early work focused on detecting such pairs in order to predict contacts in a protein in the absence of a solved structure [Altschuh et al., 1988, =-=Göbel et al., 1994-=-] and to perform fold recognition. The pioneering work of Lockless and Ranganathan [1999] used an approach to determine probabilistic dependencies they call SCA and observed that analyzing such patter... |

60 |
A new look at the statistical model identification,” Automatic Control
- Akaike
- 1974
(Show Context)
Citation Context ...an Information Criterion (BIC) [Schwarz, 1978], is used to select parsimonious models and is known to be asymptotically consistent in selecting the true model. The Akaike Information Criterion (AIC) [=-=Akaike, 2003-=-], typically selects denser models than the BIC, but is known to be asymptotically consistent in selecting the model with lowest predictive error (risk). In general, they do not however select the sam... |

50 | Protein design by binary patterning of polar and nonpolar amino acids - Kamtekar, Schiffer, et al. - 1993 |

48 |
Statistical theory of superlattices
- Bethe
- 1935
(Show Context)
Citation Context ...Bethe [1935], Kikuchi [1951], Morita [1991], Morita et al. [1994]). For example, it is now known that Pearl’s Belief Propagation (BP) algorithm [Pearl, 1986] is equivalent to the Bethe approximation [=-=Bethe, 1935-=-] of the free energy. Unless otherwise specified, we use Belief Propagation for inference in the following sections. The term ‘belief’ in both BP refers to the marginal distributions over the random v... |

41 | Markov random fields in statistics - Clifford - 1990 |

36 |
pKa's of ionizable groups in proteins: atomic detail from a continuum electrostatic model
- Bashford, Karplus
- 1990
(Show Context)
Citation Context ...s an example of a binary-valued graphical model where the pKa is equal to the log partition function of this graphical model. Classical approaches to this problem include a naive mean field approach [=-=Bashford and Karplus, 1990-=-], brute-force summation with some pruning[Bashford and Karplus, 1991] and even Structured Mean Field techniques [Gilson, 1993]. The examples described in the previous two subsections develop graphica... |

36 | Correlated Equilibria in Graphical Games - Kakade, Kearns, et al. - 2003 |

27 | Free energy estimates of all-atom protein structures using generalized belief propagation
- Kamisetty, Xing, et al.
- 2008
(Show Context)
Citation Context ...5, Kingsford et al., 2005, Canutescu et al., 2003], these rotamer libraries have also been used in computing free energies and conformational entropies of protein structures [Koehl and Delarue, 1994, =-=Kamisetty et al., 2007-=-, 2008, Lilien et al., 2005]. This approach of using a set of discrete rotameric states to compute the entropy faces a subtle problem. To understand this, let us consider an imaginary protein with exa... |

23 |
Consistency of maximum likelihood and pseudo-likelihood estimators for Gibbs Distributions
- Gidas
- 1988
(Show Context)
Citation Context ...eover, this approximation is known to yield a consistent estimate of the parameters under fairly general conditions if the generating distribution is in fact a pairwise MRF defined by a graph over S [=-=Gidas, 1988-=-]. That is, under these conditions, as the number of samples increases, parameter estimates using pseudolikelihood converge to the true parameters. 6.3.2 L1 Regularization The study of convex approxim... |

22 | in Molecular Dynamics. I. General Method - WAINWRIGHT - 1959 |

20 | Dunbrack Jr., "A graph theory algorithm for protein side-chain prediction - Canutescu, Shelenkov, et al. - 2003 |

19 | A semiempirical free energy force field with charge-based desolvation - Huey, Morris, et al. - 2007 |

17 |
Coordinated amino acid changes in homologous protein families
- Altschuh
- 1988
(Show Context)
Citation Context ...ins has been a problem of much interest due to its wide utility. Much of the early work focused on detecting such pairs in order to predict contacts in a protein in the absence of a solved structure [=-=Altschuh et al., 1988-=-, Göbel et al., 1994] and to perform fold recognition. The pioneering work of Lockless and Ranganathan [1999] used an approach to determine probabilistic dependencies they call SCA and observed that a... |

17 |
Multiple-site titration and molecular modelling: two rapid methods for computing energies and forces for ionizable groups in proteins, Proteins
- Gilson
- 1993
(Show Context)
Citation Context ...the protein. These interactions are naturally modelled as a graphical model. The probabilistic interpretation of the graphical model has a natural physical basis in terms of ionization free energies [=-=Gilson, 1993-=-, Bashford and Karplus, 1991]. The pKa of a protein is defined as follows: pKa ∝ −β n∑ 1∑ i=1 xi=0 xiGi + n∑ n∑ 1∑ xixjGij i=1 j>i xj=0 where xi refers to the ionization state (neutral, or charged) an... |

14 |
Multiple-site titration curves of proteins: an analysis of exact and approximate methods for their calculation
- Bashford, Karplus
- 1991
(Show Context)
Citation Context ...hese interactions are naturally modelled as a graphical model. The probabilistic interpretation of the graphical model has a natural physical basis in terms of ionization free energies [Gilson, 1993, =-=Bashford and Karplus, 1991-=-]. The pKa of a protein is defined as follows: pKa ∝ −β n∑ 1∑ i=1 xi=0 xiGi + n∑ n∑ 1∑ xixjGij i=1 j>i xj=0 where xi refers to the ionization state (neutral, or charged) and Gi, Gij are contributions ... |

14 | Computational design of a new hydrogen bond network and at least a 300-fold specificity switch at a protein-protein interface - Joachimiak, Kortemme, et al. - 2006 |

9 | Computing highly correlated positions using mutual information and graph theory for g protein-coupled receptors - Fatakia, Costanzi, et al. |

9 | A simple model of backbone flexibility improves modeling of side-chain conformational variability - Friedland, Linares, et al. |

9 | Ligand-dependent dynamics and intramolecular signaling in a PDZ domain - Fuentes, Der, et al. - 2004 |

9 | Modeling and Inference of Sequence-Structure Specificity - Kamisetty, Ghosh, et al. - 2009 |

8 | Predicting free energy changes using structural ensembles. Nat. Methods - Benedix, Becker - 2009 |

8 |
Mapping of two networks of residues that exhibit structural and dynamical changes upon binding in a pdz domain protein
- Dhulesia, Gsponer, et al.
- 2008
(Show Context)
Citation Context ...ively, in multiple studies, using a wide range of techniques ranging from computational approaches based on statistical coupling ([Lockless and Ranganathan, 1999]) and Molecular Dynamics simulations [=-=Dhulesia et al., 2008-=-], to NMR based experimental studies ([Fuentes et al., 2004]). We use the MSA from Lockless and Ranganathan [1999]. The MSA is an alignment of 240 non-redundant sequences, with 92 positions. We chose ... |

8 |
On evolutionary conservation of thermodynamic coupling in proteins
- Fodor, Aldrich
- 2004
(Show Context)
Citation Context ... could provide insights into the allosteric behav96ior of the proteins and be used to design new sequences [Socolich et al., 2005]. Others have since developed similar methods [Fatakia et al., 2009, =-=Fodor and Aldrich, 2004-=-, Fuchs et al., 2007]. By focusing on co-variation or probabilistic dependencies between residues, such methods conflate direct and indirect influences and can lead to incorrect estimates. In contrast... |

8 | Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space - Fromer, Yanover |

6 |
The control of apoptosis and drug resistance in ovarian cancer: influence of p53 and bcl-2
- Eliopoulos, Kerr, et al.
- 1995
(Show Context)
Citation Context ... resistance to second- and third line agents is seriously compromising treatment outcome. . . .” It is particularly common in scenarios where rapid evolution is possible – cancer [Sakai et al., 2008, =-=Eliopoulos et al., 1995-=-] and HIV [Clavel and Hance, 2004] for example – although bacteria and other microorganisms can also exhibit drug resistance from prolonged exposure [Soulsby, 2005]. To address this problem, this chap... |

5 | Statistical theory for protein ensembles with designed energy landscapes - Biswas, Zou, et al. |

4 | Matan Kalman, Sarel Fleishman, Nir BenTal, and Dmitrij Frishman. Co-evolving residues in membrane proteins - Fuchs, Martin-Galiano - 2007 |

2 |
Calibrated Learning and Correlated Equilibrium* 1
- Foster, Vohra
- 1997
(Show Context)
Citation Context ... to NE [Aumann, 1987], they are easier to compute than NE (which are PPAD-complete [Daskalakis et al., 2009]). Finally, there exist natural algorithms that allow players of a game to converge to a CE[=-=Foster and Vohra, 1997-=-]; no such algorithms are known to exist for NE in general. Fig. 8.1 shows the CE and NE constraints for the “Battle of the Sexes” game. Each point in the green polytope corresponds to a valid CE. The... |

1 | HIV drug resistance. The New England journal of medicine - Clavel, Hance - 1023 |

1 | Prior probabilities. Systems Science and Cybernetics - Jaynes - 1968 |

1 | A bayesian approach to protein model quality assessment
- Kamisetty, Langmead
- 2009
(Show Context)
Citation Context ...the properties of two loss-functions: the negative log-likelihood and the cross-entropy. Each loss-function is minimized by a quasi-Newton method. Details of these metrics are available in our paper [=-=Kamisetty and Langmead, 2009-=-]. 4.2 Results We studied the efficacy of our approach on a database of 32 proteins selected from Wroblewska and Skolnick [2007]. For each protein, this database contains a set of 50 plausible models ... |