Results 1  10
of
27
Protein Design by Sampling an Undirected Graphical Model of Residue Constraints
"... Protein engineering seeks to produce amino acid sequences with desired characteristics, such as specified structure [1] or function [4]. This is a difficult problem due to interactions among residues; choosing an amino acid type at one position may constrain the possibilities at others, in order for ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
Protein engineering seeks to produce amino acid sequences with desired characteristics, such as specified structure [1] or function [4]. This is a difficult problem due to interactions among residues; choosing an amino acid type at one position may constrain the possibilities at others, in order for the resulting protein to have proper structure and activity. To account for the dependence of some residues and take advantage of the independence of others, we have developed a new approach to protein design based on undirected probabilistic graphical models (Fig. 1). Our approach first constructs a graphical model that encodes residue constraints, and then uses the model generatively to produce new sequences optimized to meet the constraints. We focus here on constraints due to residue coupling, common pairs of amino acid types at particular pairs of positions, also known as correlated mutations or coevolving residues. Recently, Ranganathan and colleagues showed that accounting for residue coupling, in addition to conservation, was to some extent both necessary and sufficient for viability of new WW domains [6, 5]. We have previously developed an approach for learning an undirected graphical model encapsulating conservation and coupling constraints in a protein family [7]. Our model provides a formal probabilistic semantics for reasoning about amino acid choices, defining a probability distribution function measuring how well a new sequence satisfies coupling constraints observed in the extant sequences of a family. Thus in order to design
Hypergraph model of multiresidue interactions in proteins: sequentiallyconstrained partitioning algorithms for optimization of sitedirected protein recombination
 In Proc. RECOMB
, 2006
"... Abstract. Relationships among amino acids determine stability and function and are also constrained by evolutionary history. We develop a probabilistic hypergraph model of residue relationships that generalizes traditional pairwise contact potentials to account for the statistics of multiresidue in ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
(Show Context)
Abstract. Relationships among amino acids determine stability and function and are also constrained by evolutionary history. We develop a probabilistic hypergraph model of residue relationships that generalizes traditional pairwise contact potentials to account for the statistics of multiresidue interactions. Using this model, we detected nonrandom associations in protein families and in the protein database. We also use this model in optimizing sitedirected recombination experiments to preserve significant interactions and thereby increase the frequency of generating useful recombinants. We formulate the optimization as a sequentiallyconstrained hypergraph partitioning problem; the quality of recombinant libraries wrt a set of breakpoints is characterized by the total perturbation to edge weights. We prove this problem to be NPhard in general, but develop exact and heuristic polynomialtime algorithms for a number of important cases. Application to the betalactamase family demonstrates the utility of our algorithms in planning sitedirected recombination. 1
MODELING AND INFERENCE OF SEQUENCESTRUCTURE SPECIFICITY
, 2009
"... In order to evaluate protein sequences for simultaneous satisfaction of evolutionary and physical constraints, this paper develops a graphical model approach integrating sequence information from the evolutionary record of a protein family with structural information based on a molecular mechanics f ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
In order to evaluate protein sequences for simultaneous satisfaction of evolutionary and physical constraints, this paper develops a graphical model approach integrating sequence information from the evolutionary record of a protein family with structural information based on a molecular mechanics force field. Nodes in the graphical model represent choices for the backbone (native vs. nonnative), amino acids (conservation analysis), and sidechain conformations (rotamer library). Edges capture dependence relationships, in both the sequence (correlated mutations) and the structure (direct physical interactions). The sequence and structure components of the model are complementary, in that the structure component may support choices that were not present in the sequence record due to bias and artifacts, while the sequence component may capture other constraints on protein viability, such as permitting an efficient folding pathway. Inferential procedures enable computation of the joint probability of a sequencestructure pair, thereby assessing the quality of the sequence with respect to both the protein family and the specificity of its energetic preference for the native structure against alternate backbone structures. In a case study of WW domains, we show that by using the joint model and evaluating specificity, we obtain better prediction of foldedness of designed proteins (AUC of 0.85) than either a sequenceonly or a structureonly model, and gain insights into how, where, and why the sequence and structure components complement each other.
Algorithms for Joint Optimization of Stability and Diversity in Planning Combinatorial Libraries of Chimeric Proteins
"... Abstract. In engineering protein variants by constructing and screening combinatorial libraries of chimeric proteins, two complementary and competing goals are desired: the new proteins must be similar enough to the evolutionarilyselected wildtype proteins to be stably folded, and they must be dif ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
Abstract. In engineering protein variants by constructing and screening combinatorial libraries of chimeric proteins, two complementary and competing goals are desired: the new proteins must be similar enough to the evolutionarilyselected wildtype proteins to be stably folded, and they must be different enough to display functional variation. We present here the first method, Staversity, to simultaneously optimize stability and diversity in selecting sets of breakpoint locations for sitedirected recombination. Our goal is to uncover all “undominated ” breakpoint sets, for which no other breakpoint set is better in both factors. Our first algorithm finds the undominated sets serving as the vertices of the lower envelope of the twodimensional (stability and diversity) convex hull containing all possible breakpoint sets. Our second algorithm identifies additional breakpoint sets in the concavities that are either undominated or dominated only by undiscovered breakpoint sets within a distance bound computed by the algorithm. Both algorithms are efficient, requiring only
A graphical model approach for predicting free energies of association for proteinprotein interactions under backbone . . .
, 2008
"... ..."
Accounting for conformational entropy in predicting binding free energies of proteinprotein interactions. PLoS Comput Biol, (under review
, 2010
"... complex Proteinprotein interactions are governed by the change in free energy upon binding, ∆G=∆H − T ∆S. These interactions are often marginally stable, so one must examine the balance between the change in enthalpy, ∆H, and the change in entropy, ∆S, when investigating known complexes, characteri ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
complex Proteinprotein interactions are governed by the change in free energy upon binding, ∆G=∆H − T ∆S. These interactions are often marginally stable, so one must examine the balance between the change in enthalpy, ∆H, and the change in entropy, ∆S, when investigating known complexes, characterizing the effects of mutations, or designing optimized variants. In order to perform a largescale study into the contribution of conformational entropy to binding free energy, we developed a technique called GOBLIN (Graphical mOdel for BiomoLecular INteractions) that performs physicsbased free energy calculations for proteinprotein complexes under both sidechain and backbone flexibility. GOBLIN uses a probabilistic graphical model that exploits conditional independencies in the Boltzmann distribution and employs variational inference techniques that approximate the free energy of binding in only a few minutes. We examined the role of conformational entropy on a benchmark set of more than 700 mutants in eight large, wellstudied complexes. Our findings suggest that conformational entropy is important in proteinprotein interactions—the
Structure Learning for Generative Models of Protein Fold Families
, 2009
"... Statistical models of the amino acid composition of the proteins within a fold family are widely ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Statistical models of the amino acid composition of the proteins within a fold family are widely
Protein Fragment Swapping: A Method for Asymmetric, Selective Sitedirected Recombination
"... Abstract. This paper presents a new approach to sitedirected recombination, swapping combinations of selected discontiguous fragments from a source protein in place of corresponding fragments of a target protein. By being both asymmetric (differentiating source and target) and selective (swapping d ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. This paper presents a new approach to sitedirected recombination, swapping combinations of selected discontiguous fragments from a source protein in place of corresponding fragments of a target protein. By being both asymmetric (differentiating source and target) and selective (swapping discontiguous fragments), our method focuses experimental effort on a more restricted portion of sequence space, constructing hybrids that are more likely to have the properties that are the objective of the experiment. Furthermore, since the source and target need to be structurally homologous only locally (rather than overall), our method supports swapping fragments from functionally important regions of a source into a target “scaffold”; e.g., to humanize an exogenous therapeutic protein. A protein fragment swapping plan is defined by the residue position boundaries of the fragments to be swapped; it is assessed by an average potential score over the resulting hybrid library, with singleton and pairwise terms evaluating the importance and fit of the swapped residues. While we prove that it is NPhard to choose an optimal set of fragments under such a potential score, we develop an integer programming approach, which we call SWAGMER, that works very well in practice. We demonstrate the effectiveness of our method in two types of swapping problem: selective recombination between betalactamases and activity swapping between glutathione transferases. We show that the selective recombination approach generates a better plan (in terms of resulting potential score) than a traditional sitedirected recombination approach. We also show that in both cases the optimized experiment is significantly better than one that would result from stochastic methods. 1
Learning Generative Models for Protein Fold Families
"... We introduce a new approach to learning statistical models from multiple sequence alignments (MSA) of proteins. Our method, called GREMLIN (Generative REgularized ModeLs of proteINs), learns an undirected probabilistic graphical model of the amino acid composition within the MSA. The resulting model ..."
Abstract
 Add to MetaCart
(Show Context)
We introduce a new approach to learning statistical models from multiple sequence alignments (MSA) of proteins. Our method, called GREMLIN (Generative REgularized ModeLs of proteINs), learns an undirected probabilistic graphical model of the amino acid composition within the MSA. The resulting model encodes both the positionspecific conservation statistics and the correlated mutation statistics between sequential and longrange pairs of residues. Existing techniques for learning graphical models from multiple sequence alignments either make strong, and often inappropriate assumptions about the conditional independencies within the MSA (e.g., Hidden Markov Models), or else use suboptimal algorithms to learn the parameters of the model. In contrast, GREMLIN makes no a priori assumptions about the conditional independencies within the MSA. We formulate and solve a convex optimization problem, thus guaranteeing that we find a globally optimal model at convergence. The resulting model is also generative, allowing for the design of new protein sequences that have the same statistical properties as those in the MSA. We perform a detailed analysis of covariation statistics on the extensively studied WW and PDZ domains and show that our method outperforms an existing algorithm for learning undirected probabilistic graphical models from MSA. We then apply our approach to 71 additional families from the PFAM database and demonstrate that the resulting models significantly outperform Hidden Markov Models in terms of predictive accuracy.