Results 1 - 10
of
27
An expectation/maximization nuclear vector replacement algorithm for automated NMR resonance assignments
- J. Biomol. NMR
"... High-throughput NMR structural biology can play an important role in structural genomics. We report an automated procedure for high-throughput NMR resonance assignment for a protein of known structure, or of a homologous structure. These assignments are a prerequisite for probing protein–protein int ..."
Abstract
-
Cited by 27 (9 self)
- Add to MetaCart
High-throughput NMR structural biology can play an important role in structural genomics. We report an automated procedure for high-throughput NMR resonance assignment for a protein of known structure, or of a homologous structure. These assignments are a prerequisite for probing protein–protein interactions, protein–ligand binding, and dynamics by NMR. Assignments are also the starting point for structure determination and refinement. A new algorithm, called Nuclear Vector Replacement (NVR) is introduced to compute assignments that optimally correlate experimentally measured NH residual dipolar couplings (RDCs) to a given a priori whole-protein 3D structural model. The algorithm requires only uniform 15 N-labeling of the protein and processes unassigned H N- 15 N HSQC spectra, H N- 15 N RDCs, and sparse H N-H N NOE’s (dNNs), all of which can be acquired in a fraction of the time needed to record the traditional suite of experiments used to perform resonance assignments. NVR runs in minutes and efficiently assigns the (H N, 15 N) backbone resonances as well as the dNNs of the 3D 15 N-NOESY spectrum, in O(n 3) time. The algorithm is demonstrated on NMR data from a 76-residue protein, human ubiquitin, matched to four structures, including
A random graph approach to NMR sequential assignment
- In Proceedings of The International Conference on Computational Molecular Biology (RECOMB
, 2004
"... Nuclear magnetic resonance (NMR) spectroscopy allows scientists to study protein structure, dynamics and interactions in solution. A necessary first step for such applications is determining the resonance assignment, mapping spectral data to atoms and residues in the primary sequence. Automated reso ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
Nuclear magnetic resonance (NMR) spectroscopy allows scientists to study protein structure, dynamics and interactions in solution. A necessary first step for such applications is determining the resonance assignment, mapping spectral data to atoms and residues in the primary sequence. Automated resonance assignment algorithms rely on information regarding connectivity (e.g., through-bond atomic interactions) and amino acid type, typically using the former to determine strings of connected residues and the latter to map those strings to positions in the primary sequence. Significant ambiguity exists in both connectivity and amino acid type information. This paper focuses on the information content available in connectivity alone and develops a novel random-graph theoretic framework and algorithm for connectivity-driven NMR sequential assignment. Our random graph model captures the structure of chemical shift degeneracy, a key source of connectivity ambiguity. We then give a simple and natural randomized algorithm for finding optimal assignments as sets of connected fragments in NMR graphs. The algorithm naturally and efficiently reuses substrings while exploring connectivity choices; it overcomes local ambiguity by enforcing global consistency of all choices. By analyzing our algorithm under our random graph model, we show that it can provably tolerate relatively large ambiguity while still giving expected optimal performance in polynomial time. We present results from practical applications of the algorithm to experimental datasets from a variety of proteins and experimental set-ups. We demonstrate that our approach is able to overcome significant noise and local ambiguity in identifying significant fragments of sequential assignments. Key words: nuclear magnetic resonance (NMR) spectroscopy, automated sequential resonance assignment, random graph model, randomized algorithm, Hamiltonian path. 1.
Rapid Protein Structure Detection and Assignment using Residual Dipolar Couplings
, 2002
"... The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Pennsylvania Department of Health, or of any Motivation: High-throughput structural proteomics requires fast robust a ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Pennsylvania Department of Health, or of any Motivation: High-throughput structural proteomics requires fast robust algorithms for extracting protein structure from sparse experimental data. Current approaches are too slow. Determining the 3D structure of an unknown protein may require 6–12 months, mainly for data interpretation. Determining ligand induced changes in structure of a previously known protein may still require weeks of effort. This second problem is of great interest to drug designers, and is our main focus in this paper. A key step is the resonance assignment problem, in which observed NMR peaks must be matched to a protein’s atoms. Contributions: This paper describes two novel procedures, together called PEPMORPH, for inferring structure and assigning resonances: (1) A method for extracting combinatorial protein substructures directly from sparse NMR experiments; (2) A method for matching experimental to known substructures by exploiting the orientational constraint of residual dipolar coupling (RDC). PEPMORPH reverses the traditional approach, in which NMR resonances are assigned prior to
3D Structural Homology Detection via Unassigned Residual Dipolar Couplings
- Proc. IEEE Computer Society Bioinformatics Conference (CSB
, 2003
"... Recognition of a protein’s fold provides valuable information about its function. While many sequence-based homology prediction methods exist, an important challenge remains: two highly dissimilar sequences can have similar folds — how can we detect this rapidly, in the context of structural genomic ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Recognition of a protein’s fold provides valuable information about its function. While many sequence-based homology prediction methods exist, an important challenge remains: two highly dissimilar sequences can have similar folds — how can we detect this rapidly, in the context of structural genomics? High-throughput NMR experiments, coupled with novel algorithms for data analysis, can address this challenge. We report an automated procedure for detecting 3D structural homologies from sparse, unassigned protein NMR data. Our method identifies the 3D structural models in a protein structural database whose geometries best fit the unassigned experimental NMR data. It does not use sequence information and is thus not limited by sequence homology. The method can also be used to confirm or refute structural predictions made by other techniques such as protein threading or sequence homology. The algorithm runs in O(pnk 3) time, where p is the number of proteins in the database, n is the number of residues in the target protein, and k is the resolution of a rotation search. The method requires only uniform 15 N-labelling of the protein and processes unassigned H N- 15 N residual dipolar couplings, which can be acquired in a couple of hours. Our experiments on NMR data from 5 different proteins demonstrate that the method identifies closely related protein folds, despite low-sequence homology between the target protein and the computed model.
An Efficient and Accurate Algorithm for Assigning Nuclear Overhauser Effect Restraints Using a Rotamer Library Ensemble and Residual Dipolar Couplings
- IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS (CSB) CONFERENCE. STANFORD, CA 2005
, 2005
"... Nuclear Overhauser effect (NOE) distance restraints are the main experimental data from protein nuclear magnetic resonance (NMR) spectroscopy for computing a complete three dimensional solution structure including sidechain conformations. In general, NOE restraints must be assigned before they can b ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
Nuclear Overhauser effect (NOE) distance restraints are the main experimental data from protein nuclear magnetic resonance (NMR) spectroscopy for computing a complete three dimensional solution structure including sidechain conformations. In general, NOE restraints must be assigned before they can be used in a structure determination program. NOE assignment is very time-consuming to do manually, challenging to fully automate, and has become a key bottleneck for high-throughput NMR structure determination. The difficulty in automated NOE assignment is ambiguity: there can be tens of possible different assignments for an NOE peak based solely on its chemical shifts. Previous automated NOE assignment approaches rely on an ensemble of structures, computed from a subset of all the NOEs, to iteratively filter ambiguous assignments. These algorithms are heuristic in nature, provide no guarantees on solution quality or running time, and are slow in practice. In this paper we present an accurate, efficient NOE assignment algorithm. The algorithm first invokes the algorithm in [30, 29] to compute an accurate backbone structure using only two backbone residual dipolar couplings (RDCs) per residue. The algorithm then filters ambiguous NOE assignments by merging an ensemble of intra-residue vectors from a protein rotamer database, together with internuclear vectors from the computed backbone structure. The protein rotamer database was built from ultra-high resolution structures (<1.0 ˚A) in the Protein Data Bank (PDB). The algorithm has been successfully applied to assign more than 1,700 NOE distance restraints with better than 90% accuracy on the protein human ubiquitin using
real experimentally-recorded NMR data. The algorithm as-
signs these NOE restraints in less than one second on a
single-processor workstation.
Protein similarity from knot theory and geometric convolution
- J Comput Biol
, 2004
"... interpreted as representing the official policies, either expressed or implied, of the Pennsylvania Department ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
interpreted as representing the official policies, either expressed or implied, of the Pennsylvania Department
RIBRA: An error-tolerant algorithm for the NMR backbone assignment problem
- Journal of Computational Biology
"... We develop an iterative relaxation algorithm, called RIBRA, for NMR protein backbone assignment. RIBRA applies nearest neighbor and weighted maximum independent set algorithms to solve the problem. To deal with noisy NMR spectral data, RIBRA is executed in an iterative fashion based on the quality o ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We develop an iterative relaxation algorithm, called RIBRA, for NMR protein backbone assignment. RIBRA applies nearest neighbor and weighted maximum independent set algorithms to solve the problem. To deal with noisy NMR spectral data, RIBRA is executed in an iterative fashion based on the quality of spectral peaks. We first produce spin system pairs using the spectral data without missing peaks, then the data group with one missing peak, and finally, the data group with two missing peaks. We test RIBRA on two real NMR datasets: hbSBD and hbLBD, and perfect BMRB data (with 902 proteins) and four synthetic BMRB data which simulate four kinds of errors. The accuracy of RIBRA on hbSBD and hbLBD are 91.4 % and 83.6%, respectively. The average accuracy of RIBRA on perfect BMRB datasets is 98.28%, and 98.28%, 95.61%, 98.16 % and 96.28 % on four kinds of synthetic datasets, respectively.
Reducing Mass Degeneracy in SAR by MS by Stable Isotopic Labeling
- J. Comp. Bio
, 2001
"... Mass spectrometry (MS) promises to be an invaluable tool for functional genomics, by supporting low-cost, high-throughput experiments. However, large-scale MS faces the potential problem of mass degeneracy -- indistinguishable masses for multiple biopolymer fragments (e.g., from a limited proteolyti ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Mass spectrometry (MS) promises to be an invaluable tool for functional genomics, by supporting low-cost, high-throughput experiments. However, large-scale MS faces the potential problem of mass degeneracy -- indistinguishable masses for multiple biopolymer fragments (e.g., from a limited proteolytic digest). This paper studies the tasks of planning and interpreting MS experiments that use selective isotopic labeling, thereby substantially reducing potential mass degeneracy. Our algorithms support an experimental-computational protocol called structure-activity relation by mass spectrometry (SAR by MS) for elucidating the function of protein--DNA and protein-protein complexes. SAR by MS enzymatically cleaves a crosslinked complex and analyzes the resulting mass spectrum for mass peaks of hypothesized fragments. Depending on binding mode, some cleavage sites will be shielded; the absence of anticipated peaks implicates corresponding fragments as either part of the interaction region or inaccessible due to conformational change upon binding. Thus, different mass spectra provide evidence for different structure-activity relations. We address combinatorial and algorithmic questions in the areas of data analysis (constraining binding mode based on mass signature) and experiment planning (determining an isotopic labeling strategy to reduce mass degeneracy and aid data analysis). We explore the computational complexity of these problems, obtaining upper and lower bounds. We report experimental results from implementations of our algorithms.
High-throughput 3D structural homology detection via NMR resonance assignment
- in Proc. CSB, 2004
, 2004
"... One goal of the structural genomics initiative is the identification of new protein folds. Sequence-based structural homology prediction methods are an important means for prioritizing unknown proteins for structure determination. However, an important challenge remains: two highly dissimilar sequen ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
One goal of the structural genomics initiative is the identification of new protein folds. Sequence-based structural homology prediction methods are an important means for prioritizing unknown proteins for structure determination. However, an important challenge remains: two highly dissimilar sequences can have similar folds — how can we detect this rapidly, in the context of structural genomics? High-throughput NMR experiments, coupled with novel algorithms for data analysis, can address this challenge. We report an automated procedure, called HD, for detecting 3D structural homologies from sparse, unassigned protein NMR data. Our method identifies 3D models in a protein structural database whose geometries best fit the unassigned experimental NMR data. HD does not use, and is thus not limited by sequence homology. The method can also be used to confirm or refute structural predictions made by other techniques such as protein threading or homology modelling. The algorithm runs in O(pn + pn 5/2 log (cn)+p log p) time, where p is the number of proteins in the database, n is the number of residues in the target protein and c is the maximum edge weight in an integerweighted bipartite graph. Our experiments on real NMR data from 3 different proteins against a database of 4,500 representative folds demonstrate that the method identifies closely related protein folds, including sub-domains of larger proteins, with as little as 10-30 % sequence homology between the target protein (or sub-domain) and the computed model. In particular, we report no false-negatives or false-positives despite significant percentages of missing experimental data.
IPASS: error tolerant NMR backbone resonance assignment by linear programming
, 2009
"... Abstract. The automation of the entire NMR protein structure determination process requires a superior error tolerant backbone resonance assignment method. Although a variety of assignment approaches have been developed, none works well on noisy automatically picked peaks. IPASS is proposed as a nov ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Abstract. The automation of the entire NMR protein structure determination process requires a superior error tolerant backbone resonance assignment method. Although a variety of assignment approaches have been developed, none works well on noisy automatically picked peaks. IPASS is proposed as a novel integer linear programming (ILP) based assignment method. In order to reduce size of the problem, IPASS employs probabilistic spin system typing based on chemical shifts and secondary structure predictions. Furthermore, IPASS extracts connectivity information from the inter-residue information and the 15 N-edited NOESY peaks which are then used to fix reliable fragments. The experimental results demonstrate that IPASS significantly outperforms the previous assignment methods on the synthetic data sets. It achieves an average of 99 % precision and 96 % recall on the synthesized spin systems, and an average of 96 % precision and 90 % recall on the synthesized peak lists. When applied on automatically picked peaks from experimentally derived data sets, it achieves an average precision and recall of 78 % and 67%, respectively. In contrast, the next best method, MARS, achieved an average precision and recall of 50 % and 40%, respectively. Availability: IPASS is available upon request, and the web server for IPASS is under construction.

