Results 1 - 10
of
24
The HHpred interactive server for protein homology detection and structure prediction
- Nucleic Acids Res
, 2005
"... doi:10.1093/nar/gki408 ..."
(Show Context)
The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res. 37(Database Issue
, 2009
"... The Carbohydrate-Active Enzyme (CAZy) database is a knowledge-based resource specialized in the enzymes that build and breakdown complex carbohydrates and glycoconjugates. As of September 2008, the database describes the present knowledge on 113 glycoside hydrolase, 91 glycosyltransferase, 19 polysa ..."
Abstract
-
Cited by 124 (0 self)
- Add to MetaCart
(Show Context)
The Carbohydrate-Active Enzyme (CAZy) database is a knowledge-based resource specialized in the enzymes that build and breakdown complex carbohydrates and glycoconjugates. As of September 2008, the database describes the present knowledge on 113 glycoside hydrolase, 91 glycosyltransferase, 19 polysaccharide lyase, 15 carbohydrate esterase and 52 carbohydrate-binding module families. These families are created based on experimentally characterized proteins and are populated by sequences from public databases with significant similarity. Protein biochemical information is continuously curated based on the available literature and structural information. Over 6400 proteins have assigned EC numbers and 700 proteins have a PDB structure. The classification (i) reflects the structural features of these enzymes better than their sole substrate specificity, (ii) helps to reveal the evolutionary relationships between these enzymes and (iii) provides a convenient framework to understand mechanistic properties. This resource has been available for over 10 years to the scientific community, contributing to information dissemination and providing a transversal nomenclature to glycobiologists. More recently, this resource has been used to improve the quality of functional predictions of a number genome projects by providing expert annotation. The CAZy resource resides at
PDB2PQR: an automated pipeline for the setup of Poisson–Boltzmann electrostatics calculations
- Nucleic Acids Res
, 2004
"... Continuum solvation models, such as Poisson– Boltzmann and Generalized Born methods, have become increasingly popular tools for investigating the influence of electrostatics on biomolecular structure, energetics and dynamics. However, the use of such methods requires accurate and complete structural ..."
Abstract
-
Cited by 95 (3 self)
- Add to MetaCart
(Show Context)
Continuum solvation models, such as Poisson– Boltzmann and Generalized Born methods, have become increasingly popular tools for investigating the influence of electrostatics on biomolecular structure, energetics and dynamics. However, the use of such methods requires accurate and complete structural data as well as force field parameters such as atomic charges and radii. Unfortunately, the limiting step in continuum electrostatics calculations is often the addition of missing atomic coordinates to molecular structures from the Protein Data Bank and the assignment of parameters to biomolecular structures. To address this problem, we have developed the PDB2PQR web service
PISCES: recent improvements to a PDB sequence culling server
- Bioinformatics
, 2003
"... server ..."
(Show Context)
Blind men and elephants: what do citation summaries tell us about a research article
- Journal of the American Society for Information Science and Technology
, 2008
"... The old Asian legend about the blind men and the elephant comes to mind when looking at how different authors of scientific papers describe a piece of related prior work. It turns out that different citations to the same paper often focus on different aspects of that paper and that neither provides ..."
Abstract
-
Cited by 33 (9 self)
- Add to MetaCart
(Show Context)
The old Asian legend about the blind men and the elephant comes to mind when looking at how different authors of scientific papers describe a piece of related prior work. It turns out that different citations to the same paper often focus on different aspects of that paper and that neither provides a full description of its full set of contributions. In this paper we will describe our investigation of this phenomenon. We studied citation summaries in the context of research papers in the biomed-ical domain. A citation summary is the set of citing sentences for a given article and can be used as a surrogate for the actual article in a variety of scenarios. It contains information that was deemed by peers to be important. Our study shows that citation summaries overlap to some extent with the abstracts of the papers and that they also differ from them in that they focus on different aspects of these papers than the abstracts do. In addition to this, co-cited articles (which are pairs of articles cited by another article) tend to be similar. We show results based on a lexical similarity metric called cohesion to justify our claims. 1 1
Type-safe Computation with Heterogeneous Data
, 2007
"... Computation with large-scale heterogeneous data typically requires universal traversal to search forall occurrences of a substructure that matches a possibly complex search pattern, whose context may be different in different places within the data. Both aspects cause difficulty for existing general ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Computation with large-scale heterogeneous data typically requires universal traversal to search forall occurrences of a substructure that matches a possibly complex search pattern, whose context may be different in different places within the data. Both aspects cause difficulty for existing general-purpose programming languages, because these languages are designed for homogeneous data and have problems typing the different substructures in heterogeneous data, and the complex patterns to match with the substructures. Programmers either have to hard-code the structures and search patterns, preventing programs from being reusable and scalable, or have to use low-level untyped programming or programming with special-purpose query languages, opening the door to type mismatches that cause a high risk of program correctness and security problems. This thesis invents the concept of pattern structures, and proposes a general solution to the above problems -- a programming technique using pattern structures. In this solution, well-typed pattern structures are defined to represent complex search patterns, and pattern searching over heterogeneous data is programmed with pattern parameters, in a statically-typed language that supports first-class typing of structures and patterns. The resulting programs are statically-typed, highly reusable for different data structures and different patterns, and highly scalable in terms of the complexity of data structures and patterns. Adding new kinds of patterns for an application no longer requires changing the language in use or creating new ones, but is only a programming task. The thesis demonstrates the application of this approach to, and its advantages in, two important examples of computation with heterogeneous data, i.e., XML data processing and Java bytecode analysis.
Real-time Triangulation of Molecular Surfaces
"... Abstract. Protein consists of a set of atoms. Given a protein, the molecular surface of the protein is defined with respect to a probe approximating a solvent molecule. This paper presents an efficient, as efficient as the realtime, algorithm to triangulate the blending surfaces which is the most cr ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Protein consists of a set of atoms. Given a protein, the molecular surface of the protein is defined with respect to a probe approximating a solvent molecule. This paper presents an efficient, as efficient as the realtime, algorithm to triangulate the blending surfaces which is the most critical subset of a molecular surface. For the quick evaluation of points on the surface, the proposed algorithm uses masks which are similar in their concepts to those in subdivision surfaces. More fundamentally, the proposed algorithm takes advantage of the concise representation of topology among atoms stored in the β-shape which is indeed used in the computation of the blending surface itself. Given blending surfaces and the corresponding β-shape, the proposed algorithm triangulates the blending surfaces in O(c · m) time in the worst case, where m is the number of boundary atoms in the protein and c is the number of point evaluations on a patch in the blending surface.
Discovering novel interacting motif pairs from large protein–protein interaction datasets
- In Proceedings of 4th IEEE Symposium on Bioinformatics and Bioengineering (BIBE’04
, 2004
"... Current motif discovery methods can only detect individual motifs in groups of protein sequences—they do not discover potentially-interacting motif pairs underlying the interactions between the proteins. Such interacting motif pairs can be useful for the design and discovery of new drugs. Recent tec ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Current motif discovery methods can only detect individual motifs in groups of protein sequences—they do not discover potentially-interacting motif pairs underlying the interactions between the proteins. Such interacting motif pairs can be useful for the design and discovery of new drugs. Recent technological advances have made available large datasets of experimentally-detected protein-protein interactions. The functionally-induced co-occurring patterns inherent in the pairwise protein interaction data can be exploited to discover novel interacting motif pairs. In this work, we present an auto-mated method to discover novel interacting motif pairs from large datasets of protein-protein interactions. Using our method, we discovered 9,045 novel 1 interacting motif pairs from a large dataset of 78,390 interacting yeast pro-teins. Our method was able to discover motif pairs that are highly deter-ministic of protein interaction, with many of the motifs corresponding to structural contact sites in protein complexes, or experimentally-determined binding sites reported in the literature.
Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinformatics Duplicate Detection in Biological Data using Association Rule
"... Recent advancement in biotechnology has produced a massive amount of raw biological data which are accumulating at an exponential rate. Errors, redundancy and discrepancies are prevalent in the raw data, and there is a serious need for systematic approaches towards biological data cleaning. This wor ..."
Abstract
- Add to MetaCart
(Show Context)
Recent advancement in biotechnology has produced a massive amount of raw biological data which are accumulating at an exponential rate. Errors, redundancy and discrepancies are prevalent in the raw data, and there is a serious need for systematic approaches towards biological data cleaning. This work examines the extent of redundancy in biological data and proposes a method for detecting duplicates in biological data. Duplicate relations in a real-world biological dataset are modeled into forms of association rules so that these duplicate relations or rules can be induced from data with known duplicates using association rule mining. Our approach of using association rule induction to find duplicate relations is new. Evaluation of our method on a real-world dataset shows that our duplicate association rules can accurately identify up to 96.8 % of the duplicates in the dataset at the accuracy of 0.3 % false positives and 0.0038 % false negatives.