Results 1 - 10
of
13
Scalable Algorithms for String Kernels with Inexact Matching
"... We present a new family of linear time algorithms for string comparison with mismatches under the string kernels framework. Based on sufficient statistics, our algorithms improve theoretical complexity bounds of existing approaches while scaling well in sequence alphabet size, the number of allowed ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We present a new family of linear time algorithms for string comparison with mismatches under the string kernels framework. Based on sufficient statistics, our algorithms improve theoretical complexity bounds of existing approaches while scaling well in sequence alphabet size, the number of allowed mismatches and the size of the dataset. In particular, on large alphabets and under loose mismatch constraints our algorithms are several orders of magnitude faster than the existing algorithms for string comparison under the mismatch similarity measure. We evaluate our algorithms on synthetic data and real applications in music genre classification, protein remote homology detection and protein fold prediction. The scalability of the algorithms allows us to consider complex sequence transformations, modeled using longer string features and larger numbers of mismatches, leading to a state-of-the-art performance with significantly reduced running times. 1
DOMAC: an accurate, hybrid protein domain prediction server
, 2007
"... Protein domain prediction is important for protein structure prediction, structure determination, function annotation, mutagenesis analysis and protein engineering. Here we describe an accurate protein domain prediction server (DOMAC) combining both template-based and ab initio methods. The prelimin ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Protein domain prediction is important for protein structure prediction, structure determination, function annotation, mutagenesis analysis and protein engineering. Here we describe an accurate protein domain prediction server (DOMAC) combining both template-based and ab initio methods. The preliminary version of the server was ranked among the top domain prediction servers in the seventh edition of Critical
Conditional Graphical Models for Protein Structure Prediction
, 2005
"... It is widely believed that the protein structures play key roles in determining the functions, activity, stability and subcellular localization of the proteins, and the mechanisms of protein-protein interactions in cells. However, it is extremely labor-expensive and sometimes even impossible to expe ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
It is widely believed that the protein structures play key roles in determining the functions, activity, stability and subcellular localization of the proteins, and the mechanisms of protein-protein interactions in cells. However, it is extremely labor-expensive and sometimes even impossible to experimentally determine the structures for hundreds of thousands of protein sequences. In this thesis, we aim at designing computational methods to predict the protein structures from sequences. Since the protein structures involve many aspects, we focus on predicting the general protein structural topologies (as opposed to specific 3-D coordinates) from different levels, including secondary structures, super-secondary structures and quaternary folds for homogeneous multimers. Specifically, given a protein sequence, our goal is to predict what are the secondary structure elements, how they arrange themselves in threedimensional space, and how multiple chains associate into complexes. Traditional approaches for protein structure prediction are sequence-based, i.e. searching the database using PSI-BLAST or matching against a hidden Markov model (HMM) profile built from sequences with similar structures. These methods work well for simple conserved structures with strong sequence similarities, but fail when the similarity across proteins is poor and/or there exist long-range interactions,
Spatially-constrained sample kernel for sequence classification
"... Kernel-based learning methods provide some of the most accurate results in many sequence analysis and prediction tasks [1, 2, 4, 6]. However, the improved accuracy is often achieved at the cost of high computational complexity of training and prediction. We propose a new family of the string-based k ..."
Abstract
- Add to MetaCart
Kernel-based learning methods provide some of the most accurate results in many sequence analysis and prediction tasks [1, 2, 4, 6]. However, the improved accuracy is often achieved at the cost of high computational complexity of training and prediction. We propose a new family of the string-based kernel classification methods for the sequence analysis tasks that offer low computational cost and display the state-of-the-art performance. We illustrate our approach on protein remote homology classification problems [2, 3, 5, 7] under supervised and semi-supervised settings. In contrast to traditional string kernels, spatially-constrained sample kernels sample the sequence features at multiple resolutions, establishing the similarity measure across different scales, with potentially highly diverse mutation/insertion/deletion process. In particular, the kernels K(·, ·|k, t, d) have the following form K(X, Y |k, t, d) =
BMC Structural Biology BioMed Central Methodology article
, 2008
"... A multi-template combination algorithm for protein comparative modeling ..."
BMC Structural Biology BioMed Central
, 2009
"... Research article Exploring protein structural dissimilarity to facilitate structure classification ..."
Abstract
- Add to MetaCart
Research article Exploring protein structural dissimilarity to facilitate structure classification
Natural Computing Methods in Bioinformatics: A Survey
"... Often data analysis problems in Bioinformatics concern the fusion of multisensor outputs or the fusion of multi-source information, where one must integrate different kinds of biological data. Natural computing provides several possibilities in Bioinformatics, especially by presenting interesting na ..."
Abstract
- Add to MetaCart
Often data analysis problems in Bioinformatics concern the fusion of multisensor outputs or the fusion of multi-source information, where one must integrate different kinds of biological data. Natural computing provides several possibilities in Bioinformatics, especially by presenting interesting nature-inspired methodologies for handling such complex problems. In this article we survey the role of natural computing in the domains of protein structure prediction, microarray data analysis and gene regulatory network generation. We utilize the learning ability of neural networks for adapting, uncertainty handling capacity of fuzzy sets and rough sets for modeling ambiguity, and the search potential of genetic algorithms for efficiently traversing large search spaces.
BMC Bioinformatics BioMed Central Methodology article DescFold: A web server for protein fold recognition
, 2009
"... © 2009 Yan et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ..."
Abstract
- Add to MetaCart
© 2009 Yan et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License
proteins STRUCTURE O FUNCTION O BIOINFORMATICS Domain: Prediction Assessment of predictions submitted for the
"... This paper details the assessment process and evaluation results for the Critical Assessment of Protein Structure Prediction (CASP7) domain prediction category. Domain predictions were assessed using the Normalized Domain Overlap score introduced in CASP6 and the accuracy of prediction of domain bre ..."
Abstract
- Add to MetaCart
This paper details the assessment process and evaluation results for the Critical Assessment of Protein Structure Prediction (CASP7) domain prediction category. Domain predictions were assessed using the Normalized Domain Overlap score introduced in CASP6 and the accuracy of prediction of domain break points. The results of the analysis clearly demonstrate that the best methods are able to make consistently reliable predictions when the target has a structural template, although they are less good when the domain break occurs in a region not covered by a template. The conditions of the experiment meant that it was impossible to draw any conclusions about domain prediction for free modeling targets and it was also difficult to draw many distinctions between the best groups. Two thirds of the targets submitted were single domains and hence regarded as easy to predict. Even those targets defined as having multiple domains always had at least one domain with a similar template structure.

