Results 1 - 10
of
333
Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions
- J. MOL. BIOL
, 1997
"... We explore the ability of a simple simulated annealing procedure to assemble native-like structures from fragments of unrelated protein structures with similar local sequences using Bayesian scoring functions. Environment and residue pair specific contributions to the scoring functions appear as the ..."
Abstract
-
Cited by 190 (62 self)
- Add to MetaCart
We explore the ability of a simple simulated annealing procedure to assemble native-like structures from fragments of unrelated protein structures with similar local sequences using Bayesian scoring functions. Environment and residue pair specific contributions to the scoring functions appear as the first two terms in a series expansion for the residue probability distributions in the protein database; the decoupling of the distance and environment dependencies of the distributions resolves the major problems with current database-derived scoring functions noted by Thomas and Dill. The simulated annealing procedure rapidly and frequently generates native-like structures for small helical proteins and better than random structures for small b sheet containing proteins. Most of the simulated structures have native-like solvent accessibility and secondary structure patterns, and thus ensembles of these structures provide a particularly challenging set of decoys for evaluating scoring functions. We investigate the effects of multiple sequence information and different types of conformational constraints on the overall performance of the method, and the ability of a variety of recently developed scoring functions to recognize the native-like conformations in the ensembles of simulated structures.
Approaches to the Automatic Discovery of Patterns in Biosequences
, 1995
"... This paper is a survey of approaches and algorithms used for the automatic discovery of patterns in biosequences. Patterns with the expressive power in the class of regular languages are considered, and a classification of pattern languages in this class is developed, covering those patterns which a ..."
Abstract
-
Cited by 125 (21 self)
- Add to MetaCart
This paper is a survey of approaches and algorithms used for the automatic discovery of patterns in biosequences. Patterns with the expressive power in the class of regular languages are considered, and a classification of pattern languages in this class is developed, covering those patterns which are the most frequently used in molecular bioinformatics. A formulation is given of the problem of the automatic discovery of such patterns from a set of sequences, and an analysis presented of the ways in which an assessment can be made of the significance and usefulness of the discovered patterns. It is shown that this problem is related to problems studied in the field of machine learning. The largest part of this paper comprises a review of a number of existing methods developed to solve this problem and how these relate to each other, focusing on the algorithms underlying the approaches. A comparison is given of the algorithms, and examples are given of patterns that have been discovered...
GenTHREADER: An Efficient and Reliable Protein Fold Recognition Method for Genomic Sequences
- J. Mol. Biol
, 1999
"... Ouzounis et al., 1993; Abagyan et al., 1994; Nishikawa & Matsuo, 1994; Flo ckner et al., 1995; Lathrop & Smith, 1996; Madej et al., 1995; Fischer Eisenberg, 1996; Defay & Cohen, 1996; Russell et al., 1996). Blind testing has shown that fold recognition methods can be very effective (Shortle, 1997), ..."
Abstract
-
Cited by 118 (8 self)
- Add to MetaCart
Ouzounis et al., 1993; Abagyan et al., 1994; Nishikawa & Matsuo, 1994; Flo ckner et al., 1995; Lathrop & Smith, 1996; Madej et al., 1995; Fischer Eisenberg, 1996; Defay & Cohen, 1996; Russell et al., 1996). Blind testing has shown that fold recognition methods can be very effective (Shortle, 1997), and so it is surprising that they are not being more widely applied to genome analysis. Three problems with fold recognition methods probably contribute to their lack of use: their slowness, the requirement for human intervention to interpret the results and the inaccuracy of sequence-structure alignments produced. Different methods suffer from each of these problems to differing degrees. Of the three problems, the lack of automation in the fold recognition process is perhaps the biggest problem in the application of threading methods to genomic sequence analysis. Whilst it is reasonable to require some human intervention when predicting the structure of just a few sequences, this is clearl
Dirichlet Mixtures: A Method for Improving Detection of Weak but Significant Protein Sequence Homology
, 1996
"... This paper presents the mathematical foundations of Dirichlet mixtures, which have been used to improve database search results for homologous sequences, when a variable number of sequences from a protein family or domain are known. We present a method for condensing the information in a protein dat ..."
Abstract
-
Cited by 105 (20 self)
- Add to MetaCart
This paper presents the mathematical foundations of Dirichlet mixtures, which have been used to improve database search results for homologous sequences, when a variable number of sequences from a protein family or domain are known. We present a method for condensing the information in a protein database into a mixture of Dirichlet densities. These mixtures are designed to be combined with observed amino acid frequencies, to form estimates of expected amino acid probabilities at each position in a profile, hidden Markov model, or other statistical model. These estimates give a statistical model greater generalization capacity, such that remotely related family members can be more reliably recognized by the model. Dirichlet mixtures have been shown to outperform substitution matrices and other methods for computing these expected amino acid distributions in database search, resulting in fewer false positives and false negatives for the families tested. This paper corrects a previously p...
PROBCONS: Probabilistic consistency-based multiple sequence alignment
- Genome Res
, 2005
"... To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objec ..."
Abstract
-
Cited by 84 (5 self)
- Add to MetaCart
To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objective functions for measuring alignment quality. In this paper, we introduce prob-abilistic consistency, a novel scoring function for multiple sequence comparisons. We present PROBCONS, a practical tool for progressive protein multiple sequence alignment based on prob-abilistic consistency, and evaluate its performance on several standard alignment benchmark datasets. On the BAliBASE, SABmark, and PREFAB benchmark alignment databases, PROB-CONS achieves statistically significant improvement over other leading methods while maintain-ing practical speed. PROBCONS is publicly available as a web resource. Source code and execu-tables are available under the GNU Public License at
RSEARCH: Finding homologs of single structured RNA sequences
- BMC Bioinformatics
, 2003
"... Background: Many trans-acting noncoding RNA genes and cis-acting RNA regulatory elements conserve secondary structure rather than primary sequence. Most homology search tools only look at the primary sequence level, however. ..."
Abstract
-
Cited by 83 (0 self)
- Add to MetaCart
Background: Many trans-acting noncoding RNA genes and cis-acting RNA regulatory elements conserve secondary structure rather than primary sequence. Most homology search tools only look at the primary sequence level, however.
Hmmstr: a hidden markov model for local sequence-structure correlations in proteins
- Journal of Molecular Biology
, 2000
"... *Corresponding authors ..."
Gibbs motif sampling: detection of bacterial outer membrane protein repeats
- Protein Science
, 1995
"... The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif ..."
Abstract
-
Cited by 76 (10 self)
- Add to MetaCart
The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of im-munoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to clas-sify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sam-pler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statisti-cal test for motifs described here). Analysis of bacterial porins with known trimeric 0-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning 0-strands. These &strands occur on the membrane interface (as opposed to the trimeric interface) of the &barrel. The broad con-servation and structural location of these repeats suggests that they play important functional roles.
Position-based sequence weights
- J. Mol. Biol
, 1994
"... Sequence weighting methods have been used to reduce redundancy and emphasize diversity in multiple sequence alignment and searching applications. Each of these methods is based on a notion of distance between a sequence and an ancestral or generalized sequence. We describe a different approach, whic ..."
Abstract
-
Cited by 73 (3 self)
- Add to MetaCart
Sequence weighting methods have been used to reduce redundancy and emphasize diversity in multiple sequence alignment and searching applications. Each of these methods is based on a notion of distance between a sequence and an ancestral or generalized sequence. We describe a different approach, which bases weights on the diversity observed at each position in the alignment, rather than on a sequence distance measure. These position-based weights make minimal assumptions, are simple to compute, and perform well in comprehensive evaluations. Redundancy is a common feature of sequence databanks, where a typical gene or protein family is represented by a highly non-random sample of sequences. For example, an ancient protein family might be represented by a few highly diverged microbial and invertebrate sequences plus many mammalian sequences that form a closely related subgroup. This situation can be detrimental in sequence alignment and searching applications, where it is usually desirable to represent the diversity among related sequences. Since closely related sequences are largely redundant, they provide less information in a multiple sequence alignment than their distant cousins. Sequence weighting methods have been introduced to compensate for over-representation
Structure Comparison and Structure Patterns
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 1999
"... This article investigate different aspects regarding pairwise and multiple structure comparison, and the problem of automatically discover common patterns in a set of structures. Descriptions and representation of structures and patterns are investigated, as well as scoring and algorithms for com ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
This article investigate different aspects regarding pairwise and multiple structure comparison, and the problem of automatically discover common patterns in a set of structures. Descriptions and representation of structures and patterns are investigated, as well as scoring and algorithms for comparison and discovery. A framework and nomenclature is developed, and a lot of methods are reviewed and placed into this framework.

