Results 1  10
of
73
RSEARCH: Finding homologs of single structured RNA sequences
 BMC Bioinformatics
, 2003
"... Background: Many transacting noncoding RNA genes and cisacting RNA regulatory elements conserve secondary structure rather than primary sequence. Most homology search tools only look at the primary sequence level, however. ..."
Abstract

Cited by 131 (2 self)
 Add to MetaCart
Background: Many transacting noncoding RNA genes and cisacting RNA regulatory elements conserve secondary structure rather than primary sequence. Most homology search tools only look at the primary sequence level, however.
The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res
, 2003
"... The CATH database of protein domain structures ..."
(Show Context)
A NEW GENERATION OF HOMOLOGY SEARCH TOOLS BASED ON PROBABILISTIC INFERENCE
, 2009
"... Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST’s programs are about 100fold faster tha ..."
Abstract

Cited by 36 (2 self)
 Add to MetaCart
(Show Context)
Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST’s programs are about 100fold faster than the fastest competing implementations of probabilistic inference methods. I describe recent work on the HMMER software suite for protein sequence analysis, which implements probabilistic inference using profile hidden Markov models. Our aim in HMMER3 is to achieve BLAST’s speed while further improving the power of probabilistic inference based methods. HMMER3 implements a new probabilistic model of local sequence alignment and a new heuristic acceleration algorithm. Combined with efficient vectorparallel implementations on modern processors, these improvements synergize. HMMER3 uses more powerful logodds likelihood scores (scores summed over alignment uncertainty, rather than scoring a single optimal alignment); it calculates accurate expectation values (Evalues) for those scores without simulation using a generalization of Karlin/Altschul theory; it computes posterior distributions over the ensemble of possible alignments and returns posterior probabilities (confidences) in each aligned residue; and it does all this at an overall speed comparable to BLAST. The HMMER project aims to usher in a new generation of more powerful homology search tools based on probabilistic inference methods.
Approximate pvalues for local sequence alignments
 Ann. Statist
, 2000
"... Siegmund and Yakir (2000) have given an approximate pvalue when two independent, identically distributed sequences from a nite alphabet are optimally aligned based on a scoring system that rewards similarities according to a general scoring matrix and penalizes gaps (insertions and deletions). The ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
Siegmund and Yakir (2000) have given an approximate pvalue when two independent, identically distributed sequences from a nite alphabet are optimally aligned based on a scoring system that rewards similarities according to a general scoring matrix and penalizes gaps (insertions and deletions). The approximation involves an innite sequence of difculttocompute parameters. In this paper, it is shown by numerical studies that these reduce to essentially two numerically distinct parameters, which can be computed as onedimensional numerical integrals. For an arbitrary scoring matrix and afne gap penalty, this modied approximation is easily evaluated. Comparison with published numerical results show that it is reasonably accurate. Key words: local alignment, afne gap penalty, pvalue, Markov renewal theory. 1.
Efficient TreeMatching Methods for Accurate Carbohydrate Database Queries
 Genome Informatics
, 2003
"... One aspect of glycome informatics is the analysis of carbohydrate sugar chains, or glycans, whose basic structure is not a sequence, but a tree structure. Although there has been much work in the development of sequence databases and matching algorithms for sequences (for performing queries and anal ..."
Abstract

Cited by 25 (13 self)
 Add to MetaCart
(Show Context)
One aspect of glycome informatics is the analysis of carbohydrate sugar chains, or glycans, whose basic structure is not a sequence, but a tree structure. Although there has been much work in the development of sequence databases and matching algorithms for sequences (for performing queries and analyzing similarity), the more complicated tree structure of glycans does not allow a direct implementation of such a database for glycans, and further, does not allow for the direct application of sequence alignment algorithms for performing searches or analyzing similarity. Therefore, we have utilized...
Rapid Significance Estimation in Local Sequence Alignment with Gaps
, 2001
"... In order to assess the significance of sequence alignments it is crucial to know the distribution of alignment scores of pairs of random sequences. For gapped local alignment it is empirically known that the shape of this distribution is of the Gumbel form. However, the determination of the paramete ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
In order to assess the significance of sequence alignments it is crucial to know the distribution of alignment scores of pairs of random sequences. For gapped local alignment it is empirically known that the shape of this distribution is of the Gumbel form. However, the determination of the parameters of this distribution is a computationally very expensive task. We present a new algorithmic approach which allows to estimate the more important of the Gumbel parameters at least five times faster than the traditional methods. Actual runtimes of our algorithm between less than a second and a few minutes on a workstation bring significance estimation into the realm of interactive applications.
Bootstrapping and Normalization for Enhanced Evaluations of Pairwise Sequence Comparison
 PROC. IEEE
, 2002
"... ..."
Statistical Significance of Probabilistic Sequence Alignment and Related Local Hidden Markov Models
 J. COMP. BIOL
, 2001
"... The score statistics of probabilistic gapped local alignment of random sequences is investigated both analytically and numerically. The full probabilistic algorithm (e.g., the “local” version of maximumlikelihood or hidden Markov model method) is found to have anomalous statistics. A modified “semi ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
The score statistics of probabilistic gapped local alignment of random sequences is investigated both analytically and numerically. The full probabilistic algorithm (e.g., the “local” version of maximumlikelihood or hidden Markov model method) is found to have anomalous statistics. A modified “semiprobabilistic” alignment consisting of a hybrid of Smith–Waterman and probabilistic alignment is then proposed and studied in detail. It is predicted that the score statistics of the hybrid algorithm is of the Gumbel universal form, with the key Gumbel parameter l taking on a fixed asymptotic value for a wide variety of scoring systems and parameters. A simple recipe for the computation of the “relative entropy,” and from it the finite size correction to l, is also given. These predictions compare well with direct numerical simulations for sequences of lengths between 100 and 1,000 examined using various PAM substitution scores and affine gap functions. The sensitivity of the hybrid method in the detection of sequence homology is also studied using correlated sequences generated from toy mutation models. It is found to be comparable to that of the Smith–Waterman alignment and significantly better than the Viterbi version of the probabilistic alignment.
Statistics of local multiple alignments
 BIOINFORMATICS
, 2005
"... Summary: BLAST statistics have been shown to be extremely useful for searching for significant similarity hits, for amino acid and nucleotide sequences. Although these statistics are well understood for pairwise comparisons, there has been little success developing statistical scores for multiple al ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
Summary: BLAST statistics have been shown to be extremely useful for searching for significant similarity hits, for amino acid and nucleotide sequences. Although these statistics are well understood for pairwise comparisons, there has been little success developing statistical scores for multiple alignments. In particular, there is no score for multiple alignment that is well founded and treated as a standard. We extend the BLAST theory to multiple alignments. Following some simple assumptions, we present and justify a significance score for multiple segments of a local multiple alignment. We demonstrate its usefulness in distinguishing high and moderate quality multiple alignments from low quality ones, with supporting experiments on orthologous vertebrate promoter sequences.
Hybrid Alignment: HighPerformance with Universal Statistics
 Bioinformatics
, 2002
"... The score statistics of a recently introduced "hybrid alignment" algorithm is studied in detail numerically. An extensive survey across the 2; 216 models of protein domains contained in the Pfam v5.4 database (Bateman et al. 2000. Nucl. Acid Res. 28:263266) verifies the theoretical predic ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
(Show Context)
The score statistics of a recently introduced "hybrid alignment" algorithm is studied in detail numerically. An extensive survey across the 2; 216 models of protein domains contained in the Pfam v5.4 database (Bateman et al. 2000. Nucl. Acid Res. 28:263266) verifies the theoretical predictions: For the positionspecific scoring functions used in the Pfam models, the score statistics of hybrid alignment obey the Gumbel distribution, with the key Gumbel parameter taking on the asymptotic value 1 universally for all models. Thus, the use of hybrid alignment eliminates the timeconsuming computer simulations normally needed to assign pvalues to alignment scores. The performance of the hybrid algorithm in detecting sequence homology is also studied, using protein sequences from the SCOP (Murzin et al. 1995. J. Mol. Biol. 247:536540) and PfamA databases. The performance is found to be comparable to the best of the existing methods. Hybrid alignment is thereby established as a high performance alignment algorithm with wellcharacterized, universal statistics.