Results 1 - 10
of
28
RSEARCH: Finding homologs of single structured RNA sequences
- BMC Bioinformatics
, 2003
"... Background: Many trans-acting noncoding RNA genes and cis-acting RNA regulatory elements conserve secondary structure rather than primary sequence. Most homology search tools only look at the primary sequence level, however. ..."
Abstract
-
Cited by 83 (0 self)
- Add to MetaCart
Background: Many trans-acting noncoding RNA genes and cis-acting RNA regulatory elements conserve secondary structure rather than primary sequence. Most homology search tools only look at the primary sequence level, however.
Efficient Tree-Matching Methods for Accurate Carbohydrate Database Queries
- Genome Informatics
, 2003
"... One aspect of glycome informatics is the analysis of carbohydrate sugar chains, or glycans, whose basic structure is not a sequence, but a tree structure. Although there has been much work in the development of sequence databases and matching algorithms for sequences (for performing queries and anal ..."
Abstract
-
Cited by 19 (10 self)
- Add to MetaCart
One aspect of glycome informatics is the analysis of carbohydrate sugar chains, or glycans, whose basic structure is not a sequence, but a tree structure. Although there has been much work in the development of sequence databases and matching algorithms for sequences (for performing queries and analyzing similarity), the more complicated tree structure of glycans does not allow a direct implementation of such a database for glycans, and further, does not allow for the direct application of sequence alignment algorithms for performing searches or analyzing similarity. Therefore, we have utilized...
Statistical Significance of Probabilistic Sequence Alignment and Related Local Hidden Markov Models
- J. COMP. BIOL
, 2001
"... The score statistics of probabilistic gapped local alignment of random sequences is investigated both analytically and numerically. The full probabilistic algorithm (e.g., the “local” version of maximum-likelihood or hidden Markov model method) is found to have anomalous statistics. A modified “semi ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
The score statistics of probabilistic gapped local alignment of random sequences is investigated both analytically and numerically. The full probabilistic algorithm (e.g., the “local” version of maximum-likelihood or hidden Markov model method) is found to have anomalous statistics. A modified “semi-probabilistic” alignment consisting of a hybrid of Smith–Waterman and probabilistic alignment is then proposed and studied in detail. It is predicted that the score statistics of the hybrid algorithm is of the Gumbel universal form, with the key Gumbel parameter l taking on a fixed asymptotic value for a wide variety of scoring systems and parameters. A simple recipe for the computation of the “relative entropy,” and from it the finite size correction to l, is also given. These predictions compare well with direct numerical simulations for sequences of lengths between 100 and 1,000 examined using various PAM substitution scores and affine gap functions. The sensitivity of the hybrid method in the detection of sequence homology is also studied using correlated sequences generated from toy mutation models. It is found to be comparable to that of the Smith–Waterman alignment and significantly better than the Viterbi version of the probabilistic alignment.
Rapid Significance Estimation in Local Sequence Alignment with Gaps
, 2001
"... In order to assess the significance of sequence alignments it is crucial to know the distribution of alignment scores of pairs of random sequences. For gapped local alignment it is empirically known that the shape of this distribution is of the Gumbel form. However, the determination of the paramete ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
In order to assess the significance of sequence alignments it is crucial to know the distribution of alignment scores of pairs of random sequences. For gapped local alignment it is empirically known that the shape of this distribution is of the Gumbel form. However, the determination of the parameters of this distribution is a computationally very expensive task. We present a new algorithmic approach which allows to estimate the more important of the Gumbel parameters at least five times faster than the traditional methods. Actual runtimes of our algorithm between less than a second and a few minutes on a workstation bring significance estimation into the realm of interactive applications.
Bootstrapping and Normalization for Enhanced Evaluations of Pairwise Sequence Comparison
- PROC. IEEE
, 2002
"... ..."
Statistics of local multiple alignments
- BIOINFORMATICS
, 2005
"... Summary: BLAST statistics have been shown to be extremely useful for searching for significant similarity hits, for amino acid and nucleotide sequences. Although these statistics are well understood for pairwise comparisons, there has been little success developing statistical scores for multiple al ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Summary: BLAST statistics have been shown to be extremely useful for searching for significant similarity hits, for amino acid and nucleotide sequences. Although these statistics are well understood for pairwise comparisons, there has been little success developing statistical scores for multiple alignments. In particular, there is no score for multiple alignment that is well founded and treated as a standard. We extend the BLAST theory to multiple alignments. Following some simple assumptions, we present and justify a significance score for multiple segments of a local multiple alignment. We demonstrate its usefulness in distinguishing high and moderate quality multiple alignments from low quality ones, with supporting experiments on orthologous vertebrate promoter sequences.
Hybrid Alignment: High-Performance with Universal Statistics
- Bioinformatics
, 2002
"... The score statistics of a recently introduced "hybrid alignment" algorithm is studied in detail numerically. An extensive survey across the 2; 216 models of protein domains contained in the Pfam v5.4 database (Bateman et al. 2000. Nucl. Acid Res. 28:263-266) verifies the theoretical predictions: For ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
The score statistics of a recently introduced "hybrid alignment" algorithm is studied in detail numerically. An extensive survey across the 2; 216 models of protein domains contained in the Pfam v5.4 database (Bateman et al. 2000. Nucl. Acid Res. 28:263-266) verifies the theoretical predictions: For the position-specific scoring functions used in the Pfam models, the score statistics of hybrid alignment obey the Gumbel distribution, with the key Gumbel parameter taking on the asymptotic value 1 universally for all models. Thus, the use of hybrid alignment eliminates the time-consuming computer simulations normally needed to assign p-values to alignment scores. The performance of the hybrid algorithm in detecting sequence homology is also studied, using protein sequences from the SCOP (Murzin et al. 1995. J. Mol. Biol. 247:536-540) and PfamA databases. The performance is found to be comparable to the best of the existing methods. Hybrid alignment is thereby established as a high performance alignment algorithm with well-characterized, universal statistics.
Local sequence alignments statistics: Deviationsfrom gumbel statistics in the rare-event tail. To be published, 2006. [23
- in Figure 5 in the body of the paper. Query LS * 104*2 S0 P08100 100 0.187(2) 2.96(0) 33.2(0) LQ = 348
, 2003
"... This is an Open Access article distributed under the terms of the Creative Commons Attribution License ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This is an Open Access article distributed under the terms of the Creative Commons Attribution License
A NEW GENERATION OF HOMOLOGY SEARCH TOOLS BASED ON PROBABILISTIC INFERENCE
, 2009
"... Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST’s programs are about 100-fold faster tha ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST’s programs are about 100-fold faster than the fastest competing implementations of probabilistic inference methods. I describe recent work on the HMMER software suite for protein sequence analysis, which implements probabilistic inference using profile hidden Markov models. Our aim in HMMER3 is to achieve BLAST’s speed while further improving the power of probabilistic inference based methods. HMMER3 implements a new probabilistic model of local sequence alignment and a new heuristic acceleration algorithm. Combined with efficient vector-parallel implementations on modern processors, these improvements synergize. HMMER3 uses more powerful log-odds likelihood scores (scores summed over alignment uncertainty, rather than scoring a single optimal alignment); it calculates accurate expectation values (E-values) for those scores without simulation using a generalization of Karlin/Altschul theory; it computes posterior distributions over the ensemble of possible alignments and returns posterior probabilities (confidences) in each aligned residue; and it does all this at an overall speed comparable to BLAST. The HMMER project aims to usher in a new generation of more powerful homology search tools based on probabilistic inference methods.

