Results 1  10
of
39
RSEARCH: Finding homologs of single structured RNA sequences
 BMC Bioinformatics
, 2003
"... Background: Many transacting noncoding RNA genes and cisacting RNA regulatory elements conserve secondary structure rather than primary sequence. Most homology search tools only look at the primary sequence level, however. ..."
Abstract

Cited by 121 (1 self)
 Add to MetaCart
Background: Many transacting noncoding RNA genes and cisacting RNA regulatory elements conserve secondary structure rather than primary sequence. Most homology search tools only look at the primary sequence level, however.
The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res
, 2003
"... The CATH database of protein domain structures ..."
A NEW GENERATION OF HOMOLOGY SEARCH TOOLS BASED ON PROBABILISTIC INFERENCE
, 2009
"... Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST’s programs are about 100fold faster tha ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST’s programs are about 100fold faster than the fastest competing implementations of probabilistic inference methods. I describe recent work on the HMMER software suite for protein sequence analysis, which implements probabilistic inference using profile hidden Markov models. Our aim in HMMER3 is to achieve BLAST’s speed while further improving the power of probabilistic inference based methods. HMMER3 implements a new probabilistic model of local sequence alignment and a new heuristic acceleration algorithm. Combined with efficient vectorparallel implementations on modern processors, these improvements synergize. HMMER3 uses more powerful logodds likelihood scores (scores summed over alignment uncertainty, rather than scoring a single optimal alignment); it calculates accurate expectation values (Evalues) for those scores without simulation using a generalization of Karlin/Altschul theory; it computes posterior distributions over the ensemble of possible alignments and returns posterior probabilities (confidences) in each aligned residue; and it does all this at an overall speed comparable to BLAST. The HMMER project aims to usher in a new generation of more powerful homology search tools based on probabilistic inference methods.
Efficient TreeMatching Methods for Accurate Carbohydrate Database Queries
 Genome Informatics
, 2003
"... One aspect of glycome informatics is the analysis of carbohydrate sugar chains, or glycans, whose basic structure is not a sequence, but a tree structure. Although there has been much work in the development of sequence databases and matching algorithms for sequences (for performing queries and anal ..."
Abstract

Cited by 23 (12 self)
 Add to MetaCart
One aspect of glycome informatics is the analysis of carbohydrate sugar chains, or glycans, whose basic structure is not a sequence, but a tree structure. Although there has been much work in the development of sequence databases and matching algorithms for sequences (for performing queries and analyzing similarity), the more complicated tree structure of glycans does not allow a direct implementation of such a database for glycans, and further, does not allow for the direct application of sequence alignment algorithms for performing searches or analyzing similarity. Therefore, we have utilized...
Rapid Significance Estimation in Local Sequence Alignment with Gaps
, 2001
"... In order to assess the significance of sequence alignments it is crucial to know the distribution of alignment scores of pairs of random sequences. For gapped local alignment it is empirically known that the shape of this distribution is of the Gumbel form. However, the determination of the paramete ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
In order to assess the significance of sequence alignments it is crucial to know the distribution of alignment scores of pairs of random sequences. For gapped local alignment it is empirically known that the shape of this distribution is of the Gumbel form. However, the determination of the parameters of this distribution is a computationally very expensive task. We present a new algorithmic approach which allows to estimate the more important of the Gumbel parameters at least five times faster than the traditional methods. Actual runtimes of our algorithm between less than a second and a few minutes on a workstation bring significance estimation into the realm of interactive applications.
Bootstrapping and Normalization for Enhanced Evaluations of Pairwise Sequence Comparison
 PROC. IEEE
, 2002
"... ..."
Statistical Significance of Probabilistic Sequence Alignment and Related Local Hidden Markov Models
 J. COMP. BIOL
, 2001
"... The score statistics of probabilistic gapped local alignment of random sequences is investigated both analytically and numerically. The full probabilistic algorithm (e.g., the “local” version of maximumlikelihood or hidden Markov model method) is found to have anomalous statistics. A modified “semi ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
The score statistics of probabilistic gapped local alignment of random sequences is investigated both analytically and numerically. The full probabilistic algorithm (e.g., the “local” version of maximumlikelihood or hidden Markov model method) is found to have anomalous statistics. A modified “semiprobabilistic” alignment consisting of a hybrid of Smith–Waterman and probabilistic alignment is then proposed and studied in detail. It is predicted that the score statistics of the hybrid algorithm is of the Gumbel universal form, with the key Gumbel parameter l taking on a fixed asymptotic value for a wide variety of scoring systems and parameters. A simple recipe for the computation of the “relative entropy,” and from it the finite size correction to l, is also given. These predictions compare well with direct numerical simulations for sequences of lengths between 100 and 1,000 examined using various PAM substitution scores and affine gap functions. The sensitivity of the hybrid method in the detection of sequence homology is also studied using correlated sequences generated from toy mutation models. It is found to be comparable to that of the Smith–Waterman alignment and significantly better than the Viterbi version of the probabilistic alignment.
Statistics of local multiple alignments
 BIOINFORMATICS
, 2005
"... Summary: BLAST statistics have been shown to be extremely useful for searching for significant similarity hits, for amino acid and nucleotide sequences. Although these statistics are well understood for pairwise comparisons, there has been little success developing statistical scores for multiple al ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
Summary: BLAST statistics have been shown to be extremely useful for searching for significant similarity hits, for amino acid and nucleotide sequences. Although these statistics are well understood for pairwise comparisons, there has been little success developing statistical scores for multiple alignments. In particular, there is no score for multiple alignment that is well founded and treated as a standard. We extend the BLAST theory to multiple alignments. Following some simple assumptions, we present and justify a significance score for multiple segments of a local multiple alignment. We demonstrate its usefulness in distinguishing high and moderate quality multiple alignments from low quality ones, with supporting experiments on orthologous vertebrate promoter sequences.
Pairwise Statistical Significance of Local Sequence Alignment Using Substitution Matrices with SequencePairSpecific Distance
 Proc. Int’l Conf. Information Technology, (ICIT ’08
, 2008
"... Abstract—Pairwise sequence alignment is a central problem in bioinformatics, which forms the basis of various other applications. Two related sequences are expected to have a high alignment score, but relatedness is usually judged by statistical significance rather than by alignment score. Recently, ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
Abstract—Pairwise sequence alignment is a central problem in bioinformatics, which forms the basis of various other applications. Two related sequences are expected to have a high alignment score, but relatedness is usually judged by statistical significance rather than by alignment score. Recently, it was shown that pairwise statistical significance gives promising results as an alternative to database statistical significance for getting individual significance estimates of pairwise alignment scores. The improvement was mainly attributed to making the statistical significance estimation process more sequencespecific and databaseindependent. In this paper, we use sequencespecific and positionspecific substitution matrices to derive the estimates of pairwise statistical significance, which is expected to use more sequencespecific information in estimating pairwise statistical significance. Experiments on a benchmark database with sequencespecific substitution matrices at different levels of sequencespecific contribution were conducted, and results confirm that using sequencespecific substitution matrices for estimating pairwise statistical significance is significantly better than using a standard matrix like BLOSUM62, and than database statistical significance estimates reported by popular database search programs like BLAST, PSIBLAST (without pretrained PSSMs), and SSEARCH on a benchmark database, but with pretrained PSSMs, PSIBLAST results are significantly better. Further, using positionspecific substitution matrices for estimating pairwise statistical significance gives significantly better results even than PSIBLAST using pretrained PSSMs. Index Terms—Database statistical significance, homologs, pairwise statistical significance, positionspecific scoring matrices (PSSMs), sequence alignment, substitution matrices. Ç 1
Hybrid Alignment: HighPerformance with Universal Statistics
 Bioinformatics
, 2002
"... The score statistics of a recently introduced "hybrid alignment" algorithm is studied in detail numerically. An extensive survey across the 2; 216 models of protein domains contained in the Pfam v5.4 database (Bateman et al. 2000. Nucl. Acid Res. 28:263266) verifies the theoretical predictions: For ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
The score statistics of a recently introduced "hybrid alignment" algorithm is studied in detail numerically. An extensive survey across the 2; 216 models of protein domains contained in the Pfam v5.4 database (Bateman et al. 2000. Nucl. Acid Res. 28:263266) verifies the theoretical predictions: For the positionspecific scoring functions used in the Pfam models, the score statistics of hybrid alignment obey the Gumbel distribution, with the key Gumbel parameter taking on the asymptotic value 1 universally for all models. Thus, the use of hybrid alignment eliminates the timeconsuming computer simulations normally needed to assign pvalues to alignment scores. The performance of the hybrid algorithm in detecting sequence homology is also studied, using protein sequences from the SCOP (Murzin et al. 1995. J. Mol. Biol. 247:536540) and PfamA databases. The performance is found to be comparable to the best of the existing methods. Hybrid alignment is thereby established as a high performance alignment algorithm with wellcharacterized, universal statistics.