Results 1  10
of
10
Stochastic pairwise alignments
 Bioinformatics
, 2002
"... Motivation: The level of sequence conservation between related nucleic acids or proteins often varies considerably along the sequence. Both regions with high variability (mutational hotspots) and regions of almost perfect sequence identity may occur in the same pair of molecules. The reliability of ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
Motivation: The level of sequence conservation between related nucleic acids or proteins often varies considerably along the sequence. Both regions with high variability (mutational hotspots) and regions of almost perfect sequence identity may occur in the same pair of molecules. The reliability of an alignment therefore strongly depends on the level of local sequence similarity. Especially in regions of high variability, many alignments of almost equal quality exist, and the optimal alignment is highly arbitrary. Results: We discuss two approaches which deal with the inherent ambiguity of the alignment problem based on the computation of the partition function over all canonical pairwise alignments. The ensemble of possible alignments can be described by the probabilities Pij of a match between position i in the first and position j in the second sequence. Alternatively, we introduce a probabilistic backtracking procedure that generates ensembles of suboptimal alignments with correct statistical weights. A comparison between structure based alignments and large samples of stochastic alignments shows that the ensemble contains correct alignments with significant probabilities even though the optimal alignment deviates significantly from the structural alignment. Ensembles of suboptimal alignments obtained by stochastic backtracking can be used as input to any bioinformatics method based on pairwise alignment in order to gain reliability information not available from a single optimal alignment. Availability: The software described in this contribution is available for downloading at
Calibrating Evalues for hidden Markov models with reversesequence null models
 Bioinformatics
, 2005
"... Motivation: Hidden Markov models (hmms) calculate the probability that a sequence was generated by a given model. Logodds scoring provides a context for evaluating this probability, by considering it in relation to a null hypothesis. We have found that using a reversesequence null model effectivel ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Motivation: Hidden Markov models (hmms) calculate the probability that a sequence was generated by a given model. Logodds scoring provides a context for evaluating this probability, by considering it in relation to a null hypothesis. We have found that using a reversesequence null model effectively removes biases due to sequence length and composition and reduces the number of false positives in a database search. Any scoring system is an arbitrary measure of the quality of database matches. Significance estimates of scores are essential, because they eliminate model and methoddependent
The partition function variant of Sankoff’s algorithm
 In ICCS 2004 Proceedings
, 2004
"... Abstract. Many classes of functional RNA molcules are characterized by highly conserved secondary structures but little detectable sequence similarity. Reliable multiple alignments can therefore be constructed only when the shared structural features are taken into account. Sankoff's algorithm can b ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Abstract. Many classes of functional RNA molcules are characterized by highly conserved secondary structures but little detectable sequence similarity. Reliable multiple alignments can therefore be constructed only when the shared structural features are taken into account. Sankoff's algorithm can be used to construct such structurebased alignments of RNA sequences in polynomial time. Here we extend the approach to a probabilistic one by explicitly computing the partition function of all pairwisely aligned sequences with a common set of base pairs. Stochastic backtracking can then be used to compute e.g. the probability that a prescribed sequencestructure pattern is conserved between two RNA sequences. The reliability of the alignment itself can be assessed in terms of the probabilities of each possible match. 1 Introduction Sankoff's algorithm [1] simulateneously predicts a consensus structure for two(or, in its general version, more) RNA secondary structure and at the same time constructs their alignment. It is quite expensive in both CPU and memory requirements, O( N 6) and O(N 4), respectively. A further complication isthat it requires the implementation of the full loopbased RNA energy model
Local sequence alignments statistics: Deviations from Gumbel statistics in the rareevent tail
, 2007
"... ..."
ESTIMATING THE GUMBEL SCALE PARAMETER FOR LOCAL ALIGNMENT OF RANDOM SEQUENCES BY IMPORTANCE SAMPLING WITH STOPPING TIMES
, 909
"... The gapped local alignment score of two random sequences follows a Gumbel distribution. If computers could estimate the parameters of the Gumbel distribution within one second, the use of arbitrary alignment scoring schemes could increase the sensitivity of searching biological sequence databases ov ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
The gapped local alignment score of two random sequences follows a Gumbel distribution. If computers could estimate the parameters of the Gumbel distribution within one second, the use of arbitrary alignment scoring schemes could increase the sensitivity of searching biological sequence databases over the web. Accordingly, this article gives a novel equation for the scale parameter of the relevant Gumbel distribution. We speculate that the equation is exact, although present numerical evidence is limited. The equation involves ascending ladder variates in the global alignment of random sequences. In global alignment simulations, the ladder variates yield stopping times specifying random sequence lengths. Because of the random lengths, and because our trial distribution for importance sampling occurs on a different sample space from our target distribution, our study led to a mapping theorem, which led naturally in turn to an efficient dynamic programming algorithm for the importance sampling weights. Numerical studies using several popular alignment scoring schemes then examined the efficiency and accuracy of the resulting simulations. 1. Introduction. Sequence
From Protein Interactions to Functional Annotation: Graph Alignment in Herpes
, 2008
"... Sequence alignment forms the basis of many methods for functional annotation by phylogenetic comparison, but becomes unreliable in the “twilight ” regions of high sequence divergence and short gene length. Here we perform a crossspecies comparison of two herpesviruses, VZV and KSHV, with a hybrid m ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Sequence alignment forms the basis of many methods for functional annotation by phylogenetic comparison, but becomes unreliable in the “twilight ” regions of high sequence divergence and short gene length. Here we perform a crossspecies comparison of two herpesviruses, VZV and KSHV, with a hybrid method called graph alignment. The method is based jointly on the similarity of protein interaction networks and on sequence similarity. In our alignment, we find open reading frames for which interaction similarity concurs with a low level of sequence similarity, thus confirming the evolutionary relationship. In addition, we find high levels of interaction similarity between open reading frames without any detectable sequence similarity. The functional predictions derived from this alignment are consistent with genomic position and gene expression data.
150 No. 75 Yu andHwa The Statistics of SemiProbabilistic Alignment
"... Computerassisted sequence comparison has become an integral part of modern molecular biology. Two types of algorithms have been used: those which search for the optimal alignment (as exemplified by the SmithWaterman algorithm [1]), and those which identify likely alignments (as exemplified by the ..."
Abstract
 Add to MetaCart
Computerassisted sequence comparison has become an integral part of modern molecular biology. Two types of algorithms have been used: those which search for the optimal alignment (as exemplified by the SmithWaterman algorithm [1]), and those which identify likely alignments (as exemplified by the HMMbased “Sequence Alignment Modules ” [2]). In each case, the quality of alignment is
unknown title
, 2005
"... The Gumbel prefactor k for gapped local alignment can be estimated from simulations of global alignment ..."
Abstract
 Add to MetaCart
The Gumbel prefactor k for gapped local alignment can be estimated from simulations of global alignment
unknown title
"... Vol. 24 ISMB 2008, pages i15–i23 doi:10.1093/bioinformatics/btn171 The effectiveness of position and compositionspecific gap costs for protein similarity searches ..."
Abstract
 Add to MetaCart
Vol. 24 ISMB 2008, pages i15–i23 doi:10.1093/bioinformatics/btn171 The effectiveness of position and compositionspecific gap costs for protein similarity searches
A Global Credibility Measure in Pairwise Sequence Alignment
, 2008
"... Abstract – This project is an attempt to evaluate the credibility limits from the pairwise sequence alignment of orthologous human and rodent gene sequence pairs through a modified implementation of BALSA(Bayesian algorithm for local sequence alignment), which includes centroid alignment, and hammin ..."
Abstract
 Add to MetaCart
Abstract – This project is an attempt to evaluate the credibility limits from the pairwise sequence alignment of orthologous human and rodent gene sequence pairs through a modified implementation of BALSA(Bayesian algorithm for local sequence alignment), which includes centroid alignment, and hamming distance not in BALSA as well as sampling alignments which already were implemented by Webb (2001). The currently tested data set is a group of upstream DNA sequences of 24 pairs of orthologous human and rodent genes. Key words: credibility limit, centroid, sampling. I.