Results 1 - 10
of
179
A greedy algorithm for aligning DNA sequences
- J. COMPUT. BIOL
, 2000
"... For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy a ..."
Abstract
-
Cited by 133 (6 self)
- Add to MetaCart
For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information.
Optimal alignments in linear space
- CABIOS
, 1988
"... Space, not time, is often the limiting factor when computing optimal sequence alignments, and a number of recent papers in the biology literature have proposed space-saving strategies. However, a 1975 computer science paper by Hirschberg presented a method that is superior to the newer proposals, bo ..."
Abstract
-
Cited by 130 (3 self)
- Add to MetaCart
Space, not time, is often the limiting factor when computing optimal sequence alignments, and a number of recent papers in the biology literature have proposed space-saving strategies. However, a 1975 computer science paper by Hirschberg presented a method that is superior to the newer proposals, both in theory and in practice. The goal of this note is to give Hirschberg’s idea the visibility it deserves by developing a linear-space version of Gotoh’s algorithm, which accommodates affine gap penalties. A portable C-software package implementing this algorithm is available on the BIONET free of charge.
PROBCONS: Probabilistic consistency-based multiple sequence alignment
- Genome Res
, 2005
"... To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objec ..."
Abstract
-
Cited by 84 (5 self)
- Add to MetaCart
To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objective functions for measuring alignment quality. In this paper, we introduce prob-abilistic consistency, a novel scoring function for multiple sequence comparisons. We present PROBCONS, a practical tool for progressive protein multiple sequence alignment based on prob-abilistic consistency, and evaluate its performance on several standard alignment benchmark datasets. On the BAliBASE, SABmark, and PREFAB benchmark alignment databases, PROB-CONS achieves statistically significant improvement over other leading methods while maintain-ing practical speed. PROBCONS is publicly available as a web resource. Source code and execu-tables are available under the GNU Public License at
A New Algorithm for the Alignment of Phonetic Sequences
, 2000
"... Alignment of phonetic sequences is a necessary step in many applications in computational phonology. After discussing various approaches to phonetic alignment, I present a new algorithm that combines a number of techniques developed for sequence comparison with a scoring scheme for computing phoneti ..."
Abstract
-
Cited by 40 (8 self)
- Add to MetaCart
Alignment of phonetic sequences is a necessary step in many applications in computational phonology. After discussing various approaches to phonetic alignment, I present a new algorithm that combines a number of techniques developed for sequence comparison with a scoring scheme for computing phonetic similarity on the basis of multivalued features. The algorithm performs better on cognate alignment, in terms of accuracy and efficiency, than other algorithms reported in the literature.
Fast evaluation of internal loops in RNA secondary structure prediction
, 1999
"... Motivation: Though not as abundant in known biological processes as proteins, RNA molecules serve as more than mere intermediaries between DNA and proteins. Research in the last 15 years demonstrates that RNA molecules serve in many roles, including catalysis. Furthermore, RNA secondary structure pr ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
Motivation: Though not as abundant in known biological processes as proteins, RNA molecules serve as more than mere intermediaries between DNA and proteins. Research in the last 15 years demonstrates that RNA molecules serve in many roles, including catalysis. Furthermore, RNA secondary structure prediction based on free energy rules for stacking and loop formation remains one of the few major breakthroughs in the eld of structure prediction, as minimum free energy structures and related quantities can be computed with full mathematical rigor. However, with the current energy parameters, the algorithms used hitherto suer the disadvantage of either employing heuristics that risk (though highly unlikely) missing the optimal structure or becoming prohibitively time-consuming for moderate to large sequences. Results: We present a new method to evaluate internal loops utilizing currently used energy rules. This method reduces the time complexity of this part of the structure prediction f...
Sequence Comparison Significance and Poisson Approximation
- Stat. Sci
, 1994
"... The Chen-Stein method of Poisson approximation has been used to establish theorems about comparison of two DNA or protein sequences. The most useful result for sequence alignment applies to alignment scoring for aligned letters and no gaps. However there has not been a valid method to assign statist ..."
Abstract
-
Cited by 31 (4 self)
- Add to MetaCart
The Chen-Stein method of Poisson approximation has been used to establish theorems about comparison of two DNA or protein sequences. The most useful result for sequence alignment applies to alignment scoring for aligned letters and no gaps. However there has not been a valid method to assign statistical significance to alignment scores with gaps. In this paper we extend Poisson approximation techniques using the Aldous clumping heuristic to a practical method of estimating statistical significance.
Scoring Hidden Markov Models
"... Motivation: Statistical sequence comparison techniques, such as hidden Markov models and generalized pro les, calculate the probability that a sequence was generated by a given model. Log-odds scoring is a means of evaluating this probability by comparing it to a null hypothesis, usually a simpler s ..."
Abstract
-
Cited by 31 (5 self)
- Add to MetaCart
Motivation: Statistical sequence comparison techniques, such as hidden Markov models and generalized pro les, calculate the probability that a sequence was generated by a given model. Log-odds scoring is a means of evaluating this probability by comparing it to a null hypothesis, usually a simpler statistical model intended to represent the universe of sequences as a whole, rather than the group of interest. Such scoring leads to two immediate questions: what should the null model be, and what threshold of log-odds score should be deemed a match to the model. Results: This paper experimentally analyses these two issues. Within the context of the Sequence Alignment and Modeling software suite (SAM), we consider a variety ofnull models and suitable thresholds. Additionally, we consider HMMer's log-odds scoring and SAM's original Zscoring method. Among the null model choices, a simple looping null model that emits characters according to the geometric mean of the character probabilities in the columns modeled by the HMM performs well or best across all four discrimination experiments.
Accurate formula for p-values of gapped local sequence and profile alignments
- J. Mol. Biol
, 2000
"... A simple general approximation for the distribution of gapped local alignment scores is presented, suitable for assessing significance of comparisons between two protein sequences or a sequence and a profile. The approximation takes account of the scoring scheme (ie gap penalty and substitution matr ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
A simple general approximation for the distribution of gapped local alignment scores is presented, suitable for assessing significance of comparisons between two protein sequences or a sequence and a profile. The approximation takes account of the scoring scheme (ie gap penalty and substitution matrix or profile), sequence composition and length. Use of this formula means it is unnecessary to fit an extreme-value distribution to simulations or to the results of data-bank searches. The method is based on the theoretical ideas introduced in (Mott & Tribe, 1999). Extensive simulation studies show that score-thresholds produced by the method are accurate to within ±5 % 95 % of the time. We also investigate factors which affect the accuracy of alignment statistics, and show that any method based on asymptotic theory is limited because asymptotic behaviour is not strictly achieved for many real protein sequences, due to extreme composition effects. Consequently it may not be practicable to find a general formula that is significantly more accurate until the sub-asymptotic behaviour of alignments is better understood.
Name that tune: A pilot study in finding a melody from a sung query
- Journal of the American Society for Information Science and Technology
, 2004
"... We have created a system for music search and retrieval. A user sings a theme from the desired piece of music. The sung theme (query) is converted into a sequence of pitch-intervals and rhythms. This sequence is compared to musical themes (targets) stored in a database. The top pieces are returned t ..."
Abstract
-
Cited by 25 (7 self)
- Add to MetaCart
We have created a system for music search and retrieval. A user sings a theme from the desired piece of music. The sung theme (query) is converted into a sequence of pitch-intervals and rhythms. This sequence is compared to musical themes (targets) stored in a database. The top pieces are returned to the user in order of similarity to the sung theme. We describe, in detail, two different approaches to measuring similarity between database themes and the sung query. In the first, queries are compared to database themes using standard string-alignment algorithms. Here, similarity between target and query is determined by edit cost. In the second approach, pieces in the database are represented as hidden Markov models (HMMs). In this approach, the query is treated as an observation sequence and a target is judged similar to the query if its HMM has a high likelihood of generating the query. In this article we report our approach to the construction of a target database of themes, encoding, and transcription of user queries, and the results of preliminary experimentation with a set of sung queries. Our experiments show that while no approach is clearly superior to the other system, string matching has a slight advantage. Moreover, neither approach surpasses human performance.
Parallel Hardware for Sequence Comparison and Alignment
- CABIOS
, 1996
"... Sequence comparison, a vital research tool in computational biology, is based on a simple O(n 2 ) algorithm that easily maps to a linear array of processors. This paper reviews and compares high-performance sequence analysis on general-purpose supercomputers and single-purpose, reconfigurable, and ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
Sequence comparison, a vital research tool in computational biology, is based on a simple O(n 2 ) algorithm that easily maps to a linear array of processors. This paper reviews and compares high-performance sequence analysis on general-purpose supercomputers and single-purpose, reconfigurable, and programmable co-processors. The difficulty of comparing hardware from published performance figures is also noted. Introduction The vast databases produced by the Human Genome Project demand innovative tools for fast sequence and database analysis. Because sequence databases contain billions of characters it is important to locate areas of interest within a database or genome quickly. Dynamic programming organizes sequence comparison by comparing shorter subsequences first, so their costs can be made available in a table (Figure 1) for the next longer subsequence comparisons. The final entry becomes the comparison rating. Exact sequence comparison is an O(n 2 )-time dynamic programming a...

