Results 1  10
of
40
A greedy algorithm for aligning DNA sequences
 J. COMPUT. BIOL
, 2000
"... For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy a ..."
Abstract

Cited by 241 (15 self)
 Add to MetaCart
For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information.
Optimal alignments in linear space
 CABIOS
, 1988
"... Space, not time, is often the limiting factor when computing optimal sequence alignments, and a number of recent papers in the biology literature have proposed spacesaving strategies. However, a 1975 computer science paper by Hirschberg presented a method that is superior to the newer proposals, bo ..."
Abstract

Cited by 181 (3 self)
 Add to MetaCart
Space, not time, is often the limiting factor when computing optimal sequence alignments, and a number of recent papers in the biology literature have proposed spacesaving strategies. However, a 1975 computer science paper by Hirschberg presented a method that is superior to the newer proposals, both in theory and in practice. The goal of this note is to give Hirschberg’s idea the visibility it deserves by developing a linearspace version of Gotoh’s algorithm, which accommodates affine gap penalties. A portable Csoftware package implementing this algorithm is available on the BIONET free of charge.
A Robust Model for Finding Optimal Evolutionary Trees
, 1993
"... Constructing evolutionary trees for species sets is a fundamental problem in computational biology. One of the standard models assumes the ability to compute distances between every pair of species and seeks to find an edgeweighted tree T in which the distance d T ij in the tree between the leaves ..."
Abstract

Cited by 78 (14 self)
 Add to MetaCart
Constructing evolutionary trees for species sets is a fundamental problem in computational biology. One of the standard models assumes the ability to compute distances between every pair of species and seeks to find an edgeweighted tree T in which the distance d T ij in the tree between the leaves of T corresponding to the species i and j exactly equals the observed distance, d ij . When such a tree exists, this is expressed in the biological literature by saying that the distance function or matrix is additive, and trees can be constructed from additive distance matrices in O(n 2 ) time. Real distance data is hardly ever additive, and we therefore need ways of modeling the problem of finding the bestfit tree as an optimization problem. In this paper we present several natural and realistic ways of modeling the inaccuracies in the distance data. In one model we assume that we have upper and lower bounds for the distances between pairs of species and try to find an additive distanc...
A Subquadratic Sequence Alignment Algorithm for Unrestricted Cost Matrices
, 2002
"... The classical algorithm for computing the similarity between two sequences [36, 39] uses a dynamic programming matrix, and compares two strings of size n in O(n 2 ) time. We address the challenge of computing the similarity of two strings in subquadratic time, for metrics which use a scoring ..."
Abstract

Cited by 56 (4 self)
 Add to MetaCart
The classical algorithm for computing the similarity between two sequences [36, 39] uses a dynamic programming matrix, and compares two strings of size n in O(n 2 ) time. We address the challenge of computing the similarity of two strings in subquadratic time, for metrics which use a scoring matrix of unrestricted weights. Our algorithm applies to both local and global alignment computations. The speedup is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by LempelZiv parsing of both strings, and utilizing the inherent periodic nature of both strings. This leads to an O(n 2 = log n) algorithm for an input of constant alphabet size. For most texts, the time complexity is actually O(hn 2 = log n) where h 1 is the entropy of the text. Institut GaspardMonge, Universite de MarnelaVallee, Cite Descartes, ChampssurMarne, 77454 MarnelaVallee Cedex 2, France, email: mac@univmlv.fr. y Department of Computer Science, Haifa University, Haifa 31905, Israel, phone: (9724) 8240103, FAX: (9724) 8249331; Department of Computer and Information Science, Polytechnic University, Six MetroTech Center, Brooklyn, NY 112013840; email: landau@poly.edu; partially supported by NSF grant CCR0104307, by NATO Science Programme grant PST.CLG.977017, by the Israel Science Foundation (grants 173/98 and 282/01), by the FIRST Foundation of the Israel Academy of Science and Humanities, and by IBM Faculty Partnership Award. z Department of Computer Science, Haifa University, Haifa 31905, Israel; On Education Leave from the IBM T.J.W. Research Center; email: michal@cs.haifa.il; partially supported by by the Israel Science Foundation (grants 173/98 and 282/01), and by the FIRST Foundation of the Israel Academy of Science ...
Accurate formula for pvalues of gapped local sequence and profile alignments
 J. Mol. Biol
, 2000
"... A simple general approximation for the distribution of gapped local alignment scores is presented, suitable for assessing significance of comparisons between two protein sequences or a sequence and a profile. The approximation takes account of the scoring scheme (ie gap penalty and substitution matr ..."
Abstract

Cited by 37 (1 self)
 Add to MetaCart
A simple general approximation for the distribution of gapped local alignment scores is presented, suitable for assessing significance of comparisons between two protein sequences or a sequence and a profile. The approximation takes account of the scoring scheme (ie gap penalty and substitution matrix or profile), sequence composition and length. Use of this formula means it is unnecessary to fit an extremevalue distribution to simulations or to the results of databank searches. The method is based on the theoretical ideas introduced in (Mott & Tribe, 1999). Extensive simulation studies show that scorethresholds produced by the method are accurate to within ±5 % 95 % of the time. We also investigate factors which affect the accuracy of alignment statistics, and show that any method based on asymptotic theory is limited because asymptotic behaviour is not strictly achieved for many real protein sequences, due to extreme composition effects. Consequently it may not be practicable to find a general formula that is significantly more accurate until the subasymptotic behaviour of alignments is better understood.
Recent developments in linearspace alignment methods: A survey
 J. Comput. Biol
, 1994
"... A dynamicprogramming strategy for sequence alignment first proposed in 1975 by Dan Hirschberg can be adapted to yield a number of extremely spaceefficient algorithms. Specifically, these algorithms align two sequences using only ‘‘linear space’’, i.e., an amount of computer memory that is proporti ..."
Abstract

Cited by 26 (5 self)
 Add to MetaCart
A dynamicprogramming strategy for sequence alignment first proposed in 1975 by Dan Hirschberg can be adapted to yield a number of extremely spaceefficient algorithms. Specifically, these algorithms align two sequences using only ‘‘linear space’’, i.e., an amount of computer memory that is proportional to the sum of the lengths of the two sequences being aligned. This paper begins by reviewing the basic idea, as it applies to the global (i.e., endtoend) alignment of two DNA or protein sequences. Three of our recent extensions of the technique are then outlined. The first extension computes an optimal alignment subject to the constraint that each position, i, of the first sequence must be aligned somewhere between positions L[i] and U[i] of the second sequence, for given values of L and U. The second finds all aligned position pairs (i.e., potential columns of the alignment) that occur in an alignment whose score exceeds a given threshold. The third treats the case where each of the two sequences is allowed to be an alignment (e.g., a sequence of aligned pairs), using a sensitive scoring scheme. We also describe two linearspace methods for computing k best local (i.e., involving only a part of each sequence) alignments, where k ≥ 1. One is a linearspace version of the algorithm of Waterman and Eggert (1987), and the other is based on the strategy proposed by Wilbur and Lipman (1983). Finally, we describe programs that implement various combinations of these techniques to provide a multisequence alignment method that is especially suited to handling a few very long sequences. The utility of these programs is illustrated by analysis of the locus control region of the βlike globin gene cluster of several mammals.
On the Common Substring Alignment Problem
"... The Common Substring Alignment Problem is defined as follows: Given a set of one or more strings and a target string. is a common substring of all strings, that is. The goal is to compute the similarity of all strings with, without computing the part of again and again. Using the classical dynamic p ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
The Common Substring Alignment Problem is defined as follows: Given a set of one or more strings and a target string. is a common substring of all strings, that is. The goal is to compute the similarity of all strings with, without computing the part of again and again. Using the classical dynamic programming tables, each appearance of in a source string would require the computation of all the values in a dynamic programming table of size where is the size of. Here we describe an algorithm which is composed of an encoding stage and an alignment stage. During the first stage, a data structure is constructed which encodes the comparison of with. Then, during the alignment stage, for each comparison of a source with, the precompiled data structure is used to speed up the part of. We show how to reduce the alignment work, for each appearance of the common substring in a source string, to at the cost of encoding work, which is executed only once.
Speeding up Dynamic Programming
 In Proc. 29th Symp. Foundations of Computer Science
, 1988
"... this paper we consider the problem of computing two similar recurrences: the onedimensional case ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
this paper we consider the problem of computing two similar recurrences: the onedimensional case
AutomataTheoretic Models of Mutation and Alignment
 In International Conference on Intelligent Systems in Molecular Biology
, 1995
"... Finitestate automata called transducers, which have both input and output, can be used to model simple mechanisms of biological mutation. We present a methodology whereby numerically weighted versions of such specifications can be mechanically adapted to create string edit machines that are essent ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
Finitestate automata called transducers, which have both input and output, can be used to model simple mechanisms of biological mutation. We present a methodology whereby numerically weighted versions of such specifications can be mechanically adapted to create string edit machines that are essentially equivalent to recurrence relations of the sort that characterize dynamic programming alignment algorithms. Based on this, we have developed a visual programming system for designing new alignment algorithms in a rapidprototyping fashion. 1 Introduction Finitestate automata have an important place in computer science, often representing simple models of computation as the recognition or generation of strings of symbols. A wide variety of such automata have been intensively studied, including weighted automata which have numbers associated with transitions between states, and transducers which have both input and output. Allison and coworkers [2] have proposed the use of finitestat...
LinearSpace Algorithms that Build Local Alignments from Fragments
 Algorithmica
, 1995
"... Abstract. This paper presents practical algorithms for building an alignment of two long sequences from a collection of "alignment fragments, " such as all occurrences of identical 5tuples in each of two DNA sequences. We first combine a timeefficient algorithm developed by Galil and cow ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
Abstract. This paper presents practical algorithms for building an alignment of two long sequences from a collection of "alignment fragments, " such as all occurrences of identical 5tuples in each of two DNA sequences. We first combine a timeefficient algorithm developed by Galil and coworkers with a spacesaving approach of Hirschberg to obtain a local alignment algorithm that uses O((M + N + F log N) log M) time and O(M + N) space to align sequences of lengths M and N from a pool of F alignment fragments. Ideas of Huang and Miller are then employed to develop a time and spaceefficient algorithm that computes n best nonintersecting alignments for any n> 1. An example illustrates the utility of these methods.