Results 1  10
of
29
A fast bitvector algorithm for approximate string matching based on dynamic programming
 J. ACM
, 1999
"... Abstract. The approximate string matching problem is to find all locations at which a query of length m matches a substring of a text of length n with korfewer differences. Simple and practical bitvector algorithms have been designed for this problem, most notably the one used in agrep. These alg ..."
Abstract

Cited by 185 (1 self)
 Add to MetaCart
Abstract. The approximate string matching problem is to find all locations at which a query of length m matches a substring of a text of length n with korfewer differences. Simple and practical bitvector algorithms have been designed for this problem, most notably the one used in agrep. These algorithms compute a bit representation of the current stateset of the kdifference automaton for the query, and asymptotically run in either O(nmk/w) orO(nm log �/w) time where w is the word size of the machine (e.g., 32 or 64 in practice), and � is the size of the pattern alphabet. Here we present an algorithm of comparable simplicity that requires only O(nm/w) time by virtue of computing a bit representation of the relocatable dynamic programming matrix for the problem. Thus, the algorithm’s performance is independent of k, and it is found to be more efficient than the previous results for many choices of k and small m. Moreover, because the algorithm is not dependent on k, it can be used to rapidly compute blocks of the dynamic programming matrix as in the 4Russians algorithm of Wu et al. [1996]. This gives rise to an O(kn/w) expectedtime algorithm for the case where m may be arbitrarily large. In practice this new algorithm, that computes a region of the dynamic programming (d.p.) matrix w entries at a time using the basic algorithm as a subroutine, is significantly faster than our previous 4Russians algorithm, that computes the same region 4 or 5 entries at a time using table lookup. This performance improvement yields a code that is either superior or competitive with all existing algorithms except for some filtration algorithms that are superior when k/m is sufficiently small.
PipMaker  A Web Server for Aligning Two Genomic DNA Sequences
, 2000
"... this paper we describe an automated server for generating alignments and pips. A pip shows the position in one sequence of each aligning gapfree segment and plots its percent identity. As a complementary display, we also provide a plot of the position of each aligning segment in both species. We re ..."
Abstract

Cited by 101 (5 self)
 Add to MetaCart
this paper we describe an automated server for generating alignments and pips. A pip shows the position in one sequence of each aligning gapfree segment and plots its percent identity. As a complementary display, we also provide a plot of the position of each aligning segment in both species. We refer to these as dot plots, even though matches shown in conventional dot plots need not be contained within a statistically significant alignment and those in our plots are. Both displays allow rich annotation to be plotted along the appropriate axes to aid in correlating aligning segments with functional or structural features of the sequence. We provide examples of the application of PipMaker for finding exons and candidate regulatory elements in mammalian, nematode, and bacterial sequences. The server is able to compare a completed sequence from one species with an incomplete sequence from a second.
A subquadratic sequence alignment algorithm for unrestricted scoring matrices
 SIAM J. Comput
"... Abstract The classical algorithm for computing the similarity between two sequences Our algorithm applies to both local and global alignment computations. The speedup is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by LempelZiv parsing of both string ..."
Abstract

Cited by 76 (5 self)
 Add to MetaCart
(Show Context)
Abstract The classical algorithm for computing the similarity between two sequences Our algorithm applies to both local and global alignment computations. The speedup is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by LempelZiv parsing of both strings, and utilizing the inherent periodic nature of both strings. This leads to an O(n2/logn) algorithm for an input of constant alphabet size. For most texts, the time complexity is actually O(hn2/logn) where h _< 1 is the entropy of the text.
Progressive Multiple Alignment with Constraints
 J. Computational Biology
, 1996
"... A progressive alignment algorithm produces a multialignment of a set of sequences by repeatedly aligning pairs of sequences and/or previously generated alignments. We describe a method for guaranteeing that the alignment generated by a progressive alignment strategy satisfies a userspecified colle ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
A progressive alignment algorithm produces a multialignment of a set of sequences by repeatedly aligning pairs of sequences and/or previously generated alignments. We describe a method for guaranteeing that the alignment generated by a progressive alignment strategy satisfies a userspecified collection of constraints about where certain sequence positions should appear relative to others. Given a collection of C constraints over K sequences whose total length is N , our algorithm takes O(K(N 2 +KC)) time. An alignment of the filike globin gene clusters of several mammals illustrates the practicality of the method. Key words: Multiplesequence alignment, constrained alignment, dynamic programming 1 Introduction It is straightforward to extend the dynamic programming alignment algorithm (Needleman and Wunsch 1970) to the simultaneous alignment of K ? 2 sequences. However, the O(2 K N K ) execution time for sequences of length N makes it impractical to align more than three seque...
Improved Gapped Alignment in BLAST
"... Homology search is a key tool for understanding the role, structure, and biochemical function of genomic sequences. The most popular technique for rapid homology search is BLAST, which has been in widespread use within universities, research centres, and commercial enterprises since the early 1990 ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
Homology search is a key tool for understanding the role, structure, and biochemical function of genomic sequences. The most popular technique for rapid homology search is BLAST, which has been in widespread use within universities, research centres, and commercial enterprises since the early 1990s. In this paper, we propose a new step in the BLAST algorithm to reduce the computational cost of searching with negligible effect on accuracy. This new step — semigapped alignment — compromises between the efficiency of ungapped alignment and the accuracy of gapped alignment, allowing BLAST to accurately filter sequences with lower computational cost. In addition, we propose an heuristic — restricted insertion alignment — that avoids unlikely evolutionary paths with the aim of reducing gapped alignment cost with negligible effect on accuracy. Together, after including an optimisation of the local alignment recursion, our two techniques more than double the speed of the gapped alignment stages in BLAST. We conclude that our techniques are an important improvement to the BLAST algorithm. Source code for the alignment algorithms is available for download at
Memoryefficient A* heuristics for multiple sequence alignment
 In National Conference on Artificial Intelligence (AAAI02
, 2002
"... The time and space needs of an A * search are strongly influenced by the quality of the heuristic evaluation function. Usually there is a tradeoff since better heuristics may require more time and/or space to evaluate. Multiple sequence alignment is an important application for singleagent searc ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
The time and space needs of an A * search are strongly influenced by the quality of the heuristic evaluation function. Usually there is a tradeoff since better heuristics may require more time and/or space to evaluate. Multiple sequence alignment is an important application for singleagent search. The traditional heuristic uses multiple pairwise alignments that require relatively little space. Threeway alignments produce better heuristics, but they are not used in practice due to the large space requirements. This paper presents a memoryefficient way to represent threeway heuristics as an octree. The required portions of the octree are computed on demand. The octreesupported threeway heuristics result in such a substantial reduction to the size of the A * open list that they offset the additional space and time requirements for the threeway alignments. The resulting multiple sequence alignments are both faster and use less memory than using A * with traditional pairwise heuristics.
Sequence alignment using FastLSA
 In Proceedings of the 2000 International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS 2000
, 2000
"... Abstract For two strings of length m and n (m n), optimal sequence alignment (as a function of the alignment scoring function) takes time and space proportional to mn to compute. The time actually consists of two parts: computing the score of the best alignment (calculating (m+1)(n+1) values), and ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
(Show Context)
Abstract For two strings of length m and n (m n), optimal sequence alignment (as a function of the alignment scoring function) takes time and space proportional to mn to compute. The time actually consists of two parts: computing the score of the best alignment (calculating (m+1)(n+1) values), and then extracting the alignment (by reading the computed values). The space requirement is usually prohibitive. Hirschberg's algorithm reduces the space needs to roughly 2m, but doubles the cost of computing and extracting the alignment. This paper introduces the FastLSA algorithm that is adaptive to the amount of space available. At one extreme, it uses linear space, while at the other it uses quadratic space. Based on the memory resources available, the algorithm saves the maximum amount of information to achieve the lowest extraction cost. The algorithm is shown to be analytically and experimentally superior to Hirschberg's algorithm.
Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions
 Bioinformatics
, 2006
"... doi:10.1093/bioinformatics/bti828 ..."