## A Fast and Practical Bit-Vector Algorithm for the Longest Common Subsequence Problem (2000)

Venue: | Information Processing Letters |

Citations: | 26 - 2 self |

### BibTeX

@ARTICLE{Crochemore00afast,

author = {Maxime Crochemore and Costas S. Iliopoulos and Yoan J. Pinzon and James F. Reid},

title = {A Fast and Practical Bit-Vector Algorithm for the Longest Common Subsequence Problem},

journal = {Information Processing Letters},

year = {2000},

volume = {80},

pages = {279--285}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper presents a new practical bit-vector algorithm for solving the well known Longest Common Subsequence (LCS) problem. Given two strings of length m and n, n m, we present an algorithm which determines the length p of an LCS in O(nm=w) time and O(m=w) space, where w is the number of bits in a machine word. This algorithm can be thought of as column-wise "parallelization" of the classical dynamic programming approach. Our algorithm is very efficiently in practice, where computing the length of an LCS of two strings can be done in linear time and constant (additional/working) space by assuming that m w.

### Citations

1198 |
Binary codes capable of correcting deletions, insertions, and reversals
- Levenshtein
- 1966
(Show Context)
Citation Context ...at w is a subsequence of both x and y of maximum possible length. The LCS problem is related to two well known metrics for measuring the similarity (distance) of two strings: the Levenshtein distance =-=[7-=-] and the edit distance [14]. The Levenshtein distance is dened as the minimum number of character insertions and/or deletions required to transform a string x into a string y. The edit distance of tw... |

654 |
The string-to-string correction problem
- Wagner, Fischer
- 1974
(Show Context)
Citation Context ...oth x and y of maximum possible length. The LCS problem is related to two well known metrics for measuring the similarity (distance) of two strings: the Levenshtein distance [7] and the edit distance =-=[14-=-]. The Levenshtein distance is dened as the minimum number of character insertions and/or deletions required to transform a string x into a string y. The edit distance of two strings x and y is a gene... |

317 |
Fast text searching allowing errors
- Wu, Manber
- 1992
(Show Context)
Citation Context ... matching case and an O(nm log k=w) algorithm for the k-mismatches problem, where w is the number of bits in a machine word, n the length of the text and m the length of the pattern. Wu and Manber in =-=[-=-16] showed an O(nkm=w) algorithm for the k-dierences problem. Furthermore, Wright ([15]) presented an O(n log jjm=w) bit-vector style algorithm where jj is the size of the alphabet for the pattern. Re... |

270 |
A linear space algorithm for computing maximal common subsequences
- Hirschberg
- 1975
(Show Context)
Citation Context ...gs be x = x 1 xm and y = y 1 yn and let L[i; j] denote the length of an LCS (LLCS) for the prexes x 1 x i and y 1 y j . The following simple recurrence formula by Hirschberg ([5]) computes p = L[m; n] in O(nm) time and space. L[i; j] = 8 : 0; if either i = 0 or j = 0 1 + L[i 1; j 1]; if x i = y j maxfL[i 1; j]; L[i; j 1]g; if x i 6= y j (1) In fact, only linear space is neede... |

224 | A new approach to text searching
- Baeza-Yates, Gonnet
- 1992
(Show Context)
Citation Context ...way of computing an LCS of two strings by using bit-vector operations which is really fast in practice. The idea of using the bits of the computer word has been used extensively in the last years. In =-=[4]-=-, Baeza-Yates and Gonnet presented an O(nm=w) algorithm for the exact matching case and an O(nm log k=w) algorithm for the k-mismatches problem, where w is the number of bits in a machine word, n the ... |

167 |
A faster algorithm computing string edit distances
- Masek, Paterson
- 1980
(Show Context)
Citation Context ... \equal-unequal" comparisons must take (nm) in the worst case ([1]). The fastest general solution for the LCS problem is the corresponding solution to the string editing problem by Masek and Pate=-=rson [9]-=- taking O(n 2 log log n= log n) for unbounded alphabet size and O(n 2 = log n) for bounded alphabet size. Due to the fact that the best worst-case algorithm is still sub-quadratic in the input size, m... |

138 | A fast bit-vector algorithm for approximate string matching based on dynamic programming
- Myers
- 1999
(Show Context)
Citation Context ...m=w) algorithm for the k-dierences problem. Furthermore, Wright ([15]) presented an O(n log jjm=w) bit-vector style algorithm where jj is the size of the alphabet for the pattern. Recently, Myers ([10=-=]-=-) developed a particularly practical method to compute the edit distance in O(nm=w). Related work to the computation of the length of an LCS can be found in [2], where Allison and Dix present an O(nm=... |

64 | Bounds on the complexity of the longest common subsequence problem
- Aho, Hirschberg, et al.
- 1976
(Show Context)
Citation Context ...ar time, according to whether the size of the alphabet is unbounded or bounded ([6]). For unbounded alphabets, any algorithm using only \equal-unequal" comparisons must take (nm) in the worst ca=-=se ([1]-=-). The fastest general solution for the LCS problem is the corresponding solution to the string editing problem by Masek and Paterson [9] taking O(n 2 log log n= log n) for unbounded alphabet size and... |

48 | A subquadratic algorithm for approximate limited expression matching
- Wu, Manber, et al.
- 1996
(Show Context)
Citation Context ...itions the poset into the minimum possible number of antichains, see e.g. [3]. The computation of the LCS for x=\tccagatg" and y=\aaagtgacctagcccg" is depicted in Figure 2. The chain from L[=-=1; 5] to L[8; 16] spel-=-ls out the LCS w=\tccagg". 3 A Simple Bit-Vector Algorithm Here we will make use of word-level parallelism in order to compute the matrix L more eciently, similar to the manner used by Myers in [... |

43 |
Eds). Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison
- Sankoff, Kruskal
- 1983
(Show Context)
Citation Context ... p = 1=2(n +m d). Finding the longest common subsequence or the edit distance of two strings are problems with numerous applications ranging from text editing to molecular sequence analysis, see e.g. =-=[13]-=-. A vast number of ecient algorithms have been designed over the last two decades to solve these problems. The lower bounds for the LCS problem are time (n log m) or linear time, according to whether ... |

31 |
A bit-string longest-common-subsequence algorithm
- Allison, Dix
- 1986
(Show Context)
Citation Context ...abet for the pattern. Recently, Myers ([10]) developed a particularly practical method to compute the edit distance in O(nm=w). Related work to the computation of the length of an LCS can be found in =-=[2]-=-, where Allison and Dix present an O(nm=w) algorithm using a bit-vector formula withsve bit-wise operations (after optimization of the expression given on the paper). In this paper, we present a simil... |

31 |
Introductory combinatorics
- Bogart
- 1990
(Show Context)
Citation Context ...blem translates tosnding a longest chain in the poset of matches induced by R. A decomposition of a poset into antichains partitions the poset into the minimum possible number of antichains, see e.g. =-=[3]. The computati-=-on of the LCS for x=\tccagatg" and y=\aaagtgacctagcccg" is depicted in Figure 2. The chain from L[1; 5] to L[8; 16] spells out the LCS w=\tccagg". 3 A Simple Bit-Vector Algorithm Here w... |

25 | Approiximate String Matching using Within-Word Parallelism
- Wright
(Show Context)
Citation Context ...s the number of bits in a machine word, n the length of the text and m the length of the pattern. Wu and Manber in [16] showed an O(nkm=w) algorithm for the k-dierences problem. Furthermore, Wright ([=-=1-=-5]) presented an O(n log jjm=w) bit-vector style algorithm where jj is the size of the alphabet for the pattern. Recently, Myers ([10]) developed a particularly practical method to compute the edit di... |

10 |
An information theoretic lower bound for the longest common subsequence problem
- Hirschberg
- 1978
(Show Context)
Citation Context ...ned over the last two decades to solve these problems. The lower bounds for the LCS problem are time (n log m) or linear time, according to whether the size of the alphabet is unbounded or bounded ([=-=6]). Fo-=-r unbounded alphabets, any algorithm using only \equal-unequal" comparisons must take (nm) in the worst case ([1]). The fastest general solution for the LCS problem is the corresponding solution ... |

4 |
diversions and maximal chains in partially ordered sets
- Sankoff, Sellers, et al.
- 1973
(Show Context)
Citation Context ...R constitutes a chain relative to the partial order relation R. A set of matches such that in any pair neither element of the pair precedes the other in R constitutes an antichain. Sanko and Sellers [=-=12]-=- observed that the LCS problem translates tosnding a longest chain in the poset of matches induced by R. A decomposition of a poset into antichains partitions the poset into the minimum possible numbe... |