Abstract:
The problems of finding a longest common subsequence of two sequences A and B and a shortest edit script for transforming A into B have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simple O(ND) time and space algorithm is developed where N is the sum of the lengths of A and B and D is the size of the minimum edit script for A and B. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to have O(N +D expected-time performance under a basic stochastic model. A refinement of the algorithm requires only O(N) space, and the use of suffix trees leads to an O(NlgN +D ) time variation.
Citations
|
598
|
Data Structures and Algorithms
– Aho, Hopcroft, et al.
- 1987
|
|
437
|
The string-to-string correction problem
– Wagner, Fisher
- 1974
|
|
429
|
A space-economical suffix tree construction algorithm
– McCreight
- 1976
|
|
298
|
Time Warps, String Edits and Macromolecules: the Theory and Practice of Sequence Comparisons
– Sankoff, Kruskal
- 1983
|
|
290
|
The Art of Computer Programming, Vol.3: Sorting and Searching
– Knuth
- 1973
|
|
244
|
The Source Code Control System
– Rochkind
- 1975
|
|
228
|
Fast Algorithms for Finding Nearest Common Ancestors
– Harel, Tarjan
- 1984
|
|
177
|
DS: A Linear space algorithm for computing maximal Common Subsequences
– Hirschberg
- 1975
|
|
121
|
A faster algorithm computing string edit distances
– Masek, Paterson
- 1980
|
|
109
|
Algorithms for the longest common subsequence problem
– Hirschberg
- 1977
|
|
107
|
A fast algorithm for computing longest common subsequences
– Hunt, Szymansky
|
|
91
|
note on two
– Dijkstra
- 1959
|
|
66
|
The string-to-string correction problem with block move
– TICHY
- 1984
|
|
65
|
An algorithm for differential file comparison,” Computer Science
– Hunt, McIllroy
|
|
43
|
Bounds on the complexity of the longest common subsequence problem
– Aho, Hirschberg, et al.
- 1976
|
|
18
|
A Longest common subsequence algorithm suitable for similar test strings
– Nakatsu, Kambayashi, et al.
- 1982
|
|
7
|
A redisplay algorithm
– GOSLING
- 1981
|
|
6
|
An information-theoretic lower bound for the longest common subsequence problem
– Hirschberg
- 1978
|