## An O(ND) Difference Algorithm and Its Variations (1986)

### Cached

### Download Links

- [xmailserver.org]
- [www.xmailserver.org]
- [se-pubs.dbs.uni-leipzig.de]
- CiteULike
- DBLP

### Other Repositories/Bibliography

Venue: | Algorithmica |

Citations: | 155 - 4 self |

### BibTeX

@ARTICLE{Myers86ano(nd),

author = {Eugene W. Myers},

title = {An O(ND) Difference Algorithm and Its Variations},

journal = {Algorithmica},

year = {1986},

volume = {1},

pages = {251--266}

}

### Years of Citing Articles

### OpenURL

### Abstract

The problems of finding a longest common subsequence of two sequences A and B and a shortest edit script for transforming A into B have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simple O(ND) time and space algorithm is developed where N is the sum of the lengths of A and B and D is the size of the minimum edit script for A and B. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to have O(N +D expected-time performance under a basic stochastic model. A refinement of the algorithm requires only O(N) space, and the use of suffix trees leads to an O(NlgN +D ) time variation.

### Citations

706 | Data Structures and Algorithms - Aho, Hopcroft, et al. - 1983 |

656 |
The string-to-string correction problem
- Wagner, Fischer
- 1974
(Show Context)
Citation Context ...RDS longest common subsequence shortest edit script edit graph file comparison 1. Introduction The problem of determining the differences between two sequences of symbols has been studied extensively =-=[1,8,11,13,16,19,20]-=-. Algorithms for the problem have numerous applications, including spelling correction systems, file comparison tools, and the study of genetic evolution [4,5,17,18]. Formally, the problem statement i... |

548 |
A space–economical suffix tree construction algorithm
- McCreight
- 1976
(Show Context)
Citation Context ... O(D 2 ) time but not in O(D 2 ) space. Finally, an O( (M + N) lg(M + N) + D 2 ) worst-case time variation is obtained by speeding up the traversal of snakes with some previously developed techniques =-=[6,14]-=-. The variation is impractical due to the sophistication of these underlying methods but its superior asymptotic worst-case complexity is of theoretical interest. 4a. A Probabilistic Analysis Consider... |

382 |
Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison
- Sankoff, Kruskal
- 1983
(Show Context)
Citation Context ... been studied extensively [1,8,11,13,16,19,20]. Algorithms for the problem have numerous applications, including spelling correction systems, file comparison tools, and the study of genetic evolution =-=[4,5,17,18]. Formal-=-ly, the problem statement is to find a longest common subsequence or, equivalently, to find the minimum ‘‘script’’ of symbol deletions and insertions that transform one sequence into the other... |

359 |
The Art of Computer Programming, Vol. 3: Sorting and Searching
- Knuth
- 1973
(Show Context)
Citation Context ...at a total of O(M+N) space is needed. 4c. An O( (M + N) lg(M + N) + D 2 ) Worst-Case Variation The final topic involves two previous results, each of which are just sketched here. First, suffix trees =-=[12,14]-=- are used to efficiently record the common sublists of the sequences being compared. The term sublist is used as opposed to subsequence to emphasize that the symbols must be contiguous. Second, a rece... |

330 |
Fast algorithms for finding nearest common ancestors
- Harel, Tarjan
(Show Context)
Citation Context ... O(D 2 ) time but not in O(D 2 ) space. Finally, an O( (M + N) lg(M + N) + D 2 ) worst-case time variation is obtained by speeding up the traversal of snakes with some previously developed techniques =-=[6,14]-=-. The variation is impractical due to the sophistication of these underlying methods but its superior asymptotic worst-case complexity is of theoretical interest. 4a. A Probabilistic Analysis Consider... |

270 |
A linear space algorithm for computing maximal common subsequences
- Hirschberg
- 1975
(Show Context)
Citation Context ...e of the earliest algorithms is by Wagner & Fischer [20] and takes O(N 2 ) time and space to solve a generalization they call the string-to-string correction problem. A later refinement by Hirschberg =-=[7] delivers -=-a longest common subsequence using only linear space. When algorithms are over arbitrary alphabets, use ‘‘equal—unequal’’ comparisons, and are characterized in terms of the size of their inp... |

256 |
The Source Code Control System
- Rochkind
- 1975
(Show Context)
Citation Context ... been studied extensively [1,8,11,13,16,19,20]. Algorithms for the problem have numerous applications, including spelling correction systems, file comparison tools, and the study of genetic evolution =-=[4,5,17,18]. Formal-=-ly, the problem statement is to find a longest common subsequence or, equivalently, to find the minimum ‘‘script’’ of symbol deletions and insertions that transform one sequence into the other... |

176 | Algorithms for the longest common subsequence problem
- Hirschberg
- 1977
(Show Context)
Citation Context ...RDS longest common subsequence shortest edit script edit graph file comparison 1. Introduction The problem of determining the differences between two sequences of symbols has been studied extensively =-=[1,8,11,13,16,19,20]-=-. Algorithms for the problem have numerous applications, including spelling correction systems, file comparison tools, and the study of genetic evolution [4,5,17,18]. Formally, the problem statement i... |

167 |
A fast algorithm for computing longest common subsequences
- Hunt, Szymanski
- 1977
(Show Context)
Citation Context ...RDS longest common subsequence shortest edit script edit graph file comparison 1. Introduction The problem of determining the differences between two sequences of symbols has been studied extensively =-=[1,8,11,13,16,19,20]-=-. Algorithms for the problem have numerous applications, including spelling correction systems, file comparison tools, and the study of genetic evolution [4,5,17,18]. Formally, the problem statement i... |

167 |
A faster algorithm computing string edit distances
- Masek, Paterson
- 1980
(Show Context)
Citation Context |

143 |
A Note on Two
- Dijkstra
- 1959
(Show Context)
Citation Context ...oblem can be viewed as an instance of the single-source shortest paths problem on a weighted edit graph. This suggests that an efficient algorithm can be obtained by specializing Dijkstra’s algorith=-=m [3]-=-. A basic exercise [2: 207-208] shows that the algorithm takes O(ElgV) time where E is the number of edges and V is the number of vertices in the subject graph. For an edit graph E < 3V since each poi... |

106 | An Algorithm for Differential File Comparison
- Hunt, McIllroy
- 1976
(Show Context)
Citation Context ...rved as the basis for a new implementation of the UNIX diff program [15]. This version usually runs two to four times faster than the System 5 implementation based on the Hunt and Szymanski algorithm =-=[10]-=-. However, there are cases when D is large where their algorithm is superior (e.g. for files that are completely different, R=0 and D=2N). The linear space refinment is roughly twice as slow as the ba... |

74 |
The string-to-string correction problem with block moves
- Tichy
- 1984
(Show Context)
Citation Context |

64 | Bounds on the complexity of the longest common subsequence problem
- Aho, Hirschberg, et al.
- 1976
(Show Context)
Citation Context |

25 |
A longest common subsequence algorithm suitable for similar text strings
- Nakatsu, Kambayashi, et al.
- 1982
(Show Context)
Citation Context |

8 |
A redisplay algorithm
- Gosling
- 1981
(Show Context)
Citation Context ... been studied extensively [1,8,11,13,16,19,20]. Algorithms for the problem have numerous applications, including spelling correction systems, file comparison tools, and the study of genetic evolution =-=[4,5,17,18]. Formal-=-ly, the problem statement is to find a longest common subsequence or, equivalently, to find the minimum ‘‘script’’ of symbol deletions and insertions that transform one sequence into the other... |

6 |
An information-theoretic lower bound for the longest common subsequence problem
- Hirschberg
- 1978
(Show Context)
Citation Context ...e existence of faster algorithms using other comparison formats is still open. Indeed, for algorithms that use ‘‘less than—equal—greater than’’ comparisons, Ω(NlgN) time is the best lower=-= bound known [9]. ∗ This work was supported in pa-=-rt by the National Science Foundation under Grant MCS82-10096.s¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡s- 1 -sRecent work improves upon the basic O(N 2 ) time arbitrary alpha... |