Results 1  10
of
23
An O(ND) Difference Algorithm and Its Variations
 Algorithmica
, 1986
"... The problems of finding a longest common subsequence of two sequences A and B and a shortest edit script for transforming A into B have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a s ..."
Abstract

Cited by 156 (4 self)
 Add to MetaCart
The problems of finding a longest common subsequence of two sequences A and B and a shortest edit script for transforming A into B have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simple O(ND) time and space algorithm is developed where N is the sum of the lengths of A and B and D is the size of the minimum edit script for A and B. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to have O(N +D expectedtime performance under a basic stochastic model. A refinement of the algorithm requires only O(N) space, and the use of suffix trees leads to an O(NlgN +D ) time variation.
Identifying the Semantic and Textual Differences Between Two Versions of a Program
 Proceedings of the ACM SIGPLAN 90 Conference on Programming Language Design and Implementation
, 1990
"... Textbased file comparators (e.g., the Unix utility diff), are very general tools that can be applied to arbitrary files. However, using such tools to compare programs can be unsatisfactory because their only notion of change is based on program text rather than program behavior. This paper describe ..."
Abstract

Cited by 97 (6 self)
 Add to MetaCart
Textbased file comparators (e.g., the Unix utility diff), are very general tools that can be applied to arbitrary files. However, using such tools to compare programs can be unsatisfactory because their only notion of change is based on program text rather than program behavior. This paper describes a technique for comparing two versions of a program, determining which program components represent changes, and classifying each changed component as representing either a semantic or a textual change. ######################## This work was supported in part by the Defense Advanced Research Projects Agency, monitored by the Office of Naval Research under contract N0001488K, by the National Science Foundation under grant CCR8958530, and by grants from Xerox, Kodak, and Cray. Author's address: Computer Sciences Department, Univ. of Wisconsin, 1210 W. Dayton St., Madison, WI 53706. Permission to copy without fee all or part of this material is granted provided that the copies are not made...
Identifying Syntactic Differences Between Two Programs
 Software  Practice and Experience
, 1991
"... this paper is organized into five sections, as follows. The internal form of a program, which is a variant of a parse tree, is discussed in the next section. Then the treematching algorithm and the synchronous prettyprinting technique are described. Experience with the comparator for the C languag ..."
Abstract

Cited by 80 (0 self)
 Add to MetaCart
this paper is organized into five sections, as follows. The internal form of a program, which is a variant of a parse tree, is discussed in the next section. Then the treematching algorithm and the synchronous prettyprinting technique are described. Experience with the comparator for the C language and some performance measurements are also presented. The last section discusses related work and concludes this paper
A file comparison program
 Software: Practice and Experience
, 1985
"... This paper presents a simple method for computing a shortest sequence of insertion and deletion commands that converts one given file to another. The method is particularly efficient when the difference between the two files is small compared to the files ' lengths. In experiments performed on typic ..."
Abstract

Cited by 60 (3 self)
 Add to MetaCart
This paper presents a simple method for computing a shortest sequence of insertion and deletion commands that converts one given file to another. The method is particularly efficient when the difference between the two files is small compared to the files ' lengths. In experiments performed on typical files, the program often ran four times faster than the UNIX diff command. KEY WORDS Edit distance Edit script Filc comparison
Incremental String Comparison
 SIAM JOURNAL ON COMPUTING
, 1995
"... The problem of comparing two sequences A and B to determine their LCS or the edit distance between them has been much studied. In this paper we consider the following incremental version of these problems: given an appropriate encoding of a comparison between A and B, can one incrementally compute t ..."
Abstract

Cited by 38 (3 self)
 Add to MetaCart
The problem of comparing two sequences A and B to determine their LCS or the edit distance between them has been much studied. In this paper we consider the following incremental version of these problems: given an appropriate encoding of a comparison between A and B, can one incrementally compute the answer for A and bB, and the answer for A and Bb with equal efficiency, where b is an additional symbol? Our main result is a theorem exposing a surprising relationship between the dynamic programming solutions for two such "adjacent" problems. Given a threshold k on the number of differences to be permitted in an alignment, the theorem leads directly to an O(k) algorithm for incrementally computing a new solution from an old one, as contrasts the O(k²) time required to compute a solution from scratch. We further show with a series of applications that this algorithm is indeed more powerful than its nonincremental counterpart by solving the applications with greater asymptotic ef...
Longest Common Subsequences
 In Proc. of 19th MFCS, number 841 in LNCS
, 1994
"... . The length of a longest common subsequence (LLCS) of two or more strings is a useful measure of their similarity. The LLCS of a pair of strings is related to the `edit distance', or number of mutations /errors/editing steps required in passing from one string to the other. In this talk, we explore ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
. The length of a longest common subsequence (LLCS) of two or more strings is a useful measure of their similarity. The LLCS of a pair of strings is related to the `edit distance', or number of mutations /errors/editing steps required in passing from one string to the other. In this talk, we explore some of the combinatorial properties of the suband supersequence relations, survey various algorithms for computing the LLCS, and introduce some results on the expected LLCS for pairs of random strings. 1 Introduction The set \Sigma of finite strings over an unordered finite alphabet \Sigma admits of several natural partial orders. Some, such as the substring, prefix, and suffix relations, depend on contiguity and lead to many interesting combinatorial questions with practical applications to stringmatching. An excellent survey is given by Aho in [1]. In this talk however we will focus on the `subsequence' partial order. We say that u = u 1 \Delta \Delta \Delta um is a subsequence of ...
New efficient algorithms for LCS and constrained LCS problem
 In Broersma et al
"... Abstract. In this paper, we study the classic and wellstudied longest common subsequence (LCS) problem and a recent variant of it, namely the constrained LCS (CLCS) problem. In the CLCS problem, the computed LCS must also be a supersequence of a third given string. In this paper, we first present a ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Abstract. In this paper, we study the classic and wellstudied longest common subsequence (LCS) problem and a recent variant of it, namely the constrained LCS (CLCS) problem. In the CLCS problem, the computed LCS must also be a supersequence of a third given string. In this paper, we first present an efficient algorithm for the traditional LCS problem that runs in O(R log log n + n) time, where R is the total number of ordered pairs of positions at which the two strings match and n is the length of the two given strings. Then, using this algorithm, we devise an algorithm for the CLCS problem having time complexity O(pR log log n + n) in the worst case, where p is the length of the third string. 1
On the safety and efficiency of firewall policy deployment
 Proc. of IEEE Symposium on Security and Privacy
, 2007
"... Firewall policy management is challenging and errorprone. While ample research has led to tools for policy specification, correctness analysis, and optimization, few researchers have paid attention to firewall policy deployment: the process where a management tool edits a firewall’s configuration t ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Firewall policy management is challenging and errorprone. While ample research has led to tools for policy specification, correctness analysis, and optimization, few researchers have paid attention to firewall policy deployment: the process where a management tool edits a firewall’s configuration to make it run the policies specified in the tool. In this paper, we provide the first formal definition and theoretical analysis of safety in firewall policy deployment. We show that naive deployment approaches can easily create a temporary security hole by permitting illegal traffic, or interrupt service by rejecting legal traffic during the deployment. We define safe and mostefficient deployments, and introduce the shuffling theorem as a formal basis for constructing deployment algorithms and proving their safety. We present efficient algorithms for constructing mostefficient deployments in popular policy editing languages. We show that in certain widelyinstalled policy editing languages, a safe deployment is not always possible. We also show how to leverage existing diff algorithms to guarantee a safe, mostefficient, and monotonic deployment in other editing languages. 1
Measuring the Accuracy of PageReading Systems
 PH.D. DISSERTATION, UNLV, LAS VEGAS
, 1996
"... Given a bitmapped image of a page from any document, a pagereading system identifies the characters on the page and stores them in a text file. This “OCRgenerated” text is represented by a string and compared with the correct string to determine the accuracy of this process. The string editing ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Given a bitmapped image of a page from any document, a pagereading system identifies the characters on the page and stores them in a text file. This “OCRgenerated” text is represented by a string and compared with the correct string to determine the accuracy of this process. The string editing problem is applied to find an optimal correspondence of these strings using an appropriate cost function. The ISRI annual test of pagereading systems utilizes the following performance measures, which are defined in terms of this correspondence and the string edit distance: character accuracy, throughput, accuracy by character class, marked character efficiency, word accuracy, nonstopword accuracy, and phrase accuracy. It is shown that the universe of cost functions is divided into equivalence classes, and the cost functions related to the longest common subsequence (LCS) are identified. The computation of a LCS can be made faster by a lineartime preprocessing step.
BitParallel LCSlength Computation Revisited
 In Proc. 15th Australasian Workshop on Combinatorial Algorithms (AWOCA
, 2004
"... The longest common subsequence (LCS) is a classic and wellstudied measure of similarity between two strings A and B. This problem has two variants: determining the length of the LCS (LLCS), and recovering an LCS itself. In this paper we address the first of these two. Let m and n denote the leng ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
The longest common subsequence (LCS) is a classic and wellstudied measure of similarity between two strings A and B. This problem has two variants: determining the length of the LCS (LLCS), and recovering an LCS itself. In this paper we address the first of these two. Let m and n denote the lengths of the strings A and B, respectively, and w denote the computer word size. First we give a slightly improved formula for the bitparallel O(#m/w#n) LLCS algorithm of Crochemore et al. [4]. Then we discuss the relative performance of the bitparallel algorithms and compare our variant against one of the best conventional LLCS algorithms. Finally we propose and evaluate an O(#d/w#n) version of the algorithm, where d is the simple (indel) edit distance between A and B.