Results 11 - 20
of
393
Identifying the Semantic and Textual Differences Between Two Versions of a Program
- Proceedings of the ACM SIGPLAN 90 Conference on Programming Language Design and Implementation
, 1990
"... Text-based file comparators (e.g., the Unix utility diff), are very general tools that can be applied to arbitrary files. However, using such tools to compare programs can be unsatisfactory because their only notion of change is based on program text rather than program behavior. This paper describe ..."
Abstract
-
Cited by 86 (5 self)
- Add to MetaCart
Text-based file comparators (e.g., the Unix utility diff), are very general tools that can be applied to arbitrary files. However, using such tools to compare programs can be unsatisfactory because their only notion of change is based on program text rather than program behavior. This paper describes a technique for comparing two versions of a program, determining which program components represent changes, and classifying each changed component as representing either a semantic or a textual change. ######################## This work was supported in part by the Defense Advanced Research Projects Agency, monitored by the Office of Naval Research under contract N00014-88-K, by the National Science Foundation under grant CCR8958530, and by grants from Xerox, Kodak, and Cray. Author's address: Computer Sciences Department, Univ. of Wisconsin, 1210 W. Dayton St., Madison, WI 53706. Permission to copy without fee all or part of this material is granted provided that the copies are not made...
B.B.: On aligning curves
- IEEE TPAMI
, 2003
"... Abstract—We present a novel approach to finding a correspondence (alignment) between two curves. The correspondence is based on a notion of an alignment curve which treats both curves symmetrically. We then define a similarity metric based on the alignment curve using two intrinsic properties of the ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
Abstract—We present a novel approach to finding a correspondence (alignment) between two curves. The correspondence is based on a notion of an alignment curve which treats both curves symmetrically. We then define a similarity metric based on the alignment curve using two intrinsic properties of the curve, namely, length and curvature. The optimal correspondence is found by an efficient dynamic-programming method both for aligning pairs of curve segments and pairs of closed curves, and is effective in the presence of a variety of transformations of the curve. Finally, the correspondence is shown in application to handwritten character recognition, prototype formation, and object recognition, and is potentially useful in other applications such as registration and tracking. Index Terms—Curve alignment, recognition, dynamic programming, prototypes, correspondence.
On the Editing Distance between Undirected Acyclic Graphs
, 1995
"... We consider the problem of comparing CUAL graphs (Connected, Undirected, Acyclic graphs with nodes being Labeled). This problem is motivated by the study of information retrieval for bio-chemical and molecular databases. Suppose we define the distance between two CUAL graphs G1 and G2 to be the weig ..."
Abstract
-
Cited by 67 (6 self)
- Add to MetaCart
We consider the problem of comparing CUAL graphs (Connected, Undirected, Acyclic graphs with nodes being Labeled). This problem is motivated by the study of information retrieval for bio-chemical and molecular databases. Suppose we define the distance between two CUAL graphs G1 and G2 to be the weighted number of edit operations (insert node, delete node and relabel node) to transform G1 to G2. By reduction from exact cover by 3-sets, one can show that finding the distance between two CUAL graphs is NP-complete. In view of the hardness of the problem, we propose a constrained distance metric, called the degree-2 distance, by requiring that any node to be inserted (deleted) have no more than 2 neighbors. With this metric, we present an efficient algorithm to solve the problem. The algorithm runs in time O(N_1 N_2 D²) for general weighting edit operations and in time O(N_1 N_2 D √D log D) for integral weighting edit operations, where N_i, i = 1, 2, is the number of nodes in G_i, D = min{d_1, d_2} and d_i is the maximum degree of G_i.
Computing the edit-distance between unrooted ordered trees
- In Proceedings of the 6th annual European Symposium on Algorithms (ESA
, 1998
"... Abstract. An ordered tree is a tree in which each node’s incident edges are cyclically ordered; think of the tree as being embedded in the plane. Let A and B be two ordered trees. The edit distance between A and B is the minimum cost of a sequence of operations (contract an edge, uncontract an edge, ..."
Abstract
-
Cited by 67 (0 self)
- Add to MetaCart
Abstract. An ordered tree is a tree in which each node’s incident edges are cyclically ordered; think of the tree as being embedded in the plane. Let A and B be two ordered trees. The edit distance between A and B is the minimum cost of a sequence of operations (contract an edge, uncontract an edge, modify the label of an edge) needed to transform A into B. WegiveanO(n 3 log n) algorithm to compute the edit distance between two ordered trees. 1
Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons
- In Proceedings of the Third Workshop on Very Large Corpora
, 1995
"... This paper shows how to induce an N-best translation lexicon from a bilingual text corpus using statistical properties of the corpus together with four external knowledge sources. The knowledge sources are cast as filters, so that any subset of them can be cascaded in a uniform framework. A new o ..."
Abstract
-
Cited by 65 (20 self)
- Add to MetaCart
This paper shows how to induce an N-best translation lexicon from a bilingual text corpus using statistical properties of the corpus together with four external knowledge sources. The knowledge sources are cast as filters, so that any subset of them can be cascaded in a uniform framework. A new objective evaluation measure is used to compare the quality of lexicons induced with different filter cascades. The best filter cascades improve lexicon quality by up to 137% over the plain vanilla statistical method, and approach human performance. Drastically reducing the size of the training corpus has a much smaller impact on lexicon quality when these knowledge sources are used. This makes it practical to train on small hand-built corpora for language pairs where large bilingual corpora are unavailable. Moreover, three of the four filters prove useful even when used with large training corpora.
Identifying Syntactic Differences Between Two Programs
- Software - Practice and Experience
, 1991
"... this paper is organized into five sections, as follows. The internal form of a program, which is a variant of a parse tree, is discussed in the next section. Then the tree-matching algorithm and the synchronous pretty-printing technique are described. Experience with the comparator for the C languag ..."
Abstract
-
Cited by 64 (0 self)
- Add to MetaCart
this paper is organized into five sections, as follows. The internal form of a program, which is a variant of a parse tree, is discussed in the next section. Then the tree-matching algorithm and the synchronous pretty-printing technique are described. Experience with the comparator for the C language and some performance measurements are also presented. The last section discusses related work and concludes this paper
Approximate string matching over suffix trees
- PROCEEDINGS OF THE 4TH ANNUAL SYMPOSIUM ON COMBINATORIAL PATTERN MATCHING, NUMBER 684 IN LECTURE NOTES IN COMPUTER SCIENCE
, 1993
"... The classical approximate string-matching problem of finding the locations of approximate occurrences P 0 of pattern string P in text string T such that the edit distance between P and P 0 is k is considered. We concentrate on the special case in which T is available for preprocessing before the se ..."
Abstract
-
Cited by 53 (1 self)
- Add to MetaCart
The classical approximate string-matching problem of finding the locations of approximate occurrences P 0 of pattern string P in text string T such that the edit distance between P and P 0 is k is considered. We concentrate on the special case in which T is available for preprocessing before the searches with varying P and k. It is shown how the searches can be done fast using the suffix tree of T augmented with the suffix links as the preprocessed form of T and applying dynamic programming over the tree. Three variations of the search algorithm are developed with running times O(mq + n), O(mq log q + size of the output), and O(m
A file comparison program
- Software: Practice and Experience
, 1985
"... This paper presents a simple method for computing a shortest sequence of insertion and deletion commands that converts one given file to another. The method is particularly efficient when the difference between the two files is small compared to the files ' lengths. In experiments performed on typic ..."
Abstract
-
Cited by 51 (3 self)
- Add to MetaCart
This paper presents a simple method for computing a shortest sequence of insertion and deletion commands that converts one given file to another. The method is particularly efficient when the difference between the two files is small compared to the files ' lengths. In experiments performed on typical files, the program often ran four times faster than the UNIX diff command. KEY WORDS Edit distance Edit script Filc comparison
Comparing hierarchical data in external memory
- In 25th Very Large Data Base Conference (VLDB
, 1999
"... We present an external-memory algorithm for computing a minimum-cost edit script between two rooted, ordered, labeled trees. The I/O, RAM, and CPU costs of our algorithm are, respectively, 4mn+7m+5n, 6S, andO(MN+(M+N)S1:5), whereMandNare the input tree sizes,Sis the block size,m=M=S, andn=N=S. This ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
We present an external-memory algorithm for computing a minimum-cost edit script between two rooted, ordered, labeled trees. The I/O, RAM, and CPU costs of our algorithm are, respectively, 4mn+7m+5n, 6S, andO(MN+(M+N)S1:5), whereMandNare the input tree sizes,Sis the block size,m=M=S, andn=N=S. This algorithm can make effective use of surplus RAM capacity to quadratically reduce I/O cost. We extend to trees the commonly used mapping from sequence comparison problems to shortest-path problems in edit graphs. 1
Block Edit Models for Approximate String Matching
- Theoretical Computer Science
, 1997
"... In this paper we examine string block edit distance, in which two strings A and B are compared by extracting collections of substrings and placing them into correspondence. This model accounts for certain phenomena encountered in important real-world applications, including pen computing and molecu ..."
Abstract
-
Cited by 44 (4 self)
- Add to MetaCart
In this paper we examine string block edit distance, in which two strings A and B are compared by extracting collections of substrings and placing them into correspondence. This model accounts for certain phenomena encountered in important real-world applications, including pen computing and molecular biology. The basic problem admits a family of variations depending on whether the strings must be matched in their entireties, and whether overlap is permitted. We show that several variants are NPcomplete, and give polynomial-time algorithms for solving the remainder. Keywords: block edit distance, approximate string matching, sequence comparison, approximate ink matching, dynamic programming. 1 Introduction The edit distance model for string comparison [Lev66, NW70, WF74] has found widespread application in fields ranging from molecular biology to bird song classification [SK83]. A great deal of research has been devoted to this area, and numerous algorithms have been proposed for com...

