Results 1  10
of
48
An O(ND) Difference Algorithm and Its Variations
 Algorithmica
, 1986
"... The problems of finding a longest common subsequence of two sequences A and B and a shortest edit script for transforming A into B have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a s ..."
Abstract

Cited by 155 (4 self)
 Add to MetaCart
The problems of finding a longest common subsequence of two sequences A and B and a shortest edit script for transforming A into B have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simple O(ND) time and space algorithm is developed where N is the sum of the lengths of A and B and D is the size of the minimum edit script for A and B. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to have O(N +D expectedtime performance under a basic stochastic model. A refinement of the algorithm requires only O(N) space, and the use of suffix trees leads to an O(NlgN +D ) time variation.
A survey on tree edit distance and related problems
 Theor. Comput. Sci
, 2005
"... We survey the problem of comparing labeled trees based on simple local operations of deleting, inserting, and relabeling nodes. These operations lead to the tree edit distance, alignment distance, and inclusion problem. For each problem we review the results available and present, in detail, one or ..."
Abstract

Cited by 112 (2 self)
 Add to MetaCart
We survey the problem of comparing labeled trees based on simple local operations of deleting, inserting, and relabeling nodes. These operations lead to the tree edit distance, alignment distance, and inclusion problem. For each problem we review the results available and present, in detail, one or more of the central algorithms for solving the problem. keywords tree matching, edit distance 1
A file comparison program
 Software: Practice and Experience
, 1985
"... This paper presents a simple method for computing a shortest sequence of insertion and deletion commands that converts one given file to another. The method is particularly efficient when the difference between the two files is small compared to the files ' lengths. In experiments performed on typic ..."
Abstract

Cited by 59 (3 self)
 Add to MetaCart
This paper presents a simple method for computing a shortest sequence of insertion and deletion commands that converts one given file to another. The method is particularly efficient when the difference between the two files is small compared to the files ' lengths. In experiments performed on typical files, the program often ran four times faster than the UNIX diff command. KEY WORDS Edit distance Edit script Filc comparison
Clickstream Clustering Using Weighted Longest Common Subsequences
 In Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining
, 2001
"... Categorizing visitors based on their interactions with a website is a key problem in web usage mining. The clickstreams generated by various users often follow distinct patterns, the knowledge of which may help in providing customized content. In this paper, we propose a novel and effective algorith ..."
Abstract

Cited by 53 (5 self)
 Add to MetaCart
Categorizing visitors based on their interactions with a website is a key problem in web usage mining. The clickstreams generated by various users often follow distinct patterns, the knowledge of which may help in providing customized content. In this paper, we propose a novel and effective algorithm for clustering webusers based on a function of the longest common subsequence of their clickstreams that takes into account both the trajectory taken through a website and the time spent at each page. Results are presented on weblogs of www.sulekha.com to illustrate the techniques.
Comparing hierarchical data in external memory
 In 25th Very Large Data Base Conference (VLDB
, 1999
"... We present an externalmemory algorithm for computing a minimumcost edit script between two rooted, ordered, labeled trees. The I/O, RAM, and CPU costs of our algorithm are, respectively, 4mn+7m+5n, 6S, andO(MN+(M+N)S1:5), whereMandNare the input tree sizes,Sis the block size,m=M=S, andn=N=S. This ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
We present an externalmemory algorithm for computing a minimumcost edit script between two rooted, ordered, labeled trees. The I/O, RAM, and CPU costs of our algorithm are, respectively, 4mn+7m+5n, 6S, andO(MN+(M+N)S1:5), whereMandNare the input tree sizes,Sis the block size,m=M=S, andn=N=S. This algorithm can make effective use of surplus RAM capacity to quadratically reduce I/O cost. We extend to trees the commonly used mapping from sequence comparison problems to shortestpath problems in edit graphs. 1
Secure and Private Sequence Comparisons
 In WPES’03: Proceedings of the 2003 ACM workshop on Privacy in the electronic society
, 2003
"... We give an e#cient protocol for sequence comparisons of the editdistance kind, such that neither party reveals anything about their private sequence to the other party (other than what can be inferred from the edit distance between their two sequences  which is unavoidable because computing that ..."
Abstract

Cited by 31 (7 self)
 Add to MetaCart
We give an e#cient protocol for sequence comparisons of the editdistance kind, such that neither party reveals anything about their private sequence to the other party (other than what can be inferred from the edit distance between their two sequences  which is unavoidable because computing that distance is the purpose of the protocol). The amount of communication done by our protocol is proportional to the time complexity of the bestknown algorithm for performing the sequence comparison.
Longest Common Subsequences
 In Proc. of 19th MFCS, number 841 in LNCS
, 1994
"... . The length of a longest common subsequence (LLCS) of two or more strings is a useful measure of their similarity. The LLCS of a pair of strings is related to the `edit distance', or number of mutations /errors/editing steps required in passing from one string to the other. In this talk, we explore ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
. The length of a longest common subsequence (LLCS) of two or more strings is a useful measure of their similarity. The LLCS of a pair of strings is related to the `edit distance', or number of mutations /errors/editing steps required in passing from one string to the other. In this talk, we explore some of the combinatorial properties of the suband supersequence relations, survey various algorithms for computing the LLCS, and introduce some results on the expected LLCS for pairs of random strings. 1 Introduction The set \Sigma of finite strings over an unordered finite alphabet \Sigma admits of several natural partial orders. Some, such as the substring, prefix, and suffix relations, depend on contiguity and lead to many interesting combinatorial questions with practical applications to stringmatching. An excellent survey is given by Aho in [1]. In this talk however we will focus on the `subsequence' partial order. We say that u = u 1 \Delta \Delta \Delta um is a subsequence of ...
On the Parameterized Complexity of the fixed Alphabet Shortest Common Supersequence and Longest Common Subsequence Problems
, 2003
"... INTRODUCTION The Shortest Common Supersequence (SCS) and the Longest Common Subsequence (LCS) are classical problems in computer science. Shortest Common Supersequence (SCS) integer . is a supersequence Longest Common Subsequence (LCS) integer . is a subsequence The LCS and (not so m ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
INTRODUCTION The Shortest Common Supersequence (SCS) and the Longest Common Subsequence (LCS) are classical problems in computer science. Shortest Common Supersequence (SCS) integer . is a supersequence Longest Common Subsequence (LCS) integer . is a subsequence The LCS and (not so much) the SCS problems have been extensively studied over the last 30 years (see [7] and references). They are both known to be NPcomplete [8, 9]. In particular the case where the number of sequences is 2 has been studied in detail (see [7] and references). A string a is a supersequence of a string b if we can delete some characters in a such that the remaining string is equal to b, e.g. \1234" is a supersequence of \13". A string a is a subsequence of a string b if b is a supersequence of a, e.g. \13" is a subsequence of \1234". 1.1. Sequence Comparison in Bioinformatics With the recent availability of large amounts of molecular sequence data, the LCS and related problems received
A Fast and Practical BitVector Algorithm for the Longest Common Subsequence Problem
 Information Processing Letters
, 2000
"... This paper presents a new practical bitvector algorithm for solving the well known Longest Common Subsequence (LCS) problem. Given two strings of length m and n, n m, we present an algorithm which determines the length p of an LCS in O(nm=w) time and O(m=w) space, where w is the number of bits in a ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
This paper presents a new practical bitvector algorithm for solving the well known Longest Common Subsequence (LCS) problem. Given two strings of length m and n, n m, we present an algorithm which determines the length p of an LCS in O(nm=w) time and O(m=w) space, where w is the number of bits in a machine word. This algorithm can be thought of as columnwise "parallelization" of the classical dynamic programming approach. Our algorithm is very efficiently in practice, where computing the length of an LCS of two strings can be done in linear time and constant (additional/working) space by assuming that m w.
Matching for RunLength Encoded Strings
, 1999
"... this paper, we develop significantly faster algorithms for a special class of strings which emerge frequently in pattern matching problems. A string S is runlength encoded if it is described as an ordered sequence of pairs (oe; i), each consisting of an alphabet symbol oe and an integer i. Each pai ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
this paper, we develop significantly faster algorithms for a special class of strings which emerge frequently in pattern matching problems. A string S is runlength encoded if it is described as an ordered sequence of pairs (oe; i), each consisting of an alphabet symbol oe and an integer i. Each pair corresponds to a run in S consisting of i consecutive occurrences of oe. For example, the string