Results 1  10
of
10
Longest Common Subsequences
 In Proc. of 19th MFCS, number 841 in LNCS
, 1994
"... . The length of a longest common subsequence (LLCS) of two or more strings is a useful measure of their similarity. The LLCS of a pair of strings is related to the `edit distance', or number of mutations /errors/editing steps required in passing from one string to the other. In this talk, we explore ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
. The length of a longest common subsequence (LLCS) of two or more strings is a useful measure of their similarity. The LLCS of a pair of strings is related to the `edit distance', or number of mutations /errors/editing steps required in passing from one string to the other. In this talk, we explore some of the combinatorial properties of the suband supersequence relations, survey various algorithms for computing the LLCS, and introduce some results on the expected LLCS for pairs of random strings. 1 Introduction The set \Sigma of finite strings over an unordered finite alphabet \Sigma admits of several natural partial orders. Some, such as the substring, prefix, and suffix relations, depend on contiguity and lead to many interesting combinatorial questions with practical applications to stringmatching. An excellent survey is given by Aho in [1]. In this talk however we will focus on the `subsequence' partial order. We say that u = u 1 \Delta \Delta \Delta um is a subsequence of ...
Expected Length of Longest Common Subsequences
"... Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest c ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest common subsequences : : : : : : : 14 3 Lower Bounds 20 3.1 Css machines : : : : : : : : : : : : : : : : : : : : : : : : : : : 20 3.2 Analysis of css machines : : : : : : : : : : : : : : : : : : : : : 26 3.3 Design of css machines : : : : : : : : : : : : : : : : : : : : : : 31 3.4 Labeled css machines : : : : : : : : : : : : : : : : : : : : : : : 38 4 Upper bounds 45 4.1 Collations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45 4.2 Previous upper bounds : : : : : : : : : : : : : : : : : : : : : : 51 4.3 Simple upper bound (binary alphabet) : : : : : : : : : : : : : 55 4.4 Simple upper bound (alphabet size 3) : : : : : : : : : : : : : : 59 4.5 Upper bounds for binary alphabet : :
Serial Computations of Levenshtein Distances
, 1997
"... sequence (LCS) of those strings. If D is the simple Levenshtein distance between two strings having lengths m and n, SES is the length of the shortest edit sequence between the strings, and L is the length of an LCS of the strings, then SES = D and L = (m + n 0D)=2. We will focus on the problem of ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
sequence (LCS) of those strings. If D is the simple Levenshtein distance between two strings having lengths m and n, SES is the length of the shortest edit sequence between the strings, and L is the length of an LCS of the strings, then SES = D and L = (m + n 0D)=2. We will focus on the problem of determining the length of an LCS and also on the related problem of recovering an LCS. Another related problem, which will be discussed in Chapter 7, is that of approximate string matching, in which it is desired to locate all positions within string y which begin an approximation to string x containing at most D errors (insertions or deletions). 124 SERIAL COMPUTATIONS OF LEVENSHTEIN DISTANCES procedure CLASSIC( x,<
Measuring the Accuracy of PageReading Systems
 PH.D. DISSERTATION, UNLV, LAS VEGAS
, 1996
"... Given a bitmapped image of a page from any document, a pagereading system identifies the characters on the page and stores them in a text file. This “OCRgenerated” text is represented by a string and compared with the correct string to determine the accuracy of this process. The string editing ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Given a bitmapped image of a page from any document, a pagereading system identifies the characters on the page and stores them in a text file. This “OCRgenerated” text is represented by a string and compared with the correct string to determine the accuracy of this process. The string editing problem is applied to find an optimal correspondence of these strings using an appropriate cost function. The ISRI annual test of pagereading systems utilizes the following performance measures, which are defined in terms of this correspondence and the string edit distance: character accuracy, throughput, accuracy by character class, marked character efficiency, word accuracy, nonstopword accuracy, and phrase accuracy. It is shown that the universe of cost functions is divided into equivalence classes, and the cost functions related to the longest common subsequence (LCS) are identified. The computation of a LCS can be made faster by a lineartime preprocessing step.
Sequence Comparison: Some Theory and Some Practice
, 1988
"... A brief survey of the theory and practice of sequence comparison is made focusing on diff, the UNIX 1 file difference utility. 1 Sequence comparison Sequence comparison is a deep and fascinating subject in Computer Science, both theoretical and practical. However, in our opinion, neither the theo ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
A brief survey of the theory and practice of sequence comparison is made focusing on diff, the UNIX 1 file difference utility. 1 Sequence comparison Sequence comparison is a deep and fascinating subject in Computer Science, both theoretical and practical. However, in our opinion, neither the theoretical nor the practical aspects of the problem are well understood and we feel that their mastery is a true challenge for Computer Science. The central problem can be stated very easily: find an algorithm, as efficient and practical as possible, to compute a longest common subsequence (lcs for short) of two given sequences 2 . As usual, a subsequence of a sequence is another sequence obtained from it by deleting some (not necessarily contiguous) terms. Thus, both en/pri and en/pai are longest common subsequences of sequence/comparison and theory/and/practice. Part of this work was done while the author was visiting the Universit'e de Rouen, in 1987. That visit was partially supported...
New Algorithms for the Longest Common Subsequence Problem
, 1994
"... Given two sequences A = a 1 a 2 : : : am and B = b 1 b 2 : : : b n , m n, over some alphabet \Sigma, a common subsequence C = c 1 c 2 : : : c l of A and B is a sequence that can be obtained from both A and B by deleting zero or more (not necessarily adjacent) symbols. Finding a common subsequenc ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Given two sequences A = a 1 a 2 : : : am and B = b 1 b 2 : : : b n , m n, over some alphabet \Sigma, a common subsequence C = c 1 c 2 : : : c l of A and B is a sequence that can be obtained from both A and B by deleting zero or more (not necessarily adjacent) symbols. Finding a common subsequence of maximal length is called the Longest CommonSubsequence (LCS) Problem. Two new algorithms based on the wellknown paradigm of computing minimal matches are presented. One runs in time O(ns+minfds; pmg) and the other runs in time O(ns +minfp(n \Gamma p); pmg) where s = j\Sigmaj is the alphabet size, p is the length of a longest common subsequence and d is the number of minimal matches. The ns term is charged by a standard preprocessing phase. When m n both algorithms are fast in situations when a LCS is expected to be short as well as in situations when a LCS is expected to be long. Further they show a much smaller degeneration in intermediate situations, especially the second al...
Speedingup Hirschberg and HuntSzymanski LCS Algorithms
, 2003
"... Two algorithms are presented that solve the problem of recovering the longest common subsequence of two strings. The first algorithm is an improvement of Hirschberg’s divideandconquer algorithm. The second algorithm is an improvement of HuntSzymanski algorithm based on an efficient computation of ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Two algorithms are presented that solve the problem of recovering the longest common subsequence of two strings. The first algorithm is an improvement of Hirschberg’s divideandconquer algorithm. The second algorithm is an improvement of HuntSzymanski algorithm based on an efficient computation of all dominant match points. These two algorithms use bitvector operations and are shown to work very efficiently in practice.
Bounds on the number of longest common subsequences
"... This paper performs the analysis necessary to bound the running time of known, efficient algorithms for generating all longest common subsequences. That is, we bound the running time as a function of input size for algorithms with time essentially proportional to the output size. This paper consider ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This paper performs the analysis necessary to bound the running time of known, efficient algorithms for generating all longest common subsequences. That is, we bound the running time as a function of input size for algorithms with time essentially proportional to the output size. This paper considers both the case of computing all distinct LCSs and the case of computing all LCS embeddings. Also included is an analysis of how much better the efficient algorithms are than the standard method of generating LCS embeddings. A full analysis is carried out with running times measured as a function of the total number of input characters, and much of the analysis is also provided for cases in which the two input sequences are of the same specified length or of two independently specified lengths.
Fast and simple computation of all longest common subsequences
 Eprint arXiv:cs.DS/0211001, Comp. Sci. Res. Repository
, 2002
"... This paper shows that a simple algorithm produces the allprefixesLCSsgraph in O(mn) time for two input sequences of size m and n. Given any prefix p of the first input sequence and any prefix q of the second input sequence, all longest common subsequences (LCSs) of p and q can be generated in tim ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
This paper shows that a simple algorithm produces the allprefixesLCSsgraph in O(mn) time for two input sequences of size m and n. Given any prefix p of the first input sequence and any prefix q of the second input sequence, all longest common subsequences (LCSs) of p and q can be generated in time proportional to the output size, once the allprefixesLCSsgraph has been constructed. The problem can be solved in the context of generating all the distinct character strings that represent an LCS or in the context of generating all ways of embedding an LCS in the two input strings.
A New Practical Linear Space Algorithm for the Longest Common Subsequence Problem
"... This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths m and n, m, on an alphabet of size s, we first present an algorithm which determines the length p of an LCS in O(ns + min{mp, p(n p)}) time and O(ns) space. ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths m and n, m, on an alphabet of size s, we first present an algorithm which determines the length p of an LCS in O(ns + min{mp, p(n p)}) time and O(ns) space.