Results 1 -
7 of
7
Longest Common Subsequences
- In Proc. of 19th MFCS, number 841 in LNCS
, 1994
"... . The length of a longest common subsequence (LLCS) of two or more strings is a useful measure of their similarity. The LLCS of a pair of strings is related to the `edit distance', or number of mutations /errors/editing steps required in passing from one string to the other. In this talk, we explore ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
. The length of a longest common subsequence (LLCS) of two or more strings is a useful measure of their similarity. The LLCS of a pair of strings is related to the `edit distance', or number of mutations /errors/editing steps required in passing from one string to the other. In this talk, we explore some of the combinatorial properties of the suband super-sequence relations, survey various algorithms for computing the LLCS, and introduce some results on the expected LLCS for pairs of random strings. 1 Introduction The set \Sigma of finite strings over an unordered finite alphabet \Sigma admits of several natural partial orders. Some, such as the substring, prefix, and suffix relations, depend on contiguity and lead to many interesting combinatorial questions with practical applications to string-matching. An excellent survey is given by Aho in [1]. In this talk however we will focus on the `subsequence' partial order. We say that u = u 1 \Delta \Delta \Delta um is a subsequence of ...
Expected Length of Longest Common Subsequences
"... Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest c ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest common subsequences : : : : : : : 14 3 Lower Bounds 20 3.1 Css machines : : : : : : : : : : : : : : : : : : : : : : : : : : : 20 3.2 Analysis of css machines : : : : : : : : : : : : : : : : : : : : : 26 3.3 Design of css machines : : : : : : : : : : : : : : : : : : : : : : 31 3.4 Labeled css machines : : : : : : : : : : : : : : : : : : : : : : : 38 4 Upper bounds 45 4.1 Collations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45 4.2 Previous upper bounds : : : : : : : : : : : : : : : : : : : : : : 51 4.3 Simple upper bound (binary alphabet) : : : : : : : : : : : : : 55 4.4 Simple upper bound (alphabet size 3) : : : : : : : : : : : : : : 59 4.5 Upper bounds for binary alphabet : :
Serial Computations of Levenshtein Distances
, 1997
"... sequence (LCS) of those strings. If D is the simple Levenshtein distance between two strings having lengths m and n, SES is the length of the shortest edit sequence between the strings, and L is the length of an LCS of the strings, then SES = D and L = (m + n 0D)=2. We will focus on the problem of ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
sequence (LCS) of those strings. If D is the simple Levenshtein distance between two strings having lengths m and n, SES is the length of the shortest edit sequence between the strings, and L is the length of an LCS of the strings, then SES = D and L = (m + n 0D)=2. We will focus on the problem of determining the length of an LCS and also on the related problem of recovering an LCS. Another related problem, which will be discussed in Chapter 7, is that of approximate string matching, in which it is desired to locate all positions within string y which begin an approximation to string x containing at most D errors (insertions or deletions). 124 SERIAL COMPUTATIONS OF LEVENSHTEIN DISTANCES procedure CLASSIC( x,<
Measuring the Accuracy of Page-Reading Systems
- PH.D. DISSERTATION, UNLV, LAS VEGAS
, 1996
"... Given a bitmapped image of a page from any document, a page-reading system identifies the characters on the page and stores them in a text file. This “OCR-generated” text is represented by a string and com-pared with the correct string to determine the accuracy of this process. The string editing ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Given a bitmapped image of a page from any document, a page-reading system identifies the characters on the page and stores them in a text file. This “OCR-generated” text is represented by a string and com-pared with the correct string to determine the accuracy of this process. The string editing problem is applied to find an optimal correspondence of these strings using an appropriate cost function. The ISRI annual test of page-reading systems utilizes the following performance measures, which are defined in terms of this correspondence and the string edit distance: character accuracy, throughput, accuracy by character class, marked char-acter efficiency, word accuracy, non-stopword accuracy, and phrase accu-racy. It is shown that the universe of cost functions is divided into equivalence classes, and the cost functions related to the longest common subsequence (LCS) are identified. The computation of a LCS can be made faster by a linear-time preprocessing step.
Sequence Comparison: Some Theory and Some Practice
, 1988
"... A brief survey of the theory and practice of sequence comparison is made focusing on diff, the UNIX 1 file difference utility. 1 Sequence comparison Sequence comparison is a deep and fascinating subject in Computer Science, both theoretical and practical. However, in our opinion, neither the theo ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
A brief survey of the theory and practice of sequence comparison is made focusing on diff, the UNIX 1 file difference utility. 1 Sequence comparison Sequence comparison is a deep and fascinating subject in Computer Science, both theoretical and practical. However, in our opinion, neither the theoretical nor the practical aspects of the problem are well understood and we feel that their mastery is a true challenge for Computer Science. The central problem can be stated very easily: find an algorithm, as efficient and practical as possible, to compute a longest common subsequence (lcs for short) of two given sequences 2 . As usual, a subsequence of a sequence is another sequence obtained from it by deleting some (not necessarily contiguous) terms. Thus, both en/pri and en/pai are longest common subsequences of sequence/comparison and theory/and/practice. Part of this work was done while the author was visiting the Universit'e de Rouen, in 1987. That visit was partially supported...
New Algorithms for the Longest Common Subsequence Problem
, 1994
"... Given two sequences A = a 1 a 2 : : : am and B = b 1 b 2 : : : b n , m n, over some alphabet \Sigma, a common subsequence C = c 1 c 2 : : : c l of A and B is a sequence that can be obtained from both A and B by deleting zero or more (not necessarily adjacent) symbols. Finding a common subsequenc ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Given two sequences A = a 1 a 2 : : : am and B = b 1 b 2 : : : b n , m n, over some alphabet \Sigma, a common subsequence C = c 1 c 2 : : : c l of A and B is a sequence that can be obtained from both A and B by deleting zero or more (not necessarily adjacent) symbols. Finding a common subsequence of maximal length is called the Longest CommonSubsequence (LCS) Problem. Two new algorithms based on the well-known paradigm of computing minimal matches are presented. One runs in time O(ns+minfds; pmg) and the other runs in time O(ns +minfp(n \Gamma p); pmg) where s = j\Sigmaj is the alphabet size, p is the length of a longest common subsequence and d is the number of minimal matches. The ns term is charged by a standard preprocessing phase. When m n both algorithms are fast in situations when a LCS is expected to be short as well as in situations when a LCS is expected to be long. Further they show a much smaller degeneration in intermediate situations, especially the second al...
Speeding-up Hirschberg and Hunt-Szymanski LCS Algorithms
, 2003
"... Two algorithms are presented that solve the problem of recovering the longest common subsequence of two strings. The first algorithm is an improvement of Hirschberg’s divide-and-conquer algorithm. The second algorithm is an improvement of Hunt-Szymanski algorithm based on an efficient computation of ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Two algorithms are presented that solve the problem of recovering the longest common subsequence of two strings. The first algorithm is an improvement of Hirschberg’s divide-and-conquer algorithm. The second algorithm is an improvement of Hunt-Szymanski algorithm based on an efficient computation of all dominant match points. These two algorithms use bit-vector operations and are shown to work very efficiently in practice.

