Results 1  10
of
12
Longest Common Subsequences
 In Proc. of 19th MFCS, number 841 in LNCS
, 1994
"... . The length of a longest common subsequence (LLCS) of two or more strings is a useful measure of their similarity. The LLCS of a pair of strings is related to the `edit distance', or number of mutations /errors/editing steps required in passing from one string to the other. In this talk, we explore ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
. The length of a longest common subsequence (LLCS) of two or more strings is a useful measure of their similarity. The LLCS of a pair of strings is related to the `edit distance', or number of mutations /errors/editing steps required in passing from one string to the other. In this talk, we explore some of the combinatorial properties of the suband supersequence relations, survey various algorithms for computing the LLCS, and introduce some results on the expected LLCS for pairs of random strings. 1 Introduction The set \Sigma of finite strings over an unordered finite alphabet \Sigma admits of several natural partial orders. Some, such as the substring, prefix, and suffix relations, depend on contiguity and lead to many interesting combinatorial questions with practical applications to stringmatching. An excellent survey is given by Aho in [1]. In this talk however we will focus on the `subsequence' partial order. We say that u = u 1 \Delta \Delta \Delta um is a subsequence of ...
Expected Length of Longest Common Subsequences
"... Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest c ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest common subsequences : : : : : : : 14 3 Lower Bounds 20 3.1 Css machines : : : : : : : : : : : : : : : : : : : : : : : : : : : 20 3.2 Analysis of css machines : : : : : : : : : : : : : : : : : : : : : 26 3.3 Design of css machines : : : : : : : : : : : : : : : : : : : : : : 31 3.4 Labeled css machines : : : : : : : : : : : : : : : : : : : : : : : 38 4 Upper bounds 45 4.1 Collations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45 4.2 Previous upper bounds : : : : : : : : : : : : : : : : : : : : : : 51 4.3 Simple upper bound (binary alphabet) : : : : : : : : : : : : : 55 4.4 Simple upper bound (alphabet size 3) : : : : : : : : : : : : : : 59 4.5 Upper bounds for binary alphabet : :
Cacheoblivious dynamic programming
 In Proc. of the Seventeenth Annual ACMSIAM Symposium on Discrete Algorithms, SODA ’06
, 2006
"... We present efficient cacheoblivious algorithms for several fundamental dynamic programs. These include new algorithms with improved cache performance for longest common subsequence (LCS), edit distance, gap (i.e., edit distance with gaps), and least weight subsequence. We present a new cacheoblivi ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
We present efficient cacheoblivious algorithms for several fundamental dynamic programs. These include new algorithms with improved cache performance for longest common subsequence (LCS), edit distance, gap (i.e., edit distance with gaps), and least weight subsequence. We present a new cacheoblivious framework called the Gaussian Elimination Paradigm (GEP) for Gaussian elimination without pivoting that also gives cacheoblivious algorithms for FloydWarshall allpairs shortest paths in graphs and ‘simple DP’, among other problems. 1
OCRSpell: an interactive spelling correction system for OCR errors in text
 International Journal of Document Analysis and Recognition
, 2001
"... Abstract. In this paper, we describe a spelling correction system designed specifically for OCRgenerated text that selects candidate words through the use of information gathered from multiple knowledge sources. This system for text correction is based on static and dynamic device mappings, approxi ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
Abstract. In this paper, we describe a spelling correction system designed specifically for OCRgenerated text that selects candidate words through the use of information gathered from multiple knowledge sources. This system for text correction is based on static and dynamic device mappings, approximate string matching, and ngram analysis. Our statistically based, Bayesian system incorporates a learning feature that collects confusion information at the collection and document levels. An evaluation of the new system is presented as well. Key words: OCRSpell checkers – Information retrieval – Error correction – Scanning 1
Differential Compression: A Generalized Solution For Binary Files
, 1996
"... Differential Compression: A Generalized Solution for Binary Files by Randal C. Burns This work presents the development and analysis of a family of algorithms for generating differentially compressed output from binary sources. The algorithms all perform the same fundamental task: given two versi ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
Differential Compression: A Generalized Solution for Binary Files by Randal C. Burns This work presents the development and analysis of a family of algorithms for generating differentially compressed output from binary sources. The algorithms all perform the same fundamental task: given two versions of the same data as input streams, generate and output a compact encoding of one of the input streams by representing it as a set of changes with respect to the other input stream. Differential compression provides a computationally efficient compression technique for applications that generate versioned data and we often expect differencing to produce a significantly more compact file than more traditional compression techniques. The greedy algorithm for file differencing is presented and this algorithm is proven to produce the optimally compressed differential output. However, this algorithm requires execution time quadratic in the size of the input files. We next present an algorithm...
Measuring the Accuracy of PageReading Systems
 PH.D. DISSERTATION, UNLV, LAS VEGAS
, 1996
"... Given a bitmapped image of a page from any document, a pagereading system identifies the characters on the page and stores them in a text file. This “OCRgenerated” text is represented by a string and compared with the correct string to determine the accuracy of this process. The string editing ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Given a bitmapped image of a page from any document, a pagereading system identifies the characters on the page and stores them in a text file. This “OCRgenerated” text is represented by a string and compared with the correct string to determine the accuracy of this process. The string editing problem is applied to find an optimal correspondence of these strings using an appropriate cost function. The ISRI annual test of pagereading systems utilizes the following performance measures, which are defined in terms of this correspondence and the string edit distance: character accuracy, throughput, accuracy by character class, marked character efficiency, word accuracy, nonstopword accuracy, and phrase accuracy. It is shown that the universe of cost functions is divided into equivalence classes, and the cost functions related to the longest common subsequence (LCS) are identified. The computation of a LCS can be made faster by a lineartime preprocessing step.
New Algorithms for the Longest Common Subsequence Problem
, 1994
"... Given two sequences A = a 1 a 2 : : : am and B = b 1 b 2 : : : b n , m n, over some alphabet \Sigma, a common subsequence C = c 1 c 2 : : : c l of A and B is a sequence that can be obtained from both A and B by deleting zero or more (not necessarily adjacent) symbols. Finding a common subsequenc ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Given two sequences A = a 1 a 2 : : : am and B = b 1 b 2 : : : b n , m n, over some alphabet \Sigma, a common subsequence C = c 1 c 2 : : : c l of A and B is a sequence that can be obtained from both A and B by deleting zero or more (not necessarily adjacent) symbols. Finding a common subsequence of maximal length is called the Longest CommonSubsequence (LCS) Problem. Two new algorithms based on the wellknown paradigm of computing minimal matches are presented. One runs in time O(ns+minfds; pmg) and the other runs in time O(ns +minfp(n \Gamma p); pmg) where s = j\Sigmaj is the alphabet size, p is the length of a longest common subsequence and d is the number of minimal matches. The ns term is charged by a standard preprocessing phase. When m n both algorithms are fast in situations when a LCS is expected to be short as well as in situations when a LCS is expected to be long. Further they show a much smaller degeneration in intermediate situations, especially the second al...
Bounds on the number of longest common subsequences
"... This paper performs the analysis necessary to bound the running time of known, efficient algorithms for generating all longest common subsequences. That is, we bound the running time as a function of input size for algorithms with time essentially proportional to the output size. This paper consider ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This paper performs the analysis necessary to bound the running time of known, efficient algorithms for generating all longest common subsequences. That is, we bound the running time as a function of input size for algorithms with time essentially proportional to the output size. This paper considers both the case of computing all distinct LCSs and the case of computing all LCS embeddings. Also included is an analysis of how much better the efficient algorithms are than the standard method of generating LCS embeddings. A full analysis is carried out with running times measured as a function of the total number of input characters, and much of the analysis is also provided for cases in which the two input sequences are of the same specified length or of two independently specified lengths.
Fast and simple computation of all longest common subsequences
 Eprint arXiv:cs.DS/0211001, Comp. Sci. Res. Repository
, 2002
"... This paper shows that a simple algorithm produces the allprefixesLCSsgraph in O(mn) time for two input sequences of size m and n. Given any prefix p of the first input sequence and any prefix q of the second input sequence, all longest common subsequences (LCSs) of p and q can be generated in tim ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
This paper shows that a simple algorithm produces the allprefixesLCSsgraph in O(mn) time for two input sequences of size m and n. Given any prefix p of the first input sequence and any prefix q of the second input sequence, all longest common subsequences (LCSs) of p and q can be generated in time proportional to the output size, once the allprefixesLCSsgraph has been constructed. The problem can be solved in the context of generating all the distinct character strings that represent an LCS or in the context of generating all ways of embedding an LCS in the two input strings.
A New Practical Linear Space Algorithm for the Longest Common Subsequence Problem
"... This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths m and n, m, on an alphabet of size s, we first present an algorithm which determines the length p of an LCS in O(ns + min{mp, p(n p)}) time and O(ns) space. ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths m and n, m, on an alphabet of size s, we first present an algorithm which determines the length p of an LCS in O(ns + min{mp, p(n p)}) time and O(ns) space.