Results 1 
9 of
9
Differential Compression: A Generalized Solution For Binary Files
, 1996
"... Differential Compression: A Generalized Solution for Binary Files by Randal C. Burns This work presents the development and analysis of a family of algorithms for generating differentially compressed output from binary sources. The algorithms all perform the same fundamental task: given two versi ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
Differential Compression: A Generalized Solution for Binary Files by Randal C. Burns This work presents the development and analysis of a family of algorithms for generating differentially compressed output from binary sources. The algorithms all perform the same fundamental task: given two versions of the same data as input streams, generate and output a compact encoding of one of the input streams by representing it as a set of changes with respect to the other input stream. Differential compression provides a computationally efficient compression technique for applications that generate versioned data and we often expect differencing to produce a significantly more compact file than more traditional compression techniques. The greedy algorithm for file differencing is presented and this algorithm is proven to produce the optimally compressed differential output. However, this algorithm requires execution time quadratic in the size of the input files. We next present an algorithm...
Bounding the Expected Length of Longest Common Subsequences and Forests
 Proc. of WSP'96
, 1999
"... . We present two techniques to find lower and upper bounds for the expected length of longest common subsequences and forests of two random sequences of the same length, over a fixed size, uniformly distributed alphabet. We emphasize the power of the methods used, which are Markov chains and Kolmogo ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
. We present two techniques to find lower and upper bounds for the expected length of longest common subsequences and forests of two random sequences of the same length, over a fixed size, uniformly distributed alphabet. We emphasize the power of the methods used, which are Markov chains and Kolmogorov complexity. As a corollary, we obtain some new lower and upper bounds for the problems mentioned. 1 Introduction The longest common subsequence (LCS) of two strings is one of the main problems in combinatorial pattern matching. The LCS problem is related to DNA or protein alignments, file comparison, speech recognition, etc. We say that x is a subsequence of u if we can obtain x by deleting zero or more characters of u. The LCS of two strings u and v of length n is defined as the longest subsequence x common to u and v. For example, the LCS of longest and large is lge. An open problem related to the LCS is its expected length for two random strings of length n over a uniformly distrib...
Serial Computations of Levenshtein Distances
, 1997
"... sequence (LCS) of those strings. If D is the simple Levenshtein distance between two strings having lengths m and n, SES is the length of the shortest edit sequence between the strings, and L is the length of an LCS of the strings, then SES = D and L = (m + n 0D)=2. We will focus on the problem of ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
sequence (LCS) of those strings. If D is the simple Levenshtein distance between two strings having lengths m and n, SES is the length of the shortest edit sequence between the strings, and L is the length of an LCS of the strings, then SES = D and L = (m + n 0D)=2. We will focus on the problem of determining the length of an LCS and also on the related problem of recovering an LCS. Another related problem, which will be discussed in Chapter 7, is that of approximate string matching, in which it is desired to locate all positions within string y which begin an approximation to string x containing at most D errors (insertions or deletions). 124 SERIAL COMPUTATIONS OF LEVENSHTEIN DISTANCES procedure CLASSIC( x,<
Experimenting an Approximation Algorithm for the LCS
 Discrete Applied Mathematics
, 1998
"... The problem of finding the longest common subsequence (lcs) of a given set of sequences over an alphabet # occurs in many interesting contexts, such as data compression and molecular biology, in order to measure the "similarity degree" among biological sequences. Since the problem is NPcomplete in ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
The problem of finding the longest common subsequence (lcs) of a given set of sequences over an alphabet # occurs in many interesting contexts, such as data compression and molecular biology, in order to measure the "similarity degree" among biological sequences. Since the problem is NPcomplete in its decision version (i.e. does there exist a lcs of length at least k, for a given k?) even over fixed alphabet, polynomial algorithms which give approximate solutions have been proposed. Among them, Long Run (LR) is the only one with guaranteed constant performance ratio.
BitParallel LCSlength Computation Revisited
 In Proc. 15th Australasian Workshop on Combinatorial Algorithms (AWOCA
, 2004
"... The longest common subsequence (LCS) is a classic and wellstudied measure of similarity between two strings A and B. This problem has two variants: determining the length of the LCS (LLCS), and recovering an LCS itself. In this paper we address the first of these two. Let m and n denote the leng ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
The longest common subsequence (LCS) is a classic and wellstudied measure of similarity between two strings A and B. This problem has two variants: determining the length of the LCS (LLCS), and recovering an LCS itself. In this paper we address the first of these two. Let m and n denote the lengths of the strings A and B, respectively, and w denote the computer word size. First we give a slightly improved formula for the bitparallel O(#m/w#n) LLCS algorithm of Crochemore et al. [4]. Then we discuss the relative performance of the bitparallel algorithms and compare our variant against one of the best conventional LLCS algorithms. Finally we propose and evaluate an O(#d/w#n) version of the algorithm, where d is the simple (indel) edit distance between A and B.
A Syntactic Approach for Searching Similarities within Sentences
, 2002
"... Textual data is the main electronic form of knowledge representation. Sentences, meant as logic units of meaningful word sequences, can be considered its backbone. In this paper, we propose a solution based on a purely syntactic approach for searching similarities within sentences, sequence matchi ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Textual data is the main electronic form of knowledge representation. Sentences, meant as logic units of meaningful word sequences, can be considered its backbone. In this paper, we propose a solution based on a purely syntactic approach for searching similarities within sentences, sequence matching. This process being very time consuming, e#ciency in retrieving the most similar parts available in large repositories of textual data is ensured by making use of new filtering techniques. As far as the design of the system is concerned, we chose a solution that allows us to deploy approximate sub matching without changing the underlying database.
A New Practical Linear Space Algorithm for the Longest Common Subsequence Problem
"... This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths m and n, m, on an alphabet of size s, we first present an algorithm which determines the length p of an LCS in O(ns + min{mp, p(n p)}) time and O(ns) space. ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths m and n, m, on an alphabet of size s, we first present an algorithm which determines the length p of an LCS in O(ns + min{mp, p(n p)}) time and O(ns) space.
String comparison by transposition networks
, 903
"... Abstract. Computing string or sequence alignments is a classical method of comparing strings and has applications in many areas of computing, such as signal processing and bioinformatics. Semilocal string alignment is a recent generalisation of this method, in which the alignment of a given string ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. Computing string or sequence alignments is a classical method of comparing strings and has applications in many areas of computing, such as signal processing and bioinformatics. Semilocal string alignment is a recent generalisation of this method, in which the alignment of a given string and all substrings of another string are computed simultaneously at no additional asymptotic cost. In this paper, we show that there is a close connection between semilocal string alignment and a certain class of traditional comparison networks known as transposition networks. The transposition network approach can be used to represent different string comparison algorithms in a unified form, and in some cases provides generalisations or improvements on existing algorithms. This approach allows us to obtain new algorithms for sparse semilocal string comparison and for comparison of highly similar and highly dissimilar strings, as well as of runlength compressed strings. We conclude that the transposition network method is a very general and flexible way of understanding and improving different string comparison algorithms, as well as their efficient implementation. 1