Results 1 
8 of
8
Bounding the Expected Length of Longest Common Subsequences and Forests
 Proc. of WSP'96
, 1999
"... . We present two techniques to find lower and upper bounds for the expected length of longest common subsequences and forests of two random sequences of the same length, over a fixed size, uniformly distributed alphabet. We emphasize the power of the methods used, which are Markov chains and Kolmogo ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
. We present two techniques to find lower and upper bounds for the expected length of longest common subsequences and forests of two random sequences of the same length, over a fixed size, uniformly distributed alphabet. We emphasize the power of the methods used, which are Markov chains and Kolmogorov complexity. As a corollary, we obtain some new lower and upper bounds for the problems mentioned. 1 Introduction The longest common subsequence (LCS) of two strings is one of the main problems in combinatorial pattern matching. The LCS problem is related to DNA or protein alignments, file comparison, speech recognition, etc. We say that x is a subsequence of u if we can obtain x by deleting zero or more characters of u. The LCS of two strings u and v of length n is defined as the longest subsequence x common to u and v. For example, the LCS of longest and large is lge. An open problem related to the LCS is its expected length for two random strings of length n over a uniformly distrib...
A BSP/CGM algorithm for the allsubstrings longest common subsequence problem
 In Proceedings of the 17th IEEE/ACM IPDPS
, 2003
"... Given two strings X and Y of lengths m and n, respectively, the allsubstrings longest common subsequence (ALCS) problem obtains the lengths of the subsequences common to X and any substring of Y. The sequential algorithm takes O(mn) time and O(n) space. We present a parallel algorithm for ALCS on ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Given two strings X and Y of lengths m and n, respectively, the allsubstrings longest common subsequence (ALCS) problem obtains the lengths of the subsequences common to X and any substring of Y. The sequential algorithm takes O(mn) time and O(n) space. We present a parallel algorithm for ALCS on a coarsegrained multicomputer (BSP/CGM) model with p < pm processors that takes O(mn=p) time and O(n p m) space per processor, with O(log p) communication rounds. The proposed parallel algorithm also solves the wellknown LCS problem. To our knowledge this is the best BSP/CGM algorithm for the ALCS problem in the literature. 1.
Sequential and Parallel Algorithms for the AllSubstrings Longest Common Subsequence Problem
"... Given two strings A and B of lengths n a and n b , respectively, the Allsubstrings Longest Common Subsequence (ALCS) problem obtains, for any substring B # of B, the length of the longest string that is a subsequence of both A and B # . The sequential algorithm takes O(n a n b ) time and O(n b ) ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Given two strings A and B of lengths n a and n b , respectively, the Allsubstrings Longest Common Subsequence (ALCS) problem obtains, for any substring B # of B, the length of the longest string that is a subsequence of both A and B # . The sequential algorithm takes O(n a n b ) time and O(n b ) space. We present a parallel algorithm for the ALCS on the Coarse Grained Multicomputer (BSP/CGM) model with p < # n a processors, that takes O(n a n b /p) time and O(n b # n a ) space per processor, with O(log p) communication rounds. The proposed algorithm also solves the basic Longest Common Subsequence (LCS) Problem that finds the longest string (and not only its length) that is a subsequence of both A and B. To our knowledge, this is the best BSP/CGM algorithm for the LCS and ALCS problems in the literature.
A New Practical Linear Space Algorithm for the Longest Common Subsequence Problem
"... This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths m and n, m, on an alphabet of size s, we first present an algorithm which determines the length p of an LCS in O(ns + min{mp, p(n p)}) time and O(ns) space. ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths m and n, m, on an alphabet of size s, we first present an algorithm which determines the length p of an LCS in O(ns + min{mp, p(n p)}) time and O(ns) space.
Time and Space Efficient Algorithms for Decomposing Certain Partially Ordered Sets
"... Contents 1 Introduction 5 2 Theoretical Background 11 2.1 Basic Results . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 A Linear Space MOC Algorithm . . . . . . . . . . . . 14 2.3 Computing All Longest Chains . . . . . . . . . . . . . 17 3 Rick's Algorithm 25 4 The New Decomposition Meth ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Contents 1 Introduction 5 2 Theoretical Background 11 2.1 Basic Results . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 A Linear Space MOC Algorithm . . . . . . . . . . . . 14 2.3 Computing All Longest Chains . . . . . . . . . . . . . 17 3 Rick's Algorithm 25 4 The New Decomposition Method 35 4.1 Dualization of Rick's Method . . . . . . . . . . . . . . 35 . . . . . . . . . . . . . . . . . . . . . 45 4.3 Program Code . . . . . . . . . . . . . . . . . . . . . . 47 5 Linear Space LCS Construction 53 5.1 Locating Two Centered Matches . . . . . . . . . . . . 53 5.2 Program Code . . . . . . . . . . . . . . . . . . . . . . 57 6 The AMOC Problem 69 6.1 Generating the AMOCGraph . . . . . . . . . . . . . 70 6.2 Border LCS . . . . . . . . . . . . . . . . . . . . . . . . 73 6.3 Program Code . . . . . . . . . . . . . . . . . . . . . . 85 6.4 Complexity Analysis . . . . . . . . . . . . . . . . . . . 94 4 CONTENTS 7 Experimental Results 97 Bibliography 105 Introduction This work is co
Efficient Dominant Point Algorithms for the Multiple Longest Common Subsequence (MLCS) Problem
"... Finding the longest common subsequence of multiple strings is a classical computer science problem and has many applications in the areas of bioinformatics and computational genomics. In this paper, we present a new sequential algorithm for the general case of MLCS problem, and its parallel realizat ..."
Abstract
 Add to MetaCart
Finding the longest common subsequence of multiple strings is a classical computer science problem and has many applications in the areas of bioinformatics and computational genomics. In this paper, we present a new sequential algorithm for the general case of MLCS problem, and its parallel realization. The algorithm is based on the dominant point approach and employs a fast divideandconquer technique to compute the dominant points. When applied to find a MLCS of 3 strings, our general algorithm is shown to exhibit the same performance as the best existing MLCS algorithm by Hakata and Imai, designed specifically for the case of 3 strings. Moreover, we show that for a general case of more than 3 strings, the algorithm is significantly faster than the best existing sequential approaches, reaching up to 23 orders of magnitude faster on the largesize problems. Finally, we propose a parallel implementation of the algorithm. Evaluating the parallel algorithm on a benchmark set of both random and biological sequences reveals a nearlinear speedup with respect to the sequential algorithm.
Comparison of Genomes using HighPerformance Parallel Computing
"... Comparison of the DNA sequences and genes of two genomes can be useful to investigate the common functionalities of the corresponding organisms and get a better understanding of how the genes or groups of genes are organized and involved in several functions. In this paper we use highperformance pa ..."
Abstract
 Add to MetaCart
Comparison of the DNA sequences and genes of two genomes can be useful to investigate the common functionalities of the corresponding organisms and get a better understanding of how the genes or groups of genes are organized and involved in several functions. In this paper we use highperformance parallel computing to compare the whole genomes of two organisms, namely Xanthomonas axonopodis pv. citri and Xanthomonas campestris pv. campestris, each with more than five million basepairs. Our purpose is twofold. First we intend to exploit the highperformance power of a cluster of lowcost microcomputers, propose a parallel solution to this problem, and show its feasibility with implementation and performance results. Second we do additional comparisons of the two genomes by locating and compare not only the homologous genes (expressed in terms of the 20letter amino acids) but also compare the regions or gaps (in terms of the 4letter DNA nucleotides) between the corresponding homologous genes. We have implemented the proposed comparison strategy to compare the two genomes Xanthomonas axonopodis pv. citri (Xac) and Xanthomonas campestris pv. campestris (Xcc). The parallel platform used is a Beowulf cluster of 64 nodes consisting of low cost microcomputers. Xac has 5,175,554 base pairs and 4,313 proteincoding genes while Xcc has 5,076,187 base pairs and 4,182 proteincoding genes. The parallel solution is based on the dynamic programming approach and presents not only less processing time, but also better quality results as compared to approaches based on Blast and EGG. ∗ Partially supported by CNPq. † Partially supported by FINEPPRONEXSAI Proc. No.
A New IndexBased Parallel Algorithm for finding Longest Common Subsequence in Multiple DNA Sequences
"... This paper presents a new Parallel Algorithm for computing a Longest Common Subsequence in Multiple DNA Sequences. It uses a heuristic approach. Although a lot of research has been carried out to find LCS from the two or more given sequences of Protein, DNA, RNA etc, but not many parallel methods ex ..."
Abstract
 Add to MetaCart
This paper presents a new Parallel Algorithm for computing a Longest Common Subsequence in Multiple DNA Sequences. It uses a heuristic approach. Although a lot of research has been carried out to find LCS from the two or more given sequences of Protein, DNA, RNA etc, but not many parallel methods exists for finding LCS from multiple sequences. Normally in existing algorithms the time complexity for finding the LCS increases linearly with the increase in Sequences. This is an attempt to given an effective Parallel algorithm to find LCS from any given number of DNA sequences. Significance of this algorithm is that time complexity does not increase linearly with the increase in the number of sequences. However algorithm can also be applied to Protein sequences with the same effectiveness, though the requirement of processors will go up.