Results 1  10
of
46
Linear Approximation of Shortest Superstrings
, 1991
"... We consider the following problem: given a collection of strings s 1 ; . . . ; s m , find the shortest string s such that each s i appears as a substring (a consecutive block) of s. Although this problem is known to be NPhard, a simple greedy procedure appears to do quite well and is routinely used ..."
Abstract

Cited by 74 (5 self)
 Add to MetaCart
We consider the following problem: given a collection of strings s 1 ; . . . ; s m , find the shortest string s such that each s i appears as a substring (a consecutive block) of s. Although this problem is known to be NPhard, a simple greedy procedure appears to do quite well and is routinely used in DNA sequencing and data compression practice, namely: repeatedly merge the pair of distinct strings with maximum overlap until only one string remains. Let n denote the length of the optimal superstring. A common conjecture states that the above greedy procedure produces a superstring of length O(n) (in fact, 2n), yet the only previous nontrivial bound known for any polynomialtime algorithm is a recent O(n log n) result. We show that the greedy algorithm does in fact achieve a constant factor approximation, proving an upper bound of 4n. Furthermore, we present a simple modified version of the greedy algorithm that we show produces a superstring of length at most 3n. We also show the sup...
Combinatorial algorithms for DNA sequence assembly
 Algorithmica
, 1993
"... The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The seq ..."
Abstract

Cited by 44 (3 self)
 Add to MetaCart
The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The sequence reconstruction problem that we take as our formulation of DNA sequence assembly is a variation of the shortest common superstring problem, complicated by the presence of sequencing errors and reverse complements of fragments. Since the simpler superstring problem is NPhard, any efficient reconstruction procedure must resort to heuristics. In this paper, however, a four phase approach based on rigorous design criteria is presented, and has been found to be very accurate in practice. Our method is robust in the sense that it can accommodate high sequencing error rates and list a series of alternate solutions in the event that several appear equally good. Moreover it uses a limited form ...
A new algorithm for DNA sequence assembly
 Journal of Computational Biology
, 1995
"... Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a ‘ wellestablished biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are base ..."
Abstract

Cited by 40 (2 self)
 Add to MetaCart
Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a ‘ wellestablished biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion.of pairwisk fragment overlap. * While shotgun sequencing infers a DNA sequence given the sequences of overlapping fragments, a recent and complementary method, called sequencing by hybridization (SBH), infers a DNA sequence given the set of oligomers that represents all subwords of some fixed length, k. In this paper,. we propose a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods. Based on our preliminary investigations, the algorithm promises to be very fast and practical for DNA sequence assembly.
Reconstructing Strings from Substrings
 Journal of Computational Biology
, 1993
"... this paper, we consider a variety of problems with application to sequencing by hybridization. First, we develop a theory of interactive sequencing by hybridization, based on ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
this paper, we consider a variety of problems with application to sequencing by hybridization. First, we develop a theory of interactive sequencing by hybridization, based on
Rotation of Periodic Strings and Short Superstrings
, 1996
"... This paper presents two simple approximation algorithms for the shortest superstring problem, with approximation ratios 2 2 3 ( 2:67) and 2 25 42 ( 2:596), improving the best previously published 2 3 4 approximation. The framework of our improved algorithms is similar to that of previous a ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
This paper presents two simple approximation algorithms for the shortest superstring problem, with approximation ratios 2 2 3 ( 2:67) and 2 25 42 ( 2:596), improving the best previously published 2 3 4 approximation. The framework of our improved algorithms is similar to that of previous algorithms in the sense that they construct a superstring by computing some optimal cycle covers on the distance graph of the given strings, and then break and merge the cycles to finally obtain a Hamiltonian path, but we make use of new bounds on the overlap between two strings. We prove that for each periodic semiinfinite string ff = a1a2 \Delta \Delta \Delta of period q, there exists an integer k, such that for any (finite) string s of period p which is inequivalent to ff, the overlap between s and the rotation ff[k] = ak ak+1 \Delta \Delta \Delta is at most p+ 1 2 q. Moreover, if p q, then the overlap between s and ff[k] is not larger than 2 3 (p+q). In the previous shortes...
Expected Length of Longest Common Subsequences
"... Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest c ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest common subsequences : : : : : : : 14 3 Lower Bounds 20 3.1 Css machines : : : : : : : : : : : : : : : : : : : : : : : : : : : 20 3.2 Analysis of css machines : : : : : : : : : : : : : : : : : : : : : 26 3.3 Design of css machines : : : : : : : : : : : : : : : : : : : : : : 31 3.4 Labeled css machines : : : : : : : : : : : : : : : : : : : : : : : 38 4 Upper bounds 45 4.1 Collations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45 4.2 Previous upper bounds : : : : : : : : : : : : : : : : : : : : : : 51 4.3 Simple upper bound (binary alphabet) : : : : : : : : : : : : : 55 4.4 Simple upper bound (alphabet size 3) : : : : : : : : : : : : : : 59 4.5 Upper bounds for binary alphabet : :
Conjunctive Query Containment over Trees
 DBPL 2007. LNCS
, 2007
"... The complexity of containment and satisfiability of conjunctive queries over finite, unranked, labeled trees is studied with respect to the axes Child, NextSibling, their transitive and reflexive closures, and Following. For the containment problem a trichotomy is presented, classifying the problem ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
The complexity of containment and satisfiability of conjunctive queries over finite, unranked, labeled trees is studied with respect to the axes Child, NextSibling, their transitive and reflexive closures, and Following. For the containment problem a trichotomy is presented, classifying the problems as in PTIME, coNPcomplete, or Π P 2complete. For the satisfiability problem most problems are classified as either in PTIME or NPcomplete.
A 2 2/3Approximation Algorithms for the Shortest Superstring Problem
 DIMACS WORKSHOP ON SEQUENCING AND MAPPING
, 1995
"... Given a collection of strings S = fs1; : : : ; sng over an alphabet, a superstring of S is a string containing each si as a substring; that is, for each i, 1 i n, contains a block of jsij consecutive characters that match si exactly. The shortest superstring problem is the problem of nding a superst ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Given a collection of strings S = fs1; : : : ; sng over an alphabet, a superstring of S is a string containing each si as a substring; that is, for each i, 1 i n, contains a block of jsij consecutive characters that match si exactly. The shortest superstring problem is the problem of nding a superstring of minimum length. The shortest superstring problem has applications in both data compression and computational biology. In data compression, the problem is a part of a general model of string compression proposed by Gallant, Maier and Storer (JCSS '80). Much of the recent interest in the problem is due to its application to DNA sequence assembly. The problem has been shown to be NPhard; in fact, it was shown by Blum et al.(JACM '94) to be MAX SNPhard. The rst O(1)approximation was also due to Blum et al., who gave an algorithm that always returns a superstring no more than 3 times the length of an optimal solution. Several researchers have published results that improve on the approximation ratio; of these, the best previous result is our algorithm ShortString, which achieves a 2 3
Parallel and Sequential Approximations of Shortest Superstrings
 In Proceedings of Fourth Scandinavian Workshop on Algorithm Theory
, 1994
"... Abstract. Superstrings have many applications in data compression and genetics. However the decision version of the shortest superstring problem is N P�complete. In this paper we examine the complexity of approximating a shortest superstring. There are two basic measures of the approximations� the c ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Abstract. Superstrings have many applications in data compression and genetics. However the decision version of the shortest superstring problem is N P�complete. In this paper we examine the complexity of approximating a shortest superstring. There are two basic measures of the approximations� the compression ratio and the approximation ratio. The well known and practical approximation algorithm is the sequential algorithm GREEDY. It approximates the shortest superstring with the compression ratio of 1 2 and with the approximation ratio of 4. Our main results are� �1 � An N C algorithm which achieves the compression ratio of 1 4� �. �2 � The proof that the algorithm GREEDY is not parallelizable � the com� putation of its output is P�complete. �3 � An improved sequential algorithm � the approximation ratio is reduced to 2.83. Previously it was reduced by Teng and Yao from 3 to 2.89. �4 � The design of an RN C algorithm with constant approximation ratio and an N C algorithm with logarithmic approximation ratio. 1
Computability of models for sequence assembly
 In WABI
, 2007
"... pashadag,cgeorg,brudno¥ Abstract. Graphtheoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
pashadag,cgeorg,brudno¥ Abstract. Graphtheoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about the complexity of these models for sequence assembly. In the first part, we show sequence assembly to be NPhard under two different models: string graphs and de Bruijn graphs. Together with an earlier result on the NPhardness of overlap graphs, this demonstrates that all of the popular graphtheoretic sequence assembly paradigms are NPhard. In our second result, we give the first, to our knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the doublestrandedness of DNA. We solve the Chinese Postman Problem on bidirected graphs using bidirected flow techniques and show to how to use it to find the shortest doublestranded DNA sequence which contains a given set of ¦long words. This algorithm has applications to sequencing by hybridization and short read assembly. 1