Results 1  10
of
28
Linear Approximation of Shortest Superstrings
, 1991
"... We consider the following problem: given a collection of strings s 1 ; . . . ; s m , find the shortest string s such that each s i appears as a substring (a consecutive block) of s. Although this problem is known to be NPhard, a simple greedy procedure appears to do quite well and is routinely used ..."
Abstract

Cited by 73 (5 self)
 Add to MetaCart
We consider the following problem: given a collection of strings s 1 ; . . . ; s m , find the shortest string s such that each s i appears as a substring (a consecutive block) of s. Although this problem is known to be NPhard, a simple greedy procedure appears to do quite well and is routinely used in DNA sequencing and data compression practice, namely: repeatedly merge the pair of distinct strings with maximum overlap until only one string remains. Let n denote the length of the optimal superstring. A common conjecture states that the above greedy procedure produces a superstring of length O(n) (in fact, 2n), yet the only previous nontrivial bound known for any polynomialtime algorithm is a recent O(n log n) result. We show that the greedy algorithm does in fact achieve a constant factor approximation, proving an upper bound of 4n. Furthermore, we present a simple modified version of the greedy algorithm that we show produces a superstring of length at most 3n. We also show the sup...
Approximation Algorithms for Asymmetric TSP by Decomposing Directed Regular Multigraphs
 Proc. 44th Annual Symposium on Foundations of Computer Science (FOCS
, 2003
"... A directed multigraph is said to be dregular if the indegree and outdegree of every vertex is exactly d. By Hall’s theorem one can represent such a multigraph as a combination of at most n2 cycle covers each taken with an appropriate multiplicity. We prove that if the dregular multigraph does not ..."
Abstract

Cited by 51 (1 self)
 Add to MetaCart
A directed multigraph is said to be dregular if the indegree and outdegree of every vertex is exactly d. By Hall’s theorem one can represent such a multigraph as a combination of at most n2 cycle covers each taken with an appropriate multiplicity. We prove that if the dregular multigraph does not contain more than ⌊d/2 ⌋ copies of any 2cycle then we can find a similar decomposition into n2 pairs of cycle covers where each 2cycle occurs in at most one component of each pair. Our proof is constructive and gives a polynomial algorithm to find such a decomposition. Since our applications only need one such a pair of cycle covers whose weight is at least the average weight of all pairs, we also give an alternative, simpler algorithm to extract a single such pair. This combinatorial theorem then comes handy in rounding a fractional solution of an LP relaxation of the maximum Traveling Salesman Problem (TSP) problem. The first stage of the rounding procedure obtains 2cycle covers that do not share a 2cycle with weight at least twice the weight of the optimal solution. Then we show how to extract a tour from the 2 cycle covers, whose weight is at least 2/3 of the weight of the longest tour. This improves upon the previous
Combinatorial algorithms for DNA sequence assembly
 Algorithmica
, 1993
"... The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The seq ..."
Abstract

Cited by 42 (3 self)
 Add to MetaCart
The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The sequence reconstruction problem that we take as our formulation of DNA sequence assembly is a variation of the shortest common superstring problem, complicated by the presence of sequencing errors and reverse complements of fragments. Since the simpler superstring problem is NPhard, any efficient reconstruction procedure must resort to heuristics. In this paper, however, a four phase approach based on rigorous design criteria is presented, and has been found to be very accurate in practice. Our method is robust in the sense that it can accommodate high sequencing error rates and list a series of alternate solutions in the event that several appear equally good. Moreover it uses a limited form ...
Toward Simplifying and Accurately Formulating Fragment Assembly
 JOURNAL OF COMPUTATIONAL BIOLOGY
, 1995
"... The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequence ..."
Abstract

Cited by 37 (1 self)
 Add to MetaCart
The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequences this objective produces answers that are overcompressed. In this paper, the problem is reformulated as one of finding a maximumlikelihood reconstruction with respect to the 2sided KolmogorovSmirnov statistic, and it is argued that this is a better formulation of the problem. Next the fragment assembly problem is recast in graphtheoretic terms as one of finding a noncyclic subgraph with certain properties and the objectives of being shortest or maximallylikely are also recast in this framework. Finally, a series of graph reduction transformations are given that dramatically reduce the size of the graph to be explored in practical instances of the problem. This reduction is ...
Rotation of Periodic Strings and Short Superstrings
, 1996
"... This paper presents two simple approximation algorithms for the shortest superstring problem, with approximation ratios 2 2 3 ( 2:67) and 2 25 42 ( 2:596), improving the best previously published 2 3 4 approximation. The framework of our improved algorithms is similar to that of previous a ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
This paper presents two simple approximation algorithms for the shortest superstring problem, with approximation ratios 2 2 3 ( 2:67) and 2 25 42 ( 2:596), improving the best previously published 2 3 4 approximation. The framework of our improved algorithms is similar to that of previous algorithms in the sense that they construct a superstring by computing some optimal cycle covers on the distance graph of the given strings, and then break and merge the cycles to finally obtain a Hamiltonian path, but we make use of new bounds on the overlap between two strings. We prove that for each periodic semiinfinite string ff = a1a2 \Delta \Delta \Delta of period q, there exists an integer k, such that for any (finite) string s of period p which is inequivalent to ff, the overlap between s and the rotation ff[k] = ak ak+1 \Delta \Delta \Delta is at most p+ 1 2 q. Moreover, if p q, then the overlap between s and ff[k] is not larger than 2 3 (p+q). In the previous shortes...
Expected Length of Longest Common Subsequences
"... Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest c ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest common subsequences : : : : : : : 14 3 Lower Bounds 20 3.1 Css machines : : : : : : : : : : : : : : : : : : : : : : : : : : : 20 3.2 Analysis of css machines : : : : : : : : : : : : : : : : : : : : : 26 3.3 Design of css machines : : : : : : : : : : : : : : : : : : : : : : 31 3.4 Labeled css machines : : : : : : : : : : : : : : : : : : : : : : : 38 4 Upper bounds 45 4.1 Collations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45 4.2 Previous upper bounds : : : : : : : : : : : : : : : : : : : : : : 51 4.3 Simple upper bound (binary alphabet) : : : : : : : : : : : : : 55 4.4 Simple upper bound (alphabet size 3) : : : : : : : : : : : : : : 59 4.5 Upper bounds for binary alphabet : :
Parallel and Sequential Approximations of Shortest Superstrings
 In Proceedings of Fourth Scandinavian Workshop on Algorithm Theory
, 1994
"... Abstract. Superstrings have many applications in data compression and genetics. However the decision version of the shortest superstring problem is N P�complete. In this paper we examine the complexity of approximating a shortest superstring. There are two basic measures of the approximations� the c ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Abstract. Superstrings have many applications in data compression and genetics. However the decision version of the shortest superstring problem is N P�complete. In this paper we examine the complexity of approximating a shortest superstring. There are two basic measures of the approximations� the compression ratio and the approximation ratio. The well known and practical approximation algorithm is the sequential algorithm GREEDY. It approximates the shortest superstring with the compression ratio of 1 2 and with the approximation ratio of 4. Our main results are� �1 � An N C algorithm which achieves the compression ratio of 1 4� �. �2 � The proof that the algorithm GREEDY is not parallelizable � the com� putation of its output is P�complete. �3 � An improved sequential algorithm � the approximation ratio is reduced to 2.83. Previously it was reduced by Teng and Yao from 3 to 2.89. �4 � The design of an RN C algorithm with constant approximation ratio and an N C algorithm with logarithmic approximation ratio. 1
A 2 2/3Approximation Algorithms for the Shortest Superstring Problem
 DIMACS WORKSHOP ON SEQUENCING AND MAPPING
, 1995
"... Given a collection of strings S = fs1; : : : ; sng over an alphabet, a superstring of S is a string containing each si as a substring; that is, for each i, 1 i n, contains a block of jsij consecutive characters that match si exactly. The shortest superstring problem is the problem of nding a superst ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Given a collection of strings S = fs1; : : : ; sng over an alphabet, a superstring of S is a string containing each si as a substring; that is, for each i, 1 i n, contains a block of jsij consecutive characters that match si exactly. The shortest superstring problem is the problem of nding a superstring of minimum length. The shortest superstring problem has applications in both data compression and computational biology. In data compression, the problem is a part of a general model of string compression proposed by Gallant, Maier and Storer (JCSS '80). Much of the recent interest in the problem is due to its application to DNA sequence assembly. The problem has been shown to be NPhard; in fact, it was shown by Blum et al.(JACM '94) to be MAX SNPhard. The rst O(1)approximation was also due to Blum et al., who gave an algorithm that always returns a superstring no more than 3 times the length of an optimal solution. Several researchers have published results that improve on the approximation ratio; of these, the best previous result is our algorithm ShortString, which achieves a 2 3
Sequential and Parallel Approximation of Shortest Superstrings
, 1997
"... Superstrings have many applications in data compression and genetics. However, the decision version of the shortest superstring problem is N Pcomplete. In this paper we examine the complexity of approximating shortest superstrings. There are two basic measures of the approximations: the length fact ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Superstrings have many applications in data compression and genetics. However, the decision version of the shortest superstring problem is N Pcomplete. In this paper we examine the complexity of approximating shortest superstrings. There are two basic measures of the approximations: the length factor and the compression factor. The well known and practical approximation algorithm is the sequential algorithm GREEDY. It approximates the shortest superstring with the compression 1 factor of and with the length factor of 4. Our main results are: Ž. 2 1 A sequential length approximation algorithm which achieves a length factor of 2.83. This result improves the best previously known bound of 2.89 due to Teng and Yao. Very recently, this bound was improved by Kosaraju, Park, and Stein to 2.79, and by Armen and Stein to 2.75. Ž. 2 A proof that the algorithm GREEDY is not paralleliz
Coevolving Solutions to the Shortest Common Superstring Problem
, 2004
"... The Shortest Common Superstring (SCS) problem, known to be NPComplete, seeks the shortest string that contains all strings from a given set. In this paper we compare four approaches for finding solutions to the SCS problem: a standard genetic algorithm, a novel cooperativecoevolutionary algorithm, ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The Shortest Common Superstring (SCS) problem, known to be NPComplete, seeks the shortest string that contains all strings from a given set. In this paper we compare four approaches for finding solutions to the SCS problem: a standard genetic algorithm, a novel cooperativecoevolutionary algorithm, a benchmark greedy algorithm, and a parallel coevolutionarygreedy approach. We show the coevolutionary approach produces the best results, and discuss directions for future research.