Results 1  10
of
42
Linear Approximation of Shortest Superstrings
, 1991
"... We consider the following problem: given a collection of strings s 1 ; . . . ; s m , find the shortest string s such that each s i appears as a substring (a consecutive block) of s. Although this problem is known to be NPhard, a simple greedy procedure appears to do quite well and is routinely used ..."
Abstract

Cited by 79 (5 self)
 Add to MetaCart
We consider the following problem: given a collection of strings s 1 ; . . . ; s m , find the shortest string s such that each s i appears as a substring (a consecutive block) of s. Although this problem is known to be NPhard, a simple greedy procedure appears to do quite well and is routinely used in DNA sequencing and data compression practice, namely: repeatedly merge the pair of distinct strings with maximum overlap until only one string remains. Let n denote the length of the optimal superstring. A common conjecture states that the above greedy procedure produces a superstring of length O(n) (in fact, 2n), yet the only previous nontrivial bound known for any polynomialtime algorithm is a recent O(n log n) result. We show that the greedy algorithm does in fact achieve a constant factor approximation, proving an upper bound of 4n. Furthermore, we present a simple modified version of the greedy algorithm that we show produces a superstring of length at most 3n. We also show the sup...
Approximation Algorithms for Asymmetric TSP by Decomposing Directed Regular Multigraphs
, 2006
"... A directed multigraph is said to be dregular if the indegree and outdegree of every vertexis exactly d. By Hall's theorem one can represent such a multigraph as a combination of atmost n2 cycle covers each taken with an appropriate multiplicity. We prove that if the dregular multigraph does ..."
Abstract

Cited by 68 (2 self)
 Add to MetaCart
(Show Context)
A directed multigraph is said to be dregular if the indegree and outdegree of every vertexis exactly d. By Hall's theorem one can represent such a multigraph as a combination of atmost n2 cycle covers each taken with an appropriate multiplicity. We prove that if the dregular multigraph does not contain more than b d/2c copies of any 2cycle then we can find asimilar decomposition into n2 pairs of cycle covers where each 2cycle occurs in at most onecomponent of each pair. Our proof is constructive and gives a polynomial algorithm to find such a decomposition. Since our applications only need one such a pair of cycle covers whoseweight is at least the average weight of all pairs, we also give an alternative, simpler algorithm to extract a single such pair.This combinatorial theorem then comes handy in rounding a fractional solution of an LP relaxation of the maximum Traveling Salesman Problem (TSP) problem. The first stage of therounding procedure obtains 2cycle covers that do not share a 2cycle with weight at least twice the weight of the optimal solution. Then we show how to extract a tour from the 2 cycle covers,whose weight is at least 2 /3 of the weight of the longest tour. This improves upon the previous5/8 approximation with a simpler algorithm. Utilizing a reduction from maximum TSP to the shortest superstring problem we obtain a 2.5approximation algorithm for the latter problemwhich is again much simpler than the previous one. For minimum asymmetric TSP the same technique gives 2cycle covers, not sharing a 2cycle, with weight at most twice the weight of the optimum. Assuming triangle inequality, we then show how to obtain from this pair of cycle covers a tour whose weight is at most0.842 log2 n larger than optimal. This improves upon a previous approximation algorithm with approximation guarantee of 0.999 log2 n. Other applications of the rounding procedure are approximation algorithms for maximum 3cycle cover (factor 2/3, previously 3/5) and maximum
Toward Simplifying and Accurately Formulating Fragment Assembly
 JOURNAL OF COMPUTATIONAL BIOLOGY
, 1995
"... The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequence ..."
Abstract

Cited by 63 (1 self)
 Add to MetaCart
The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequences this objective produces answers that are overcompressed. In this paper, the problem is reformulated as one of finding a maximumlikelihood reconstruction with respect to the 2sided KolmogorovSmirnov statistic, and it is argued that this is a better formulation of the problem. Next the fragment assembly problem is recast in graphtheoretic terms as one of finding a noncyclic subgraph with certain properties and the objectives of being shortest or maximallylikely are also recast in this framework. Finally, a series of graph reduction transformations are given that dramatically reduce the size of the graph to be explored in practical instances of the problem. This reduction is ...
Combinatorial algorithms for DNA sequence assembly
 Algorithmica
, 1993
"... The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The seq ..."
Abstract

Cited by 62 (3 self)
 Add to MetaCart
(Show Context)
The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The sequence reconstruction problem that we take as our formulation of DNA sequence assembly is a variation of the shortest common superstring problem, complicated by the presence of sequencing errors and reverse complements of fragments. Since the simpler superstring problem is NPhard, any efficient reconstruction procedure must resort to heuristics. In this paper, however, a four phase approach based on rigorous design criteria is presented, and has been found to be very accurate in practice. Our method is robust in the sense that it can accommodate high sequencing error rates and list a series of alternate solutions in the event that several appear equally good. Moreover it uses a limited form ...
Rotation of Periodic Strings and Short Superstrings
, 1996
"... This paper presents two simple approximation algorithms for the shortest superstring problem, with approximation ratios 2 2 3 ( 2:67) and 2 25 42 ( 2:596), improving the best previously published 2 3 4 approximation. The framework of our improved algorithms is similar to that of previous a ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
(Show Context)
This paper presents two simple approximation algorithms for the shortest superstring problem, with approximation ratios 2 2 3 ( 2:67) and 2 25 42 ( 2:596), improving the best previously published 2 3 4 approximation. The framework of our improved algorithms is similar to that of previous algorithms in the sense that they construct a superstring by computing some optimal cycle covers on the distance graph of the given strings, and then break and merge the cycles to finally obtain a Hamiltonian path, but we make use of new bounds on the overlap between two strings. We prove that for each periodic semiinfinite string ff = a1a2 \Delta \Delta \Delta of period q, there exists an integer k, such that for any (finite) string s of period p which is inequivalent to ff, the overlap between s and the rotation ff[k] = ak ak+1 \Delta \Delta \Delta is at most p+ 1 2 q. Moreover, if p q, then the overlap between s and ff[k] is not larger than 2 3 (p+q). In the previous shortes...
Expected Length of Longest Common Subsequences
"... Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest c ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest common subsequences : : : : : : : 14 3 Lower Bounds 20 3.1 Css machines : : : : : : : : : : : : : : : : : : : : : : : : : : : 20 3.2 Analysis of css machines : : : : : : : : : : : : : : : : : : : : : 26 3.3 Design of css machines : : : : : : : : : : : : : : : : : : : : : : 31 3.4 Labeled css machines : : : : : : : : : : : : : : : : : : : : : : : 38 4 Upper bounds 45 4.1 Collations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45 4.2 Previous upper bounds : : : : : : : : : : : : : : : : : : : : : : 51 4.3 Simple upper bound (binary alphabet) : : : : : : : : : : : : : 55 4.4 Simple upper bound (alphabet size 3) : : : : : : : : : : : : : : 59 4.5 Upper bounds for binary alphabet : :
A 2 2/3Approximation Algorithms for the Shortest Superstring Problem
 DIMACS WORKSHOP ON SEQUENCING AND MAPPING
, 1995
"... Given a collection of strings S = fs1; : : : ; sng over an alphabet, a superstring of S is a string containing each si as a substring; that is, for each i, 1 i n, contains a block of jsij consecutive characters that match si exactly. The shortest superstring problem is the problem of nding a superst ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(Show Context)
Given a collection of strings S = fs1; : : : ; sng over an alphabet, a superstring of S is a string containing each si as a substring; that is, for each i, 1 i n, contains a block of jsij consecutive characters that match si exactly. The shortest superstring problem is the problem of nding a superstring of minimum length. The shortest superstring problem has applications in both data compression and computational biology. In data compression, the problem is a part of a general model of string compression proposed by Gallant, Maier and Storer (JCSS '80). Much of the recent interest in the problem is due to its application to DNA sequence assembly. The problem has been shown to be NPhard; in fact, it was shown by Blum et al.(JACM '94) to be MAX SNPhard. The rst O(1)approximation was also due to Blum et al., who gave an algorithm that always returns a superstring no more than 3 times the length of an optimal solution. Several researchers have published results that improve on the approximation ratio; of these, the best previous result is our algorithm ShortString, which achieves a 2 3
Parallel and Sequential Approximations of Shortest Superstrings
 In Proceedings of Fourth Scandinavian Workshop on Algorithm Theory
, 1994
"... Abstract. Superstrings have many applications in data compression and genetics. However the decision version of the shortest superstring problem is N P�complete. In this paper we examine the complexity of approximating a shortest superstring. There are two basic measures of the approximations� the c ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
Abstract. Superstrings have many applications in data compression and genetics. However the decision version of the shortest superstring problem is N P�complete. In this paper we examine the complexity of approximating a shortest superstring. There are two basic measures of the approximations� the compression ratio and the approximation ratio. The well known and practical approximation algorithm is the sequential algorithm GREEDY. It approximates the shortest superstring with the compression ratio of 1 2 and with the approximation ratio of 4. Our main results are� �1 � An N C algorithm which achieves the compression ratio of 1 4� �. �2 � The proof that the algorithm GREEDY is not parallelizable � the com� putation of its output is P�complete. �3 � An improved sequential algorithm � the approximation ratio is reduced to 2.83. Previously it was reduced by Teng and Yao from 3 to 2.89. �4 � The design of an RN C algorithm with constant approximation ratio and an N C algorithm with logarithmic approximation ratio. 1
Efficient Processing of Distance Queries in Large Graphs: A Vertex Cover Approach
, 2012
"... We propose a novel diskbased index for processing singlesource shortest path or distance queries. The index is useful in a wide range of important applications (e.g., network analysis, routing planning, etc.). Our index is a treestructured index constructed based on the concept of vertex cover. W ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
We propose a novel diskbased index for processing singlesource shortest path or distance queries. The index is useful in a wide range of important applications (e.g., network analysis, routing planning, etc.). Our index is a treestructured index constructed based on the concept of vertex cover. We propose an I/Oefficient algorithm to construct the index when the input graph is too large to fit in main memory. We give detailed analysis of I/O and CPU complexity for both index construction and query processing, and verify the efficiency of our index for query processing in massive realworld graphs.
An 8/13approximation algorithm for the asymmetric maximum TSP
 In Proc. 13th Ann. ACMSIAM Symp. on Discrete Algorithms (SODA
, 2002
"... We present a polynomial time approximation algorithm for the asymmetric maximum traveling salesperson problem that achieves performance ratio 8 ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We present a polynomial time approximation algorithm for the asymmetric maximum traveling salesperson problem that achieves performance ratio 8