Results 1  10
of
15
Approximation Algorithms for Asymmetric TSP by Decomposing Directed Regular Multigraphs
, 2006
"... A directed multigraph is said to be dregular if the indegree and outdegree of every vertexis exactly d. By Hall's theorem one can represent such a multigraph as a combination of atmost n2 cycle covers each taken with an appropriate multiplicity. We prove that if the dregular multigraph does ..."
Abstract

Cited by 68 (2 self)
 Add to MetaCart
(Show Context)
A directed multigraph is said to be dregular if the indegree and outdegree of every vertexis exactly d. By Hall's theorem one can represent such a multigraph as a combination of atmost n2 cycle covers each taken with an appropriate multiplicity. We prove that if the dregular multigraph does not contain more than b d/2c copies of any 2cycle then we can find asimilar decomposition into n2 pairs of cycle covers where each 2cycle occurs in at most onecomponent of each pair. Our proof is constructive and gives a polynomial algorithm to find such a decomposition. Since our applications only need one such a pair of cycle covers whoseweight is at least the average weight of all pairs, we also give an alternative, simpler algorithm to extract a single such pair.This combinatorial theorem then comes handy in rounding a fractional solution of an LP relaxation of the maximum Traveling Salesman Problem (TSP) problem. The first stage of therounding procedure obtains 2cycle covers that do not share a 2cycle with weight at least twice the weight of the optimal solution. Then we show how to extract a tour from the 2 cycle covers,whose weight is at least 2 /3 of the weight of the longest tour. This improves upon the previous5/8 approximation with a simpler algorithm. Utilizing a reduction from maximum TSP to the shortest superstring problem we obtain a 2.5approximation algorithm for the latter problemwhich is again much simpler than the previous one. For minimum asymmetric TSP the same technique gives 2cycle covers, not sharing a 2cycle, with weight at most twice the weight of the optimum. Assuming triangle inequality, we then show how to obtain from this pair of cycle covers a tour whose weight is at most0.842 log2 n larger than optimal. This improves upon a previous approximation algorithm with approximation guarantee of 0.999 log2 n. Other applications of the rounding procedure are approximation algorithms for maximum 3cycle cover (factor 2/3, previously 3/5) and maximum
Rotation of Periodic Strings and Short Superstrings
, 1996
"... This paper presents two simple approximation algorithms for the shortest superstring problem, with approximation ratios 2 2 3 ( 2:67) and 2 25 42 ( 2:596), improving the best previously published 2 3 4 approximation. The framework of our improved algorithms is similar to that of previous a ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
(Show Context)
This paper presents two simple approximation algorithms for the shortest superstring problem, with approximation ratios 2 2 3 ( 2:67) and 2 25 42 ( 2:596), improving the best previously published 2 3 4 approximation. The framework of our improved algorithms is similar to that of previous algorithms in the sense that they construct a superstring by computing some optimal cycle covers on the distance graph of the given strings, and then break and merge the cycles to finally obtain a Hamiltonian path, but we make use of new bounds on the overlap between two strings. We prove that for each periodic semiinfinite string ff = a1a2 \Delta \Delta \Delta of period q, there exists an integer k, such that for any (finite) string s of period p which is inequivalent to ff, the overlap between s and the rotation ff[k] = ak ak+1 \Delta \Delta \Delta is at most p+ 1 2 q. Moreover, if p q, then the overlap between s and ff[k] is not larger than 2 3 (p+q). In the previous shortes...
Greedy Algorithms For The Shortest Common Superstring That Are Asymptotically Optimal
, 1997
"... There has recently been a resurgence of interest in the shortest common superstring problem due to its important applications in molecular biology (e.g., recombination of DNA) and data compression. The problem is NPhard, but it has been known for some time that greedy algorithms work well for this ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
There has recently been a resurgence of interest in the shortest common superstring problem due to its important applications in molecular biology (e.g., recombination of DNA) and data compression. The problem is NPhard, but it has been known for some time that greedy algorithms work well for this problem. More precisely, it was proved in a recent sequence of papers that in the worst case a greedy algorithm produces a superstring that is at most fi times (2 fi 4) worse than optimal. We analyze the problem in a probabilistic framework, and consider the optimal total overlap O opt n and the overlap O gr n produced by various greedy algorithms. These turn out to be asymptotically equivalent. We show that with high probability lim n!1 O opt n n log n = lim n!1 O gr n n log n = 1 H where n is the number of original strings, and H is the entropy of the underlying alphabet. Our results hold under a condition that the lengths of all strings are not too short.
ASSEMBLY ALGORITHMS FOR NEXTGENERATION SEQUENCE DATA
, 2009
"... *Signatures are on file in the Graduate School. ii Nextgeneration sequencing is revolutionizing genomics, promising higher coverage at a lower cost per base when compared to Sanger sequencing. Shorter reads and higher error rates from these new instruments necessitate the development of new algorit ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
*Signatures are on file in the Graduate School. ii Nextgeneration sequencing is revolutionizing genomics, promising higher coverage at a lower cost per base when compared to Sanger sequencing. Shorter reads and higher error rates from these new instruments necessitate the development of new algorithms and software. This dissertation describes approaches to tackle some problems related to genome assembly with these short fragments. We describe YASRA (Yet Another Short Read Assembler), that performs comparative assembly of short reads using a reference genome, which can differ substantially from the genome being sequenced. We explain the algorithm and present the results of assembling one ancientmitochondrial and one plastid dataset. Comparing the performance of YASRA with the AMOScmpshortReads and Newbler mapping assemblers (version 2.0.00.17) as template genomes are varied, we find that YASRA generates fewer contigs with higher coverage and fewer errors. We also analyze situations where the use of comparative assembly outperforms de novo assembly, and
Approximating the Shortest Superstring Problem Using de Bruijn Graphs
, 2013
"... ..."
(Show Context)
Algorithms for Three Versions of the Shortest Common Superstring Problem
"... Abstract. The input to the Shortest Common Superstring (SCS) problem is a set S of k words of total length n. In the classical version the output is an explicit word SCS(S) in which each s ∈ S occurs at least once. In our paper we consider two versions with multiple occurrences, in which the input i ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The input to the Shortest Common Superstring (SCS) problem is a set S of k words of total length n. In the classical version the output is an explicit word SCS(S) in which each s ∈ S occurs at least once. In our paper we consider two versions with multiple occurrences, in which the input includes additional numbers (multiplicities), given in binary. Our output is the word SCS(S) given implicitly in a compact form, since its real size could be exponential. We also consider a case when all input words are of length two, where our main algorithmic tool is a compact representation of Eulerian cycles in multigraphs. Due to exponential multiplicities of edges such cycles can be exponential and the compact representation is needed. Other tools used in our paper are a polynomial case of integer linear programming and a minplus product of matrices. 1
be a set of strings over some alphabet \Sigma. A
"... ily a good approximation for the maximum overlap in the superstring, and vice versa. The first constantapproximation algorithm for the length of the shortest superstring was given by Blum et al. [4], who discovered a 3approximation algorithm and proved that the "Greedy" algorithm by Tar ..."
Abstract
 Add to MetaCart
ily a good approximation for the maximum overlap in the superstring, and vice versa. The first constantapproximation algorithm for the length of the shortest superstring was given by Blum et al. [4], who discovered a 3approximation algorithm and proved that the "Greedy" algorithm by Tarhio and Ukkonen [9] achieves 4approximation. Their algorithms and analysis rely on the close relation between the shortest superstring problem, that was shown by Turner [11] to be reducible to the travelling salesman problem, and the cycle cover problem. The same relation was exploited in subsequent papers [10] ( 2:89), [5] ( 2:83), [7] ( 2:79) and [1, 2] ( 2:75). Armen and Stein [3] have also recently obtained a 2 2 3 approximation algorithm, independently of our work. Here we continue this li
ScoringandUnfolding Trimmed Tree Assembler: Algorithms for Assembling Genome Sequences Accurately and Efficiently
, 2011
"... My family iii lowed me to clarify several important aspects of the thesis for the general reader. I thank Professor Michael Schatz for making sure that the theoretical framework presented in the dissertation was correctly presented and related to the relevant prior art extensively. I also express my ..."
Abstract
 Add to MetaCart
(Show Context)
My family iii lowed me to clarify several important aspects of the thesis for the general reader. I thank Professor Michael Schatz for making sure that the theoretical framework presented in the dissertation was correctly presented and related to the relevant prior art extensively. I also express my thanks to Professor Alan Siegel for improving the presentation and stile of the thesis. He has been not just a scientific mentor but also a tutoring figure in virtue of his high commitment to the value of education. Finally, I would like to thank Professor Raul Rabadan for suggesting many areas of application of the tools developed in this thesis. One of the contributions of the dissertation (TotalReCaller) is the result of the joint work with Fabian Menges. I am particularly grateful to him for such collaboration as well as for all the valuable and energetic discussions that we had on many of the topics presented in this thesis. I also would like to thank all the members of the NYU Bioinformatics Group for creating a unique and
Sequential and Parallel Algorithms for the Shortest Common Superstring Problem
"... We design sequential and parallel genetic algorithms, simulated annealing algorithms and improved greedy algorithms for the shortest common superstring problem(SCS), which is to find the shortest string that contains all strings from a given set of strings. The SCS problem is NPcomplete [7]. It is ..."
Abstract
 Add to MetaCart
(Show Context)
We design sequential and parallel genetic algorithms, simulated annealing algorithms and improved greedy algorithms for the shortest common superstring problem(SCS), which is to find the shortest string that contains all strings from a given set of strings. The SCS problem is NPcomplete [7]. It is even MAX SNP hard [2] i.e. no polynomialtime algorithm exists, that can approximate the optimum to within a predetermined constant unless P=NP. We compare the above mentioned algorithms applied to several randomly generated test cases. The test results show the superiority of the parallel island genetic algorithm. 1
MIHAI POP
"... Shotgun sequencing is the most widely used technique for determining the DNA sequence of organisms. It involves breaking up the DNA into many small pieces that can be read by automated sequencing machines, then piecing together the original genome using specialized software programs called assembler ..."
Abstract
 Add to MetaCart
(Show Context)
Shotgun sequencing is the most widely used technique for determining the DNA sequence of organisms. It involves breaking up the DNA into many small pieces that can be read by automated sequencing machines, then piecing together the original genome using specialized software programs called assemblers. Due to the large amounts of data being generated and to the complex structure of most