## Computability of models for sequence assembly (2007)

### Cached

### Download Links

- [www.cs.toronto.edu]
- [www.cs.utoronto.ca]
- [www.cs.utoronto.ca]
- [www.cs.toronto.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In WABI |

Citations: | 9 - 1 self |

### BibTeX

@INPROCEEDINGS{Medvedev07computabilityof,

author = {Paul Medvedev and Konstantinos Georgiou and Gene Myers and Michael Brudno},

title = {Computability of models for sequence assembly},

booktitle = {In WABI},

year = {2007},

pages = {289--301}

}

### OpenURL

### Abstract

pashadag,cgeorg,brudno¥ Abstract. Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about the complexity of these models for sequence assembly. In the first part, we show sequence assembly to be NP-hard under two different models: string graphs and de Bruijn graphs. Together with an earlier result on the NP-hardness of overlap graphs, this demonstrates that all of the popular graph-theoretic sequence assembly paradigms are NP-hard. In our second result, we give the first, to our knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the double-strandedness of DNA. We solve the Chinese Postman Problem on bidirected graphs using bidirected flow techniques and show to how to use it to find the shortest doublestranded DNA sequence which contains a given set of ¦-long words. This algorithm has applications to sequencing by hybridization and short read assembly. 1

### Citations

10959 |
Computers and Intractability: A Guide to the Theory of NP-Completeness
- Garey, Johnson
- 1979
(Show Context)
Citation Context ...� say is a superstring of � if for ������� all is a substring � of . The Shortest Common Superstring (SCS) problem is to find the shortest superstring of � . It was proven to be NP-hard � ������� for =-=[4, 5]-=-. We define the de Bruijn ��������� graph as a 3sA B W X Y E Z D Fig. 2. This is an example of a bidirected graph and its incidence matrix. We draw an edge that is positive incident to a vertex using ... |

120 |
An eulerian path approach to dna fragment assembly
- Pevzner, Tang, et al.
(Show Context)
Citation Context ...-collapsing the repeats. One way of solving this problem is to build representative strings or structures for each repeat, and allow the assembly algorithm to use these multiple times. Pevzner et al. =-=[12]-=- had the insight that by dividing the reads into shorter § -long stretches (called § -mers), all of the instances of a repeat collapse into a single set of vertices. They represent each read as a walk... |

78 | Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement
- Kececioglu, Sankoff
- 1995
(Show Context)
Citation Context ... edges correspond to the three possible ways in which the overlap can occur (see Figure 1B & C). Bidirected graphs were further used for sequence assembly in [9, 10] and to model breakpoint graphs in =-=[7]-=-. Remarkably, however, bidirected graphs have been studied within the context of graph theory already in the 1960s when Edmonds formulated the problem of bidirected flow (a generalization of network f... |

66 |
1-Tuple DNA sequencing: computer analysis
- Pevzner
- 1989
(Show Context)
Citation Context ... extend Gabow’s and Edmonds’ work to give a polynomial time algorithm for solving the Chinese Postman Problem in bidirected graphs. By combining this algorithm with Pevzner’s work on de Bruijn graphs =-=[11, 12]-=- and Kececioglu’s work on modeling strandedness with bidirected graphs [8], we show how it can be used to find the shortest (double-stranded) DNA sequence with a given set of § -long DNA fragments. To... |

43 |
On finding minimal length superstrings
- Gallant, Maier, et al.
- 1980
(Show Context)
Citation Context ...est sequence that contains every read as a substring. This assumption lead to the casting of the genome assembly problem as the Shortest Common Superstring (SCS) problem, which is known to be NP-hard =-=[4]-=-. The problem of modeling genome assembly as the SCS problem is that most genomes have repeats – multiple identical, or nearly identical, stretches of DNA, while the SCS solution would represent each ... |

43 | Combinatorial algorithms for DNA sequence assembly
- Kececioglu, Myers
- 1995
(Show Context)
Citation Context ...ead. The EULER method [12] uses both the reads and their reverse-complements to build the de Bruijn graph and searches heuristically for two “complementary” paths. In the work of Kececioglu and Myers =-=[6]-=- strand selection for a read is formulated as the NP-hard maximum weight cut problem. 2sIn 1992, Kececioglu [8] introduced an elegant method for dealing with doublestrandedness by modeling overlaps be... |

43 | De novo repeat classification and fragment assembly. Genome Research - Pevzner, Tang, et al. - 2004 |

39 |
An efficient reduction technique for degree-constrained subgraph and bidirected network flow problems
- Gabow
- 1983
(Show Context)
Citation Context ...dmonds formulated the problem of bidirected flow (a generalization of network flow to bidirected graphs) and showed it equivalent to perfect b-matchings [1]. Edmonds’ work was later extended by Gabow =-=[3]-=-, who gave the fastest to-date algorithm for bidirected flow. In our second result, we extend Gabow’s and Edmonds’ work to give a polynomial time algorithm for solving the Chinese Postman Problem in b... |

37 | Toward simplifying and accurately formulating fragment assembly
- Myers
- 1995
(Show Context)
Citation Context ...h endpoints. The three types of bidirected edges correspond to the three possible ways in which the overlap can occur (see Figure 1B & C). Bidirected graphs were further used for sequence assembly in =-=[9, 10]-=- and to model breakpoint graphs in [7]. Remarkably, however, bidirected graphs have been studied within the context of graph theory already in the 1960s when Edmonds formulated the problem of bidirect... |

27 |
Exact and Approximation Algorithms for DNA Sequence Reconstruction
- Kececioglu
- 1991
(Show Context)
Citation Context ...earches heuristically for two “complementary” paths. In the work of Kececioglu and Myers [6] strand selection for a read is formulated as the NP-hard maximum weight cut problem. 2sIn 1992, Kececioglu =-=[8]-=- introduced an elegant method for dealing with doublestrandedness by modeling overlaps between DNA molecules using a bidirected graph. Each read is represented by a single node, and each overlap (edge... |

26 | Building fragment assembly string graphs
- Myers
(Show Context)
Citation Context ...y. This approach was later expanded to A-Bruijn graphs [13], where the initial subdivision into § -mers is not necessary, but the basic algorithmic problem of searching for a superwalk remains. Myers =-=[10]-=- provides for an alternative model of sequence assembly, using a string graph. Instead of dividing the reads into § -mers, he builds an overlap graph – a graph where nodes correspond to reads and edge... |

2 |
An introduction to matching. Notes of engineering summer conference
- Edmonds
- 1967
(Show Context)
Citation Context ...ext of graph theory already in the 1960s when Edmonds formulated the problem of bidirected flow (a generalization of network flow to bidirected graphs) and showed it equivalent to perfect b-matchings =-=[1]-=-. Edmonds’ work was later extended by Gabow [3], who gave the fastest to-date algorithm for bidirected flow. In our second result, we extend Gabow’s and Edmonds’ work to give a polynomial time algorit... |