Results 1 - 10
of
24
Trie-Based Data Structures for Sequence Assembly (Extended Abstract)
- The Eighth Symposium on Combinatorial Pattern Matching
, 1997
"... Ting Chen Steven S. Skiena Department of Computer Science State University of New York Stony Brook, NY 11794-4400 ftichen---skienag@cs.sunysb.edu January 27, 1997 1 ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Ting Chen Steven S. Skiena Department of Computer Science State University of New York Stony Brook, NY 11794-4400 ftichen---skienag@cs.sunysb.edu January 27, 1997 1
A new approach to fragment assembly in DNA sequencing
- In Proc. 5th Annual International Conference on Computational Molecular Biology (RECOMB ’01
, 2001
"... For the last twenty years fragment assembly in DNA sequencing followed the “overlap- layout- consensus ” paradigm that is used in all currently available assembly tools. Although this approach proved to be useful in assembling clones, it faces difficulties in genomic shotgun assembly: the existing a ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
For the last twenty years fragment assembly in DNA sequencing followed the “overlap- layout- consensus ” paradigm that is used in all currently available assembly tools. Although this approach proved to be useful in assembling clones, it faces difficulties in genomic shotgun assembly: the existing algorithms make assembly errors and are often unable to resolve repeats even in prokaryotic genomes. Biologists are well-aware of these errors and are forced to carry additional experiments to verify the assembled contigs. We abandon the classical “overlap- layout- consensus ” approach in favor of a new Eulerian Superpath approach that, for the first time, resolves the problem of repeats in fragment assembly. Our main result is the reduction of the fragment assembly to a variation of the classical Eulerian path problem. This reduction opens new possibilities for repeat resolution and allows one to generate error-free solutions of the large-scale fragment assembly problems. The major improvement of EULER over other algorithms is that it resolves all repeats except long perfect repeats that are theoretically impossible to resolve without additional experiments. 1.
Computability of models for sequence assembly
- In WABI
, 2007
"... pashadag,cgeorg,brudno¥ Abstract. Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
pashadag,cgeorg,brudno¥ Abstract. Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about the complexity of these models for sequence assembly. In the first part, we show sequence assembly to be NP-hard under two different models: string graphs and de Bruijn graphs. Together with an earlier result on the NP-hardness of overlap graphs, this demonstrates that all of the popular graph-theoretic sequence assembly paradigms are NP-hard. In our second result, we give the first, to our knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the double-strandedness of DNA. We solve the Chinese Postman Problem on bidirected graphs using bidirected flow techniques and show to how to use it to find the shortest doublestranded DNA sequence which contains a given set of ¦-long words. This algorithm has applications to sequencing by hybridization and short read assembly. 1
A comparison of DNA fragment assembly algorithms
- Proc. of the Int’l Conf. on Mathematics and Engineering Techniques in Medicine and Biological Sciences
, 2004
"... As more research centers embark on sequencing new genomes, the problem of DNA fragment assembly for shotgun sequencing is growing in importance and complexity. Accurate and fast assembly is a crucial part of any sequencing project and many algorithms have been developed to tackle it. Since the DNA f ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
As more research centers embark on sequencing new genomes, the problem of DNA fragment assembly for shotgun sequencing is growing in importance and complexity. Accurate and fast assembly is a crucial part of any sequencing project and many algorithms have been developed to tackle it. Since the DNA fragment assembly problem is NP-hard, exact solutions are very difficult to obtain. In this work, we present four heuristic algorithms, which we designed, implemented and tested. We compare the algorithms and the data structures of the four heuristics and present results of our experiments. We also compare our results with the assemblies produced by the wellknown packages: PHRAP and CAP3.
Shotgun Protein Sequencing ASSEMBLY OF PEPTIDE TANDEM MASS SPECTRA FROM MIXTURES OF MODIFIED PROTEINS * □S
"... Despite significant advances in the identification of known proteins, the analysis of unknown proteins by MS/MS still remains a challenging open problem. Although Klaus Biemann recognized the potential of MS/MS for sequencing of unknown proteins in the 1980s, low throughput Edman degradation followe ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Despite significant advances in the identification of known proteins, the analysis of unknown proteins by MS/MS still remains a challenging open problem. Although Klaus Biemann recognized the potential of MS/MS for sequencing of unknown proteins in the 1980s, low throughput Edman degradation followed by cloning still remains the main method to sequence unknown proteins. The automated interpretation of MS/MS spectra has been limited by a focus on individual spectra and has not capitalized on the information contained in spectra of overlapping peptides. Indeed the powerful shotgun DNA sequencing strategies have not been extended to automated protein sequencing. We demonstrate, for the first time, the feasibility of automated shotgun protein sequencing of protein mixtures by utilizing MS/MS spectra
A preprocessor for shotgun assembly of large genomes
- Journal of Computational Biology
, 2004
"... The whole-genome shotgun (WGS) assembly technique has been remarkably successful in efforts to determine the sequence of bases that make up a genome. WGS assembly begins with a large collection of short fragments that have been selected at random from a genome. The sequence of bases at each end of t ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The whole-genome shotgun (WGS) assembly technique has been remarkably successful in efforts to determine the sequence of bases that make up a genome. WGS assembly begins with a large collection of short fragments that have been selected at random from a genome. The sequence of bases at each end of the fragment is determined, albeit imprecisely, resulting in a sequence of letters called a “read”. Each letter in a read is assigned a quality value, which estimates the probability that a sequencing error occurred in determining that letter. Reads are typically cut off after about 500 letters, where sequencing errors become endemic. We report on a set of procedures that (1) corrects most of the sequencing errors, (2) changes quality values accordingly, and (3) produces a list of “overlaps”, i.e., pairs of reads that plausibly come from overlapping parts of the genome. Our procedures, which we call collectively the “UMD Overlapper”, can be run iteratively and as a preprocessor for other assemblers. We tested the UMD Overlapper on Celera’s Drosophila reads. When we replaced Celera’s overlap procedures in the front end of their assembler, it was able to produce a significantly improved genome.
How Good is Genome-Level Fragment Assembly? (Extended Abstract)
, 1997
"... ) Ting Chen Steven S. Skiena y Department of Computer Science State University of New York Stony Brook, NY 11794-4400 ftichenjskienag@cs.sunysb.edu October 17, 1997 1 Introduction In late Summer 1997, groups at Brookhaven National Laboratory (BNL) and the Institute for Genome Research (TIGR) in ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
) Ting Chen Steven S. Skiena y Department of Computer Science State University of New York Stony Brook, NY 11794-4400 ftichenjskienag@cs.sunysb.edu October 17, 1997 1 Introduction In late Summer 1997, groups at Brookhaven National Laboratory (BNL) and the Institute for Genome Research (TIGR) independently completed sequencing the genome of Borrelia burgdorferi, the bacterium which causes Lyme disease. As part of the Brookhaven team, lead by Dr. William Studier, we have developed a new fragment assembler, STROLL, which is capable of assembling megabase genome sequencing projects. Why did we develop yet another fragment assembler? At the time of our beginning this project (January 1996), the Brookhaven group did not have access to an adequate assembler for assembling data using their primer walking strategy [6]. Indeed, historically, fragment assemblers did not prove very portable across different sequencing projects. Each large sequencing team developed its own sequencing strategy...
Genome Sequence Assembly Using Trace Signals and Additional Sequence Information
"... Motivation: This article presents a method for assembling shotgun sequences which primarily uses high confidence regions whilst taking advantage of additional available information such as low confidence regions, quality values or repetitive region tags. Conflict situations are resolved with routine ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Motivation: This article presents a method for assembling shotgun sequences which primarily uses high confidence regions whilst taking advantage of additional available information such as low confidence regions, quality values or repetitive region tags. Conflict situations are resolved with routines for analysing trace signals.
! Sequence Comparison 101
"... DNA assembly problem formulation Lander-Waterman sampling analysis Overlap/Layout/Consensus paradigm ..."
Abstract
- Add to MetaCart
DNA assembly problem formulation Lander-Waterman sampling analysis Overlap/Layout/Consensus paradigm

