Results 1 - 10
of
40
A novel method for multiple alignment of sequences with repeated and shuffled elements
, 2004
"... ..."
Fragment assembly with short reads
- Bioinformatics
, 2004
"... Motivation: Current DNA sequencing technology produces reads of about 500-750 base pairs (bp) with typical coverage under 10X. New sequencing technologies are emerging that produce shorter reads (length 80-200 bp) but allow one to generate significantly higher coverage (30X and higher) at low cost. ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
Motivation: Current DNA sequencing technology produces reads of about 500-750 base pairs (bp) with typical coverage under 10X. New sequencing technologies are emerging that produce shorter reads (length 80-200 bp) but allow one to generate significantly higher coverage (30X and higher) at low cost. Modern assembly programs and error correction routines have been tuned to work well with current read technology, but were not designed for assembly of short reads. Results: We analyze the limitations of assembling reads generated by these new technologies, and present a routine for basecalling in reads prior to their assembly. We demonstrate that while it is feasible to assemble such short reads, the resulting contigs will require significant (if not prohibitive) finishing efforts. Contact:
The Directed Chinese Postman Problem
- IEEE Trans. On Magnetics
, 1976
"... This paper reviews the wide range of applications of the problem and presents complete, executable code to solve it for the case of directed multigraphs. A variation called the `open Chinese Postman Problem' is also introduced and solved. Although optimisations are possible, no substantially better ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This paper reviews the wide range of applications of the problem and presents complete, executable code to solve it for the case of directed multigraphs. A variation called the `open Chinese Postman Problem' is also introduced and solved. Although optimisations are possible, no substantially better algorithms are likely.
An eulerian path approach to global multiple alignment for DNA sequences
- J. Comput. Biol
, 2003
"... With the rapid increase in the dataset of genome sequences, the multiple sequence alignment problem is increasingly important and frequently involves the alignment of a large number of sequences. Many heuristic algorithms have been proposed to improve the speed of computation and the quality of alig ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
With the rapid increase in the dataset of genome sequences, the multiple sequence alignment problem is increasingly important and frequently involves the alignment of a large number of sequences. Many heuristic algorithms have been proposed to improve the speed of computation and the quality of alignment. We introduce a novel approach that is fundamentally different from all currently available methods. Our motivation comes from the Eulerian method for fragment assembly in DNA sequencing that transforms all DNA fragments into a de Bruijn graph and then reduces sequence assembly to a Eulerian path problem. The paper focuses on global multiple alignment of DNA sequences, where entire sequences are aligned into one con � guration. Our main result is an algorithm with almost linear computational speed with respect to the total size (number of letters) of sequences to be aligned. Five hundred simulated sequences (averaging 500 bases per sequence and as low as 70% pairwise identity) have been aligned within three minutes on a personal computer, and the quality of alignment is satisfactory. As a result, accurate and simultaneous alignment of thousands of long sequences within a reasonable amount of time becomes possible. Data from an Arabidopsis sequencing project is used to demonstrate the performance. Key words: multiple sequence alignment, de Bruijn graph, Eulerian path. 1.
Ab Initio Whole Genome Shotgun Assembly With Mated Short Reads
"... Abstract. Next Generation Sequencing (NGS) technologies are capable of reading millions of short DNA sequences both quickly and cheaply. While these technologies are already being used for resequencing individuals once a reference genome exists, it has not been shown if it is possible to use them fo ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract. Next Generation Sequencing (NGS) technologies are capable of reading millions of short DNA sequences both quickly and cheaply. While these technologies are already being used for resequencing individuals once a reference genome exists, it has not been shown if it is possible to use them for ab initio genome assembly. In this paper, we give a novel network flow-based algorithm that, by taking advantage of the high coverage provided by NGS, accurately estimates the copy counts of repeats in a genome. We also give a second algorithm that combines the predicted copy-counts with mate-pair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from E. Coli and predict copy-counts with extremely high accuracy, while assembling long contigs. 1
De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Research 19
, 2009
"... De novo fragment assembly with short mate-paired reads: Does ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
De novo fragment assembly with short mate-paired reads: Does
Computability of models for sequence assembly
- In WABI
, 2007
"... pashadag,cgeorg,brudno¥ Abstract. Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
pashadag,cgeorg,brudno¥ Abstract. Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about the complexity of these models for sequence assembly. In the first part, we show sequence assembly to be NP-hard under two different models: string graphs and de Bruijn graphs. Together with an earlier result on the NP-hardness of overlap graphs, this demonstrates that all of the popular graph-theoretic sequence assembly paradigms are NP-hard. In our second result, we give the first, to our knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the double-strandedness of DNA. We solve the Chinese Postman Problem on bidirected graphs using bidirected flow techniques and show to how to use it to find the shortest doublestranded DNA sequence which contains a given set of ¦-long words. This algorithm has applications to sequencing by hybridization and short read assembly. 1
Correcting errors in shotgun sequences
- Nucleic Acids Res
, 2003
"... Sequencing errors in combination with repeated regions cause major problems in shotgun sequencing, mainly due to the failure of assembly programs to distinguish single base differences between repeat copies from erroneous base calls. In this paper, a new strategy designed to correct errors in shotgu ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Sequencing errors in combination with repeated regions cause major problems in shotgun sequencing, mainly due to the failure of assembly programs to distinguish single base differences between repeat copies from erroneous base calls. In this paper, a new strategy designed to correct errors in shotgun sequence data using de®ned nucleotide positions, DNPs, is presented. The method distinguishes single base differences from sequencing errors by analyzing multiple alignments consisting of a read and all its overlaps with other reads. The construction of multiple alignments is performed using a novel pattern matching algorithm, which takes advantage of the symmetry between indices that can be computed for similar words of the same length. This allows for rapid construction of multiple alignments, with no previous pair-wise matching of sequence reads required. Results from a C++ implementation of this method show that up to 99 % of sequencing errors can be corrected, while up to 87 % of the single base differences remain and up to 80 % of the corrected reads contain at most one error. The results also show that the method outperforms the error correction method used in the EULER assembler. The prototype software, MisEd, is freely available from the authors for academic use.
Design patterns for efficient graph algorithms in mapreduce
- In MLG ’10: Proceedings of the Eighth Workshop on Mining and Learning with Graphs
, 2010
"... Graphs are analyzed in many important contexts, including ranking search results based on the hyperlink structure of the world wide web, module detection of proteinprotein interaction networks, and privacy analysis of social networks. Many graphs of interest are difficult to analyze because of their ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Graphs are analyzed in many important contexts, including ranking search results based on the hyperlink structure of the world wide web, module detection of proteinprotein interaction networks, and privacy analysis of social networks. Many graphs of interest are difficult to analyze because of their large size, often spanning millions of vertices and billions of edges. As such, researchers have increasingly turned to distributed solutions. In particular, MapReduce has emerged as an enabling technology for large-scale graph processing. However, existing best practices for MapReduce graph algorithms have significant shortcomings that limit performance, especially with respect to partitioning, serializing, and distributing the graph. In this paper, we present three design patterns that address these issues and can be used to accelerate a large class of graph algorithms based on message passing, exemplified by PageRank. Experiments show that the application of our design patterns reduces the running time of PageRank on a web graph with 1.4 billion edges by 69%. 1.
A comparison of DNA fragment assembly algorithms
- Proc. of the Int’l Conf. on Mathematics and Engineering Techniques in Medicine and Biological Sciences
, 2004
"... As more research centers embark on sequencing new genomes, the problem of DNA fragment assembly for shotgun sequencing is growing in importance and complexity. Accurate and fast assembly is a crucial part of any sequencing project and many algorithms have been developed to tackle it. Since the DNA f ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
As more research centers embark on sequencing new genomes, the problem of DNA fragment assembly for shotgun sequencing is growing in importance and complexity. Accurate and fast assembly is a crucial part of any sequencing project and many algorithms have been developed to tackle it. Since the DNA fragment assembly problem is NP-hard, exact solutions are very difficult to obtain. In this work, we present four heuristic algorithms, which we designed, implemented and tested. We compare the algorithms and the data structures of the four heuristics and present results of our experiments. We also compare our results with the assemblies produced by the wellknown packages: PHRAP and CAP3.

