Results 1 -
7 of
7
A new algorithm for DNA sequence assembly
- Journal of Computational Biology
, 1995
"... Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a ‘ well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are base ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a ‘ well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion.of pairwisk fragment overlap. * While shotgun sequencing infers a DNA sequence given the sequences of overlapping frag-ments, a recent and complementary method, called sequencing by hybridization (SBH), in-fers a DNA sequence given the set of oligomers that represents all subwords of some fixed length, k. In this paper,. we propose a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods. Based on our preliminary investigations, the algorithm promises- to be very fast and practical for DNA sequence assembly.
AMASS: A Structured Pattern Matching Approach to Shotgun Sequence Assembly
- J. Comput. Biol
, 1999
"... In this paper, we propose an efficient, reliable shotgun sequence assembly algorithm based on a fingerprinting scheme that is robust to both noise and repetitive sequences in the data. Our algorithm uses exact matches of short patterns randomly selected from fragment data to identify fragment overla ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
In this paper, we propose an efficient, reliable shotgun sequence assembly algorithm based on a fingerprinting scheme that is robust to both noise and repetitive sequences in the data. Our algorithm uses exact matches of short patterns randomly selected from fragment data to identify fragment overlaps, construct an overlap map, and finally deliver a consensus sequence. We show how statistical clues made explicit in our approach can easily be exploited to correctly assemble results even in the presence of extensive repetitive sequences. Our approach is exceptionally fast in practice: e.g., we have successfully assembled a whole Mycoplasma genitalium genome (approximately 580 kbps) in roughly 8 minutes of 64MB 200MHz Pentium Pro CPU time from real shotgun data, where most existing algorithms can be expected to run for several hours to a day on the same data. Moreover, experiments with shotgun data synthetically prepared from real DNA sequences from a wide range of organisms (including h...
Improving the Quality of Automatic DNA Sequence Assembly using Fluorescent Trace-Data Classifications
- Proceedings, Fourth International Conference on Intelligent Systems for Molecular Biology
, 1996
"... Virtually all large-scale sequencing projects use automatic sequence-assembly programs to aid in the determination of DNA sequences. The computer-generated assemblies require substantial hand-editing to transform them into submissions for GenBank. As the size of sequencing projects increases, i ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Virtually all large-scale sequencing projects use automatic sequence-assembly programs to aid in the determination of DNA sequences. The computer-generated assemblies require substantial hand-editing to transform them into submissions for GenBank. As the size of sequencing projects increases, it becomes essential to improve the quality of the automated assemblies so that this timeconsuming hand-editing may be reduced. Current ABI sequencing technology uses base calls made from fluorescently-labeled DNA fragments run on gels. We present a new representation for the fluorescent trace data associated with individual base calls. This representation can be used before, during, and after fragment assembly to improve the quality of assemblies. We demonstrate one such use -- end-trimming of sub-optimal data -- that results in a significant improvement in the quality of subsequent assemblies. Introduction A fundamental goal of the Human Genome Project is to determine the seque...
Algorithms for Sequence Alignment
, 2002
"... Sequence alignment is an important tool for describing relationships between sequences. Many sequence alignment algorithms exist, differing in efficiency, and in their models of the sequences and of the relationship between sequences. The focus of this thesis is on algo-rithms for the optimal alignm ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Sequence alignment is an important tool for describing relationships between sequences. Many sequence alignment algorithms exist, differing in efficiency, and in their models of the sequences and of the relationship between sequences. The focus of this thesis is on algo-rithms for the optimal alignment of two or three sequences of biological data, particularly DNA sequences. The algorithms are discussed with particular emphasis on space and time complexity. A divide-and-conquer method is presented for use with a number of different alignment algo-rithms. This method may be used to reduce the space complexity of an alignment algorithm with little or no effect to the time complexity. The advantages of this divide-and-conquer method include its simplicity and the ease with which it can be applied to many different alignment algorithms. These advantages are demonstrated by using the divide-and-conquer method in conjunction with several known alignment algorithms. An efficient alignment algorithm is presented for the important problem of optimally aligning three sequences using a linear function for costing gaps in the alignment. For sequences
Computational Methods for Fast and Accurate DNA Fragment Assembly
, 1999
"... for their love, laughter, and support As advances in technology result in the production of increasing amounts of DNA sequencing data in decreasing amounts of time, it is imperative that computational methods are developed that allow data analysis to keep pace. In this dissertation, I present method ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
for their love, laughter, and support As advances in technology result in the production of increasing amounts of DNA sequencing data in decreasing amounts of time, it is imperative that computational methods are developed that allow data analysis to keep pace. In this dissertation, I present methods that improve the speed and accuracy of DNA fragment assembly. One critical characteristic of automatic methods for fragment assembly is that they must be accurate. Currently, to ensure accurate sequences, the data that underlies questionable base calls must be examined by human editors so that the correct base call can be determined. This manual process is both error-prone and time-consuming. Automatic methods that yield high accuracy and few questionable calls can reduce errors and lessen the need for manual inspections. In my work, I developed a method, Trace-Evidence, that automatically produces highly accurate consensus sequences, even with few aligned sequences. Most assembly programs analyze only base calls when determining a consensus sequence. The key to the high accuracy is that I incorporate morphological information about the
Genetic Algorithms, Operators, and DNA Fragment Assembly
- Machine Learning
, 1995
"... . We study different genetic algorithm operators for one permutationproblem associated with the Human Genome Project---the assembly of DNA sequence fragments from a parent clone whose sequence is unknown into a consensus sequence corresponding to the parent sequence. The sorted-order representation, ..."
Abstract
- Add to MetaCart
. We study different genetic algorithm operators for one permutationproblem associated with the Human Genome Project---the assembly of DNA sequence fragments from a parent clone whose sequence is unknown into a consensus sequence corresponding to the parent sequence. The sorted-order representation, which does not require specialized operators, is compared with a more traditional permutation representation, which does require specialized operators. The two representations and their associated operators are compared on problems ranging from 2K to 34K base pairs (KB). Edge-recombination crossover used in conjunction with several specialized operators is found to perform best in these experiments; these operators solved a 10KB sequence, consisting of 177 fragments, with no manual intervention. Natural building blocks in the problem are exploited at progressively higher levels through "macro-operators." This significantly improves performance. Keywords: genetic algorithms, DNA fragment as...
DNA Sequence Assembly and Genetic Algorithms
- Proceedings, Third International Conference on Intelligent Systems for Molecular Biology
, 1995
"... Applying genetic algorithms to DNA sequence assembly is not a straightforward process. Significantly improved results in terms of performance, quality of results, and the scaling of applicability have been realized through non-standard and even counter-intuitive parameter settings. Specifically ..."
Abstract
- Add to MetaCart
Applying genetic algorithms to DNA sequence assembly is not a straightforward process. Significantly improved results in terms of performance, quality of results, and the scaling of applicability have been realized through non-standard and even counter-intuitive parameter settings. Specifically, the solution time for a 10kb data set was reduced by an order of magnitude, and a 20kb data set that was previously unsolved by the genetic algorithm was solved in a time that represents only a linear increase from the 10kb data set.

