Results 1 - 10
of
23
Combinatorial algorithms for DNA sequence assembly
- Algorithmica
, 1993
"... The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The seq ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The sequence reconstruction problem that we take as our formulation of DNA sequence assembly is a variation of the shortest common superstring problem, complicated by the presence of sequencing errors and reverse complements of fragments. Since the simpler superstring problem is NP-hard, any efficient reconstruction procedure must resort to heuristics. In this paper, however, a four phase approach based on rigorous design criteria is presented, and has been found to be very accurate in practice. Our method is robust in the sense that it can accommodate high sequencing error rates and list a series of alternate solutions in the event that several appear equally good. Moreover it uses a limited form ...
A new algorithm for DNA sequence assembly
- Journal of Computational Biology
, 1995
"... Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a ‘ well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are base ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a ‘ well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion.of pairwisk fragment overlap. * While shotgun sequencing infers a DNA sequence given the sequences of overlapping frag-ments, a recent and complementary method, called sequencing by hybridization (SBH), in-fers a DNA sequence given the set of oligomers that represents all subwords of some fixed length, k. In this paper,. we propose a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods. Based on our preliminary investigations, the algorithm promises- to be very fast and practical for DNA sequence assembly.
Toward Simplifying and Accurately Formulating Fragment Assembly
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 1995
"... The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequence ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequences this objective produces answers that are overcompressed. In this paper, the problem is reformulated as one of finding a maximum-likelihood reconstruction with respect to the 2-sided Kolmogorov-Smirnov statistic, and it is argued that this is a better formulation of the problem. Next the fragment assembly problem is recast in graph-theoretic terms as one of finding a non-cyclic subgraph with certain properties and the objectives of being shortest or maximally-likely are also recast in this framework. Finally, a series of graph reduction transformations are given that dramatically reduce the size of the graph to be explored in practical instances of the problem. This reduction is ...
AMASS: A Structured Pattern Matching Approach to Shotgun Sequence Assembly
- J. Comput. Biol
, 1999
"... In this paper, we propose an efficient, reliable shotgun sequence assembly algorithm based on a fingerprinting scheme that is robust to both noise and repetitive sequences in the data. Our algorithm uses exact matches of short patterns randomly selected from fragment data to identify fragment overla ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
In this paper, we propose an efficient, reliable shotgun sequence assembly algorithm based on a fingerprinting scheme that is robust to both noise and repetitive sequences in the data. Our algorithm uses exact matches of short patterns randomly selected from fragment data to identify fragment overlaps, construct an overlap map, and finally deliver a consensus sequence. We show how statistical clues made explicit in our approach can easily be exploited to correctly assemble results even in the presence of extensive repetitive sequences. Our approach is exceptionally fast in practice: e.g., we have successfully assembled a whole Mycoplasma genitalium genome (approximately 580 kbps) in roughly 8 minutes of 64MB 200MHz Pentium Pro CPU time from real shotgun data, where most existing algorithms can be expected to run for several hours to a day on the same data. Moreover, experiments with shotgun data synthetically prepared from real DNA sequences from a wide range of organisms (including h...
Estannotator: a tool for high throughput est annotation
- Nucleic Acids Res
, 2003
"... In high throughput sequence analysis, it is often necessary to combine the results of contemporary bioinformatics tools, because no individual tool alone computes all the requested information. ESTAnnotator is a tool for the high throughput annotation of expressed sequence tags (ESTs) by automatical ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
In high throughput sequence analysis, it is often necessary to combine the results of contemporary bioinformatics tools, because no individual tool alone computes all the requested information. ESTAnnotator is a tool for the high throughput annotation of expressed sequence tags (ESTs) by automatically running a collection of bioinformatics applications. In the first step, a quality check is performed and repeats, vector parts and low quality sequences are masked. Then successive steps of database searching and EST clustering are performed. Already known transcripts present within mRNA and genomic DNA reference databases are identified. Subsequently, tools for the clustering of anonymous ESTs, and for further database searches at the protein level, are applied. Finally, the outputs of each individual tool are gathered and the relevant results presented in a descriptive summary. ESTAnnotator was already successfully applied for the systematic identification and characterisation of novel human genes involved in cartilage/bone formation, growth, differentiation and homeostasis. ESTAnnotator is available at
Algorithms for Sequence Alignment
, 2002
"... Sequence alignment is an important tool for describing relationships between sequences. Many sequence alignment algorithms exist, differing in efficiency, and in their models of the sequences and of the relationship between sequences. The focus of this thesis is on algo-rithms for the optimal alignm ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Sequence alignment is an important tool for describing relationships between sequences. Many sequence alignment algorithms exist, differing in efficiency, and in their models of the sequences and of the relationship between sequences. The focus of this thesis is on algo-rithms for the optimal alignment of two or three sequences of biological data, particularly DNA sequences. The algorithms are discussed with particular emphasis on space and time complexity. A divide-and-conquer method is presented for use with a number of different alignment algo-rithms. This method may be used to reduce the space complexity of an alignment algorithm with little or no effect to the time complexity. The advantages of this divide-and-conquer method include its simplicity and the ease with which it can be applied to many different alignment algorithms. These advantages are demonstrated by using the divide-and-conquer method in conjunction with several known alignment algorithms. An efficient alignment algorithm is presented for the important problem of optimally aligning three sequences using a linear function for costing gaps in the alignment. For sequences
A preprocessor for shotgun assembly of large genomes
- Journal of Computational Biology
, 2004
"... The whole-genome shotgun (WGS) assembly technique has been remarkably successful in efforts to determine the sequence of bases that make up a genome. WGS assembly begins with a large collection of short fragments that have been selected at random from a genome. The sequence of bases at each end of t ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The whole-genome shotgun (WGS) assembly technique has been remarkably successful in efforts to determine the sequence of bases that make up a genome. WGS assembly begins with a large collection of short fragments that have been selected at random from a genome. The sequence of bases at each end of the fragment is determined, albeit imprecisely, resulting in a sequence of letters called a “read”. Each letter in a read is assigned a quality value, which estimates the probability that a sequencing error occurred in determining that letter. Reads are typically cut off after about 500 letters, where sequencing errors become endemic. We report on a set of procedures that (1) corrects most of the sequencing errors, (2) changes quality values accordingly, and (3) produces a list of “overlaps”, i.e., pairs of reads that plausibly come from overlapping parts of the genome. Our procedures, which we call collectively the “UMD Overlapper”, can be run iteratively and as a preprocessor for other assemblers. We tested the UMD Overlapper on Celera’s Drosophila reads. When we replaced Celera’s overlap procedures in the front end of their assembler, it was able to produce a significantly improved genome.
BioMed Central
, 2006
"... Research article Mariner mutagenesis of Brucella melitensis reveals genes with previously uncharacterized roles in virulence and survival ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Research article Mariner mutagenesis of Brucella melitensis reveals genes with previously uncharacterized roles in virulence and survival
EST2Prot: mapping EST sequences to proteins
- BMC Genomics
, 2006
"... We describe a system (EST2Prot) that uses multiple elements to map EST sequences to their corresponding protein products. EST2Prot uses UniGene clusters, substring analysis, information about protein coding regions in existing DNA sequences and protein database searches to detect protein products re ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We describe a system (EST2Prot) that uses multiple elements to map EST sequences to their corresponding protein products. EST2Prot uses UniGene clusters, substring analysis, information about protein coding regions in existing DNA sequences and protein database searches to detect protein products related to a query EST sequence. Gene Ontology terms, Swiss-Prot keywords, and protein similarity data are used to map the ESTs to functional descriptors. The system is part of the Biozon database and is accessible at
How Good is Genome-Level Fragment Assembly? (Extended Abstract)
, 1997
"... ) Ting Chen Steven S. Skiena y Department of Computer Science State University of New York Stony Brook, NY 11794-4400 ftichenjskienag@cs.sunysb.edu October 17, 1997 1 Introduction In late Summer 1997, groups at Brookhaven National Laboratory (BNL) and the Institute for Genome Research (TIGR) in ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
) Ting Chen Steven S. Skiena y Department of Computer Science State University of New York Stony Brook, NY 11794-4400 ftichenjskienag@cs.sunysb.edu October 17, 1997 1 Introduction In late Summer 1997, groups at Brookhaven National Laboratory (BNL) and the Institute for Genome Research (TIGR) independently completed sequencing the genome of Borrelia burgdorferi, the bacterium which causes Lyme disease. As part of the Brookhaven team, lead by Dr. William Studier, we have developed a new fragment assembler, STROLL, which is capable of assembling megabase genome sequencing projects. Why did we develop yet another fragment assembler? At the time of our beginning this project (January 1996), the Brookhaven group did not have access to an adequate assembler for assembling data using their primer walking strategy [6]. Indeed, historically, fragment assemblers did not prove very portable across different sequencing projects. Each large sequencing team developed its own sequencing strategy...

