Results 1 - 10
of
1,653
Ultrafast and memoryefficient alignment of short DNA sequences to the human genome.
, 2009
"... ..."
Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53–59
, 2008
"... ..."
A greedy algorithm for aligning DNA sequences
- J. COMPUT. BIOL
, 2000
"... For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy a ..."
Abstract
-
Cited by 585 (16 self)
- Add to MetaCart
(Show Context)
For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information.
Consed: a graphical tool for sequence finishing
- Genome Res
, 1998
"... Sequencing of large clones or small genomes is generally done by the shotgun approach Although complete automation of data processing in shotgun sequencing is clearly desirable and may be feasible in the near future, at present finishing still requires extensive human intervention. This is customa ..."
Abstract
-
Cited by 207 (0 self)
- Add to MetaCart
(Show Context)
Sequencing of large clones or small genomes is generally done by the shotgun approach Although complete automation of data processing in shotgun sequencing is clearly desirable and may be feasible in the near future, at present finishing still requires extensive human intervention. This is customarily done by use of an interactive computer program. The program (which is usually called a sequence editor) must, at a minimum, display the aligned sequences of the assembled reads and allow the user to access underlying raw data (e.g., the fluorescence trace profiles from automated sequencers) and other information that may be useful in evaluating the base calls and assembly. It should also facilitate the detection of regions where additional data are needed, help in determining reagents (e.g., sequencing primers and templates) needed to obtain these data, and allow editing to correct errors. A good editor makes the finishing process as efficient and painless as possible. The display should indicate, with appropriate size and color emphases, the most important information about the assembly, with less important information being easily accessible with a minimum of effort, and the user should have the ability to change which information is shown, on the basis of the task at hand. Locations requiring human inspection should be efficiently pinpointed. The user manipulations required to accomplish a given task should be as natural and efficient as possible. The program should allow customization to suit individual preferences, facilitate quick detection and correction of user mistakes, and be easy to learn. It should have a quick response time and allow recovery from hardware and software problems on the users's computer. A number of editors are available commercially or from academic developers. The pioneering work in both assembly and editing was done by Staden, and his gap4 program We have developed an editor consed that is intended to be used in conjunction with several other sequence data processing programs developed by our group, including the base-calling program phred
Genome Sequence Assembly Using Trace Signals and Additional Sequence Information
"... Motivation: This article presents a method for assembling shotgun sequences which primarily uses high confidence regions whilst taking advantage of additional available information such as low confidence regions, quality values or repetitive region tags. Conflict situations are resolved with routine ..."
Abstract
-
Cited by 195 (1 self)
- Add to MetaCart
Motivation: This article presents a method for assembling shotgun sequences which primarily uses high confidence regions whilst taking advantage of additional available information such as low confidence regions, quality values or repetitive region tags. Conflict situations are resolved with routines for analysing trace signals.
ARACHNE: a whole-genome shotgun assembler
- Genome Res
, 2002
"... We describe a new computer system, called ARACHNE, for assembling genome sequence using paired-end whole-genome shotgun reads. ARACHNE has several key features, including an efficient and sensitive procedure for finding read overlaps, a procedure for scoring overlaps that achieves high accuracy by c ..."
Abstract
-
Cited by 177 (7 self)
- Add to MetaCart
(Show Context)
We describe a new computer system, called ARACHNE, for assembling genome sequence using paired-end whole-genome shotgun reads. ARACHNE has several key features, including an efficient and sensitive procedure for finding read overlaps, a procedure for scoring overlaps that achieves high accuracy by correcting errors before assembly, read merger based on forward-reverse links, and detection of repeat contigs by forward-reverse link inconsistency. To test ARACHNE, we created simulated reads providing ∼10-fold coverage of the genomes of H. influenzae, S. cerevisiae, and D. melanogaster, as well as human chromosomes 21 and 22. The assemblies of these simulated reads yielded nearly complete coverage of the respective genomes, with a small number of contigs joined into a smaller number of supercontigs (or scaffolds). For example, analysis of the D. melanogaster genome yielded ∼98 % coverage with an N50 contig length of 324 kb and an N50 supercontig length of 5143 kb. The assembly accuracy was high, although not perfect: small errors occurred at a frequency of roughly 1 per 1 Mb (typically, deletion of ∼1 kb in size), with a very small number of other misassemblies. The assembly was rapid: the Drosophila assembly required only 21 hours on a single 667 MHz processor and used 8.4 Gb of memory. Shotgun sequencing was introduced by Sanger et al. (1977) and has remained the mainstay of genome sequence assembly for nearly 25 years. The method involves obtaining random
Genome sequence and comparative analysis of the solvent-producing bacterium Clostridium acetobutylicum
- J
, 2001
"... The genome sequence of the solvent-producing bacterium Clostridium acetobutylicum ATCC 824 has been determined by the shotgun approach. The genome consists of a 3.94-Mb chromosome and a 192-kb megaplasmid that contains the majority of genes responsible for solvent production. Comparison of C. acetob ..."
Abstract
-
Cited by 133 (5 self)
- Add to MetaCart
(Show Context)
The genome sequence of the solvent-producing bacterium Clostridium acetobutylicum ATCC 824 has been determined by the shotgun approach. The genome consists of a 3.94-Mb chromosome and a 192-kb megaplasmid that contains the majority of genes responsible for solvent production. Comparison of C. acetobutylicum to Bacillus subtilis reveals significant local conservation of gene order, which has not been seen in comparisons of other genomes with similar, or, in some cases closer, phylogenetic proximity. This conservation allows the prediction of many previously undetected operons in both bacteria. However, the C. acetobutylicum genome also contains a significant number of predicted operons that are shared with distantly related bacteria and archaea but not with B. subtilis. Phylogenetic analysis is compatible with the dissemination of such operons by horizontal transfer. The enzymes of the solventogenesis pathway and of the cellulosome of C. acetobutylicum comprise a new set of metabolic capacities not previously represented in the collection of complete genomes. These enzymes show a complex pattern of evolutionary affinities, emphasizing the role of lateral gene exchange
Nast: a multiple sequence alignment server for comparative analysis of 16s rrna genes. Nucleic Acids Res, 34(Web Server issue
, 2006
"... of 16S rRNA genes ..."
(Show Context)
Improved base calling for the Illumina Genome Analyzer using
, 2009
"... Software ..."
(Show Context)
The Sanger FASTQ file format for sequences with quality scores
- and the Solexa/Illumina FASTQ variants,”Nucleic Acids Research
, 2009
"... FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the or ..."
Abstract
-
Cited by 68 (0 self)
- Add to MetaCart
(Show Context)
FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/ Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.