• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Base-calling of automated sequencer traces using phred. II. error probabilities (1998)

by B Ewing, P Green
Venue:Genome Res
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 1,653
Next 10 →

Ultrafast and memoryefficient alignment of short DNA sequences to the human genome.

by B Langmead, C Trapnell, M Pop, Salzberg SL , 2009
"... ..."
Abstract - Cited by 1272 (8 self) - Add to MetaCart
Abstract not found

Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53–59

by David R. Bentley, Shankar Balasubramanian, Harold P. Swerdlow, Geoffrey P. Smith, John Milton, Clive G. Brown, Kevin P. Hall, Dirk J. Evers, Colin L. Barnes, Helen R, Jonathan M. Boutell, Jason Bryant, Richard J. Carter, R. Keira Cheetham, Anthony J. Cox, Darren J. Ellis, Michael R. Flatbush, Niall A. Gormley, Sean J, Leslie J. Irving, Mirian S. Karbelashvili, Scott M. Kirk, Heng Li, Klaus S. Maisinger, Lisa J. Murray, Bojan Obradovic, Tobias Ost, Michael L, Mark R. Pratt, Isabelle M. J. Rasolonjatovo, Mark T. Reed, Roberto Rigatti, Chiara Rodighiero, Mark T. Ross, Andrea Sabot, Subramanian V. Sankar, Svilen S. Tzonev, Eric H. Vermaas, Klaudia Walter, Xiaolin Wu, Lu Zhang, Mohammed D. Alam, Carole Anastasi, Ify C. Aniebo, David M. D. Bailey, Iain R, Kevin F. Benson, Claire Bevis, Phillip J. Black, Asha Boodhun, Joe S. Brennan, A. Bridgham, Rob C. Brown, Andrew A. Brown, Dale H. Buermann, Abass A. Bundu, James C. Burrows, Nigel P. Carter, Nestor Castillo, Maria Chiara, E. Catenazzi, R. Neil Cooley, Natasha R. Crake, Olubunmi O. Dada, Konstantinos D, Belen Dominguez-fern, David J. Earnshaw, Ugonna C. Egbujor, David W. Elmore, Sergey S. Etchin, Mark R. Ewan, Milan Fedurco, Louise J. Fraser, Karin V. Fuentes Fajardo, W. Scott Furey, David George, Kimberley J. Gietzen, Colin P, George S. Golda, Philip A. Granieri, David E. Green, David L. Gustafson, Nancy F. Hansen, Kevin Harnish, Christian D. Haudenschild, Narinder I. Heyer, Matthew M. Hims, Johnny T. Ho, Adrian M. Horgan, Katya Hoschler, Steve Hurwitz, Denis V. Ivanov, Maria Q. Johnson, Terena James, T. A. Huw Jones, Tzvetana H. Kerelska, Alan D. Kersey, Irina Khrebtukova, Alex P. Kindwall, Paula I. Kokko-gonzales, Anil Kumar, Marc A. Laurent, Cynthia T. Lawley, Sarah E. Lee, Xavier Lee, Arnold K. Liao, Jennifer A. Loch, Mitch Lok, Shujun Luo, Radhika M. Mammen, John W. Martin, Patrick G. Mccauley, Paul Mcnitt, Parul Mehta, Keith W. Moon, Joe W. Mullens, Taksina Newington, Zemin Ning , 2008
"... ..."
Abstract - Cited by 636 (1 self) - Add to MetaCart
Abstract not found

A greedy algorithm for aligning DNA sequences

by Zheng Zhang, Scott Schwartz, Lukas Wagner, Webb Miller - J. COMPUT. BIOL , 2000
"... For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy a ..."
Abstract - Cited by 585 (16 self) - Add to MetaCart
For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information.
(Show Context)

Citation Context

... to penalize an indel about the same as, or slightly more than, a replacement. This seems consistent with published � gures on the rates of actual errors in both single-pass (low accuracy) sequences (=-=Ewing et al., 1998-=-; Hillier et al., 1996) and high quality data. On the other hand, it is widely appreciated that dynamic-programming alignment algorithms can guarantee a theoretically optimal alignment under a wide va...

Consed: a graphical tool for sequence finishing

by David Gordon , Chris Abajian , Phil Green - Genome Res , 1998
"... Sequencing of large clones or small genomes is generally done by the shotgun approach Although complete automation of data processing in shotgun sequencing is clearly desirable and may be feasible in the near future, at present finishing still requires extensive human intervention. This is customa ..."
Abstract - Cited by 207 (0 self) - Add to MetaCart
Sequencing of large clones or small genomes is generally done by the shotgun approach Although complete automation of data processing in shotgun sequencing is clearly desirable and may be feasible in the near future, at present finishing still requires extensive human intervention. This is customarily done by use of an interactive computer program. The program (which is usually called a sequence editor) must, at a minimum, display the aligned sequences of the assembled reads and allow the user to access underlying raw data (e.g., the fluorescence trace profiles from automated sequencers) and other information that may be useful in evaluating the base calls and assembly. It should also facilitate the detection of regions where additional data are needed, help in determining reagents (e.g., sequencing primers and templates) needed to obtain these data, and allow editing to correct errors. A good editor makes the finishing process as efficient and painless as possible. The display should indicate, with appropriate size and color emphases, the most important information about the assembly, with less important information being easily accessible with a minimum of effort, and the user should have the ability to change which information is shown, on the basis of the task at hand. Locations requiring human inspection should be efficiently pinpointed. The user manipulations required to accomplish a given task should be as natural and efficient as possible. The program should allow customization to suit individual preferences, facilitate quick detection and correction of user mistakes, and be easy to learn. It should have a quick response time and allow recovery from hardware and software problems on the users's computer. A number of editors are available commercially or from academic developers. The pioneering work in both assembly and editing was done by Staden, and his gap4 program We have developed an editor consed that is intended to be used in conjunction with several other sequence data processing programs developed by our group, including the base-calling program phred
(Show Context)

Citation Context

... and software problems on the users’s computer. A number of editors are available commercially or from academic developers. The pioneering work in both assembly and editing was done by Staden, and his gap4 program (Dear and Staden 1991; Bonfield et al. 1995) remains among the best. Commercially available programs include Sequencher, DNAStar Seqman (Swindell and Plasterer 1997), and ABI AutoAssemble. We have developed an editor consed that is intended to be used in conjunction with several other sequence data processing programs developed by our group, including the base-calling program phred (Ewing et al. 1998), the assembler phrap (P. Green, in prep.), and the high-level assembly viewer phrapview. A key feature of these programs is their emphasis on objective criteria to measure the accuracy of sequences and assemblies. In particular, phred uses trace parameters to produce error probabilities associated to each called read base, and phrap uses these together with the read alignments to attach an error probability to each base of the inferred underlying sequence (consensus sequence) of the clone. These error probabilities (or log-transformed ver1Present address: Geospiza, Inc., Seattle, Washington 9...

Genome Sequence Assembly Using Trace Signals and Additional Sequence Information

by B. Chevreux, T. Wetter, S. Suhai
"... Motivation: This article presents a method for assembling shotgun sequences which primarily uses high confidence regions whilst taking advantage of additional available information such as low confidence regions, quality values or repetitive region tags. Conflict situations are resolved with routine ..."
Abstract - Cited by 195 (1 self) - Add to MetaCart
Motivation: This article presents a method for assembling shotgun sequences which primarily uses high confidence regions whilst taking advantage of additional available information such as low confidence regions, quality values or repetitive region tags. Conflict situations are resolved with routines for analysing trace signals.

ARACHNE: a whole-genome shotgun assembler

by Serafim Batzoglou, David B. Jaffe, Ken Stanley, Jonathan Butler, Sante Gnerre, Evan Mauceli, Bonnie Berger, Jill P. Mesirov, Eric S - Genome Res , 2002
"... We describe a new computer system, called ARACHNE, for assembling genome sequence using paired-end whole-genome shotgun reads. ARACHNE has several key features, including an efficient and sensitive procedure for finding read overlaps, a procedure for scoring overlaps that achieves high accuracy by c ..."
Abstract - Cited by 177 (7 self) - Add to MetaCart
We describe a new computer system, called ARACHNE, for assembling genome sequence using paired-end whole-genome shotgun reads. ARACHNE has several key features, including an efficient and sensitive procedure for finding read overlaps, a procedure for scoring overlaps that achieves high accuracy by correcting errors before assembly, read merger based on forward-reverse links, and detection of repeat contigs by forward-reverse link inconsistency. To test ARACHNE, we created simulated reads providing ∼10-fold coverage of the genomes of H. influenzae, S. cerevisiae, and D. melanogaster, as well as human chromosomes 21 and 22. The assemblies of these simulated reads yielded nearly complete coverage of the respective genomes, with a small number of contigs joined into a smaller number of supercontigs (or scaffolds). For example, analysis of the D. melanogaster genome yielded ∼98 % coverage with an N50 contig length of 324 kb and an N50 supercontig length of 5143 kb. The assembly accuracy was high, although not perfect: small errors occurred at a frequency of roughly 1 per 1 Mb (typically, deletion of ∼1 kb in size), with a very small number of other misassemblies. The assembly was rapid: the Drosophila assembly required only 21 hours on a single 667 MHz processor and used 8.4 Gb of memory. Shotgun sequencing was introduced by Sanger et al. (1977) and has remained the mainstay of genome sequence assembly for nearly 25 years. The method involves obtaining random
(Show Context)

Citation Context

...ibe each clone as belonging to one of several libraries characterized by these parameters. Each base in each read has an associated quality score, such as that produced by the PHRED computer program (=-=Ewing et al. 1998-=-). A quality score of q corresponds to a probability of 10!q/10 that the base is incorrect; a quality score of 40 thus corresponds to 99.99% accuracy. As an initial step, ARACHNE trims reads to elimin...

Genome sequence and comparative analysis of the solvent-producing bacterium Clostridium acetobutylicum

by Jörk Nölling, Gary Breton, Marina V. Omelchenko, Kira S. Makarova, Qiandong Zeng, Rene Gibson, Hong Mei Lee, Joann Dubois, Dayong Qiu, Joseph Hitti, Gtc Sequencing, Center Production, Bioinformatics Teams, Yuri I. Wolf, Roman L. Tatusov, Fabrice Sabathe, Lynn Doucette-stamm, Philippe Soucaille, Michael J. Daly, George N. Bennett, Eugene V. Koonin, Douglas R. Smith - J , 2001
"... The genome sequence of the solvent-producing bacterium Clostridium acetobutylicum ATCC 824 has been determined by the shotgun approach. The genome consists of a 3.94-Mb chromosome and a 192-kb megaplasmid that contains the majority of genes responsible for solvent production. Comparison of C. acetob ..."
Abstract - Cited by 133 (5 self) - Add to MetaCart
The genome sequence of the solvent-producing bacterium Clostridium acetobutylicum ATCC 824 has been determined by the shotgun approach. The genome consists of a 3.94-Mb chromosome and a 192-kb megaplasmid that contains the majority of genes responsible for solvent production. Comparison of C. acetobutylicum to Bacillus subtilis reveals significant local conservation of gene order, which has not been seen in comparisons of other genomes with similar, or, in some cases closer, phylogenetic proximity. This conservation allows the prediction of many previously undetected operons in both bacteria. However, the C. acetobutylicum genome also contains a significant number of predicted operons that are shared with distantly related bacteria and archaea but not with B. subtilis. Phylogenetic analysis is compatible with the dissemination of such operons by horizontal transfer. The enzymes of the solventogenesis pathway and of the cellulosome of C. acetobutylicum comprise a new set of metabolic capacities not previously represented in the collection of complete genomes. These enzymes show a complex pattern of evolutionary affinities, emphasizing the role of lateral gene exchange
(Show Context)

Citation Context

...hods and computational tools. Clones from a plasmid library made with randomly sheared 2.0- to 2.5-kb inserts were sequenced from both ends. The sequences were preprocessed and base called with Phred =-=(15)-=-, and low-quality reads were removed (multiplex or short-run dye terminator reads with fewer than 100 Phred Q-30 bases [error rate of 10 �3 ], and long-run dye terminator reads with fewer than 175 Q-3...

Nast: a multiple sequence alignment server for comparative analysis of 16s rrna genes. Nucleic Acids Res, 34(Web Server issue

by T. Z. Desantis, P. Hugenholtz, K. Keller, E. L. Brodie, N. Larsen, Y. M. Piceno, R. Phan, G. L. Andersen , 2006
"... of 16S rRNA genes ..."
Abstract - Cited by 111 (9 self) - Add to MetaCart
of 16S rRNA genes
(Show Context)

Citation Context

...environmental or medical sample. The DNA is serially sampled by cloning and sequencing. The raw sequencing reads can be trimmed of low quality terminal fragments using the ‘Trim’ tool following phred =-=(18)-=- chromatogram scoring. The NAST tool is then used to create the MSA and maintain the 7682-character format. Once aligned, the entire batch can be classified by Greengenes using taxonomic nomenclature ...

Improved base calling for the Illumina Genome Analyzer using

by Martin Kircher, Udo Stenzel, Janet Kelso , 2009
"... Software ..."
Abstract - Cited by 80 (6 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...lete run using a custom C++ interface to the SVM multiclass package. For each cluster in the intensity files an entry in a FastQ file is created, containing the sequence and PHRED-like quality scores =-=[9]-=- in the Sanger encoding (with a quality score offset of 33). Other base callers In addition to the Illumina standard base caller Bustard v1.9.5, we used AltaCyclic v0.1.1 and Rolexa v1.1.6 (with R v2....

The Sanger FASTQ file format for sequences with quality scores

by Peter J. A. Cock, Christopher J. Fields, Naohisa Goto, Michael L. Heuer, Peter M. Rice - and the Solexa/Illumina FASTQ variants,”Nucleic Acids Research , 2009
"... FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the or ..."
Abstract - Cited by 68 (0 self) - Add to MetaCart
FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/ Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.
(Show Context)

Citation Context

...ommunity consensus of the FASTQ format specification. PHRED SCORES AND THE QUAL FORMAT The PHRED software reads DNA sequencing trace files, calls bases and assigns a quality value to each base called =-=(9,10)-=-. This introduced the PHRED quality score of a *To whom correspondence should be addressed. Tel: +44 1382 562731; Fax:+44 1382 562426; Email: peter.cock@scri.ac.uk Nucleic Acids Research, 2009, 1–5 do...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University