Results 1 - 10
of
38
Correction of sequencing errors in a mixed set of reads
- Bioinformatics
, 2010
"... Motivation: High throughput sequencing technologies produce large sets of short reads that may contain errors. These sequencing errors make de novo assembly challenging. Error correction aims to reduce the error rate prior assembly. Many de novo sequencing projects use reads from several sequencing ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
Motivation: High throughput sequencing technologies produce large sets of short reads that may contain errors. These sequencing errors make de novo assembly challenging. Error correction aims to reduce the error rate prior assembly. Many de novo sequencing projects use reads from several sequencing technologies to get the benets of all used technologies and to alleviate their shortcomings. However, com-bining such a mixed set of reads is problematic as many tools are specic to one sequencing platform. The SOLiD sequencing platform is especially problematic in this regard because of the two base color coding of the reads. Therefore new tools for working with mixed read sets are needed. Results: We present an error correction tool for correcting substi-tutions, insertions, and deletions in a mixed set of reads produced by various sequencing platforms. We rst develop a method for cor-recting reads from any sequencing technology producing base space reads such as the SOLEXA/Illumina and Roche/454 Life Sciences sequencing platforms. We then further rene the algorithm to correct the color space reads from the Applied Biosystems SOLiD sequen-cing platform together with normal base space reads. Our new tool is based on the SHREC program that is aimed at correcting SOLEXA/Illumina reads. Our experiments show that we can detect errors with 99 % sensitivity and over 98 % specicity if the combined sequencing coverage of the sets is at least 12. We also show that the error rate of the reads is greatly reduced. Availability: The JAVA source code is freely available at
HiTEC: accurate error correction in high-throughput sequencing data
, 2010
"... Motivation: High-throughput sequencing technologies produce very large amounts of data and sequencing errors constitute one of the major problems in analyzing such data. Current algorithms for correcting these errors are not very accurate and do not automatically adapt to the given data. Results: We ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
Motivation: High-throughput sequencing technologies produce very large amounts of data and sequencing errors constitute one of the major problems in analyzing such data. Current algorithms for correcting these errors are not very accurate and do not automatically adapt to the given data. Results: We present HiTEC, an algorithm which provides a highly accurate, robust, and fully automated method to correct reads produced by high-throughput sequencing methods. Our approach provides significantly higher accuracy than previous methods. It is time and space efficient and works very well for all read lengths, genome sizes, and coverage levels. Availability: The source code of HiTEC is freely available at www.csd.uwo.ca/˜ilie/HiTEC/
ECHO: a reference-free short-read error correction algorithm
- Genome Res
, 2011
"... Reference-free short-read error correction Developing accurate, scalable algorithms to improve data quality is an important computational challenge associated with recent advances in high-throughput sequencing technology. In this paper, a novel error correction algorithm, called ECHO, is introduced ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
(Show Context)
Reference-free short-read error correction Developing accurate, scalable algorithms to improve data quality is an important computational challenge associated with recent advances in high-throughput sequencing technology. In this paper, a novel error correction algorithm, called ECHO, is introduced for correcting base-call errors in short-reads, without the need of a reference genome. Unlike most previous methods, ECHO does not require the user to specify parameters of which optimal values are typically unknown a priori. ECHO automatically sets the parameters in the assumed model and estimates error characteristics specific to each sequencing run, while maintaining a running time that is within the range of practical use. ECHO is based on a probabilistic model and is able to assign a quality score to each corrected base. Furthermore, it explicitly models heterozygosity in diploid genomes and provides a reference-free method for detecting bases that originated from heterozygous sites. On both real and simulated data, ECHO is able to improve the accuracy of previous error correction methods by several folds to an order of magnitude, depending on the sequence coverage depth and the position in the read. The improvement is most pronounced toward the end of the read, where previous methods become noticeably less effective. Using a whole-genome yeast data set, it is demonstrated here that ECHO is capable of coping with non-uniform coverage. Also, it is shown that using ECHO to perform error correction as a preprocessing step considerably facilitates de novo assembly, particularly in the case of low to moderate sequence coverage depth. Software: ECHO is publicly available at
LoRDEC: accurate and efficient long read error correction
, 2015
"... accurate and efficient long read error correction ..."
Three-stage quality control strategies for DNA re-sequencing data. Brief Bioinform 15:879
, 2014
"... re-sequencing data ..."
(Show Context)
Probabilistic error correction for RNA sequencing
- Nucleic Acids Research
"... Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference ge ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference genome is not available. Current read error correction methods, developed for DNA sequence data, cannot handle the overlapping effects of non-uniform abun-dance, polymorphisms and alternative splicing. Here we present SEquencing Error CorrEction in Rna-seq data (SEECER), a hidden Markov Model (HMM)–based method, which is the first to successfully address these problems. SEECER efficiently learns hundreds of thousands of HMMs and uses these to correct sequencing errors. Using human RNA-Seq data, we show that SEECER greatly improves on previous methods in terms of quality of read alignment to the genome and assembly accuracy. To illustrate the use-fulness of SEECER for de novo transcriptome studies, we generated new RNA-Seq data to study the de-velopment of the sea cucumber Parastichopus parvimensis. Our corrected assembled transcripts shed new light on two important stages in sea cucumber development. Comparison of the assembled transcripts to known transcripts in other species has also revealed novel transcripts that are unique to sea cucumber, some of which we have ex-perimentally validated. Supporting website:
Prospects and limitations of full-text index structures in genome analysis
, 2012
"... The combination of incessant advances in sequenc-ing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinfo ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The combination of incessant advances in sequenc-ing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinfor-matics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data struc-tures, generally referred to as index structures. Although the importance of index structures is gen-erally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less under-stood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared.
Accessed
, 2012
"... Background. Accumulating evidence supports leukocyte telomere length (LTL) as a biological marker of cellular aging. Poor sleep is a risk factor for age-related disease; however, the extent to which sleep accounts for variation in LTL is unknown. Methods. The present study examined associations of ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Background. Accumulating evidence supports leukocyte telomere length (LTL) as a biological marker of cellular aging. Poor sleep is a risk factor for age-related disease; however, the extent to which sleep accounts for variation in LTL is unknown. Methods. The present study examined associations of self-reported sleep duration, onset latency, and subjective quality with LTL in a communitydwelling sample of 245 healthy women in midlife (aged 49-66 years). Results. While sleep duration and onset latency were unrelated to LTL, women reporting poorer sleep quality displayed shorter LTL (r = 0.14, P = 0.03), independent of age, BMI, race, and income (b = 55.48, SE = 27.43, P = 0.04). When analyses were restricted to participants for whom sleep patterns were chronic, poorer sleep quality predicted shorter LTL independent of covariates and perceived psychological stress. Conclusions. This study provides the first evidence that poor sleep quality explains significant variation in LTL, a marker of cellular aging.
Review of genome sequence short read error correction algorithms.
- American Journal of Bioinformatics Research
, 2013
"... Abstract Ne xt -generation high throughput sequencing technologies have opened up a wide range of new genome research opportunities. High throughput sequencing technologies produces a massive amount of short reads data in a single run. The large dataset produced by short read sequencing technologie ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract Ne xt -generation high throughput sequencing technologies have opened up a wide range of new genome research opportunities. High throughput sequencing technologies produces a massive amount of short reads data in a single run. The large dataset produced by short read sequencing technologies are h ighly error-p rone as compared to tradit ional Sanger sequencing approaches. These errors are critical and removing them is challenging. Therefore, there are peremptory demands for statistical tools for bioinformat ics to analyze such large amounts of data. In this paper, we present review of and measuring parameters associated with genome sequence short read errors correction tools and algorithms. We further present comprehensive detail of datasets and results obtained with defined parameters. The reviews present the current state of the art and future directions in the area as well.
Open Access
"... Primary immune thrombocytopenia (ITP) has been traditionally thought as an antibody-mediated autoimmune disease involving platelet destruction by macrophages in the reticuloendothelia system. More recently it has become obvious that ITP is a more complex disorder in which T cell mediated immunity pl ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Primary immune thrombocytopenia (ITP) has been traditionally thought as an antibody-mediated autoimmune disease involving platelet destruction by macrophages in the reticuloendothelia system. More recently it has become obvious that ITP is a more complex disorder in which T cell mediated immunity plays important roles in platelet destruction. Antiplatelet autoantibody production is under the control of platelet-specific helper T-cells, and loss of tolerance to self antigen by T cells is the critical step of the immune dysregulation in ITP. Dendritic cells (DCs) from ITP patients showed enhanced capacity in stimulating autologous T-cell proliferation in the presence of autologous/allogeneic platelets [1], and ITP patients ’ T cells had elevated IL-2 secretion ability compared with controls [2,3], suggesting increased antiplaltelet T-cell reactivity in