• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A: Evaluating the fidelity of de novo short read metagenomic assembly using simulated data. PLoS One 2011

by Miguel Pignatelli, Andrés Moya
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 20
Next 10 →

Conservation of Gene Cassettes among Diverse Viruses of the Human Gut

by Samuel Minot, Gary D. Wu, James D. Lewis, Frederic D. Bushman , 2012
"... Viruses are a crucial component of the human microbiome, but large population sizes, high sequence diversity, and high frequencies of novel genes have hindered genomic analysis by high-throughput sequencing. Here we investigate approaches to metagenomic assembly to probe genome structure in a sample ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Viruses are a crucial component of the human microbiome, but large population sizes, high sequence diversity, and high frequencies of novel genes have hindered genomic analysis by high-throughput sequencing. Here we investigate approaches to metagenomic assembly to probe genome structure in a sample of 5.6 Gb of gut viral DNA sequence from six individuals. Tests showed that a new pipeline based on DeBruijn graph assembly yielded longer contigs that were able to recruit more reads than the equivalent non-optimized, single-pass approach. To characterize gene content, the database of viral RefSeq proteins was compared to the assembled viral contigs, generating a bipartite graph with functional cassettes linking together viral contigs, which revealed a high degree of connectivity between diverse genomes involving multiple genes of the same functional class. In a second step, open reading frames were grouped by their co-occurrence on contigs in a database-independent manner, revealing conserved cassettes of co-oriented ORFs. These methods reveal that free-living bacteriophages, while usually dissimilar at the nucleotide level, often have significant similarity at the level of encoded amino acid motifs, gene order, and gene orientation. These findings thus connect contemporary metagenomic analysis with

by Osama Said Ali

by unknown authors , 2013
"... De novo metagenomic assembly of microbial communities from the lower convective layer of the Red Sea Atlantis II brine environment. ..."
Abstract - Add to MetaCart
De novo metagenomic assembly of microbial communities from the lower convective layer of the Red Sea Atlantis II brine environment.
(Show Context)

Citation Context

...limited knowledge ofsmicrobial communities, simulated Data studies played an important role in thesassessment of the quality and reliability of the resulted datasets from De novosmetagenomic assembly =-=[59]-=-, [60]. As a result of these efforts, it was suggested that highssequencing will raise the coverage of the studied metagenome and accordingly cansovercome the functional classification limitations due...

Recycler: an algorithm for detecting plasmids from de novo assembly graphs

by Roye Rozov , Aya Brown Kav , David Bogumil , Eran Halperin , Itzhak Mizrahi , Ron Shamir
"... Abstract Plasmids have important roles in antibiotic resistance and in affecting production of metabolites used in industrial and agricultural applications. However, their extraction through deep sequencing remains challenging, in spite of rapid drops in cost and throughput increases for sequencing ..."
Abstract - Add to MetaCart
Abstract Plasmids have important roles in antibiotic resistance and in affecting production of metabolites used in industrial and agricultural applications. However, their extraction through deep sequencing remains challenging, in spite of rapid drops in cost and throughput increases for sequencing. Here, we attempt to ameliorate this situation by introducing a new plasmid-specific assembly algorithm, leveraging assembly graphs provided by a conventional de novo assembler and alignments of paired-end reads to assembled graph nodes. We introduce the first tool for this task, called Recycler, and demonstrate its merits in comparison with extant approaches. We show that Recycler greatly increases the number of true plasmids recovered while remaining highly accurate. On simulated plasmidomes, Recycler recovered 5-14% more true plasmids compared to extant methods, and overall precision of about 90%. We validate these results on real data, by comparison against available reference sequences and quantifying annotation of predicted ORFs. All 12 of Recycler's outputs on isolate samples matched known plasmids or phages, and had alignments having at least 97% identity over at least 99% of the reported reference sequence lengths. For the two E. Coli strains examined, most known plasmid sequences were recovered, while in both cases additional plasmids only known to be present in different hosts were found. Recycler also generated plasmids in high agreement with known annotation on real plasmidome data. Moreover, 6 of 8 plasmids previously validated by PCR were completely recovered. Recycler is available at http://github.com/Shamir-Lab/Recycler
(Show Context)

Citation Context

...reference sequences were selected from the NCBI plasmids database and from plasmid sequences reported in [6], filtered to include 2760 sequences with a length range of 1 to 20 kbp with a mean of 6337 bp. Five datasets were created, composed of 100 bp mates (read pair ends), with insert sizes ∼ ...

METHODOLOGY ARTICLE Open Access

by Yan Ji Yixiang Shi
"... A new strategy for better genome assembly from very short reads ..."
Abstract - Add to MetaCart
A new strategy for better genome assembly from very short reads
(Show Context)

Citation Context

...ion to assist gene annotations. So far metagenome assemblies are still challenging, and most available de novo assemblers for reads of NGS techniques have a limited capability to assemble metagenomes =-=[27]-=-. The quality of de novo metagenome assembly is affected not only by repeats of the same or different genomes but also heterogenous DNA fragments of different coverages. The comparative assembly strat...

MetaGeniE: Characterizing Human Clinical Samples Using Deep Metagenomic Sequencing

by Arun Rawat, David M. Engelthaler, Elizabeth M. Driebe, Paul Keim, Jeffrey T. Foster
"... With the decreasing cost of next-generation sequencing, deep sequencing of clinical samples provides unique opportunities to understand host-associated microbial communities. Among the primary challenges of clinical metagenomic sequencing is the rapid filtering of human reads to survey for pathogens ..."
Abstract - Add to MetaCart
With the decreasing cost of next-generation sequencing, deep sequencing of clinical samples provides unique opportunities to understand host-associated microbial communities. Among the primary challenges of clinical metagenomic sequencing is the rapid filtering of human reads to survey for pathogens with high specificity and sensitivity. Metagenomes are inherently variable due to different microbes in the samples and their relative abundance, the size and architecture of genomes, and factors such as target DNA amounts in tissue samples (i.e. human DNA versus pathogen DNA concentration). This variation in metagenomes typically manifests in sequencing datasets as low pathogen abundance, a high number of host reads, and the presence of close relatives and complex microbial communities. In addition to these challenges posed by the composition of metagenomes, high numbers of reads generated from high-throughput deep sequencing pose immense computational challenges. Accurate identification of pathogens is confounded by individual reads mapping to multiple different reference genomes due to gene similarity in different taxa present in the community or close relatives in the reference database. Available global and local sequence aligners also vary in sensitivity, specificity, and speed of detection. The efficiency of detection of pathogens in clinical samples is largely dependent on the desired taxonomic resolution of the organisms. We have developed an efficient strategy that identifies ‘‘all against all’’ relationships between sequencing reads and reference genomes. Our approach allows for scaling to large reference
(Show Context)

Citation Context

...of closely related organisms in queried reference databases or the community being analyzed. Studies have shown metagenomic sequences share similar regions for even the simplest microbial communities =-=[17,45,46]-=-. Assigning each read to all mapped genomes might be an effective strategy as metagenome community analysis is unbiased and researchers may have no a priori knowledge about the community composition [...

Edited by:

by Saskia L. Smits, Rogier Bodewes, Aritz Ruiz-gonzalez, Wolfgang Baumgärtner, Marion P. Koopmans, Albert D. M. E. Osterhaus, Anita C. Schürch, Richard J. Hall, Karen Dawn Weynberg, Patrick Jon Biggs, Anita C. Schürch, Department Of , 2014
"... †These authors have contributed equally to this work. Viral infections remain a serious global health issue. Metagenomic approaches are increasingly used in the detection of novel viral pathogens but also to generate complete genomes of uncultivated viruses. In silico identification of complete vira ..."
Abstract - Add to MetaCart
†These authors have contributed equally to this work. Viral infections remain a serious global health issue. Metagenomic approaches are increasingly used in the detection of novel viral pathogens but also to generate complete genomes of uncultivated viruses. In silico identification of complete viral genomes from sequence data would allow rapid phylogenetic characterization of these new viruses. Often, however, complete viral genomes are not recovered, but rather several distinct contigs derived from a single entity are, some of which have no sequence homology to any known proteins. De novo assembly of single viruses from a metagenome is challenging, not only because of the lack of a reference genome, but also because of intrapopulation variation and uneven or insufficient coverage. Here we explored different assembly algorithms, remote homology searches, genome-specific sequence motifs, k-
(Show Context)

Citation Context

...to chimera formation during PCR. Chimerism can not only prohibit successful assembly but can also lead to misclassification of the taxonomic content of the metagenome sample (Mavromatis et al., 2007; =-=Pignatelli and Moya, 2011-=-; Mende et al., 2012). Taxonomic “misclassification” of reads in the analysis described here, however, was rather due to the large number of taxonomic units without a homolog in the sequence databases...

SOFTWARE Open Access InteMAP: Integrated metagenomic

by Binbin Lai, Fumeng Wang, Xiaoqi Wang, Liping Duan, Huaiqiu Zhu
"... Background: Next-generation sequencing (NGS) has greatly facilitated metagenomic analysis but also raised new challenges for metagenomic DNA sequence assembly, owing to its high-throughput nature and extremely short reads generated by sequencers such as Illumina. To date, how to generate a high-qual ..."
Abstract - Add to MetaCart
Background: Next-generation sequencing (NGS) has greatly facilitated metagenomic analysis but also raised new challenges for metagenomic DNA sequence assembly, owing to its high-throughput nature and extremely short reads generated by sequencers such as Illumina. To date, how to generate a high-quality draft assembly for metagenomic sequencing projects has not been fully addressed. Results: We conducted a comprehensive assessment on state-of-the-art de novo assemblers and revealed that the performance of each assembler depends critically on the sequencing depth. To address this problem, we developed a pipeline named InteMAP to integrate three assemblers, ABySS, IDBA-UD and CABOG, which were found to complement each other in assembling metagenomic sequences. Making a decision of which assembling approaches to use according to the sequencing coverage estimation algorithm for each short read, the pipeline presents an automatic platform suitable to assemble real metagenomic NGS data with uneven coverage distribution of sequencing depth. By comparing the performance of InteMAP with current assemblers on both synthetic and real NGS metagenomic data, we demonstrated that InteMAP achieves better performance with a longer total contig length and higher contiguity, and contains more genes than others. Conclusions: We developed a de novo pipeline, named InteMAP, that integrates existing tools for metagenomics
(Show Context)

Citation Context

...ngth was set 100 bp, with the average and the standard deviation of paired-end insert size as 300 bp and 20 bp. To model the specific pattern of sequencing error of Illumina technology, we used NGSfy =-=[41]-=- to generate sequencing errors in reads, which uses a fourth-degree polynomial model to describe the frequency of errors in Illumina reads. We used the default settings for NGSfy, and the average erro...

Communication Metagenomic Analysis of Upwelling-Affected Brazilian Coastal Seawater Reveals Sequence Domains of Type I PKS and Modular NRPS

by Rafael R. C. Cuadrat, Juliano C. Cury, Alberto M. R. Dávila, Patrick C. Y. Woo , 2015
"... Abstract: Marine environments harbor a wide range of microorganisms from the three domains of life. These microorganisms have great potential to enable discovery of new enzymes and bioactive compounds for industrial use. However, only ~1 % of microorganisms from the environment can currently be iden ..."
Abstract - Add to MetaCart
Abstract: Marine environments harbor a wide range of microorganisms from the three domains of life. These microorganisms have great potential to enable discovery of new enzymes and bioactive compounds for industrial use. However, only ~1 % of microorganisms from the environment can currently be identified through cultured isolates, limiting the discovery of new compounds. To overcome this limitation, a metagenomics approach has been widely adopted for biodiversity studies on samples from marine environments. In this study, we screened metagenomes in order to estimate the potential for new natural compound synthesis mediated by diversity in the Polyketide Synthase (PKS) and Nonribosomal Peptide Synthetase (NRPS) genes. The samples were collected from the Praia dos Anjos (Angel’s Beach) surface water—Arraial do Cabo (Rio de Janeiro state, Brazil), an environment affected by upwelling. In order to evaluate the potential for screening natural products in Arraial do Cabo samples, we used KS (keto-synthase) and C (condensation) domains (from PKS and NRPS, respectively) to build Hidden Markov Models (HMM) models. From both samples, a total of 84 KS and 46 C novel domain sequences were obtained, showing the potential of this environment for the discovery of new genes of biotechnological interest. These domains were classified by phylogenetic analysis and this was the first study conducted to screen
(Show Context)

Citation Context

... so many algorithms were proposed to address it [30–33]. The main problems of the assembly are the low coverage and the possible formation of chimeras (especially in environments with high diversity) =-=[34,35]-=-. Using the 28286 Int. J. Mol. Sci. 2015, 16, 28285–28295 CAP3 with the very stringent default parameters, we tried to minimize the problem of chimeras, but the low coverage only can be outlined with ...

PROCEEDINGS Open Access A better sequence-read simulator program for metagenomics

by Stephen Johnson, Brett Trost, Jeffrey R Long, Vanessa Pittet, Anthony Kusalik , 2014
"... Background: There are many programs available for generating simulated whole-genome shotgun sequence reads. The data generated by many of these programs follow predefined models, which limits their use to the authors ’ original intentions. For example, many models assume that read lengths follow a u ..."
Abstract - Add to MetaCart
Background: There are many programs available for generating simulated whole-genome shotgun sequence reads. The data generated by many of these programs follow predefined models, which limits their use to the authors ’ original intentions. For example, many models assume that read lengths follow a uniform or normal distribution. Other programs generate models from actual sequencing data, but are limited to reads from single-genome studies. To our knowledge, there are no programs that allow a user to generate simulated data following non-parametric read-length distributions and quality profiles based on empirically-derived information from metagenomics sequencing data. Results: We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine-learning approach to generate reads with lengths and quality values that closely match empirically-derived distributions. BEAR can emulate reads from various sequencing platforms, including Illumina, 454, and Ion Torrent. BEAR requires minimal user input, as it automatically determines appropriate parameter settings from user-supplied data. BEAR also uses a unique method for deriving run-specific error rates, and extracts useful statistics from the metagenomic data itself, such as quality-error models. Many existing simulators are specific to a particular sequencing technology; however, BEAR is not restricted in this way. Because of its flexibility, BEAR is particularly useful for emulating the behaviour of technologies like Ion Torrent, for which no dedicated sequencing simulators are currently available. BEAR is also the first metagenomic sequencing simulator program that automates the process of generating abundances, which can be an arduous task. Conclusions: BEAR is useful for evaluating data processing tools in genomics. It has many advantages over existing comparable software, such as generating more realistic reads and being independent of sequencing technology, and has features particularly useful for metagenomics work.
(Show Context)

Citation Context

... from processes, respectively. Johnson et al. BMC Bioinformatics 2014, 15(Suppl 9):S14 http://www.biomedcentral.com/1471-2105/15/S9/S14 Page 3 of 10 simMC, and simHC datasets from Pignatelli and Moya =-=[12]-=- to power functions. • Homology-based abundance profile generation (slow): This method derives abundance values by first determining the similarity of WGS shotgun reads in a user-supplied sample of re...

1Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets

by Adina Chuang Howe, Jason Pell, Rosangela Canino-koning, Rachel Mackelprang, Janet Jansson, James M. Tiedje, C. Titus Brown
"... Sequencing errors and biases in metagenomic datasets affect coverage-based assemblies and are often ignored during analysis. Here, we analyze read connectivity in metagenomes and identify the presence of problematic and likely a-biological connectivity within metagenome assembly graphs. Specifically ..."
Abstract - Add to MetaCart
Sequencing errors and biases in metagenomic datasets affect coverage-based assemblies and are often ignored during analysis. Here, we analyze read connectivity in metagenomes and identify the presence of problematic and likely a-biological connectivity within metagenome assembly graphs. Specifically, we identify highly connected sequences which join a large proportion of reads within each real metagenome. These sequences show position-specific bias in shotgun reads, suggestive of sequencing artifacts, and are only minimally incorporated into contigs by assembly. The removal of these sequences prior to assembly results in similar assembly content for most metagenomes and enables the use of graph partitioning to decrease assembly memory and time requirements.
(Show Context)

Citation Context

... approach for metagenomic sequence analysis, it is complicated by the variable coverage of sequencing reads from mixed populations in the environment and their associated sequencing errors and biases =-=[7, 8]-=-. Several metagenome-specific assemblers have been developed to deal with variable coverage communities, including Meta-IDBA [9], MetaVelvet [10], and SOAPdenovo [11]. These assemblers rely on analysi...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University