Help Login Create account

Data released on December 23, 2016

Supporting data for "From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data"

Leys, N; Monsieurs, P; Mysara, M; Njima, M; Raes, J (2016): Supporting data for "From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data" GigaScience Database. http://dx.doi.org/10.5524/100265 RIS BibTeX Text

The development of high-throughput sequencing technologies has provided microbial ecologists with an efficient approach to assess bacterial diversity at an unseen depth, particularly with the recent advances in the Illumina MiSeq sequencing platform. However, analysing such high-throughput data is posing important computational challenges, requiring specialized bioinformatics solutions at different stages during the processing pipeline, such as assembly of paired-end reads, chimera removal, correction of sequencing errors and clustering of those sequences into Operational Taxonomic Units (OTUs). Individual algorithms grappling with each of those challenges have been combined into various bioinformatics pipelines, such as mothur, QIIME, LotuS and USEARCH. Using a set of well-described bacterial mock communities, state-of-the-art pipelines for Illumina MiSeq amplicon sequencing data are benchmarked at the level of the amount of sequences retained, computational cost, error rate and quality of the OTUs. In addition, a new pipeline called OCToPUS is introduced, which is making an optimal combination of different algorithms. Huge variability is observed between the different pipelines in respect to the monitored performance parameters, where in general the amount of retained reads is found to be inversely proportional to the quality of the reads. By contrast, OCToPUS achieves the lowest error rate, minimum number of spurious OTUs, and the closest correspondence to the existing community, while retaining the uppermost amount of reads when compared to other pipelines.

Contact Submitter

Related manuscripts:

doi:10.1093/gigascience/giw017

Additional information:

https://github.com/M-Mysara/OCToPUS

Accessions (data included in GigaDB):

BioProject: PRJEB4688
SRA: SRP066114

Keywords:

16S rRNA metagenomics amplicon sequencing chimera denoising OTU clustering operational taxonomic units microbiome 

Workflow, Metagenomic

http://gigadb.org/images/data/cropped/100265.jpg

Samples: Table Settings

Columns:

Common Name
Scienfic Name
Sample Attributes
Taxonomic ID
Genbank Name

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
130403(V34)1235509synthetic metagenome Description:MOCK1 - contains the following organis...
Relevant electronic resources:Kozich JJ, Westcott ...
Alternative accession-BioProject:N/A
...
+
130403(V4)1235509synthetic metagenome Description:MOCK1 - contains the following organis...
Relevant electronic resources:Kozich JJ, Westcott ...
Alternative accession-BioProject:N/A
...
+
130417(V34)1235509synthetic metagenome Description:MOCK1 - contains the following organis...
Relevant electronic resources:Kozich JJ, Westcott ...
Alternative accession-BioProject:N/A
...
+
130417(V4)1235509synthetic metagenome Description:MOCK1 - contains the following organis...
Relevant electronic resources:Kozich JJ, Westcott ...
Alternative accession-BioProject:N/A
...
+
130422(V34)1235509synthetic metagenome Description:MOCK1 - contains the following organis...
Relevant electronic resources:Kozich JJ, Westcott ...
Alternative accession-BioProject:N/A
...
+
130422(V4)1235509synthetic metagenome Description:MOCK1 - contains the following organis...
Relevant electronic resources:Kozich JJ, Westcott ...
Alternative accession-BioProject:N/A
...
+
M1(V34)1235509synthetic metagenome Description:MOCK3 - contains the following organis...
Relevant electronic resources:Mysara M, Leys N, Ra...
Alternative accession-BioProject:PRJNA302007
...
+
M2(V34)1235509synthetic metagenome Description:MOCK3 - contains the following organis...
Relevant electronic resources:Mysara M, Leys N, Ra...
Alternative accession-BioProject:PRJNA302007
...
+
M3(V34)1235509synthetic metagenome Description:MOCK3 - contains the following organis...
Relevant electronic resources:Mysara M, Leys N, Ra...
Alternative accession-BioProject:PRJNA302007
...
+
V4.I.051235509synthetic metagenome Description:MOCK2 - contains the following organis...
Relevant electronic resources:Nelson MC, Morrison ...
Alternative accession-BioProject:PRJEB4688
...
+
Displaying 1-10 of 13 Sample(s).

Files: (FTP site) Table Settings

Columns:

File Description
Sample ID
File Type
File Format
Size
Release Date
Download Link
File Attributes

File NameSample IDFile TypeFile FormatSizeRelease Date 
GitHub archivearchive-0 KB2016-12-15
Mixed archivezip62.18 KB2016-12-22
Mixed archivearchive231.3 KB2016-12-15
Mixed archivearchive539.82 KB2016-12-15
Mixed archivearchive61.57 MB2016-12-15
ReadmeTEXT-0 KB2016-12-15
otherEXCEL-0 KB2016-12-15
Displaying 1-7 of 7 File(s).

History:

+

Other datasets you might like: