Help Login Create account

Data released on May 09, 2018

Supporting data for "Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments"

Almeida, A; Mitchell, A, L; Tarkowska, A; Finn, R, D (2018): Supporting data for "Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments" GigaScience Database. http://dx.doi.org/10.5524/100448 RIS BibTeX Text

Taxonomic profiling of ribosomal RNA (rRNA) sequences has been the accepted norm for inferring the composition of complex microbial ecosystems. QIIME and mothur have been the most widely used taxonomic analysis tools for this purpose, with MAPseq and QIIME 2 being two recently released alternatives. However, no independent and direct comparison between these four main tools has been performed. Here, we compared the default classifiers of MAPseq, mothur, QIIME, and QIIME 2 using synthetic simulated datasets comprised of some of the most abundant genera found in the human gut, ocean and soil environments. We evaluate their accuracy when paired with both different reference databases and variable sub-regions of the 16S rRNA gene.
We show that QIIME 2 provided the best recall and F-scores at genus and family levels, together with the lowest distance estimates between the observed and simulated samples. However, MAPseq showed the highest precision, with miscall rates consistently below 2%. Notably, QIIME 2 was the most computationally expensive tool, with CPU time and memory usage almost two and 30 times higher than MAPseq, respectively. Using the SILVA database generally yielded a higher recall than using Greengenes, while assignment results of different 16S rRNA variable sub-regions varied up to 40% between samples analysed with the same pipeline.
Our results support the use of either QIIME 2 or MAPseq for optimal 16S rRNA gene profiling, and we suggest that the choice between the two should be based on the level of recall, precision and/or computational performance required.

Contact Submitter

Keywords:

16s rrna gene human gastrointestinal tract ocean microbiome soil taxonomy 

Software, Metagenomic

/images/uploads/image_upload/Images_581.png

Samples: Table Settings

Columns:

Common Name
Scienfic Name
Sample Attributes
Taxonomic ID
Genbank Name

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
ocean_A100408172marine metagenomemarine metagenome Description:in-silico synthetic community sample generated from 100 species selected at random from the 80 most abundant genera across publicly deposited samples from ocean biomes; a maximum of 20 species per genus was allowed.
Relevant electronic resources:ftp://ftp.ebi.ac.uk/pub/databases/metagenomics/taxon_benchmarking/
ocean_B100408172marine metagenomemarine metagenome Description:in-silico synthetic community sample generated from 100 species selected at random from the 80 most abundant genera across publicly deposited samples from ocean biomes; a maximum of 20 species per genus was allowed.
Relevant electronic resources:ftp://ftp.ebi.ac.uk/pub/databases/metagenomics/taxon_benchmarking/
ocean_A500408172marine metagenomemarine metagenome Description:in-silico synthetic community sample generated from 500 species selected at random from the 80 most abundant genera across publicly deposited samples from ocean biomes; a maximum of 50 species per genus was allowed.
Relevant electronic resources:ftp://ftp.ebi.ac.uk/pub/databases/metagenomics/taxon_benchmarking/
ocean_B500408172marine metagenomemarine metagenome Description:in-silico synthetic community sample generated from 500 species selected at random from the 80 most abundant genera across publicly deposited samples from ocean biomes; a maximum of 50 species per genus was allowed.
Relevant electronic resources:ftp://ftp.ebi.ac.uk/pub/databases/metagenomics/taxon_benchmarking/
soil_A100410658Soil Metagenome soil metagenome Description:in-silico synthetic community sample generated from 100 species selected at random from the 80 most abundant genera across publicly deposited samples from soil biomes; a maximum of 20 species per genus was allowed.
Relevant electronic resources:ftp://ftp.ebi.ac.uk/pub/databases/metagenomics/taxon_benchmarking/
soil_B100410658Soil Metagenome soil metagenome Description:in-silico synthetic community sample generated from 100 species selected at random from the 80 most abundant genera across publicly deposited samples from soil biomes; a maximum of 20 species per genus was allowed.
Relevant electronic resources:ftp://ftp.ebi.ac.uk/pub/databases/metagenomics/taxon_benchmarking/
soil_A500410658Soil Metagenome soil metagenome Description:in-silico synthetic community sample generated from 500 species selected at random from the 80 most abundant genera across publicly deposited samples from soil biomes; a maximum of 50 species per genus was allowed.
Relevant electronic resources:ftp://ftp.ebi.ac.uk/pub/databases/metagenomics/taxon_benchmarking/
human-gut_A100408170Human gut metagenome human gut metagenome Description:in-silico synthetic community sample generated from 100 species selected at random from the 80 most abundant genera across publicly deposited samples from human gut biomes; a maximum of 20 species per genus was allowed.
Relevant electronic resources:ftp://ftp.ebi.ac.uk/pub/databases/metagenomics/taxon_benchmarking/
soil_B500410658Soil Metagenome soil metagenome Description:in-silico synthetic community sample generated from 500 species selected at random from the 80 most abundant genera across publicly deposited samples from soil biomes; a maximum of 50 species per genus was allowed.
Relevant electronic resources:ftp://ftp.ebi.ac.uk/pub/databases/metagenomics/taxon_benchmarking/
human-gut_B100408170Human gut metagenome human gut metagenome Description:in-silico synthetic community sample generated from 100 species selected at random from the 80 most abundant genera across publicly deposited samples from human gut biomes; a maximum of 20 species per genus was allowed.
Relevant electronic resources:ftp://ftp.ebi.ac.uk/pub/databases/metagenomics/taxon_benchmarking/
Displaying 1-10 of 12 Sample(s).

Files: (FTP site) Table Settings

Columns:

File Description
Sample ID
Data Type
File Format
Size
Release Date
Download Link
File Attributes

File NameSample IDData TypeFile FormatSizeRelease Date 
human-gut_A100Amplicon sequenceFASTQ23.66 MB2018-05-02
human-gut_A100Amplicon sequenceFASTQ1.21 MB2018-05-02
ocean_A100Amplicon sequenceFASTQ1.46 MB2018-05-02
soil_A100Amplicon sequenceFASTQ1.43 MB2018-05-02
human-gut_A100Amplicon sequenceFASTQ28.62 MB2018-05-02
ocean_A100Amplicon sequenceFASTQ28.91 MB2018-05-02
human-gut_A100Amplicon sequenceFASTQ23.9 MB2018-05-02
ocean_A100Amplicon sequenceFASTQ24.15 MB2018-05-02
soil_A100Amplicon sequenceFASTQ23.66 MB2018-05-02
human-gut_A100Amplicon sequenceFASTQ28.33 MB2018-05-02
Displaying 1-10 of 198 File(s).

History:

+

Other datasets you might like: