Help Login Create account

Data released on August 11, 2017

Benchmark data sets, software results and reference data for the first CAMI challenge.

Sczyrba, A; Hofman, P; Belmann, P; Koslicki, D; Janssen, S; Dröge, J; Gregor, I; Majda, S; Fiedler, J; Dahms, E; Bremges, A; Fritz, A; Garrido-Oter, R; Jørgensen, T, S; Shapiro, N; Blood, P, D; Gurevich, A; Bai, Y; Turaev, D; DeMaere, M, Z; Chikhi, R; Nagarajan, N; Quince, C; Meyer, F; Balvočiūtė, M; Hansen, L, H; Sørensen, S, J; H. Chia, B, K; Denis, B; Froula, J, L; Wang, Z; Egan, R; Kang, D, D; Cook, J, J; Deltel, C; Beckstette, M; Lemaitre, C; Peterlongo, P; Rizk, G; Lavenier, D; Wu, Y; Singer, S, W; Jain, C; Strous, M; Klingenberg, H; Meinicke, P; Barton, M; Lingner, T; Lin, H; Liao, Y; Z. Silva, G, G; Cuevas, D, A; Edwards, R, A; Saha, S; Piro, V, C; Renard, B, Y; Pop, M; Klenk, H; Göker, M; Kyrpides, N, C; Woyke, T; Vorholt, J, A; Schulze-Lefert, P; Rubin, E, M; Darling, A, E; Rattei, T; McHardy, A, C (2017): Benchmark data sets, software results and reference data for the first CAMI challenge. GigaScience Database. http://dx.doi.org/10.5524/100344 RIS BibTeX Text

In just over a decade, metagenomics has developed into a powerful and productive method in microbiology and microbial ecology. The ability to retrieve and organize bits and pieces of genomic DNA from any natural context has opened a window into the vast universe of uncultivated microbes. Tremendous progress has been made in computational approaches to interpret this sequence data but none can completely recover the complex information encoded in metagenomes. A number of challenges stand in the way. Simplifying assumptions are needed and lead to strong limitations and potential inaccuracies in practice. Critically, methodological improvements are difficult to gauge due to the lack of a general standard for comparison. Developers also face a substantial burden to individually evaluate existing approaches, which consumes time and computational resources, and may introduce unintended biases.

The Critical Assessment of Metagenome Interpretation (CAMI) is a community-led initiative that tackles these problems by aiming for an independent, comprehensive and bias-free evaluation of methods. In the first CAMI challenge running from March to July 2015, it provided three simulated benchmark metagenome datasets of different organismal complexities and sizes. These were generated from around ~700 newly sequenced genomes and ~600 circular elements (plasmids, viruses, other circular elements) not included in public databases during the challenge. These are now available here, together with gold standards for assembly, genome and taxonomic binning and taxonomic profiling, the underlying genome sequences, NCBI and ARB reference sequences snapshots from before the challenge and the reference NCBI taxonomy used. In addition, 3 test (toy) data sets are provided that were simulated from public genomes before the challenge. For the most realistic evaluation of reference based methods on the challenge data sets, usually taxonomic binners and profilers, the provided reference sequences or other sequence collections from before challenge should be used as references, as by now all underlying genomes have been deposited at NCBI or EBI.

Contact Submitter

Additional information:

https://data.cami-challenge.org/participate

Keywords:

Water buffalo Genome assembly Transcriptome Annotation 

Metagenomic, Software

http://gigadb.org/images/data/cropped/100344.jpg

Samples: Table Settings

Columns:

Common Name
Scienfic Name
Sample Attributes
Taxonomic ID
Genbank Name

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
CAMI_high1235509synthetic metagenome Description:a 75 Gb time series dataset with five samples from a high complexity community with correlated log normal abundance distributions (596 genomes and 478 circular elements; not included in the reference sequence collections also provided in this archive)
CAMI_low1235509synthetic metagenome Description:a 15 Gb single sample dataset from a low complexity community with log normal abundance distribution (40 genomes and 20 circular elements; not included in the reference sequence collections also provided in this archive)
CAMI_medium1235509synthetic metagenome Description:a 40 Gb differential log normal abundance dataset with two samples of a medium complexity community (132 genomes and 100 circular elements; not included in the reference sequence collections also provided in this archive) and long and short insert sizes
CAMI_TOY_high1235509synthetic metagenome Description:a toy data set simulated from public genomes. Can be used for testing tools (gold standards provided). THIS IS NOT A CHALLENGE DATA SET. 5 Hiseq (small insert size) 15 Gbp samples (time series) from 450 genomes (included in the reference sequence collections also provided in this archive) 15 Giga base pairs (each sample) Insert size mean: 180 bp Insert size stddev: 18 bp Read length: 2x100 bp
CAMI_TOY_low1235509synthetic metagenome Description:a toy data set simulated from public genomes. Can be used for testing tools (gold standards provided). THIS IS NOT A CHALLENGE DATA SET. Genomes: 30 (included in the reference sequence collections also provided in this archive), Total Size: 15 Gbp, Read length: 2x100 bp, Insert size mean: 180 bp, Insert size stddev: 10%.
CAMI_TOY_medium1235509synthetic metagenome Description:a toy data set simulated from public genomes. Can be used for testing tools (gold standards provided). THIS IS NOT A CHALLENGE DATA SET. Two samples, differential abundance 2 Hiseq (small insert size) differential abundance 15 Gbp samples from 225 genomes (included in the reference sequence collections also provided in this archive). From the same two differential abundance community profiles, 2 Hiseq (5kb insert size) 0.75 Gbp samples
Displaying 1-6 of 6 Sample(s).

Files: (FTP site) Table Settings

Columns:

File Description
Sample ID
File Type
File Format
Size
Release Date
Download Link
File Attributes

File NameSample IDFile TypeFile FormatSizeRelease Date 
CAMI_TOY_mediumMixed archiveTAR31.26 GB2017-08-11
CAMI_TOY_lowMixed archiveTAR14.35 GB2017-08-11
CAMI_TOY_highMixed archiveTAR75 GB2017-08-11
Mixed archiveTAR90.26 MB2017-08-11
CAMI_highMixed archiveTAR49.67 GB2017-08-11
CAMI_lowMixed archiveTAR10.48 GB2017-08-11
CAMI_mediumMixed archiveTAR26.46 GB2017-08-11
Mixed archiveTAR7.63 GB2017-08-11
Mixed archiveTAR147.18 GB2017-08-11
Mixed archiveTAR-0 KB2017-08-11
Displaying 1-10 of 12 File(s).

History:

+

Other datasets you might like: