Help Login Create account

Data released on June 30, 2014

Supporting data for the paper: "An integrated catalog of reference genes in the human gut microbiome".

Al-Aama, J, Y; Arumugam, M; Cai, X; Chen, B; Chen, W; Edris, S; Feng, Q; Hansen, T; Jia, H; Juncker, A, S; Kultima, J, R; Levenez, F; Li, J; Liang, S; Manichanh, C; Nielsen, H, B; Nielsen, T; Prifti, E; Sunagawa, S; Wang, J; Wang, J; Xiao, L; Xu, X; Yang, H; Zhang, D; Zhang, W; Zhang, Z; Zhao, H; Zhong, H; Brunak, S; Guarner, F; Kristiansen, K; Pedersen, O; Doré, J; Ehrlich, S; MetaHIT consortium, ; Bork, P; Wang, J (2014): Supporting data for the paper: "An integrated catalog of reference genes in the human gut microbiome". GigaScience Database. RIS BibTeX Text

Here we sequenced 249 fecal samples from European adults, leading to a total of 760 samples in the Metagenome of the Human Intestinal Tract (MetaHIT) project. All 6.4TB whole-genome shotgun sequencing data from 1267 fecal samples in MetaHIT, the Human Microbiome Project (HMP) and our diabetes study on Chinese adults were processed with the MOCAT pipeline. The resulting gene catalogs were merged using CD-HIT and complemented with genes from 511 sequenced human gut-related prokaryotic genomes that were present in our gut metagenomes. The final high-quality integrated reference catalog of the human gut microbiome contains 9,879,896 non-redundant genes. The genes were phylogenetically annotated according to 3449 bacterial and archaeal genomes and draft genomes from NCBI, and functionally annotated using orthologous groups from the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the evolutionary genealogy of genes: Non-supervised Orthologous Groups (eggNOG) databases. In addition, 11 samples from the Chinese cohort were re-extracted using the MetaHIT DNA extraction protocol and shotgun-sequenced to compare with the original data generated by a slightly different DNA extraction protocol.

Contact Submitter

Related manuscripts:


Related datasets:

doi:10.5524/100064 Cites doi:10.5524/200012
doi:10.5524/100064 IsCitedBy doi:10.5524/100317

Accessions (data not in GigaDB):

ENA: ERP000108
ENA: ERP002061
ENA: ERP003612
ENA: ERP004605
ENA: SRP002163
ENA: SRP011011
ENA: SRP008047



Samples: Table Settings


Common Name
Scienfic Name
Sample Attributes
Taxonomic ID
Genbank Name

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
158256496-stool1408170Human gut metagenome human gut metagenome
158337416-stool1408170Human gut metagenome human gut metagenome
158337416-stool2408170Human gut metagenome human gut metagenome
158458797-stool1408170Human gut metagenome human gut metagenome
158479027-stool1408170Human gut metagenome human gut metagenome
158499257-stool1408170Human gut metagenome human gut metagenome
158499257-stool2408170Human gut metagenome human gut metagenome
158742018-stool1408170Human gut metagenome human gut metagenome
158802708-stool1408170Human gut metagenome human gut metagenome
158802708-stool2408170Human gut metagenome human gut metagenome
Displaying 1-10 of 1278 Sample(s).

Files: (FTP site) Table Settings


File Description
Sample ID
File Type
File Format
Release Date
Download Link
File Attributes

File NameSample IDFile TypeFile FormatSizeRelease Date 
OtherUNKNOWN1.58 GB2013-09-15
OtherUNKNOWN874.78 MB2013-09-15
158256496-stool1Sequence assemblyUNKNOWN45.03 MB2013-09-15
158256496-stool1AnnotationFASTA43.74 MB2013-09-15
158337416-stool1Sequence assemblyUNKNOWN56.91 MB2013-09-15
158337416-stool1AnnotationFASTA55.04 MB2013-09-15
158337416-stool2Sequence assemblyUNKNOWN45.69 MB2013-09-15
158337416-stool2AnnotationFASTA44.3 MB2013-09-15
158458797-stool1Sequence assemblyUNKNOWN30.32 MB2013-09-15
158458797-stool1AnnotationFASTA29.1 MB2013-09-15
Displaying 1-10 of 2554 File(s).



Other datasets you might like: