GigaDB Dataset - DOI 10.5524/100357 - Multigenomic Entropy Based Score (MEBS): The molecular reconstruction of the sulfur cycle

http://gigadb.org/images/data/cropped/100357.jpg

Multigenomic Entropy Based Score (MEBS): The molecular reconstruction of the sulfur cycle

Dataset type: Genomic, Software
Data released on October 05, 2017

De Anda V; Zapata-Penasco I; Poot-Hernandez AC; Eguiarte LE; Contreras-Moreira B; Souza V (2017): Multigenomic Entropy Based Score (MEBS): The molecular reconstruction of the sulfur cycle GigaScience Database. http://dx.doi.org/10.5524/100357

DOI10.5524/100357

The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging.
We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare and infer complex metabolic pathways in large ‘omic’ datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H’), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used to both: build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2,107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, Receiver Operator Characteristic plots and the Area Under the Curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC=0.985) hard to culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones and metagenomic environments such as hydrothermal vents, or deep-sea sediment. CONCLUSIONS: Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa.

Keywords:

Additional details

Read the peer-reviewed publication(s):

(PubMed: 29069412)

Additional information:

https://github.com/eead-csic-compbio/metagenome_Pfam_score

File Name	Data Type	File Format	Size	Release Date
metagenome_Pfam_score-master.zip	GitHub archive	archive	76.31 MB	2017-09-26
readme.txt	Readme	TEXT	4.12 KB	2017-09-26
S11.Gen_completenness.csv	Tabular data	CSV	2.34 MB	2017-10-05
S12.Met_completeness.csv	Tabular data	CSV	121.68 KB	2017-10-05
S13.Gen_completeness_annotation.csv	Tabular data	CSV	57.92 KB	2017-10-05
S1A.Sulfur_cycle_known_representatives.csv	Tabular data	CSV	307.38 KB	2017-10-05
S1B.Suli_including161genomes.csv	Tabular data	CSV	7.65 KB	2017-10-05
S2.Sucy_database.csv	Tabular data	CSV	257.46 KB	2017-10-05
S3.Pfam_mappingKEGG.csv	Tabular data	CSV	262.5 KB	2017-10-05
S6.GenDataset_SS_values.csv	Tabular data	CSV	358.91 KB	2017-10-05

Displaying 1-10 of 13 File(s).

Funding body	Awardee	Award ID	Comments
Consejo Nacional de Ciencia y Tecnología	V De Anda	356832	CONACYT
Secretaria De Education Publica	V Souza	238245	Ciencia Basica Conacyt Project
Secretaria De Education Publica	LE Eguiarte	238245	Ciencia Basica Conacyt Project
WWF International	V Souza

Date	Action
October 5, 2017	Dataset publish
November 13, 2017	Manuscript Link added : 10.1093/gigascience/gix096
November 29, 2018	File S1B.Suli_including161genomes.csv updated
November 9, 2022	Manuscript Link updated : 10.1093/gigascience/gix096

Multigenomic Entropy Based Score (MEBS): The molecular reconstruction of the sulfur cycle

Additional details

Read the peer-reviewed publication(s):

Additional information:

Columns: