Repositorio Dspace

MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle.

Mostrar el registro sencillo del ítem

dc.contributor.author Zapata-Peñasco, Icoquih
dc.contributor.author Poot-Hernández, Augusto César
dc.contributor.author Eguiarte, Luis E.
dc.contributor.author Contreras-Moreira, Bruno
dc.contributor.author Souza, Valeria
dc.coverage.spatial US
dc.creator De Anda, Valerie
dc.date.accessioned 2021-11-17T03:32:21Z
dc.date.available 2021-11-17T03:32:21Z
dc.date.issued 2017-10-23
dc.identifier.citation De Anda, V., Zapata-Penasco, I., Poot-Hernandez, A. C., Eguiarte, L. E., Contreras-Moreira, B., & Souza, V. (2017). MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle. Gigascience, 6(11). doi:10.1093/gigascience/gix096
dc.identifier.uri http://www.ru.iimas.unam.mx/handle/IIMAS_UNAM/ART33
dc.description.abstract The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large "omic" datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github. com/eead-csic-compbio/metagenome Pfam score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H'), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta) genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa.
dc.format application/pdf
dc.language.iso eng
dc.publisher OXFORD UNIV PRESS
dc.rights openAccess
dc.rights.uri http://creativecommons.org/licenses/by/4.0
dc.source Gigascience (2047-217X), Vol. 6(11), 1-17 (2017).
dc.subject metabolic machinery
dc.subject metagenomics
dc.subject omic-datasets
dc.subject Pfam domains
dc.subject relative entropy
dc.subject sulfur cycle
dc.subject multigenomic entropy-based score
dc.subject.classification Ingeniería y Tecnología
dc.title MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle.
dc.type article
dc.type publishedVersion
dcterms.creator De Anda, Valerie::orcid::0000-0001-9775-0737
dcterms.creator Zapata-Peñasco, Icoquih::orcid::0000-0003-1580-1321
dcterms.creator Poot-Hernández, Augusto César::orcid::0000-0003-4565-7594
dcterms.creator Eguiarte, Luis::orcid::0000-0002-5906-9737
dcterms.creator Contreras-Moreira, Bruno::orcid::0000-0002-5462-907X
dcterms.creator souza, valeria::orcid::0000-0002-2992-4229
dc.audience researchers
dc.audience students
dc.audience teachers
dc.identifier.doi http://dx.doi.org/10.1093/gigascience/gix096
dc.relation.ispartofjournal https://academic.oup.com/gigascience/issue/6/11


Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

openAccess Excepto si se señala otra cosa, la licencia del ítem se describe como openAccess