Supporting data for "Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics and epigenetics data"

Dataset type: Genomic, Software
Data released on December 18, 2017

Nguyen QH; Tellam RL; Naval-Sanchez M; Porto-Neto LR; Barendse W; Reverter A; Hayes B; Kijas J; Dalrymple BP (2017): Supporting data for "Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics and epigenetics data" GigaScience Database. http://dx.doi.org/10.5524/100390

DOI10.5524/100390

Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits, and identifying potential genome editing targets.

Additional details

Read the peer-reviewed publication(s):

Nguyen, Q. H., Tellam, R. L., Naval-Sanchez, M., Porto-Neto, L. R., Barendse, W., Reverter, A., … Dalrymple, B. P. (2018). Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data. GigaScience, 7(3). doi:10.1093/gigascience/gix136

Additional information:

https://bitbucket.csiro.au/users/ngu121/repos/hprs/browse





File NameSample IDData TypeFile FormatSizeRelease Date 
annotationBED36.75 MB2017-12-12
annotationBED30.49 MB2017-12-12
annotationBED26.75 MB2017-12-12
annotationBED30.93 MB2017-12-12
annotationBED23.54 MB2017-12-12
annotationBED10.98 MB2017-12-12
annotationBED26.4 MB2017-12-12
annotationBED40.15 MB2017-12-12
annotationBED21.84 MB2017-12-12
annotationBED6.1 MB2017-12-12
Displaying 1-10 of 16 File(s).
Funding body Awardee Award ID Comments
Commonwealth Scientific and Industrial Research Organisation OCE PostDoc James Kijas (project leader), Marina Naval-Sanchez (PostDoc)
Commonwealth Scientific and Industrial Research Organisation OCE PostDoc Brian P Dalrymple (project leader), Quan Nguyen (PostDoc)
Date Action
December 18, 2017 Dataset publish
March 30, 2018 Manuscript Link added : 10.1093/gigascience/gix136