Supporting data for "The case for using Mapped Exonic Non-Duplicate (MEND) read counts in RNA-Seq experiments: examples from pediatric cancer datasets"

Dataset type: Software, Transcriptomic
Data released on January 26, 2021

Beale HC; Roger JM; Cattle MA; McKay LT; Thompson DKA; Learned K; Lyle AG; Kephart ET; Currie R; Lam DL; Sanders L; Pfeil J; Vivian J; Bjork I; Salama SR; Haussler D; Vaske OM (2021): Supporting data for "The case for using Mapped Exonic Non-Duplicate (MEND) read counts in RNA-Seq experiments: examples from pediatric cancer datasets" GigaScience Database. http://dx.doi.org/10.5524/100859

DOI10.5524/100859

The reproducibility of gene expression measured by RNA sequencing (RNA-Seq) is dependent on the sequencing depth. While unmapped or non-exonic reads do not contribute to gene expression quantification, duplicate reads contribute to the quantification but are not informative for reproducibility. We show that Mapped, Exonic, Non-duplicate (MEND) reads are a useful measure of reproducibility of RNA-Seq datasets utilized for gene expression analysis. In bulk RNA-Seq datasets from 2179 tumors in 48 cohorts, the fraction of reads that contribute to the reproducibility of gene expression analysis varies greatly. Unmapped reads constitute 1-77% of all reads (med.) 3%; IQR 3%); duplicate reads constitute 3-100% of mapped reads (med. 27%; IQR 30%); and non-exonic reads constitute 4-97% of mapped, non-duplicate reads (med. 25%; IQR 21%). Mapped, Exonic, Non-duplicate (MEND) reads constitute 0-79% of total reads (med. 50%; IQR 31%). Since not all reads in a RNA-Seq dataset are informative for reproducibility of gene expression measurements, and the fraction of reads that are informative varies, we propose reporting a dataset's sequencing depth in MEND reads, which definitively inform the reproducibility of gene expression, rather than total, mapped or exonic reads. We provide a Docker image containing 1) the existing required tools (RSeQC, sambamba and samblaster) and 2) a custom script. We recommend that all RNA-Seq gene expression experiments, sensitivity studies and depth recommendations use MEND units for sequencing depth.

Additional details

Read the peer-reviewed publication(s):

(PubMed: 33712853)

Additional information:

https://www.stjude.cloud/

https://cavatica.squarespace.com/

Github links:

https://github.com/UCSC-Treehouse/mend_qc





File NameSample IDData TypeFile FormatSizeRelease Date 
GitHub archivezip4.28 MB2021-01-14
readmeTEXT3.2 KB2021-01-14
Tabular DataCSV481.21 KB2021-01-14
Tabular DataCSV0.31 KB2021-01-14
Tabular DataCSV1.96 KB2021-01-14
Displaying 1-5 of 5 File(s).
Funding body Awardee Award ID Comments
Unravel Pediatric Cancer D Haussler
Team G Childhood Cancer Foundation D Haussler
Colligan Presidential Chair in Pediatric Genomics D Haussler
The Schmidt Futures Foundation D Haussler This donation, in support of New Architecture of the Human Brain, is made possible by the generosity of Eric and Wendy Schmidt by recommendation of the Schmidt Futures program.
American Association for Cancer Research D Haussler 19-20-01-VASK NextGen Grant for Transformative Cancer Research
Howard Hughes Medical Institute D Haussler 090100
Alex's Lemonade Stand Foundation for Childhood Cancer (US) D Haussler Crazy 8 Pilot Project
St. Baldrick's Foundation D Haussler 427053 Emily Beazley Kures for Kids Fund
California Initiative to Advance Precision Medicine D Haussler OPR014109 per the award document, the following disclaimer "This project was funded by the State. The contents may not necessarily reflect the official views or policies of the State of California"
Live for Others Foundation D Haussler EDA2050826 Orange County Community Foundation
Date Action
January 26, 2021 Dataset publish
January 26, 2021 File readme_100859.txt updated
January 26, 2021 readme_100859.txt: additional file attribute added
January 26, 2021 File readme_100859.txt updated
February 3, 2021 Author added : Cattle, Matthew
February 3, 2021 Author added : McKay, Liam
February 8, 2021 Manuscript Link added : 10.1093/gigascience/giab011
November 29, 2021 Manuscript Link updated : 10.1093/gigascience/giab011