Supporting software for "VaDiR: an integrated approach to Variant Detection in RNA"

Dataset type: Software
Data released on November 01, 2017

Neums L; Suenaga S; Beyerlein P; Anders S; Koestler D; Mariani A; Chien J (2017): Supporting software for "VaDiR: an integrated approach to Variant Detection in RNA" GigaScience Database. http://dx.doi.org/10.5524/100360

DOI10.5524/100360

Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue.
We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called "VaDiR: Variant Detection in RNA" that integrate three variant callers, namely: SNPiR, RVBoost and MuTect2. The combination of all three methods, which we called Tier1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed higher rate of mutation discovery in genes that are expressed at higher levels.
Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing data sets.
For testing purposes we utilised data kindly provided by Dr. Andrea Mariani of Mayo Clinic, Rochester, Minnesota. Due to ethical constraints these data cannot be shared publicly, but if researchers would like to request access to these data please contact Dr. Andrea Mariani (mariani.andrea@mayo.edu) with a short description of why you require access and how you would use the data.

Additional details

Read the peer-reviewed publication(s):

(PubMed: 29267927)





File NameSample IDData TypeFile FormatSizeRelease Date 
TextTEXT0.32 KB2017-09-26
ReadmeTEXT0.17 KB2017-09-26
Mixed archiveTAR12.83 GB2017-09-26
Displaying 1-3 of 3 File(s).
Funding body Awardee Award ID Comments
National Cancer Institute P30-CA168524
Department of Defense J Chien W81XWH-10-1-0386 Ovarian Cancer Research Program
Date Action
November 1, 2017 Dataset publish
January 9, 2018 Manuscript Link added : 10.1093/gigascience/gix122
November 9, 2022 Manuscript Link updated : 10.1093/gigascience/gix122