Supporting software for "VaDiR: an integrated approach to Variant Detection in RNA"
Dataset type: Software
Data released on November 01, 2017
Neums L; Suenaga S; Beyerlein P; Anders S; Koestler D; Mariani A; Chien J (2017): Supporting software for "VaDiR: an integrated approach to Variant Detection in RNA" GigaScience Database. http://dx.doi.org/10.5524/100360
Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue.
We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called "VaDiR: Variant Detection in RNA" that integrate three variant callers, namely: SNPiR, RVBoost and MuTect2. The combination of all three methods, which we called Tier1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed higher rate of mutation discovery in genes that are expressed at higher levels.
Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing data sets.
For testing purposes we utilised data kindly provided by Dr. Andrea Mariani of Mayo Clinic, Rochester, Minnesota. Due to ethical constraints these data cannot be shared publicly, but if researchers would like to request access to these data please contact Dr. Andrea Mariani (mariani.andrea@mayo.edu) with a short description of why you require access and how you would use the data.