Supporting data for "Sequana Coverage: Detection and Characterization of Genomic Variations using Running Median and Mixture Models."

Dataset type: Software, Genomic
Data released on August 14, 2018

Desvillechabrol D; Bouchier C; Kennedy S; Cokelaer T (2018): Supporting data for "Sequana Coverage: Detection and Characterization of Genomic Variations using Running Median and Mixture Models." GigaScience Database. http://dx.doi.org/10.5524/100493

DOI10.5524/100493

In addition to mapping quality information, the Genome coverage contains valuable biological information like the presence of repetitive regions, deleted genes or copy number variations. It is essential to take into consideration atypical regions, trends (e.g., origin of replication) or known and unknown biases that influence coverage. It is also important that reported events have robust statistics (e.g. z-score) associated with their detections as well as precise location.
We provide a standalone application – sequana_coverage – that reports genomic regions of interest (ROIs) which are significantly over- or under-represented in HTS sequencing data. Significance is associated with the events as well as characteristics such as length of the regions. The algorithm first detrends the data using an efficient running median algorithm. It then estimates the distribution of the normalized genome coverage with a Gaussian mixture model. Finally, a z-score statistic is assigned to each base position and used to separate the central distribution from the ROIs (i.e., under- and over-covered regions). A double thresholds mechanism is used to cluster the genomic ROIs. HTML reports provide a summary with interactive visual representations of the genomic ROIs with standard plots and metrics. Genomic variations such as single nucleotide variants (SNVs) or copy number variations (CNVs) can be effectively identified at the same time.

Additional details

Read the peer-reviewed publication(s):

Desvillechabrol, D., Bouchier, C., Kennedy, S., & Cokelaer, T. (2018). Sequana Coverage: Detection and Characterization of Genomic Variations using Running Median and Mixture Models. GigaScience. doi:10.1093/gigascience/giy110

Additional information:

https://sequana.readthedocs.io/en/master/

https://github.com/sequana/sequana

https://www.synapse.org/#!Synapse:syn10638358/wiki/465309





File NameSample IDData TypeFile FormatSizeRelease Date 
ReadmeTEXT1.98 KB2018-08-09
file_typearchive13.99 MB2018-08-09
Displaying 1-2 of 2 File(s).
Funding body Awardee Award ID Comments
France Génomique consortium ANR10-INBS-09-08
Date Action
August 14, 2018 Dataset publish
August 28, 2018 Manuscript Link added : 10.1093/gigascience/giy110