Help Login Create account

Data released on May 01, 2018

Supporting data for "Error Correcting Optical Mapping Data"

Mukherjee, K; Washimkar, D; Muggli, M; Salmela, L; Boucher, C (2018): Supporting data for "Error Correcting Optical Mapping Data" GigaScience Database. RIS BibTeX Text

Optical mapping is a unique system that is capable of producing high-resolution, high-throughput genomic map data that gives information about the structure of a genome. Recently it has been used for scaffolding contigs and assembly validation for large-scale sequencing projects, including the maize, goat, and amborella genomes. However, a major impediment in the use of this data is the variety and quantity of errors in the raw optical mapping data, which are called Rmaps. The challenges associated with using Rmap data—and thus, optical mapping data—is analogous to dealing with insertions and deletions in the alignment of long reads. Moreover, they are arguably harder to tackle since the data is integral and susceptible to inaccuracy. We develop cOMet to error correct Rmap data, which to the best of our knowledge is the only optical mapping error correction method. Our experimental results demonstrate that cOMet corrects 82.49% of insertion errors and 77.38% of deletion errors in Rmap data generated from the E. coli K-12 reference genome. Out of the deletion errors corrected, 98.26% are true errors. Similarly, out of the insertion errors corrected, 82.19% are true errors. It also successfully scales to large genomes, improving the quality of 78% and 99% of the Rmaps in the plum and goat genomes, respectively. Lastly, we show the utility of error correction by demonstrating how it improves the assembly of Rmap data. Error corrected Rmap data results in an assembly that is more contiguous, and covers a larger fraction of the genome.

Contact Submitter

Related datasets:

doi:10.5524/100434 Cites doi:10.5524/100082
doi:10.5524/100434 Cites doi:10.5524/100084

Additional information:


optical mapping error correction 

Software, Imaging, Genomic


  • Funding body - Academy of Finland
  • Award ID - 284598
  • Comment - CoECGR
  • Awardee - L Salmela
  • Funding body - National Science Foundation
  • Award ID - 1618814
  • Comment - Div Of Information & Intelligent Systems (IIS)
  • Awardee - C Boucher
  • Funding body - Academy of Finland
  • Award ID - 308030
  • Comment - Academy Research Fellow LT
  • Awardee - L Salmela
  • Funding body - Academy of Finland
  • Award ID - 314170
  • Comment - Academy Research Fellows: initial funding for research costs LT
  • Awardee - L Salmela

Files: (FTP site) Table Settings


File Description
Sample ID
Data Type
File Format
Release Date
Download Link
File Attributes

File NameSample IDData TypeFile FormatSizeRelease Date 
GitHub archivearchive12.06 MB2018-03-28
ReadmeTEXT2.03 KB2018-03-28
Displaying 1-2 of 2 File(s).



Other datasets you might like: