Data released on December 17, 2015
Next generation sequencing (NGS) technologies have changed our understanding of the variability of the human genome. However, the identification of genome structural variations based on NGS approaches with read lengths of 35 to 300 bases remains to be a challenge. Single molecule optical mapping technologies allow the analysis of DNA molecules of up to 2 Mb and are very suitable for the identification of large scale genome structural variations and for de novo genome assemblies when combined with short read NGS data. Here we present the optical mapping data of two human genomes: the HapMap cell line GM12878 and the colorectal cancer cell line HCT116.
High molecular weight DNA was obtained by embedding GM12878 and HCT116 cells, respectively, in agarose plugs followed by DNA extraction under mild conditions. We digested genomic DNA with KpnI and analyzed 310,000 and 296,000 DNA molecules (≥ 150 kb and 10 restriction fragments), respectively, per cell line using the Argus optical mapping system. We aligned the maps to the human reference by OPTIMA, a new glocal alignment method, and obtained 6.8x and 5.7x genome coverage, 2.9x and 1.7x more than the coverage obtained with previously available software.