Data released on December 17, 2015
Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome mapping technologies (e.g. optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kbp–2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging due to the lack of efficient and sensitive map alignment algorithms for robustly aligning error-prone maps to sequences.
We introduce a novel seed-and-extend glocal alignment method, called OPTIMA (and a sliding-window extension for overlap alignment, called OPTIMA-Overlap), that is the first to be able to create indexes for continuous-valued mapping data while accounting for mapping errors. We also present a novel statistical model, agnostic to technology-dependent error rates, for conservatively evaluating the significance of alignments without relying on expensive permutation-based tests.