Supporting data for "NanoSim: nanopore sequence read simulator based on statistical characterization"

Dataset type: Software
Data released on February 27, 2017

Yang C; Chu J; Warren RL; Birol I (2017): Supporting data for "NanoSim: nanopore sequence read simulator based on statistical characterization" GigaScience Database.


The MinION sequencing instrument from Oxford Nanopore Technologies (ONT) produces long read lengths from single-molecule sequencing, a valuable feature for detailed genome characterization. To realize the potential of this platform, a number of groups are developing bioinformatics tools tuned for the unique characteristics of its data. We note that these development efforts would benefit from a simulator software, output of which could be used to benchmark analysis tools.
Here, we introduce NanoSim, a fast and scalable read simulator that captures the technology-specific features of ONT data, and allows for adjustments upon improvement of nanopore sequencing technology. The first step of NanoSim is read characterization, which provides a comprehensive alignment-based analysis, and generates a set of read profiles serving as the input to the next step, the simulation stage. The simulation stage uses the model built in the previous step to produce in silico reads for a given reference genome. NanoSim is written in Python and R.
In this work, we model the base-calling errors of ONT reads to inform the simulation of sequences with similar characteristics. We showcase the performance of NanoSim on publicly available datasets generated using the R7 and R7.3 chemistries and different sequencing kits and compare the resulting, synthetic reads, to that of other long sequence simulators and experimental ONT reads. We expect NanoSim to have an enabling role in the field and benefit the development of scalable NGS technologies for the long nanopore reads, including genome assembly, mutation detection, and even metagenomic analysis software.

Additional details

Read the peer-reviewed publication(s):

Yang, C., Chu, J., Warren, R. L., & Birol, I. (2017). NanoSim: nanopore sequence read simulator based on statistical characterization. GigaScience, 6(4). doi:10.1093/gigascience/gix010

Additional information:

File NameSample IDData TypeFile FormatSizeRelease Date 
Sequence assemblyFASTA4.5 MB2017-02-13
Genome sequenceFASTA4.35 MB2017-02-13
Genome sequenceFASTA154.16 MB2017-02-13
Genome sequenceFASTA129.79 MB2017-02-13
Otherarchive73.51 KB2017-02-13
Otherarchive87.65 KB2017-02-13
Genome sequenceFASTA1.42 GB2017-02-13
Otherarchive166.83 KB2017-02-13
Genome sequenceFASTA240 MB2017-02-13
Otherarchive121.37 KB2017-02-13
Displaying 1-10 of 19 File(s).
Funding body Awardee Award ID Comments
National Institutes of Health R01HG007182
Date Action
March 6, 2017 Manuscript Link added : 10.1093/gigascience/gix010
February 27, 2017 Dataset publish
March 2, 2017 File ecoli_UCSC_assembly.fa updated
March 2, 2017 File ecoli_nanosim_assembly.fa updated
March 2, 2017 File ecoli_readsim_assembly.fa updated