Supporting data for "Chiron: Translating nanopore raw signal directly into nucleotide sequence using deep learning"

Dataset type: Software
Data released on April 10, 2018

Teng H; Cao MD; Hall MB; Duarte T; Wang S; Coin LJM (2018): Supporting data for "Chiron: Translating nanopore raw signal directly into nucleotide sequence using deep learning" GigaScience Database. http://dx.doi.org/10.5524/100425

DOI10.5524/100425

Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology which offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling: directly translating the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4000 reads, we show that our model provides state-of-the-art basecalling accuracy even on previously unseen species. Chiron achieves basecalling speeds of over 2000 bases per second using desktop computer graphics processing units, making it competitive with other deep-learning basecalling algorithms.

Additional details

Read the peer-reviewed publication(s):

Teng, H., Cao, M. D., Hall, M. B., Duarte, T., Wang, S., & Coin, L. J. M. (2018). Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning. GigaScience, 7(5). doi:10.1093/gigascience/giy037

Additional information:

https://github.com/haotianteng/chiron

https://pypi.python.org/pypi/chiron

https://github.com/nanopore-wgs-consortium/NA12878

Accessions (data included in GigaDB):

BioProject: PRJNA386696
SRA: SRP136964





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
NA128789606HumanhumanHomo sapiens Description:
Age:not provided
Source material identifiers:Coriell:NA12878
...
+
MT208231773  Mycobacterium tuberculosis Description:Oxford Nanopore MinION and Illumina se...
Collected by:CPHL
Collection date:30-Jun-2013
...
+
Displaying 1-2 of 2 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
Mixed archiveGZIP519.48 MB2018-03-19
GitHub archivearchive81.55 MB2018-03-08
mixed archiveTAR457.81 MB2018-03-08
GitHub archivearchive2.58 MB2018-03-08
Mixed archiveGZIP24.71 GB2018-03-19
ReadmeTEXT2.04 KB2018-03-08
mixed archiveTAR447.71 MB2018-03-08
Displaying 1-7 of 7 File(s).
Funding body Awardee Award ID Comments
National Health and Medical Research Council LJM Coin GNT1130084
Australian Research Council LJM Coin DP170102626
MB Hall Westpac Future Leaders Scholarship
Date Action
April 10, 2018 Dataset publish
July 4, 2018 Manuscript Link added : 10.1093/gigascience/giy037