An updated reference human genome dataset of the BGISEQ-500 sequencer

Dataset type: Genomic
Data released on March 22, 2017

Huang J; Liang X; Xuan Y; Geng C; Li Y; Qu S; Lu H; Mei X; Chen H; Yu T; Sun N; Jiang H; Liu X; Yang Z; Mu F; Gao S (2017): An updated reference human genome dataset of the BGISEQ-500 sequencer GigaScience Database. http://dx.doi.org/10.5524/100274

DOI10.5524/100274

The BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoballs (DNB) and combinational probe-anchor synthesis (cPAS) developed from Complete Genomics™ sequencing technology, it generates short reads at a large scale, which can help fulfill the growing demands for sequencing. Here, we present the first human whole genome sequencing dataset from the BGISEQ-500. The dataset was generated by sequencing the widely-used Genome in a Bottle Consortium cell line, HG001 (NA12878). We have previously released the paired end 50bp (PE50) sequences (DOI:10.5524/100252) and here we present the PE100 reads from same sample, together with the assembled genome. We also included examples of the raw images from the sequencer for reference. Finally, we carried out variation calling based on the dataset and compared that to similar amounts of publicly available HiSeq2500 data and the previously identified high confident variations in this previously sequenced genome.

Additional details

Read the peer-reviewed publication(s):

(PubMed: 28379488)

Related datasets:

doi:10.5524/100274 IsNewVersionOf doi:10.5524/100252
doi:10.5524/100274 IsPreviousVersionOf doi:10.5524/100449 (It is a more recent version of this dataset)


There is a new version of this dataset available at: DOI: 10.5524/100449

Accessions (data generated as part of this study):

BioProject: PRJEB15427





File NameSample IDData TypeFile FormatSizeRelease Date 
Sequence variantsVCF204.79 MB2017-03-21
Sequence variantsVCF191.77 MB2017-03-21
NA12878Genome sequenceFASTQ20.78 GB2017-01-23
NA12878Genome sequenceFASTQ23.47 GB2017-01-23
NA12878Genome sequenceFASTQ22.19 GB2017-01-23
NA12878Genome sequenceFASTQ24.2 GB2017-01-23
Sequence variantsVCF199.98 MB2017-03-21
MD5sumTEXT0.25 KB2017-01-23
ReadmeTEXT0.67 KB2017-01-23
Displaying 1-9 of 9 File(s).
Date Action
March 22, 2017 Dataset publish
April 4, 2017 Manuscript Link added : 10.1093/gigascience/gix024
April 18, 2017 File CL100011513_L01_read_2.fq.gz updated
April 18, 2017 File md5.txt updated
April 18, 2017 File CL100011513_L01_read_2.fq.gz updated
April 18, 2017 File CL100008589_L02_read_1.fq.gz updated
April 18, 2017 File readme.txt updated
April 18, 2017 File CL100011513_L01_read_1.fq.gz updated
April 18, 2017 File CL100008589_L02_read_2.fq.gz updated
April 18, 2017 File CL100008589_L02_read_2.fq.gz updated
April 18, 2017 Link updated : BioProject:PRJEB15427
November 29, 2018 File CL100008589_L02_read_1.fq.gz updated
November 29, 2018 File BGISEQ-500_PE50.vcf.gz updated
November 29, 2018 File BGISEQ-500_PE100.vcf.gz updated
November 29, 2018 File Hiseq_PE148.vcf.gz updated
November 9, 2022 Manuscript Link updated : 10.1093/gigascience/gix024