Data released on December 17, 2014

Supporting material for: De novo assembly of a haplotype-resolved human genome.

Cao, H; Chen, D; Feng, Q; Gao, P; He, G; Huang, H; Huang, S; Huang, W; Huang, Z; Li, B; Li, J, ; Li, Y; Liu, B; Liu, S; Liu, X; Luo, R; Sun, J; Sun, P; Sun, Y; Tellier, L, C; Tong, X; Wang, Y; Wu, H; Xie, Y; Xu, X; Yang, F; Yang, H; Zhang, X; Zheng, H; Bolund, L; Kristiansen, K; Krogh, A; Goodman, L; Drmanac, R; Drmanac, S, A; Luo, Q; Li, S; Wang, J; Yang, H; Li, Y; Wong, G, K; Wang, J (2014): Supporting material for: De novo assembly of a haplotype-resolved human genome. GigaScience Database. RIS BibTeX Text

Here we provide the first de novo haplotype-resolved diploid genome sequence of an Asian individual using a unique de novo assembly pipeline. Our pipeline uses fosmid pooling and whole genome shotgun strategies, based on next generation sequencing (NGS) technology. The assembled genome contains 5.15 Gb, with a haplotype N50 of 484 kb. This haplotype-resolved genome represents the most complete genome assembly so far. Our analysis further identified previously undetected indels and novel coding sequences, and thus provides the most complete representation of an individual’s genetic variation.
We generated ~614,850 fosmid clones ranging from 20 kb-80 kb with a mean of 36kb, approximately 30 fosmid clones were pooled and each pool had one or two DNA libraries sequenced using Hiseq 2000. In total, 1,712 Gb of raw sequence data was generated for all the pooled fosmid libraries. Please see the linked paper for assembly pipeline details. We then analysed the newly generated haploid-resolved diploid genome (HDG) for SNPs, INDELs, inversions and translocations, of which we identified 3,580,000 SNPs, 762,000 short INDELs (<50bp) and 30,000 long INDELs, 111 inversions and 168 translocations.

Read the peer-reviewed publication(s):

Cao, H., Wu, H., Luo, R., Huang, S., Sun, Y., Tong, X., … Wang, J. (2015). De novo assembly of a haplotype-resolved human genome. Nature Biotechnology, 33(6), 617–622. doi:10.1038/nbt.3200

Related datasets:

doi:10.5524/100096 IsSupplementTo doi:10.5524/100038
doi:10.5524/100096 IsSupplementTo doi:10.5524/100097
doi:10.5524/100096 IsCitedBy doi:10.5524/100318


Samples: Table Settings


Common Name
Scienfic Name
Sample Attributes
Taxonomic ID
Genbank Name

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
YH_cell_line9606HumanhumanHomo sapiens
Displaying 1-1 of 1 Sample(s).

Files: (FTP site) Table Settings


File Description
Sample ID
Data Type
File Format
Release Date
Download Link
File Attributes

File NameSample IDData TypeFile FormatSizeRelease Date 
YH_cell_lineSequence assemblyFASTA1.44 GB2014-12-17
MD5sumTEXT0.05 KB2014-12-17
ReadmeTEXT0.28 KB2014-07-24
YH_cell_lineSequence assemblyFASTA6.64 GB2014-12-17
MD5sumTEXT0.05 KB2014-12-17
Displaying 1-5 of 5 File(s).



