Supporting material for: De novo assembly of a haplotype-resolved human genome.

Dataset type: Genomic
Data released on December 17, 2014

Here we provide the first de novo haplotype-resolved diploid genome sequence of an Asian individual using a unique de novo assembly pipeline. Our pipeline uses fosmid pooling and whole genome shotgun strategies, based on next generation sequencing (NGS) technology. The assembled genome contains 5.15 Gb, with a haplotype N50 of 484 kb. This haplotype-resolved genome represents the most complete genome assembly so far. Our analysis further identified previously undetected indels and novel coding sequences, and thus provides the most complete representation of an individual’s genetic variation.
We generated ~614,850 fosmid clones ranging from 20 kb-80 kb with a mean of 36kb, approximately 30 fosmid clones were pooled and each pool had one or two DNA libraries sequenced using Hiseq 2000. In total, 1,712 Gb of raw sequence data was generated for all the pooled fosmid libraries. Please see the linked paper for assembly pipeline details. We then analysed the newly generated haploid-resolved diploid genome (HDG) for SNPs, INDELs, inversions and translocations, of which we identified 3,580,000 SNPs, 762,000 short INDELs (<50bp) and 30,000 long INDELs, 111 inversions and 168 translocations.

Additional details

Read the peer-reviewed publication(s):

Cao, H., Wu, H., Luo, R., Huang, S., Sun, Y., Tong, X., … Wang, J. (2015). De novo assembly of a haplotype-resolved human genome. Nature Biotechnology, 33(6), 617–622. doi:10.1038/nbt.3200

Related datasets:

doi:10.5524/100096 IsSupplementTo doi:10.5524/100038
doi:10.5524/100096 IsSupplementTo doi:10.5524/100097
doi:10.5524/100096 IsCitedBy doi:10.5524/100318

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
YH_cell_line9606HumanhumanHomo sapiens
Displaying 1-1 of 1 Sample(s).

File NameSample IDData TypeFile FormatSizeRelease Date 
YH_cell_lineSequence assemblyFASTA1.44 GB2014-12-17
MD5sumTEXT0.05 KB2014-12-17
ReadmeTEXT0.28 KB2014-07-24
YH_cell_lineSequence assemblyFASTA6.64 GB2014-12-17
MD5sumTEXT0.05 KB2014-12-17
Displaying 1-5 of 5 File(s).
Date Action
May 29, 2015 Manuscript Link added : 10.1038/nbt.3200
July 5, 2017 Relationship updated : DOI 100318