Data released on December 17, 2014
Here we provide the first de novo haplotype-resolved diploid genome sequence of an Asian individual using a unique de novo assembly pipeline. Our pipeline uses fosmid pooling and whole genome shotgun strategies, based on next generation sequencing (NGS) technology. The assembled genome contains 5.15 Gb, with a haplotype N50 of 484 kb. This haplotype-resolved genome represents the most complete genome assembly so far. Our analysis further identified previously undetected indels and novel coding sequences, and thus provides the most complete representation of an individual’s genetic variation.
We generated ~614,850 fosmid clones ranging from 20 kb-80 kb with a mean of 36kb, approximately 30 fosmid clones were pooled and each pool had one or two DNA libraries sequenced using Hiseq 2000. In total, 1,712 Gb of raw sequence data was generated for all the pooled fosmid libraries. Please see the linked paper for assembly pipeline details. We then analysed the newly generated haploid-resolved diploid genome (HDG) for SNPs, INDELs, inversions and translocations, of which we identified 3,580,000 SNPs, 762,000 short INDELs (<50bp) and 30,000 long INDELs, 111 inversions and 168 translocations.