Help Login Create account

Data released on July 12, 2017

Supporting data for "Deep whole-genome sequencing of 90 Han Chinese genomes"

Lan, T; Lin, H; Asker Melchior Tellier, L, C; Zhu, W; Yang, M; Liu, X; Wang, J; Wang, J; Yang, H; Xu, X; Guo, X (2017): Supporting data for "Deep whole-genome sequencing of 90 Han Chinese genomes" GigaScience Database. http://dx.doi.org/10.5524/100302 RIS BibTeX Text

Next generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data, due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low frequency and novel variants. Although whole exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole genome sequencing data is limited for any population, and a large amount of low-frequency, population-specific variants remains uncharacterized. We have performed whole genome sequencing at high depth (~80X) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genome Project samples, including 45 North Han Chinese and 45 South Han Chinese samples. 83 of these 90 have not been sequenced by the 1000 Genomes Project. We have identified 12,568,804 single nucleotide polymorphisms, 2,074,734 short InDels and 26,142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7,007,685 novel variants with low frequency (defined as minor allele frequency < 5%), including 5,816,839 SNPs, 1,172,919 InDels, and 17,927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, this Han Chinese deep sequencing data enhances characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement for the 1000 Genomes Project, as well as for other human genome projects.

Contact Submitter

Additional information:

https://github.com/HaoxiangLin/WGS_of_Han_Chinese_genomes

Protocols.io:

+

Accessions (data included in GigaDB):

BioProject: PRJEB11005

Genomic

/images/uploads/image_upload/Images_386.png

Funding:

  • Funding body - Shenzhen Municipal Government of China
  • Location - China
  • Award ID - CXB201108250094A

Samples: Table Settings

Columns:

Common Name
Scienfic Name
Sample Attributes
Taxonomic ID
Genbank Name

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
SRS0001119606HumanhumanHomo sapiens Miscallaneous parameter:Coriell panel:MGP00017
Miscallaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18526
...
+
SRS0001129606HumanhumanHomo sapiens Miscallaneous parameter:Coriell panel:MGP00017
Miscallaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18532
...
+
SRS0001139606HumanhumanHomo sapiens Miscallaneous parameter:Coriell panel:MGP00017
Miscallaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18537
...
+
SRS0001149606HumanhumanHomo sapiens Miscallaneous parameter:Coriell panel:MGP00017
Miscallaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18542
...
+
SRS0001159606HumanhumanHomo sapiens Miscallaneous parameter:Coriell panel:MGP00017
Miscallaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18545
...
+
SRS0001169606HumanhumanHomo sapiens Miscallaneous parameter:Coriell panel:MGP00017
Miscallaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18547
...
+
SRS0001179606HumanhumanHomo sapiens Miscallaneous parameter:Coriell panel:MGP00017
Miscallaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18550
...
+
SRS0001189606HumanhumanHomo sapiens Miscallaneous parameter:Coriell panel:MGP00017
Miscallaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18552
...
+
SRS0001199606HumanhumanHomo sapiens Miscallaneous parameter:Coriell panel:MGP00017
Miscallaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18555
...
+
SRS0001209606HumanhumanHomo sapiens Miscallaneous parameter:Coriell panel:MGP00017
Miscallaneous parameter:Coriell plate:HAPMAPPT02
Source material identifiers:Coriell:GM18558
...
+
Displaying 1-10 of 90 Sample(s).

Files: (FTP site) Table Settings

Columns:

File Description
Sample ID
File Type
File Format
Size
Release Date
Download Link
File Attributes

File NameSample IDFile TypeFile FormatSizeRelease Date 
Sequence assemblyFASTA841.14 MB2017-04-17
Sequence assemblyFASTA844.32 MB2017-04-17
Sequence assemblyFASTA839.64 MB2017-04-17
Sequence assemblyFASTA848.45 MB2017-04-17
Sequence assemblyFASTA844.73 MB2017-04-17
Sequence assemblyFASTA837.79 MB2017-04-17
Sequence assemblyFASTA843.08 MB2017-04-17
Sequence assemblyFASTA845.63 MB2017-04-17
Sequence assemblyFASTA840.25 MB2017-04-17
Sequence assemblyFASTA843.35 MB2017-04-17
Displaying 31-40 of 103 File(s).

History:

+

Other datasets you might like: