Supporting data for "The Whole Genome Sequences and Experimentally Phased Haplotypes of over 100 Personal Genomes"

Dataset type: Genomic
Data released on October 10, 2016

Mao Q; Ciotlos S; Zhang RY; Ball MP; Chin R; Carnevali P; Barua N; Nguyen S; Agarwal MR; Clegg T; Connelly A; Vandewege W; Zaranek AW; Estep PW; Church GM; Drmanac R; Peters BA (2016): Supporting data for "The Whole Genome Sequences and Experimentally Phased Haplotypes of over 100 Personal Genomes" GigaScience Database. http://dx.doi.org/10.5524/100242

DOI10.5524/100242

Since the completion of the Human Genome Project in 2003 it has been estimated that over 200,000 individual whole human genomes have been sequenced; a stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and as such are missing an important aspect of genome biology. In addition, much of the genomic data generated is not available to the public and lacks phenotypic information.
As part of the Personal Genome Project (PGP), 184 participants’ blood samples were collected and processed with Complete Genomics’ Long Fragment Read (LFR) technology. Here we report the results for the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X of 114 of these samples. This level of coverage is approximately 3 fold higher than the read coverage applied to most whole human genome assemblies and was done to ensure the highest quality results. Currently only 114 genomes from this data set are freely available on GigaDB and are associated with rich phenotypic data provided in this manuscript. It is our expectation that all 184 participants’ genomes will be made freely available in the near future, as soon as the individuals concerned have reviewed their own genomes prior to release in accordance with the PGP agreement.
Twenty genomes were sequenced at least two times using independently made LFR barcoded libraries for reproducibility analyses. Additionally, 7 genomes were also sequenced using Complete Genomics’ standard non-barcoded library process.
These genomes represent a unique source of haplotype and phenotype data for the scientific community. In addition, we report 2.6 million high quality rare variants not previously identified in dbSNP or Phase 3 1000 Genomes Project (1KG) data. The data presented here should help expand our understanding of the human genome evolution and functioning.


PLEASE NOTE: The files associated with this dataset are hosted in cold storage, please contact us via email with the details of the dataset and any particular files you would like to download, and we will be happy to make those available to you.

Additional details

Read the peer-reviewed publication(s):


Additional information:

http://personalgenomes.org/

https://my.pgp-hms.org/

Accessions (data referenced by this study):

dbGaP: phs000905.v1.p1





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
hu01F73B9606HumanhumanHomo sapiens Age:21-29
Sex:female [PATO:0000383]
Ethnicity:White
...
+
hu02C8E39606HumanhumanHomo sapiens Age:50-59
Sex:female [PATO:0000383]
Ethnicity:White
...
+
hu03E3D29606HumanhumanHomo sapiens Age:30-39
Sex:male [PATO:0000384]
Ethnicity:White
...
+
hu0486D69606HumanhumanHomo sapiens Age:50-59
Sex:male [PATO:0000384]
Ethnicity:White
...
+
hu04D9CD9606HumanhumanHomo sapiens Age:21-29
Sex:male [PATO:0000384]
Ethnicity:White
...
+
hu066C789606HumanhumanHomo sapiens Age:30-39
Sex:female [PATO:0000383]
Ethnicity:White
...
+
hu085B6D9606HumanhumanHomo sapiens Age:30-39
Sex:female [PATO:0000383]
Ethnicity:White
...
+
hu093F8B9606HumanhumanHomo sapiens Age:50-59
Sex:female [PATO:0000383]
Ethnicity:White
...
+
hu0BCA459606HumanhumanHomo sapiens Age:50-59
Sex:male [PATO:0000384]
Ethnicity:White
...
+
hu0E7AAF9606HumanhumanHomo sapiens Age:21-29
Sex:female [PATO:0000383]
Ethnicity:White
...
+
Displaying 1-10 of 183 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
hu646527Mixed archiveUNKNOWN39.73 GB2016-09-02
huEAE6C8Mixed archiveUNKNOWN39.55 GB2016-09-02
hu66D0AAMixed archiveUNKNOWN37.71 GB2016-09-02
hu02C8E3Mixed archiveUNKNOWN37.93 GB2016-09-02
hu6EDC7EMixed archiveUNKNOWN38.81 GB2016-09-02
hu904B18Mixed archiveUNKNOWN37.21 GB2016-09-02
hu4C20BCMixed archiveUNKNOWN37.85 GB2016-09-02
hu83BC6AMixed archiveUNKNOWN37.01 GB2016-09-02
hu87C6A9Mixed archiveUNKNOWN38.14 GB2016-09-02
hu0E7AAFMixed archiveUNKNOWN38.65 GB2016-09-02
Displaying 1-10 of 116 File(s).
Date Action
October 10, 2016 Dataset publish
April 8, 2024 Description updated to include: PLEASE NOTE: The files associated with this dataset are hosted in cold storage, please contact us via email with the details of the dataset and any particular files you would like to download, and we will be happy to make those available to you.