Help Login Create account

Data released on September 13, 2017

High quality chimpanzee reference genome (Pan_tro_3.0) from hybrid assembly approach

Armero, A, S; Armstrong, J; Fiddes, I; Garrido, J, G; Gordon, D; Hillier, L, W; Ho, D; Huddleston, J; Kuderna, L, F; Laayouni, H; Perez, R, G; Povolotskaya, I; Tomlinson, C; Tran, A; Alioto, T; Ribeca, P; Green, R, E; Paten, B; Betranpetit, J; Navarro, A; Herrero, J; Eichler, E, E; Sharp, A, J; Feuk, L; Warren, W, C; Marques-Bonet, T (2017): High quality chimpanzee reference genome (Pan_tro_3.0) from hybrid assembly approach GigaScience Database. http://dx.doi.org/10.5524/100327 RIS BibTeX Text

The chimpanzee is arguably the most important species for the study of human origins. A key resource for these studies is a high quality reference genome assembly. The current iteration of the chimpanzee reference genome assembly (Pan_tro_2.1.4) is highly fragmented, with more than 183,000 contigs and incorporating over 159,000 gaps, with a genome wide contig N50 of 51 Kbp.
In this work we produce an extensive and diverse array of sequencing datasets to rapidly assemble a new chimpanzee reference that surpasses previous iterations in bases represented and organized in large scaffolds. We show substantial improvements over the Pan_tro_2.1.4 version by several metrics: increased contiguity by >750% and 300% on contigs and scaffolds, respectively; closure of 77% of gaps in the Pan_tro_2.1.4 assembly gaps spanning >850 Kbp of novel coding sequence based on RNASeq data. We furthermore report over 2,700 genes that had putatively erroneous frame-shift predictions to human in Pan_tro_2.1.4 and show a substantial increase in the annotation of repetitive elements.
We apply a simple 3-way hybrid approach to considerably improve the reference genome assembly for the chimpanzee, providing a valuable resource to study human origins. We furthermore produced extensive sequencing datasets that are all derived from the same cell line, generating a broad non-human benchmark dataset.

Contact Submitter

Related manuscripts:

doi:10.1093/gigascience/gix098

Accessions (data included in GigaDB):

WGS: AACZ04000000
BioProject: PRJEB18078

Keywords:

Chimpanzee reference genome Assembly Genomics 

Genomic

http://gigadb.org/images/data/cropped/100327.jpg

Funding:

  • Funding body - Swedish Foundation for Strategic Research
  • Award ID - F06-0045
  • Awardee - L Feuk
  • Funding body - National Institutes of Health
  • Award ID - DA033660
  • Awardee - AJ Sharp
  • Funding body - National Institutes of Health
  • Award ID - HG006696
  • Awardee - AJ Sharp
  • Funding body - National Institutes of Health
  • Award ID - HD073731
  • Awardee - AJ Sharp
  • Funding body - National Institutes of Health
  • Award ID - MH097018
  • Awardee - AJ Sharp
  • Funding body - Ministerio de Economía y Competitividad
  • Award ID - MINECO BFU2014-55090-P
  • Awardee - T Marques-Bonet
  • Funding body - National Institutes of Health
  • Award ID - HG002385
  • Awardee - EE Eichler
  • Funding body - National Institutes of Health
  • Award ID - HG007990
  • Awardee - B Paten
  • Funding body - National Institutes of Health
  • Award ID - HG007234
  • Awardee - B Paten
  • Funding body - March of Dimes Foundation
  • Award ID - 6-FY13-92
  • Awardee - AJ Sharp
  • Funding body - Ministerio de Economía y Competitividad
  • Award ID - BFU2015-6215-ERC
  • Awardee - T Marques-Bonet
  • Funding body - Ministerio de Economía y Competitividad
  • Award ID - BFU2015-7116-ERC
  • Awardee - T Marques-Bonet
  • Funding body - Ministerio de Economía y Competitividad
  • Award ID - BFU2014-55090-P
  • Comment - FPI fellowship
  • Awardee - LFK Kuderna

Samples: Table Settings

Columns:

Common Name
Scienfic Name
Sample Attributes
Taxonomic ID
Genbank Name

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
SAMEA45578389598 chimpanzeePan troglodytes Description:DNA extracted from Chimpanzee cell lin...
Specimen voucher:Coriell:S006007
Sex:male
...
+
Displaying 1-1 of 1 Sample(s).

Files: (FTP site) Table Settings

Columns:

File Description
Sample ID
File Type
File Format
Size
Release Date
Download Link
File Attributes

File NameSample IDFile TypeFile FormatSizeRelease Date 
ReadmeTEXT2.37 KB2017-08-03
Sequence variantsTSV85.75 KB2017-08-24
OtherBED124.95 KB2017-08-24
Scriptzip200.92 KB2017-08-24
Sequence variantsTSV8.17 MB2017-08-24
annotationUNKNOWN25.03 MB2017-08-03
annotationUNKNOWN181.84 MB2017-08-24
Genome assemblyFASTA896.82 MB2017-08-24
Genome sequenceFASTA932.4 MB2017-08-24
Displaying 1-9 of 9 File(s).

History:

+

Other datasets you might like: