High quality chimpanzee reference genome (Pan_tro_3.0) from hybrid assembly approach
Dataset type: Genomic
Data released on September 13, 2017
Armero AS; Armstrong J; Fiddes I; Garrido JG; Gordon D; Hillier LW; Ho D; Huddleston J; Kuderna LFK; Laayouni H; Perez RG; Povolotskaya I; Tomlinson C; Tran A; Alioto T; Ribeca P; Green RE; Paten B; Betranpetit J; Navarro A; Herrero J; Eichler EE; Sharp AJ; Feuk L; Warren WC; Marques-Bonet T (2017): High quality chimpanzee reference genome (Pan_tro_3.0) from hybrid assembly approach GigaScience Database. http://dx.doi.org/10.5524/100327
The chimpanzee is arguably the most important species for the study of human origins. A key resource for these studies is a high quality reference genome assembly. The current iteration of the chimpanzee reference genome assembly (Pan_tro_2.1.4) is highly fragmented, with more than 183,000 contigs and incorporating over 159,000 gaps, with a genome wide contig N50 of 51 Kbp.
In this work we produce an extensive and diverse array of sequencing datasets to rapidly assemble a new chimpanzee reference that surpasses previous iterations in bases represented and organized in large scaffolds. We show substantial improvements over the Pan_tro_2.1.4 version by several metrics: increased contiguity by >750% and 300% on contigs and scaffolds, respectively; closure of 77% of gaps in the Pan_tro_2.1.4 assembly gaps spanning >850 Kbp of novel coding sequence based on RNASeq data. We furthermore report over 2,700 genes that had putatively erroneous frame-shift predictions to human in Pan_tro_2.1.4 and show a substantial increase in the annotation of repetitive elements.
We apply a simple 3-way hybrid approach to considerably improve the reference genome assembly for the chimpanzee, providing a valuable resource to study human origins. We furthermore produced extensive sequencing datasets that are all derived from the same cell line, generating a broad non-human benchmark dataset.
Read the peer-reviewed publication(s):
Kuderna, L. F. K., Tomlinson, C., Hillier, L. W., Tran, A., Fiddes, I. T., Armstrong, J., … Marques-Bonet, T. (2017). A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0). GigaScience, 6(11). doi:10.1093/gigascience/gix098
Accessions (data included in GigaDB):
|Sample ID||Taxonomic ID||Common Name||Genbank Name||Scientific Name||Sample Attributes|
|SAMEA4557838||9598||chimpanzee||Pan troglodytes|| Description:DNA extracted from Chimpanzee cell lin...|
... Description:DNA extracted from Chimpanzee cell line used for multiple sequencing experiments (PacBio RS II sequencing, HiSeq 2500 sequencing , Chicago Library Sequencing (Dovetail Genomics), TruSeq SLR / Moleculo sequencing )
Common name:western chimpanzee
Cell type:Finite cell line