Supporting data for "De novo genome assembly of the Indian Blue Peacock (Pavo cristatus), from Oxford Nanopore and Illumina sequencing"

Dataset type: Genomic
Data released on March 20, 2019

Dhar R; Seethy A; Pethusamy K; Singh S; Rohil V; Purkayastha K; Mukherjee I; Goswami S; Singh R; Raj A; Srivastava T; Acharya S; Rajashekhar B; Karmakar S (2019): Supporting data for "De novo genome assembly of the Indian Blue Peacock (Pavo cristatus), from Oxford Nanopore and Illumina sequencing" GigaScience Database. http://dx.doi.org/10.5524/100559

DOI10.5524/100559

Pavo cristatus, the Indian peafowl are located in natural habitats of South Asia. The male blue peacock bird is known for its elegance, majestic looks and beauty. Since prehistoric times they have been described in Indian culture and has been adopted as the national bird of India. In this study, we present the first draft genome sequence of the peacock using Illumina and Oxford Nanopore technologies (ONT).
ONT sequencing resulted in approximately 2.3-fold sequencing coverage, whereas Illumina generated 150 bp paired-end sequence data at 284.6-fold sequencing coverage from five libraries. Subsequently, we generated de novo genome assembly of the peacock genome with a 0.915 Gigabases (Gb) with a scaffold N50 of 0.23 Megabases (Mb). We also predicted that the peacock genome contains 23,153 protein-coding genes and 75.3 Mb (7.33%) of repetitive sequences.
We report a high-quality genome assembly of the peacock using a hybrid assembly generated from Illumina and ONT sequencing platforms. Long read chemistry generated from ONT was found to be useful in addressing challenges related to de novo assembly particularly at regions containing repetitive sequences that span longer than the read length, and which cannot be resolved using only short-read-based assembly. The contig assembly on the short reads from Illumina resulted in an N50 of 1639 bases, whereas using 2.3x coverage from ONT increased the N50 by nine fold to 14,749 bases. The initial contig assembly based on Illumina sequencing reads alone resulted in total of 685,241 contigs. Further scaffolding on assembled contigs using both Illumina and ONT sequencing reads resulted in a final assembly having 15,025 super scaffolds with a N50 of about 0.23 Mb. The completeness of our genome assembly was verified with the fact that 95% of proteins predicted by homology were matched to those submitted in public repository. Further in concordance with other phylogenetic studies, the avian phylogeny on the conserved genes showed P. cristatus being closest with Gallus gallus followed by Meleagris gallopavo and Anas platyrhynchos. In comparison to the recently published peacock genome assembly, the current hybrid assembly appears to be much superior with a greater sequencing depth, lesser non-ATGC in the assembly and with a reduced number of scaffolds as evident by a nearly 9.1-fold improvement in N50 statistics.

Additional details

Read the peer-reviewed publication(s):

(PubMed: 31077316)

Additional information:

https://biit.cs.ut.ee/supplementary/peacock/

Accessions (data generated as part of this study):

BioProject: PRJNA413288





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
SO_6221_FPL_3_5KB9049blue peafowlIndian peafowlPavo cristatus Description:DNA from PBMNC cells, isolated from an...
Alternative accession-BioSample:SAMN07739101
Alternative names:Indian peafowl
...
+
SO_6221_FPL_5_7KB9049blue peafowlIndian peafowlPavo cristatus Description:DNA from PBMNC cells, isolated from an...
Alternative accession-BioSample:SAMN07739102
Alternative names:Indian peafowl
...
+
SO_6221_FPL_7_10KB9049blue peafowlIndian peafowlPavo cristatus Description:DNA from PBMNC cells, isolated from an...
Alternative accession-BioSample:SAMN07739103
Alternative names:Indian peafowl
...
+
SO_6221_NP9049blue peafowlIndian peafowlPavo cristatus Description:DNA from PBMNC cells, isolated from an...
Alternative accession-BioSample:SAMN07739107
Alternative names:Indian peafowl
...
+
SO_6221_SKPea2016_LI9049blue peafowlIndian peafowlPavo cristatus Description:DNA from PBMNC cells, isolated from an...
Alternative accession-BioSample:SAMN07739104
Alternative names:Indian peafowl
...
+
SO_6221_SKPea2016_SI9049blue peafowlIndian peafowlPavo cristatus Description:DNA from PBMNC cells, isolated from an...
Alternative accession-BioSample:SAMN07739105
Alternative names:Indian peafowl
...
+
Displaying 1-6 of 6 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
AlignmentsFASTA-aln1.27 MB2019-02-20
Sequence assemblyarchive295.98 MB2019-02-13
AnnotationFASTA12.18 MB2019-02-13
AnnotationUNKNOWN38.21 MB2019-02-13
Phylogenetic treeUNKNOWN2.15 KB2019-02-13
ReadmeTEXT4.88 KB2019-03-20
Sequence assemblyarchive362.33 KB2019-02-13
Sequence assemblyarchive356.34 KB2019-02-13
Sequence assemblyarchive388.04 KB2019-02-13
Sequence assemblyarchive373.34 KB2019-02-13
Displaying 1-10 of 16 File(s).
Date Action
March 20, 2019 Dataset publish
March 20, 2019 readme_100559.txt: file attribute updated
March 20, 2019 File readme_100559.txt updated
March 20, 2019 File readme_100559.txt updated
March 25, 2019 Manuscript Link added : 10.1093/gigascience/giz038
October 14, 2022 Manuscript Link updated : 10.1093/gigascience/giz038
November 14, 2022 File format and data type for File all_final_concatenated_gblock.fasta updated