De novo high-coverage sequencing and annotated assemblies of the budgerigar genome

Dataset type: Genome-Mapping, Genomic, Software, Transcriptomic
Data released on July 22, 2013

Ganapathy G; Howard JT; Koren S; Phillippy A; Zhou S; Schwartz D; Schatz M; Aboukhalil R; Ward JM; Li J; Li B; Fedrigo O; Bukovnik L; Wang T; Wray G; Rasolonjatovo I; Winer R; Knight JR; Warren W; Zhang G; Jarvis ED (2013): De novo high-coverage sequencing and annotated assemblies of the budgerigar genome GigaScience Database.


Background: Parrots are considered one of the most behaviorally advanced vertebrate groups. They have an advanced ability of vocal learning. Parrots can imitate human speech, synchronize their body movements to a rhythmic beat, and understand complex concepts of referential meaning to sounds. However, very little is known about the genetics of these traits. In order to understand the molecular and genetic basis of these traits we need whole genome sequencing and a robust assembly of coding and noncoding regions of a parrot genome including regulatory regions and repetitive elements.
Findings: Here we present a genomic resource for the budgerigar, an Australian Parakeet (Melopsittacus undulatus) that is the most widely used parrot species for studying vocal learning. Specifically, we present the genomic reads, four high quality annotated assemblies and optical maps. This sequence reads were in part used for the Assemblathon 2 competition (see dataset doi:10.5524/100060). The sequence data presented here includes over 300X raw read coverage from multiple sequencing technologies (454 Titanium, 454 Flexplus, Illumina and Pacific Biosciences) and chromosome optical maps from a single male animal. The reads and optical maps were used to create hybrid assemblies representing some of the largest genome scaffolds to date for a bird genome using next generation sequence technology. Annotation of these assemblies was generated using brain transcriptome sequence assemblies.
Conclusions: Along several quality metric dimensions, these assemblies are comparable to or better than the Chicken and Zebra Finch genome assemblies that were built from traditional Sanger sequencing reads. These assemblies are sufficient to analyze difficult to sequence and assemble regions, including those not yet assembled in the finch genome, and promoter regions of genes deferentially regulated in vocal learning brain regions.

Additional details

Read the peer-reviewed publication(s):

Koren, S., Schatz, M. C., Walenz, B. P., Martin, J., Howard, J. T., Ganapathy, G., … Phillippy, A. M. (2012). Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nature Biotechnology, 30(7), 693–700. doi:10.1038/nbt.2280 (PubMed: 22750884)
Zhang, G., Li, B., Li, C., Gilbert, M. T. P., Jarvis, E. D., & Wang, J. (2014). Comparative genomic data of the Avian Phylogenomics Project. GigaScience, 3(1). doi:10.1186/2047-217x-3-26
Zhang, G., Li, C., Li, Q., Li, B., Larkin, D. M., Lee, C., … Meredith, R. W. (2014). Comparative genomics reveals insights into avian genome evolution and adaptation. Science, 346(6215), 1311–1320. doi:10.1126/science.1251385
Jarvis, E. D., Mirarab, S., Aberer, A. J., Li, B., Houde, P., Li, C., … Howard, J. T. (2014). Whole-genome analyses resolve early branches in the tree of life of modern birds. Science, 346(6215), 1320–1331. doi:10.1126/science.1253451
Ganapathy, G., Howard, J. T., Ward, J. M., Li, J., Li, B., Li, Y., … Jarvis, E. D. (2014). High-coverage sequencing and annotated assemblies of the budgerigar genome. GigaScience, 3(1). doi:10.1186/2047-217x-3-11

Genome browser:

Accessions (data generated as part of this study):

ENA: ERP002324
ENA: ERS222880

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
SAMEA171377413146Melopsittacus undulatusbudgerigarMelopsittacus undulatus Alternative accession-BioSample:SAMEA1713774
Cell type:blood
Broad-scale environmental context:anthropogenic te...
SAMEA83605213146Melopsittacus undulatusbudgerigarMelopsittacus undulatus Broad-scale environmental context:anthropogenic te...
Alternative accession-BioSample:SAMEA836052
Body site:Brain [UBERON:0000955]
Displaying 1-2 of 2 Sample(s).

File NameSample IDData TypeFile FormatSizeRelease Date 
SAMPLE:SAMEA1713774OtherPDF7.78 KB2013-02-17
SAMPLE:SAMEA1713774OtherPDF5.9 KB2013-02-17
SAMPLE:SAMEA1713774OtherPDF13.98 KB2013-02-17
SAMPLE:SAMEA1713774OtherPDF4.61 KB2013-02-17
SAMPLE:SAMEA836052Transcriptome sequenceSFF2.04 GB2013-06-26
SAMPLE:SAMEA1713774OtherPDF8.04 KB2013-02-17
SAMPLE:SAMEA1713774OtherPDF5.33 KB2013-02-17
Tabular dataEXCEL22.41 KB2014-03-26
SAMPLE:SAMEA1713774Sequence assemblyFASTA1.06 GB2013-02-26
SAMPLE:SAMEA1713774AnnotationGFF10.61 MB2013-02-26
Displaying 1-10 of 133 File(s).
Date Action
October 15, 2015 File umd.mega.fa.gff updated
October 15, 2015 File Melopsittacus_undulatus.gene.cds updated
October 29, 2015 File Melopsittacus_undulatus.fa.gz updated
October 29, 2015 File Melopsittacus_undulatus.gene.cds updated
October 29, 2015 File Melopsittacus_undulatus.gene.gff updated
October 29, 2015 File Melopsittacus_undulatus.gff.gz updated
November 5, 2015 File Melopsittacus_undulatus.cds.gz updated
November 5, 2015 File Melopsittacus_undulatus.fa.gz updated
November 5, 2015 File Melopsittacus_undulatus.gff.gz updated
November 5, 2015 File Melopsittacus_undulatus.pep.gz updated
November 13, 2017 External Link updated :
December 6, 2018 Sample updated : SAMPLE:SAMN1985454
December 10, 2018 Sample updated : SAMPLE:SAMEA836052. The original accession given was actually the NCBI UID
December 10, 2018 Sample updated : SAMPLE:SAMEA1713774. The original accession given was actually the NCBI UID
May 1, 2020 File size of ap_v6_sli_asm.scf_SwaI_map100211_5_8.xml updated
May 1, 2020 File size of ParrotOM100211_bird_9C_scaffoldsC_SwaI_5_8.xml updated
May 1, 2020 File size of BGIMUN1.120628.gene.withUTR.pep updated