These data represent the first assembly of a genome sequence for a critically endangered parrot (Amazona vittata) endemic to the United States, and also the first genome of a species from the diverse and ecologically important genus Amazona native to South America and the Caribbean. One sample has been selected from the non-reproductive female at Rio Abajo Breeding Facility in Puerto Rico (IACUC#201109.1), and sequenced on Illumina HiSeq platform with both fragment and paired-end sequencing approaches, resulting in a total of 42,479,499,706 bases. We predicted a total coverage depth of 26.89X of the parrot’s genome: 17.08X coverage for the short fragment reads, and 9.8X coverage for the mate pairs. The sequencing was initiated with the construction of two genome libraries: a short fragment library (~300 bp inserts) for sequencing the majority of the genome, and a long fragment library (~2.5 Kb inserts) to generate scaffolds to be used to order and assemble contigs derived from the short fragment library. The Illumina paired-end and mate-pairs reads were assembled together with Ray (http://denovoassembler.sourceforge.net), with the k-mer defined iteratively. In total, given that the genome size is predicted to be 1.58Gb, with the total scaffold length of 1,184, 594,388 bp, the overall coverage of the genome is around 76%, a value that might be slightly overestimated given that some of the scaffolds may be overlapping but could not be assembled. Filtering followed by assembly resulted in 259,423 contigs (N50=6,983 bp, longest = 75,003 bp), which was further combined into 148,255 scaffolds (N50 = 19,470, longest = 206,462 bp). The database contains all of the contigs, scaffolds, corresponding assembly parameters, and the annotations for the known repeats and coding sequences. The assembled scaffolds allow basic genomic annotation and comparative analyses with other available avian whole-genome sequences.
readme
readme.txt
Genome assembly
Assembly-2011
Additional data
DATABASE.DATA.Repeats_to_scaffolds.txt
DATABASE.ANNOTATION.repeats_to_scaffolds.xls
History
September 11, 2012: Data released.
September 14, 2012: Additional data added.
In accordance with our terms of use, please cite this dataset as:
Oleksyk, TK; Guiblet, W; Pombert, JF; Valentin, R; Martinez-Cruzado, JC (2012): Genomic data of the Puerto Rican Parrot (Amazona vittata) from a locally funded project. GigaScience. http://dx.doi.org/10.5524/
Related manuscript available at:
doi:10.1186/2047-217X-1-14
Accession codes associated with this data:
NCBI BioProject PRJNA171587
EBI Project PRJEB225
NCBI GenBank AOCU00000000
