Data released on November 15, 2016
Ginkgo biloba is one of the world’s most ancient plants, a living fossil that has remained essentially unchanged in terms of gross morphology for more than 200 million years. Representing one of the four extant gymnosperm lineages and having no living relatives, it possesses a suite of fascinating characteristics including including a large genome, outstanding resistance/tolerance to abiotic and biotic stresses, and dioecious reproduction, making it an ideal model species for biological studies. However, the lack of a high-quality genome sequence has been an impediment to our understanding of its biology and evolution.
Here, we assembled and annotated the first complete genome sequence of ginkgo, which will will aid related research. Genomic DNA was extracted from endosperm tissue from ginkgo seeds that develop directly from female gametophytes without fertilization and thus contain haploid genomes without undergoing genetic recombination. Multiple seeds at visually different developmental stages were collected from five separate large ginkgo trees at one of the ginkgo refuge populations located on Tianmu Mountain, Zhejiang Province, China. All paired-end libraries and one mate-pair library (2 kb) were constructed using DNA extracted from a single seed. Data were generated using a Hiseq 2000/4000 platform from 1253.09 Gb clean data. The resulting assembled 10.61 Gb genome sequence contained 41,840 annotated genes, with N50 values of 48.2 kb for contigs and 1.36 Mb for scaffolds, respectively. Repetitive sequences account for 76.58 % of the assembled sequence, and long terminal repeat retrotransposons (LTR-RTs) are particularly prevalent.