Supporting data for "The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies"

Dataset type: Genomic
Data released on November 27, 2019

Wang W; Das A; Kainer D; Schalamun M; Morales-Suarez A; Schwessinger B; Lanfear R (2019): Supporting data for "The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies" GigaScience Database. http://dx.doi.org/10.5524/100679

DOI10.5524/100679

Eucalyptus pauciflora (the snow gum) is a long-lived tree with high economic and ecological importance. Currently, little genomic information for Eucalyptus pauciflora is available. Here, we sequentially assemble the genome of Eucalyptus pauciflora with different methods, and combine multiple existing and novel approaches to help to select the best genome assembly. We generated high coverage of long- (Nanopore, 174x) and short- (Illumina, 228x) read data from a single Eucalyptus pauciflora individual and compared assemblies from five assemblers (Canu, SMARTdenovo, Flye, Marvel, and MaSuRCA) with different read lengths (1kb and 35 kb minimum read length). A key component of our approach is to keep a randomly selected collection of ~10% of both long- and short-reads separated from the assemblies to use as a validation set for assessing assemblies. Using this validation set along with a range of existing tools, we compared the assemblies in eight ways: contig N50, BUSCO scores, LAI (Long terminal repeat Assembly Index) scores, assembly ploidy, base-level error rate, CGAL (Computing Genome Assembly Likelihoods) scores, structural variation, and genome sequence similarity. Our result showed that MaSuRCA generated the best assembly, which is 594.87 Mb in size, with a contig N50 of 3.23 Mb, and an estimated error rate of ~0.006 errors per base. We report a draft genome of Eucalyptus pauciflora , which will be a valuable resource for further genomic studies of eucalypts. The approaches for assessing and comparing genomes, should help in assessing and choosing among many potential genome assemblies from a single dataset.

Additional details

Read the peer-reviewed publication(s):

(PubMed: 31895413)

Github links:

https://github.com/asdcid/Eucalyptus-pauciflora-genome-assembly

https://github.com/asdcid/Genome_Assembly_Assessment

Accessions (data generated as part of this study):

BioProject: PRJNA450887





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
SAMN0919723387676  Eucalyptus pauciflora Description:Genomic DNA extracted from leaf sample...
Alternative accession-BioSample:SAMN09197233
Alternative accession-BioProject:PRJNA450887
...
+
SAMN0919723487676  Eucalyptus pauciflora Description:Genomic DNA extracted from leaf sample...
Alternative accession-BioSample:SAMN09197234
Alternative accession-BioProject:PRJNA450887
...
+
SAMN0919723587676  Eucalyptus pauciflora Description:Genomic DNA extracted from leaf sample...
Alternative accession-BioSample:SAMN09197235
Alternative accession-BioProject:PRJNA450887
...
+
SAMN0919723687676  Eucalyptus pauciflora Description:Genomic DNA extracted from leaf sample...
Alternative accession-BioSample:SAMN09197236
Alternative accession-BioProject:PRJNA450887
...
+
SAMN0919723787676  Eucalyptus pauciflora Description:Genomic DNA extracted from leaf sample...
Alternative accession-BioSample:SAMN09197237
Alternative accession-BioProject:PRJNA450887
...
+
SAMN0919723887676  Eucalyptus pauciflora Description:Genomic DNA extracted from leaf sample...
Alternative accession-BioSample:SAMN09197238
Alternative accession-BioProject:PRJNA450887
...
+
SAMN0919723987676  Eucalyptus pauciflora Description:Genomic DNA extracted from leaf sample...
Alternative accession-BioSample:SAMN09197239
Alternative accession-BioProject:PRJNA450887
...
+
SAMN0919724087676  Eucalyptus pauciflora Description:Genomic DNA extracted from leaf sample...
Alternative accession-BioSample:SAMN09197240
Alternative accession-BioProject:PRJNA450887
...
+
SAMN0919724187676  Eucalyptus pauciflora Description:Genomic DNA extracted from leaf sample...
Alternative accession-BioSample:SAMN09197241
Alternative accession-BioProject:PRJNA450887
...
+
SAMN0919724287676  Eucalyptus pauciflora Description:Genomic DNA extracted from leaf sample...
Alternative accession-BioSample:SAMN09197242
Alternative accession-BioProject:PRJNA450887
...
+
Displaying 1-10 of 16 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
textTAR529.77 KB2019-11-22
textTAR1.29 MB2019-11-22
otherTAR265.62 KB2019-11-22
otherTAR217.75 KB2019-11-22
Sequence assemblyFASTA603.29 MB2019-11-22
Sequence variantsVCF1.14 MB2019-11-22
Sequence assemblyFASTA567.97 MB2019-11-22
Sequence variantsVCF1.34 MB2019-11-22
Repeat sequenceGFF75.39 MB2019-11-22
Repeat sequenceGFF66.22 MB2019-11-22
Displaying 1-10 of 36 File(s).
Funding body Awardee Award ID Comments
Australian Research Council Robert Lanfear FT140100843 Future Fellowship
Australian Research Council Benjamin Schwessinger FT180100024 Future Fellowship
Date Action
November 27, 2019 Dataset publish
December 30, 2019 Manuscript Link added : 10.1093/gigascience/giz160
October 14, 2022 Manuscript Link updated : 10.1093/gigascience/giz160