Supporting data for "An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing"

Dataset type: Genomic
Data released on December 28, 2016

Zimin AV; Stevens KA; Crepeau MW; Puiu D; Wegrzyn JL; Yorke JA; Langley CH; Neale DB; Salzberg SL (2016): Supporting data for "An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing" GigaScience Database.


The 22 gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8,206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time (SMRT) sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25,361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107,821, 61% larger than the previous assembly.

Additional details

Read the peer-reviewed publication(s):

Additional information:

Accessions (data generated as part of this study):

BioProject: PRJNA174450
SRA: SRP034079

Accessions (data referenced by this study):

ENA: GCA_000404065.2


Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
SAMN052565523352 loblolly pinePinus taeda Description:DNA extracted from pine needles of lob...
Geographic location (country and/or sea,region):US...
Displaying 1-1 of 1 Sample(s).

File NameSample IDData TypeFile FormatSizeRelease Date 
Sequence assemblyFASTA20.61 GB2016-12-15
MD5sumFASTA0.05 KB2016-12-15
otherUNKNOWN55.47 MB2016-12-15
TXTUNKNOWN24 MB2016-12-15
ReadmeTEXT1.51 KB2016-12-15
otherTEXT0.67 KB2016-12-15
Displaying 1-6 of 6 File(s).
Funding body Awardee Award ID Comments
U.S. Department of Agriculture 2011-67009-30030 D.B. Neale
National Institutes of Health R01-HG006677 S.L. Salzberg
National Institutes of Health R01-GM083873 S.L. Salzberg
Date Action
December 28, 2016 Dataset publish
January 9, 2017 Manuscript Link added : 10.1093/gigascience/giw016