Supporting data for "Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries."
Dataset type: Genomic, Software
Data released on May 08, 2018
The accurate sequencing and assembly of very large, often polyploid, genomes remain a challenging task, limiting long-range sequence information and phased sequence variation for applications such as plant breeding. The 15 Gb hexaploid bread wheat genome has been particularly challenging to sequence, and several different approaches have recently generated long-range assemblies. Mapping and understanding the types of assembly errors is important for optimising future sequencing and assembly approaches and for comparative genomics.
Here we use a Fosill 38 Kb jumping library to assess medium and longer–range order of different publicly available wheat genome assemblies. Modifications to the Fosill protocol generated longer Illumina sequences and enabled comprehensive genome coverage. Analyses of two independent BAC-based chromosome-scale assemblies, two independent Illumina whole genome shotgun assemblies, and a hybrid Single Molecule Real Time (SMRT-PacBio) and short read (Illumina) assembly were carried out. We revealed a surprising scale and variety of discrepancies using Fosill mate-pair mapping and validated several of each class. In addition, Fosill mate-pairs were used to scaffold a whole genome Illumina assembly, leading to a three-fold increase in N50 values.
Our analyses, using an independent means to validate different wheat genome assemblies, show that whole genome shotgun assemblies based solely on Illumina sequences are significantly more accurate by all measures compared to BAC-based chromosome-scale assemblies and hybrid SMRT-Illumina approaches. Although current whole genome assemblies are reasonably accurate and useful, additional improvements will be needed to generate complete assemblies of wheat genomes using open-source, computationally efficient and cost-effective methods.
Read the peer-reviewed publication(s):
Lu, F.-H., McKenzie, N., Kettleborough, G., Heavens, D., Clark, M. D., & Bevan, M. W. (2018). Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries. GigaScience, 7(5). doi:10.1093/gigascience/giy053
Accessions (data generated as part of this study):
|Sample ID||Taxonomic ID||Common Name||Genbank Name||Scientific Name||Sample Attributes|
|Chinese Spring 42||4565||Canadian hard winter wheat||bread wheat||Triticum aestivum|| Description:A single-seed-descent line of Triticum...|
Genotype:Chinese Spring 42
... Description:A single-seed-descent line of Triticum aestivum Chinese Spring (CS42) was used for high molecular weight DNA extraction . Long mate paired 40 kb fosmid jumping libraries were prepared from sized fractionated Chinese Spring 42 DNA ligated into the pFosill 4 cloning vector. Paired- end read lengths were 250 bp oriented inwards. . The fosmid clone insert size distribution was 37,500+/-5,000 bp. The Fosill libraries are useful for assessing the longer-range fidelity of wheat genome assemblies.
Genotype:Chinese Spring 42
Alternative names:bread wheat landrace