Help Login Create account

Data released on May 08, 2018

Supporting data for "Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries."

Lu, F; McKenzie, N; Kettleborough, G; Heavens, D; Clark, M, D; Bevan, M, W (2018): Supporting data for "Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries." GigaScience Database. http://dx.doi.org/10.5524/100446 RIS BibTeX Text

The accurate sequencing and assembly of very large, often polyploid, genomes remain a challenging task, limiting long-range sequence information and phased sequence variation for applications such as plant breeding. The 15 Gb hexaploid bread wheat genome has been particularly challenging to sequence, and several different approaches have recently generated long-range assemblies. Mapping and understanding the types of assembly errors is important for optimising future sequencing and assembly approaches and for comparative genomics.
Here we use a Fosill 38 Kb jumping library to assess medium and longer–range order of different publicly available wheat genome assemblies. Modifications to the Fosill protocol generated longer Illumina sequences and enabled comprehensive genome coverage. Analyses of two independent BAC-based chromosome-scale assemblies, two independent Illumina whole genome shotgun assemblies, and a hybrid Single Molecule Real Time (SMRT-PacBio) and short read (Illumina) assembly were carried out. We revealed a surprising scale and variety of discrepancies using Fosill mate-pair mapping and validated several of each class. In addition, Fosill mate-pairs were used to scaffold a whole genome Illumina assembly, leading to a three-fold increase in N50 values.
Our analyses, using an independent means to validate different wheat genome assemblies, show that whole genome shotgun assemblies based solely on Illumina sequences are significantly more accurate by all measures compared to BAC-based chromosome-scale assemblies and hybrid SMRT-Illumina approaches. Although current whole genome assemblies are reasonably accurate and useful, additional improvements will be needed to generate complete assemblies of wheat genomes using open-source, computationally efficient and cost-effective methods.

Contact Submitter

Additional information:

https://github.com/lufuhao/NGSimple

https://github.com/lufuhao/ReadCleaner4Scaffolding

Accessions (data included in GigaDB):

BioProject: PRJEB23322
ENA: OEIT01000000

Keywords:

wheat genome assembly methods fosills long-range genome assembly illumina pacbio 

Software, Genomic

http://gigadb.org/images/data/cropped/100446.png

Funding:

  • Funding body - Biological and Biotechnological Sciences Research Council (BBSRC)
  • Award ID - BB/J00328X/1
  • Comment - strategic LOLA award
  • Awardee - MWB
  • Funding body - Biological and Biotechnological Sciences Research Council (BBSRC)
  • Award ID - BB/J003743/1
  • Comment - strategic LOLA
  • Awardee - MDC
  • Funding body - European Commission
  • Comment - FP7 Triticeae Genome Project
  • Awardee - MWB
  • Funding body - Biological and Biotechnological Sciences Research Council (BBSRC)
  • Award ID - BB/P013511/1
  • Comment - Institute Strategic Programme Grant
  • Awardee - MWB
  • Funding body - Biological and Biotechnological Sciences Research Council (BBSRC)
  • Award ID - BB/J004669/1
  • Funding body - Biological and Biotechnological Sciences Research Council (BBSRC)
  • Award ID - BB/CSP17270/1
  • Comment - Core Strategic Programme Grant
  • Funding body - Biological and Biotechnological Sciences Research Council (BBSRC)
  • Award ID - BB/J010375/1
  • Comment - National Capability in Genomics

Samples: Table Settings

Columns:

Common Name
Scienfic Name
Sample Attributes
Taxonomic ID
Genbank Name

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
Chinese Spring 424565Canadian hard winter wheatbread wheatTriticum aestivum Description:A single-seed-descent line of Triticum...
Genotype:Chinese Spring 42
Alternative accession-BioProject:PRJEB23322
...
+
Displaying 1-1 of 1 Sample(s).

Files: (FTP site) Table Settings

Columns:

File Description
Sample ID
Data Type
File Format
Size
Release Date
Download Link
File Attributes

File NameSample IDData TypeFile FormatSizeRelease Date 
External linkUNKNOWN0 KB2018-05-02
External linkUNKNOWN3.6 GB2018-05-02
External linkUNKNOWN0 KB2018-05-02
External linkUNKNOWN0 KB2018-05-02
Tabular DataUNKNOWN111.56 KB2018-05-02
Tabular DataUNKNOWN84.47 KB2018-05-02
GitHub archivearchive44.98 MB2018-05-02
GitHub archivearchive38.96 KB2018-05-02
ReadmeTEXT4.09 KB2018-05-02
External linkFASTA226 MB2018-05-02
Displaying 1-10 of 13 File(s).

History:

+

Other datasets you might like: