Supporting data for "Sequencing smart: De novo sequencing and assembly approaches for a non-model mammal."

Dataset type: Genomic
Data released on April 07, 2020

Etherington GJ; Heavens D; Baker D; Lister A; McNelly R; Garcia G; Clavijo B; Macaulay I; Haerty W; Di Palma F (2020): Supporting data for "Sequencing smart: De novo sequencing and assembly approaches for a non-model mammal." GigaScience Database. http://dx.doi.org/10.5524/100731

DOI10.5524/100731

Whilst much sequencing effort has focused on key mammalian model organisms such as mouse and human, little is known about the correlation between genome sequencing techniques for non-model mammals and genome assembly quality. This is especially relevant to non-model mammals, where the samples to be sequenced are often degraded and low quality. A key aspect when planning a genome project is the choice of sequencing data to generate. This decision is driven by several factors, including the biological questions being asked, the quality of DNA available, and the availability of funds. Cutting-edge sequencing technologies now make it possible to achieve highly contiguous, chromosome-level genome assemblies, but relies on good quality high-molecular-weight DNA. The funds to generate and combine these data are often only available within large consortiums and sequencing initiatives, and are often not affordable for many independent research groups. For many researchers, value-for-money is a key factor when considering the generation of genomic sequencing data. Here we use a range of different genomic technologies generated from a roadkill European Polecat (Mustela putorius) to assess various assembly techniques on this low-quality sample. We evaluated different approaches for de novo assemblies and discuss their value in relation to biological analyses.
Generally, assemblies containing more data types achieved better scores in our ranking system. However, when accounting for misassemblies, this was not always the case for Bionano and low-coverage 10x Genomics (for scaffolding only). We also find that the extra cost associated with combining multiple data types is not necessarily associated with better genome assemblies.
The high degree of variability between each de novo assembly method (assessed from the seven key metrics) highlights the importance of carefully devising the sequencing strategy to be able to carry out the desired analysis. Adding more data to genome assemblies does not always results in better assemblies so it is important to understand the nuances of genomic data integration explained here, in order to obtain cost-effective value-for-money when sequencing genomes.

Additional details

Read the peer-reviewed publication(s):

(PubMed: 32396200)

Accessions (data generated as part of this study):

BioProject: PRJEB34131
ENA: ERZ1036476





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
Ferret#14209669black ferretdomestic ferretMustela putorius furo Description:DNA extracted from Ferret, used for Bi...
Alternative accession-BioSample:not applicable
Geographic location (country and/or sea,region):US...
...
+
VWT_6939668 European polecatMustela putorius Description:DNA extracted from a road kill specime...
Alternative accession-BioSample:SAMEA5818864
Geographic location (country and/or sea,region):Wa...
...
+
Displaying 1-2 of 2 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
OtherTSV241.52 KB2020-04-02
AnnotationGFF179.15 MB2020-04-02
OtherTSV212.78 KB2020-04-02
AnnotationGFF180.94 MB2020-04-02
OtherTSV280.76 KB2020-04-02
AnnotationGFF182.27 MB2020-04-02
AnnotationGFF70.47 MB2020-04-02
OtherTSV218.25 KB2020-04-02
AnnotationGFF166.46 MB2020-04-02
OtherTSV283.62 KB2020-04-02
Displaying 1-10 of 31 File(s).
Funding body Awardee Award ID Comments
Biotechnology and Biological Sciences Research Council R Davey BBS/E/T/000PR9817
Biotechnology and Biological Sciences Research Council N Hall BB/CCG1720/1

Protocols.io:

Date Action
April 21, 2020 File description of Figure_S1_B.mx updated
April 7, 2020 Dataset publish
April 21, 2020 File description of Tabular_data_for_fig3.csv updated
April 21, 2020 File description of Figure_S1_A.mx updated
April 21, 2020 File description of Figure_S1_D.mx updated
April 21, 2020 File description of Figure_S1_C.mx updated
April 21, 2020 File description of Figure_S1_E.mx updated
April 21, 2020 File description of Tabular_data_for_fig6.csv updated
April 21, 2020 File description of Tabular_data_for_fig5.csv updated
April 21, 2020 File description of Tabular_data_for_fig2.csv updated
April 21, 2020 File description of Tabular_data_for_fig7_figS2.csv updated
April 24, 2020 Manuscript Link added : 10.1093/gigascience/giaa045
May 5, 2020 readme_100731.txt: additional file attribute added
May 5, 2020 File readme_100731.txt updated
May 22, 2020 External Link updated : doi.org/10.17504/protocols.io.bd3ri8m6
May 22, 2020 External Link updated : https://www.protocols.io/widgets/doi?uri=dx.doi.org/10.17504/protocols.io.bd3ri8m6
October 7, 2022 Manuscript Link updated : 10.1093/gigascience/giaa045