Supporting data for "Improving draft genome contiguity with reference-derived in silico mate-pair libraries"

Dataset type: Genomic, Software
Data released on March 09, 2018

Grau JH; Hackl T; Koepfli K; Hofreiter M (2018): Supporting data for "Improving draft genome contiguity with reference-derived in silico mate-pair libraries" GigaScience Database.


Contiguous genome assemblies are a highly valued biological resource because of the higher number of completely annotated genes and genomic elements that are usable compared to fragmented draft genomes. Nonetheless, contiguity is difficult to obtain if only low coverage data and/or only distantly related reference genome assemblies are available. In order to improve genome contiguity, we have developed Cross-Species Scaffolding - a new pipeline which imports long-range distance information directly into the de novo assembly process by constructing mate-pair libraries in silico. We show how genome assembly metrics and gene prediction dramatically improve with our pipeline by assembling two primate genomes solely based on ~30x coverage of shotgun sequencing data.

Additional details

Read the peer-reviewed publication(s):

Grau, J. H., Hackl, T., Koepfli, K.-P., & Hofreiter, M. (2018). Improving draft genome contiguity with reference-derived in silico mate-pair libraries. GigaScience, 7(5). doi:10.1093/gigascience/giy029

Additional information:

Accessions (data generated as part of this study):

BioProject: PRJNA74997

Accessions (data referenced by this study):

BioSample: SAMN00857914
BioProject: PRJNA170813
GENBANK: GCF_000001405
GENBANK: GCF_000165445
GENBANK: GCA_000241425
GENBANK: GCA_001693075.2
GENBANK: GCA_001693035.2
GENBANK: GCA_001923025.1
GENBANK: GCA_001870725.1
GENBANK: AEWM00000000.1
GitHub: jstjohn/SeqPrep
GitHub: mahajrod/KrATER

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
CLIB324929629Saccharomyces cerevisiae CLIB324Saccharomyces cerevisiae CLIB324 Description:Genomic DNA isolated CLIB324 is a Viet...
Alternative names:baker's yeast
Sex:not applicable
SAMN0069038031869 aye-ayeDaubentonia madagascariensis Description:Genomic DNA isolated from male Daubent...
Alternative names:Dm6514m, Goblin, aye-aye
SAMN008579149598 chimpanzeePan troglodytes Description:Genomic DNA isolated from chimpanzee
Alternative names:Clint, chimpanzee
SAMN010906826204pig tapewormpork tapewormTaenia solium Description:Genomic DNA isolated from Taenia soliu...
Alternative names:TsUNAM cysticerci, pork tapeworm
Sex:not applicable
Displaying 1-4 of 4 Sample(s).

File NameSample IDData TypeFile FormatSizeRelease Date 
Sequence assemblyTAR1.67 GB2018-02-17
otherTAR39.69 MB2018-02-17
GitHub archivearchive6.42 KB2018-02-17
otherTAR8.26 GB2018-02-17
ReadmeTEXT1.87 KB2018-02-17
Displaying 1-5 of 5 File(s).
Funding body Awardee Award ID Comments
European Research Council M Hofreiter 310763
Date Action
March 9, 2018 Dataset publish
July 4, 2018 Manuscript Link added : 10.1093/gigascience/giy029