Data released on August 04, 2015

Software and supporting material for "LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads".

Behsaz, B; Birol, I; Jones, S, J; Lagman, A; Vandervalk, B, P; Warrem, R, L; Yang, C (2015): Software and supporting material for "LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads". GigaScience Database. RIS BibTeX Text

Owing to the complexity of the assembly problem, we do not yet have complete genome sequences. The difficulty in assembling reads into finished genomes is exacerbated by sequence repeats and the inability of short reads to capture sufficient genomic information to resolve those problematic regions. In this regard, established and emerging long read technologies show great promise, but their current associated higher error rates typically require computational base correction and/or additional bioinformatics pre-processing before they can be of value. We present LINKS, the Long Interval Nucleotide K-mer Scaffolder algorithm, a method that makes use of the sequence properties of nanopore sequence data and other error-containing sequence data, to scaffold high-quality genome assemblies, without the need for read alignment or base correction. Here, we show how the contiguity of an ABySS Escherichia coli K-12 genome assembly can be increased greater than five-fold by the use of beta-released Oxford Nanopore Technologies Ltd. long reads and how LINKS leverages long-range information in Saccharomyces cerevisiae W303 nanopore reads to yield assemblies whose resulting contiguity and correctness are on par with or better than that of competing applications. We also present the re-scaffolding of the colossal white spruce (Picea glauca) draft assembly (PG29, 20 Gbp) and demonstrate how LINKS scales to larger genomes.

Read the peer-reviewed publication(s):

Warren, R. L., Yang, C., Vandervalk, B. P., Behsaz, B., Lagman, A., Jones, S. J. M., & Birol, I. (2015). LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads. GigaScience, 4(1). doi:10.1186/s13742-015-0076-3

Additional information:

Software, Genomic

Samples: Table Settings


Common Name
Scienfic Name
Sample Attributes
Taxonomic ID
Genbank Name

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
A. thaliana3702mouse-ear cressthale cressArabidopsis thaliana Relevant electronic resources:
E.coli K-1283333  Escherichia coli K-12 Alternative accession-SRA Sample:ERP007108
Relevant electronic resources:
S. cerevisiae S288c559292  Saccharomyces cerevisiae S288c Alternative accession-SRA Sample:ERS096538
S. cerevisiae W303580240  Saccharomyces cerevisiae W303 Relevant electronic resources:
S. Typhi H5890370  Salmonella enterica subsp. enterica serovar Typhi Alternative accession-SRA Sample:ERS577930
Relevant electronic resources:
white spruce (PG29)3330 white sprucePicea glauca Alternative accession-SRA Sample:SRS357050
white spruce (WS77111)3330 white sprucePicea glauca Alternative accession-SRA Sample:SRS597041
Displaying 1-7 of 7 Sample(s).

Files: (FTP site) Table Settings


File Description
Sample ID
Data Type
File Format
Release Date
Download Link
File Attributes

File NameSample IDData TypeFile FormatSizeRelease Date 
TextUNKNOWN2.34 KB2015-08-17
Tabular dataCSV9.93 KB2015-08-17
TextUNKNOWN392.66 KB2015-08-17
TextUNKNOWN0.45 KB2015-08-17
Genome sequenceFASTA4.94 MB2015-08-17
TextUNKNOWN2.34 KB2015-08-17
Tabular dataCSV5.4 KB2015-08-17
TextUNKNOWN1.54 MB2015-08-17
TextUNKNOWN16.49 KB2015-08-17
Genome sequenceFASTA110.88 MB2015-08-17
Displaying 1-10 of 102 File(s).



