Software and supporting data for Colib'read on Galaxy.

Dataset type: Software
Data released on January 20, 2016

With NGS technologies, life sciences face a raw data deluge. Classical analysis processes of such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to directly focus on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools. Dedicated to ”whole genome assembly-free” treatments, the Colib’read tools suite uses optimized algorithms for various analyses of NGS datasets, such as variant calling or read set comparisons. Based on the use of de Bruijn graph and bloom filter, such analyses can be performed in few hours, using small amounts of memory. Applications on real data demonstrate the good accuracy of these tools compared to classical approaches. To facilitate data analysis and tools dissemination, we developed Galaxy tools and tool shed repositories. With the Colib’read Galaxy tools suite, we give the possibility to a broad range of life scientists to analyze raw NGS data. More importantly, our approach allows to keep the maximum of biological information from data and use very low memory footprint.

Additional details

Read the peer-reviewed publication(s):

Le Bras, Y., Collin, O., Monjeaud, C., Lacroix, V., Rivals, É., Lemaitre, C., … Peterlongo, P. (2016). Colib’read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads. GigaScience, 5(1). doi:10.1186/s13742-015-0105-2

Additional information:

Accessions (data not in GigaDB):

ENA: ERP000546
SRA: SRR567755

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
brain9606HumanhumanHomo sapiens Description:Used as KisSplice exampler data, downloaded from SRA, part of project ERP000546
Alternative accession-SRA File:ERR030882 ERR030890
E.coli562E. coli Escherichia coli Description:Used as LoRDEC example data, downloaded from PacBio and Illumina websites.
Relevant electronic resources:
F1410658Soil Metagenome soil metagenome Relevant electronic resources:
F2a410658Soil Metagenome soil metagenome Relevant electronic resources:
F2b410658Soil Metagenome soil metagenome Relevant electronic resources:
F3410658Soil Metagenome soil metagenome Relevant electronic resources:
F4410658Soil Metagenome soil metagenome Relevant electronic resources:
F5410658Soil Metagenome soil metagenome Relevant electronic resources:
F6410658Soil Metagenome soil metagenome Relevant electronic resources:
J1410658Soil Metagenome soil metagenome Relevant electronic resources:
Displaying 1-10 of 20 Sample(s).

File NameSample IDData TypeFile FormatSizeRelease Date 
Otherzip1.97 GB2016-01-20
OtherEXCEL9.01 KB2016-01-20
F1Genome sequenceFASTA418.67 MB2016-01-04
F2aGenome sequenceFASTA536.05 MB2016-01-04
F2bGenome sequenceFASTA376.53 MB2016-01-04
F3Genome sequenceFASTA356.96 MB2016-01-04
F4Genome sequenceFASTA448.54 MB2016-01-04
F5Genome sequenceFASTA303.33 MB2016-01-04
F6Genome sequenceFASTA353.33 MB2016-01-04
J1aGenome sequenceFASTA502.99 MB2016-01-04
Displaying 1-10 of 20 File(s).
Funding body Awardee Award ID Comments
Agence Nationale de la Recherche ANR-12-BS02-0008
European Research Council [247073]10 Gustavo Sacomoto
Academy of Finland 267591 Leena Salmela
Date Action
January 20, 2016 Dataset publish
February 12, 2016 Manuscript Link added : 10.1186/s13742-015-0105-2