Software and supporting data for Colib'read on Galaxy.

Dataset type: Software
Data released on January 20, 2016

With NGS technologies, life sciences face a raw data deluge. Classical analysis processes of such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to directly focus on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools. Dedicated to ”whole genome assembly-free” treatments, the Colib’read tools suite uses optimized algorithms for various analyses of NGS datasets, such as variant calling or read set comparisons. Based on the use of de Bruijn graph and bloom filter, such analyses can be performed in few hours, using small amounts of memory. Applications on real data demonstrate the good accuracy of these tools compared to classical approaches. To facilitate data analysis and tools dissemination, we developed Galaxy tools and tool shed repositories. With the Colib’read Galaxy tools suite, we give the possibility to a broad range of life scientists to analyze raw NGS data. More importantly, our approach allows to keep the maximum of biological information from data and use very low memory footprint.

Additional details

Read the peer-reviewed publication(s):

Le Bras, Y., Collin, O., Monjeaud, C., Lacroix, V., Rivals, É., Lemaitre, C., … Peterlongo, P. (2016). Colib’read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads. GigaScience, 5(1). doi:10.1186/s13742-015-0105-2

Additional information:

https://colibread.inria.fr/

https://github.com/genouest/tools-colibread

http://files.pacb.com/datasets/primary-analysis/e-coli-k12/1.3.0/e-coli-k12-mg1655-raw-reads-1.3.0.tgz

ftp://webdata:webdata@ussd-ftp.illumina.com/Data/SequencingRuns/MG1655/MiSeq_Ecoli_MG1655_110721_PF.bam

https://github.com/PacificBiosciences/DevNet/wiki/Saccharomyces-cerevisiae-W303-Assembly-Contigs

Accessions (data not in GigaDB):

ENA: ERP000546
SRA: SRR567755





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
J1a410658Soil Metagenome soil metagenome Relevant electronic resources:ftp://ftp-adn.ec-lyon.fr/Metasoil-datasets/METASOIL-J1a-10_Rothamsted_2010_July_0-21cm_Direct_MPBIO1O1.fna
J1b410658Soil Metagenome soil metagenome Relevant electronic resources:ftp://ftp-adn.ec-lyon.fr/Metasoil-datasets/METASOIL-J1b-10_Rothamsted_2010_July_0-21cm_Direct_MPBIO1O1.fna
J1rhizo410658Soil Metagenome soil metagenome Relevant electronic resources:ftp://ftp-adn.ec-lyon.fr/Metasoil-datasets/METASOIL-J1rhizo-10_Rothamsted_Rhizosphere_2010_July_0-21cm_Direct_MPBIO1O1.fna
J4410658Soil Metagenome soil metagenome Relevant electronic resources:ftp://ftp-adn.ec-lyon.fr/Metasoil-datasets/METASOIL-J4_Rothamsted_2009_July_0-10cm_Indirect_DNA_Tissue.fna
J7410658Soil Metagenome soil metagenome Relevant electronic resources:ftp://ftp-adn.ec-lyon.fr/Metasoil-datasets/METASOIL-J7_Rothamsted_2009_July_0-21cm_Direct_MoBIO.fna
liver9606HumanhumanHomo sapiens Description:Used as KisSplice exampler data downloaded from SRA, part of project ERP000546
Alternative accession-SRA File:ERR030887 ERR030895
Puerto_Rican_Forest_Soil410658Soil Metagenome soil metagenome Relevant electronic resources:http://metagenomics....
Geographic location (latitude and longitude):18.3 ...
Geographic location (country and/or sea,region):Pu...
...
+
Sargasso_Sea408172marine metagenomemarine metagenome Relevant electronic resources:http://metagenomics....
Sample contact:J.Craig Venter dlgosdata@venterins...
Alternative accession-BioProject:PRJNA13694
...
+
yeast_S288c559292  Saccharomyces cerevisiae S288c Description:yeast read dataset downloaded from SRA (accession SRA054922) was used for DiscoSNP and Mapsembler2 examples
Alternative accession-SRA Sample:SRS346774
yeast_W303-1None assignedNone assignedNone assigned Description:Used as LoRDEC example data, downloaded from PacBio websites and SRA (read accession SRR567755)
Relevant electronic resources:https://github.com/PacificBiosciences/DevNet/wiki/Saccharomyces-cerevisiae-W303-Assembly-Contigs
Alternative accession-SRA File:SRR567755
Displaying 11-20 of 20 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
Otherzip1.97 GB2016-01-20
OtherEXCEL9.01 KB2016-01-20
F1Genome sequenceFASTA418.67 MB2016-01-04
F2aGenome sequenceFASTA536.05 MB2016-01-04
F2bGenome sequenceFASTA376.53 MB2016-01-04
F3Genome sequenceFASTA356.96 MB2016-01-04
F4Genome sequenceFASTA448.54 MB2016-01-04
F5Genome sequenceFASTA303.33 MB2016-01-04
F6Genome sequenceFASTA353.33 MB2016-01-04
J1aGenome sequenceFASTA502.99 MB2016-01-04
Displaying 1-10 of 20 File(s).
Funding body Awardee Award ID Comments
Agence Nationale de la Recherche ANR-12-BS02-0008
European Research Council [247073]10 Gustavo Sacomoto
Academy of Finland 267591 Leena Salmela
Date Action
January 20, 2016 Dataset publish
February 12, 2016 Manuscript Link added : 10.1186/s13742-015-0105-2