Software and exemplar data for Rcorrector.

Dataset type: Software, Transcriptomic
Data released on October 20, 2015

Song L; Florea L (2015): Software and exemplar data for Rcorrector. GigaScience Database. http://dx.doi.org/10.5524/100171

DOI10.5524/100171

Next generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, due to the variation in gene expression levels and alternative splicing.
We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which employ a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read.
The software as published is available directly from here, but for the most up to date version please see the project GitHub https://github.com/mourisl/Rcorrector/ repository.

Additional details

Read the peer-reviewed publication(s):


Additional information:

https://github.com/mourisl/Rcorrector/





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
Geuvadis9606HumanhumanHomo sapiens Description:a lymphoblastoid cell line sequenced as part of the GEUVADIS population variation project, used in Rcorrector assessment
Alternative names:NA20508
Alternative accession-SRA File:ERR188021
Lung9606HumanhumanHomo sapiens Description:a lung cancer cell line (HCC827/R2) used in Rcorrector assessment
Alternative names:HCC827/R2
Alternative accession-SRA File:SRR1062943
Peach3760 peachPrunus persica Description:Plant RNA-seq data used in Rcorrector assessment
Alternative accession-SRA File:SRR531865
Simulated9606HumanhumanHomo sapiens Description:100 million x 100 bp long paired-end reads were generated with FluxSimulator starting from the human GENCODE v.17 gene annotations. Errors were subsequently introduced with Mason.
Single-cell511145E.coliE.coli Description:E. coli K-12, strain MG1655, single-cell sequencing based on MDA (multiple displacement amplication) method; contains 29,124,078 100 bp reads http://bix.ucsd.edu/projects/singlecell/nbt_data.html
Relevant electronic resources:http://bix.ucsd.edu/projects/singlecell/nbt_data.html
Displaying 1-5 of 5 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
Single-cellTextTEXT0 KB2015-09-04
Softwarezip978.12 KB2015-09-04
ReadmeTEXT2.67 KB2015-09-04
SimulatedTranscriptome sequenceFASTQ0 KB2015-09-04
SimulatedTranscriptome sequenceFASTQ0 KB2015-09-04
Displaying 1-5 of 5 File(s).
Funding body Awardee Award ID Comments
National Science Foundation ABI-1159078 Liliana Florea
National Science Foundation ABI-1356078 Liliana Florea
Date Action
October 20, 2015 Dataset publish
October 21, 2015 File readme.txt updated
October 21, 2015 File Rcorrector-master.zip updated
December 4, 2015 Manuscript Link added : 10.1186/s13742-015-0089-y
May 5, 2020 readme.txt: additional file attribute added
May 5, 2020 File readme_100171.txt updated
May 5, 2020 File simulate_pair_100M_read1.fq.gq updated
May 5, 2020 File E.coli_single_cell_lane1.txt updated
May 5, 2020 File simulate_pair_100M_read2.fq.gz updated