Help Login Create account

Data released on October 20, 2015

Software and exemplar data for Rcorrector.

Florea, L; Song, L (2015): Software and exemplar data for Rcorrector. GigaScience Database. http://dx.doi.org/10.5524/100171 RIS BibTeX Text

Next generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, due to the variation in gene expression levels and alternative splicing.
We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which employ a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read.
The software as published is available directly from here, but for the most up to date version please see the project GitHub https://github.com/mourisl/Rcorrector/ repository.

Contact Submitter

Related manuscripts:

doi:10.1186/s13742-015-0089-y

Additional information:

https://github.com/mourisl/Rcorrector/

Keywords:

Next generation sequencing RNA-seq Error correction k-mers Rcorrector 

Software, Transcriptomic

http://gigadb.org/images/data/cropped/100171.jpg

Funding:

  • Funding body - National Science Foundation
  • Award ID - ABI-1159078
  • Comment - Liliana Florea
  • Funding body - National Science Foundation
  • Award ID - ABI-1356078
  • Comment - Liliana Florea

Samples: Table Settings

Columns:

Common Name
Scienfic Name
Sample Attributes
Taxonomic ID
Genbank Name

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
Geuvadis9606HumanhumanHomo sapiens Description:a lymphoblastoid cell line sequenced as part of the GEUVADIS population variation project, used in Rcorrector assessment
Alternative names:NA20508
Alternative accession-SRA_file:ERR188021
Lung9606HumanhumanHomo sapiens Description:a lung cancer cell line (HCC827/R2) used in Rcorrector assessment
Alternative names:HCC827/R2
Alternative accession-SRA_file:SRR1062943
Peach3760 peachPrunus persica Description:Plant RNA-seq data used in Rcorrector assessment
Alternative accession-SRA_file:SRR531865
Simulated9606HumanhumanHomo sapiens Description:100 million x 100 bp long paired-end reads were generated with FluxSimulator starting from the human GENCODE v.17 gene annotations. Errors were subsequently introduced with Mason.
Single-cell511145E.coli Description:E. coli K-12, strain MG1655, single-cell sequencing based on MDA (multiple displacement amplication) method; contains 29,124,078 100 bp reads http://bix.ucsd.edu/projects/singlecell/nbt_data.html
Relevant electronic resources:http://bix.ucsd.edu/projects/singlecell/nbt_data.html
Displaying 1-5 of 5 Sample(s).

Files: (FTP site) Table Settings

Columns:

File Description
Sample ID
File Type
File Format
Size
Release Date
Download Link
File Attributes

File NameSample IDFile TypeFile FormatSizeRelease Date 
Single-cellTextTEXT-0 KB2015-09-04
Softwarezip978.12 KB2015-09-04
ReadmeTEXT-0 KB2015-09-04
SimulatedTranscriptome sequenceFASTQ-0 KB2015-09-04
SimulatedTranscriptome sequenceFASTQ-0 KB2015-09-04
Displaying 1-5 of 5 File(s).

History:

+

Other datasets you might like: