Software and exemplar data for Rcorrector.
Dataset type: Software, Transcriptomic
Data released on October 20, 2015
Song L; Florea L (2015): Software and exemplar data for Rcorrector. GigaScience Database. http://dx.doi.org/10.5524/100171
Next generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, due to the variation in gene expression levels and alternative splicing.
We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which employ a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read.
The software as published is available directly from here, but for the most up to date version please see the project GitHub https://github.com/mourisl/Rcorrector/ repository.
Additional details
Read the peer-reviewed publication(s):
Additional information:
Sample ID | Taxonomic ID | Common Name | Genbank Name | Scientific Name | Sample Attributes |
---|---|---|---|---|---|
Geuvadis | 9606 | Human | human | Homo sapiens | Description:a lymphoblastoid cell line sequenced as part of the GEUVADIS population variation project, used in Rcorrector assessment Alternative names:NA20508 Alternative accession-SRA File:ERR188021 |
Lung | 9606 | Human | human | Homo sapiens | Description:a lung cancer cell line (HCC827/R2) used in Rcorrector assessment Alternative names:HCC827/R2 Alternative accession-SRA File:SRR1062943 |
Peach | 3760 | peach | Prunus persica | Description:Plant RNA-seq data used in Rcorrector assessment Alternative accession-SRA File:SRR531865 | |
Simulated | 9606 | Human | human | Homo sapiens | Description:100 million x 100 bp long paired-end reads were generated with FluxSimulator starting from the human GENCODE v.17 gene annotations. Errors were subsequently introduced with Mason. |
Single-cell | 511145 | E.coli | E.coli | Description:E. coli K-12, strain MG1655, single-cell sequencing based on MDA (multiple displacement amplication) method; contains 29,124,078 100 bp reads http://bix.ucsd.edu/projects/singlecell/nbt_data.html Relevant electronic resources:http://bix.ucsd.edu/projects/singlecell/nbt_data.html |