Supporting data for "Leveraging multiple transcriptome assembly methods for improved gene structure annotation"

Dataset type: Software, Transcriptomic
Data released on July 04, 2018

Venturini L; Caim S; Kaithakottil GG; Mapleson DL; Swarbreck D (2018): Supporting data for "Leveraging multiple transcriptome assembly methods for improved gene structure annotation" GigaScience Database. http://dx.doi.org/10.5524/100464

DOI10.5524/100464

The performance of RNA-Seq aligners and assemblers varies greatly across different organisms and experiments, and often the optimal approach is not known beforehand. Here we show that the accuracy of transcript reconstruction can be boosted by combining multiple methods, and we present a novel algorithm to integrate multiple RNA-Seq assemblies into a coherent transcript annotation. Our algorithm can remove redundancies and select the best transcript models according to user-specified metrics, while solving common artefacts such as erroneous transcript chimerisms. We have implemented this method in an open-source Python3 and Cython program, Mikado, available at https://github.com/lucventurini/Mikado.

Additional details

Additional information:

https://github.com/lucventurini/mikado

https://github.com/lucventurini/mikado-analysis

Accessions (data included in GigaDB):

BioProject: PRJEB22606

Accessions (data not in GigaDB):

BioProject: PRJEB4208
BioProject: PRJEB7093





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
SAMEA19695057227 fruit flyDrosophila melanogaster Description:RNA-seq transcript assembly evaluation...
Alternative names:fruit fly
Alternative accession-BioSample:ERS317061
...
+
SAMEA21441776239roundworm Caenorhabditis elegans Description:RNA-seq transcript assembly evaluation...
Alternative names:roundworm
Alternative accession-BioSample:ERS317066
...
+
SAMEA21455187227 fruit flyDrosophila melanogaster Description:RNA-seq transcript assembly evaluation...
Alternative names:fruit fly
Alternative accession-BioSample:ERS317058
...
+
SAMEA21497586239roundworm Caenorhabditis elegans Description:RNA-seq transcript sequences of Strata...
Alternative names:human
Alternative accession-BioSample:SAM20465
...
+
SAMEA21522177227 fruit flyDrosophila melanogaster Description:RNA-seq transcript assembly evaluation...
Alternative names:human
Alternative accession-BioSample:ERS317065
...
+
SAMEA21522809606HumanhumanHomo sapiens Description:RNA-seq transcript assembly evaluation...
Alternative names:roundworm
Alternative accession-BioSample:ERS317067
...
+
SAMEA21523277227 fruit flyDrosophila melanogaster Description:RNA-seq transcript assembly evaluation...
Alternative names:fruit fly
Alternative accession-BioSample: ERS317060
...
+
SAMEA21576509606HumanhumanHomo sapiens Description:RNA-seq transcript assembly evaluation...
Alternative names:human
Alternative accession-BioSample:ERS317057
...
+
SAMEA21595959606HumanhumanHomo sapiens Description:RNA-seq transcript assembly evaluation...
Alternative names:human
Alternative accession-BioSample:ERS317062
...
+
SAMEA21610676239roundworm Caenorhabditis elegans Description:RNA-seq transcript assembly evaluation...
Alternative names:roundworm
Alternative accession-BioSample:ERS317064
...
+
Displaying 1-10 of 25 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
Transcriptome sequencearchive93.9 MB2018-06-26
Transcriptome sequencearchive65.02 MB2018-06-26
GitHub archivearchive17.22 MB2018-06-26
GitHub archivearchive30.27 MB2018-06-26
Expression dataarchive104.1 MB2018-06-26
Expression dataarchive137.9 MB2018-06-26
Expression dataarchive73.74 MB2018-06-26
ReadmeTEXT5.42 KB2018-06-26
Displaying 11-18 of 18 File(s).
Funding body Awardee Award ID Comments
Biotechnology and Biological Sciences Research Council Federica Di Palma BB/CSP1720/1 Core Strategic Programme Grant
Biotechnology and Biological Sciences Research Council Neil Hall BB/CCG1720/1 Capability in Genomics and Single Cell
Biotechnology and Biological Sciences Research Council Ksenia Krasileva BB/J003743/1 Strategic LOLA Award
Date Action
July 4, 2018 Dataset publish