Supporting data for "Leveraging multiple transcriptome assembly methods for improved gene structure annotation"

Dataset type: Software, Transcriptomic
Data released on July 04, 2018

Venturini L; Caim S; Kaithakottil GG; Mapleson DL; Swarbreck D (2018): Supporting data for "Leveraging multiple transcriptome assembly methods for improved gene structure annotation" GigaScience Database. http://dx.doi.org/10.5524/100464

DOI10.5524/100464

The performance of RNA-Seq aligners and assemblers varies greatly across different organisms and experiments, and often the optimal approach is not known beforehand. Here we show that the accuracy of transcript reconstruction can be boosted by combining multiple methods, and we present a novel algorithm to integrate multiple RNA-Seq assemblies into a coherent transcript annotation. Our algorithm can remove redundancies and select the best transcript models according to user-specified metrics, while solving common artefacts such as erroneous transcript chimerisms. We have implemented this method in an open-source Python3 and Cython program, Mikado, available at https://github.com/lucventurini/Mikado.

Additional details

Additional information:

https://github.com/lucventurini/mikado

https://github.com/lucventurini/mikado-analysis

Accessions (data included in GigaDB):

BioProject: PRJEB22606

Accessions (data not in GigaDB):

BioProject: PRJEB4208
BioProject: PRJEB7093





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
SAMEA27250163702mouse-ear cressthale cressArabidopsis thaliana Description:Strand-specific RNA-Seq dataset for A....
Alternative names:thale cress
Alternative accession-BioSample:ERS533777
...
+
SAMEA27250173702mouse-ear cressthale cressArabidopsis thaliana Description:Strand-specific RNA-Seq dataset for A....
Alternative names:thale cress
Alternative accession-BioSample:ERS533779
...
+
SAMEA27250183702mouse-ear cressthale cressArabidopsis thaliana Description:Strand-specific RNA-Seq dataset for A....
Alternative names:thale cress
Alternative accession-BioSample:ERS533778
...
+
SAMEA27250193702mouse-ear cressthale cressArabidopsis thaliana Description:Strand-specific RNA-Seq dataset for A....
Alternative names:thale cress
Alternative accession-BioSample:ERS533776
...
+
SAM204659606HumanhumanHomo sapiens Description:RNA-seq transcript sequences from 2700...
Alternative names:human
Sample source:10x Genomics
...
+
Displaying 21-25 of 25 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
Expression dataarchive93.84 MB2018-06-26
Expression dataarchive70.03 MB2018-06-26
Expression dataarchive182.45 MB2018-06-26
Expression dataarchive111.4 MB2018-06-26
Expression dataarchive168.37 MB2018-06-26
Alignmentsarchive129.39 MB2018-06-26
Expression dataarchive608.81 MB2018-06-26
Transcriptome sequencearchive86.68 MB2018-06-26
Transcriptome sequencearchive185.12 MB2018-06-26
Alignmentsarchive53.72 MB2018-06-26
Displaying 1-10 of 18 File(s).
Funding body Awardee Award ID Comments
Biotechnology and Biological Sciences Research Council Federica Di Palma BB/CSP1720/1 Core Strategic Programme Grant
Biotechnology and Biological Sciences Research Council Neil Hall BB/CCG1720/1 Capability in Genomics and Single Cell
Biotechnology and Biological Sciences Research Council Ksenia Krasileva BB/J003743/1 Strategic LOLA Award
Date Action
July 4, 2018 Dataset publish