Supporting data for "Long-read sequencing of the coffee bean transcriptome reveals the diversity of full length transcripts"

Dataset type: Transcriptomic
Data released on August 16, 2017

Cheng B; Furtado A; Henry R (2017): Supporting data for "Long-read sequencing of the coffee bean transcriptome reveals the diversity of full length transcripts" GigaScience Database. http://dx.doi.org/10.5524/100340

DOI10.5524/100340

Polyploidization contributes to the complexity of gene expression resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis.
An isoform-level tetraploid coffee bean reference transcriptome with 95,995 distinct transcripts (average 3,236 bp) was obtained. A total of 88,715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34,719 high quality annotations. Further BLASTn to NCBI non-redundant nucleotide sequences, C. canephora coding sequences with UTR, C.arabica ESTs and Rfam resulted in 1,213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5'UTRs, facilitating the identification of upstream ORFs (uORFs). The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10kb) were poorly annotated.

Additional details

Read the peer-reviewed publication(s):

Cheng, B., Furtado, A., & Henry, R. J. (2017). Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts. GigaScience, 6(11). doi:10.1093/gigascience/gix086

Additional information:

https://github.com/chengbing0404/BLAST5_result_handle

Accessions (data included in GigaDB):

BioProject: PRJEB19262





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
Coffea arabica var K713443arabica coffeecoffeeCoffea arabica Description:A long read transcriptome of developin...
Infra specific name:variety:var K7
Analyte type:RNA
...
+
Displaying 1-1 of 1 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
Coding SequenceFASTA2.3 MB2017-08-09
Coding SequenceFASTA34.69 KB2017-08-09
transcriptome sequenceFASTA7.35 MB2017-08-09
TextTEXT18.88 MB2017-08-09
GitHub archivearchive1.93 KB2017-08-09
transcriptome sequenceFASTA93.37 MB2017-08-09
transcriptome sequenceFASTA95.29 MB2017-08-09
TextTEXT1.62 KB2017-08-09
TextTEXT405.62 KB2017-08-09
ReadmeTEXT3.78 KB2017-08-09
Displaying 1-10 of 10 File(s).
Funding body Awardee Award ID Comments
Australian Research Council R Henry LP130100376 Understanding Coffee Quality
Chinese Scholarship Council Bing Cheng Study abroad

Protocols.io:

Date Action
August 16, 2017 Dataset publish
August 17, 2017 File 1213_novel_genes.fa updated
August 17, 2017 File 25_BLAST_putative_genes_encoding_caffeine_biosynthesis_pathway.fa updated
August 17, 2017 File 577_long_sequences_more_than_10kb.fasta updated
August 17, 2017 File BLAST2GO_annotation.txt updated
August 17, 2017 File coffee_LRS_isoforms.fa.gz updated
August 17, 2017 File Iso-seq_Raw_reads.fa.gz updated
August 17, 2017 File KEGG_results_of_coffee-LRS_isoforms_sequences.txt updated
August 17, 2017 File KEGG_results_of_577_long_sequences_more_than_10kb.txt updated
August 17, 2017 File Long_non-coding_RNAs.fasta removed
August 17, 2017 File removed : Long_non-coding_RNAs.fasta
August 17, 2017 File BLAST2GO_results.b2g removed
August 17, 2017 File removed : BLAST2GO_results.b2g
August 17, 2017 File InterProScan_output.txt removed
August 17, 2017 File removed : InterProScan_output.txt
August 17, 2017 File BLAST_OUTPUT.gz removed
August 17, 2017 File removed : BLAST_OUTPUT.gz
August 17, 2017 File BLAST2GO_annotation.txt updated
October 17, 2017 Manuscript Link added : 10.1093/gigascience/gix086