Supporting data for "Long-read sequencing of the coffee bean transcriptome reveals the diversity of full length transcripts"
Dataset type: Transcriptomic
Data released on August 16, 2017
Polyploidization contributes to the complexity of gene expression resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis.
An isoform-level tetraploid coffee bean reference transcriptome with 95,995 distinct transcripts (average 3,236 bp) was obtained. A total of 88,715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34,719 high quality annotations. Further BLASTn to NCBI non-redundant nucleotide sequences, C. canephora coding sequences with UTR, C.arabica ESTs and Rfam resulted in 1,213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5'UTRs, facilitating the identification of upstream ORFs (uORFs). The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10kb) were poorly annotated.
Read the peer-reviewed publication(s):
Cheng, B., Furtado, A., & Henry, R. J. (2017). Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts. GigaScience, 6(11), 1–13. doi:10.1093/gigascience/gix086
Accessions (data included in GigaDB):
|Sample ID||Taxonomic ID||Common Name||Genbank Name||Scientific Name||Sample Attributes|
|Coffea arabica var K7||13443||arabica coffee||coffee||Coffea arabica|| Description:A long read transcriptome of developin...|
Infra specific name:variety:var K7
... Description:A long read transcriptome of developing coffee bean
Infra specific name:variety:var K7
Geographic location (latitude and longitude):-28.662736, 153.463001
Geographic location (country and/or sea,region):330 Federal Road, Federal NSW, Australia
Collection date:11/08/2015 10am to 2pm
Sample collection device or method:scaples and blades (cut), liquid nitrogen (freeze), dry ice (transport), -80 degree (storage)
Sample material processing:Pulverised with Tissue lyser, total RNA was extracted, cDNA library was prepared and sequences generated was processed with RS IsoSeq (version 2.3) pipeline and SOPs
Amount or size of sample collected:450 coffee cherries (900 beans)
Sample storage duration:17
Sample storage location:306 Carmody Road, St Lucia, The Univeristy of Queensland, Brisbane, Queensland, Australia
Sample storage temperature:-80°C
Plant body site:seed [PO:0009010]
Pooling details:All the RNAs were pooled together in same amount before library preparation.
Environment (biome):plantation [ENVO:00000117]
Environment (feature):area of cropland [ENVO:01000892]
Tissue:perisperm [PO:0020058], seed coat [PO:0030103], endosperm [PO:0009089], embryo [PO:0009009]
Sample source:coffee farm/Green Cauldron coffee
Sample contact:Bing Cheng, firstname.lastname@example.org
Collected by:Bing Cheng, Agnelo Furtado, Robert Henry, Poss Reading, Marta Brozynska, Adam Healey, Tiparat Tikapunya and Hayba Badro