Help Login Create account

Data released on August 16, 2017

Supporting data for "Long-read sequencing of the coffee bean transcriptome reveals the diversity of full length transcripts"

Cheng, B; Furtado, A; Henry, R (2017): Supporting data for "Long-read sequencing of the coffee bean transcriptome reveals the diversity of full length transcripts" GigaScience Database. http://dx.doi.org/10.5524/100340 RIS BibTeX Text

Polyploidization contributes to the complexity of gene expression resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis.
An isoform-level tetraploid coffee bean reference transcriptome with 95,995 distinct transcripts (average 3,236 bp) was obtained. A total of 88,715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34,719 high quality annotations. Further BLASTn to NCBI non-redundant nucleotide sequences, C. canephora coding sequences with UTR, C.arabica ESTs and Rfam resulted in 1,213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5'UTRs, facilitating the identification of upstream ORFs (uORFs). The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10kb) were poorly annotated.

Contact Submitter

Related manuscripts:

doi:10.1093/gigascience/gix086

Additional information:

https://github.com/chengbing0404/BLAST5_result_handle

Protocols.io:

+

Accessions (data included in GigaDB):

BioProject: PRJEB19262

Keywords:

coffee transcriptome full-length cDNA long sequences isoform polyploid UTR 

Transcriptomic

http://gigadb.org/images/data/cropped/100340.jpg

Funding:

  • Funding body - Australian Research Council
  • Award ID - LP130100376
  • Comment - Understanding Coffee Quality
  • Awardee - R Henry
  • Funding body - Chinese Scholarship Council
  • Location - China
  • Comment - Study abroad
  • Awardee - Bing Cheng

Samples: Table Settings

Columns:

Common Name
Scienfic Name
Sample Attributes
Taxonomic ID
Genbank Name

Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
Coffea arabica var K713443arabica coffeecoffeeCoffea arabica Description:A long read transcriptome of developin...
Infra_specific_name:variety:var K7
Analyte type:RNA
...
+
Displaying 1-1 of 1 Sample(s).

Files: (FTP site) Table Settings

Columns:

File Description
Sample ID
File Type
File Format
Size
Release Date
Download Link
File Attributes

File NameSample IDFile TypeFile FormatSizeRelease Date 
Coding SequenceFASTA2.3 MB2017-08-09
Coding SequenceFASTA34.69 KB2017-08-09
transcriptome sequenceFASTA7.35 MB2017-08-09
TextTEXT18.88 MB2017-08-09
GitHub archivearchive1.93 KB2017-08-09
transcriptome sequenceFASTA93.37 MB2017-08-09
transcriptome sequenceFASTA95.29 MB2017-08-09
TextTEXT1.62 KB2017-08-09
TextTEXT405.62 KB2017-08-09
ReadmeTEXT3.78 KB2017-08-09
Displaying 1-10 of 10 File(s).

History:

+

Other datasets you might like: