Supporting data for "Assembly of the 373K gene space of the polyploid sugarcane genome reveals reservoirs of functional diversity in the world’s leading biomass crop"

Dataset type: Genomic, Transcriptomic
Data released on October 16, 2019

Souza GM; Van Sluys MA; Lembke CG; Lee H; Rodrigues Alves Margarido G; Hotta CT; Gaiarsa JW; Diniz AL; Oliveira MM; Ferreira SS; Nishiyama-Jr MY; ten-Caten F; Ragagnin GT; Morais Andrade Pd; de Souza RF; Nicastro GG; Pandya R; Kim C; Guo H; Durham AM; Carneiro MS; Zhang J; Zhang X; Zhang Q; Ming R; Schatz MC; Davidson B; Paterson AH; Heckerman D (2019): Supporting data for "Assembly of the 373K gene space of the polyploid sugarcane genome reveals reservoirs of functional diversity in the world’s leading biomass crop" GigaScience Database. http://dx.doi.org/10.5524/100655

DOI10.5524/100655

Sugarcane cultivars are polyploid interspecific hybrids of giant genomes, typically with 10-13 sets of chromosomes from two Saccharum species. The ploidy, hybridity and size of the genome, estimated to have in excess of 10 Gb, pose a great challenge for sequencing.
Here we present a gene space assembly of SP80-3280, including 373,869 putative genes and their potential regulatory regions. The alignment of single-copy genes in diploid grasses to the putative genes, indicates that we could resolve 2-6 (up to 15) putative homo(eo)logs that are 99.1% identical within their coding sequences. Dissimilarities increase in their regulatory regions and gene promoter analysis shows differences in regulatory elements within gene families and are species-specific expressed. We exemplify these differences for sucrose synthase (SuSy) and phenylalanine ammonia-lyase (PAL), two gene families central to carbon partitioning. SP80-3280 have particular regulatory elements involved in sucrose synthesis not found in the ancestor S. spontaneum. PAL regulatory elements are found in co-expressed genes related to fiber synthesis within gene networks defined during plant growth and maturation. Comparison to sorghum reveals predominantly biallelic variations in sugarcane, consistent with the formation of two ‘subgenomes’ after their divergence ca. 3.8~4.6 MYA and reveals SNVs that may underlie their differences.
This assembly represents a large step towards a whole genome assembly of a commercial sugarcane cultivar. It includes a rich diversity of genes and homo(eo)logous resolution for a representative fraction of the gene space, relevant to improve biomass and food production.

Additional details

Read the peer-reviewed publication(s):

(PubMed: 31782791)

Github links:

https://github.com/sp80-3280-genome

Additional information:

http://sucest-fun.org/cgi-bin/cane_regnet/gbrowse2/gbrowse/microsoft_genome_moleculo_scga7/

Accessions (data generated as part of this study):

BioProject: PRJNA431722
GEO: GSE124990





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
SP803280_300413193079  Saccharum hybrid cultivar SP80-3280 Description:DNA extracted from the leaves of Sacch...
Plant body site:leaf[PO:PO:0025034]
Alternative accession-BioProject:PRJNA431722
...
+
Displaying 1-1 of 1 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
GitHub archivearchive7.38 KB2019-09-26
GitHub archivearchive3.51 MB2019-09-26
GitHub archivearchive2.79 KB2019-09-26
OtherTEXT66.33 KB2019-09-26
AlignmentsUNKNOWN513.59 KB2019-09-26
Phylogenetic treeUNKNOWN533.33 KB2019-09-26
Phylogenetic treeUNKNOWN4.54 KB2019-09-26
Tabular dataTSV12.96 MB2019-09-26
Tabular dataTSV18.58 MB2019-09-26
ReadmeTEXT6.62 KB2019-10-16
Displaying 1-10 of 36 File(s).
Funding body Awardee Award ID Comments
São Paulo Research Foundation G M Souza grant #2012/51062-3 BIOEN
São Paulo Research Foundation G M Souza grant #2008/52146-0 BIOEN
São Paulo Research Foundation G M Souza grant #2014/50921-8 BIOEN
National Science Foundation M Schatz DBI-1350041 ADVANCES IN BIO INFORMATICS
National Science Foundation A Paterson IOS/0115903 Plant Genome Research Project
National Institutes of Health M Schatz R01-HG006677 NHGRI
São Paulo Research Foundation M A Van Sluys grant #2008/52074-0 BIOEN
São Paulo Research Foundation R M Cesar Junior grant #2011/50761-2
Brazilian National Council for Scientific and Technological Development M A Van Sluys 308197/2010-0
São Paulo Research Foundation G R A Margarido grant #2015/22993-7
Brazilian National Council for Scientific and Technological Development J Weissmann Gaiarsa 159094/2014-3
São Paulo Research Foundation J Weissmann Gaiarsa grant #2015/15346-5
São Paulo Research Foundation J Weissmann Gaiarsa grant #2013/18322-4
São Paulo Research Foundation A L Diniz grant #2017/02270-6
São Paulo Research Foundation S S Ferreira grant #2013/23048-9
CAPES M M Oliveira DS-1454337
São Paulo Research Foundation S S Ferreira grant #2016/06917-1
São Paulo Research Foundation M Y Nishiyama-Jr grant #2013/07467-1
São Paulo Research Foundation F ten Caten grant #2017/02842-0
Brazilian National Council for Scientific and Technological Development A M Durham 309566/2015-0
National Institute of Health M Schatz R01-HG006677
Date Action
October 16, 2019 Dataset publish
November 14, 2019 Manuscript Link added : 10.1093/gigascience/giz129
October 14, 2022 Manuscript Link updated : 10.1093/gigascience/giz129
November 15, 2022 Data type for File RNASeqTranscriptomeAssembly.fa.gz.fai updated
November 15, 2022 Data type for File RNASeqTranscriptomeAssembly.fa.gz.gzi updated
November 15, 2022 Data type for File RNASeqTranscriptomeAssembly.fa.gz updated
November 15, 2022 Data type for File proteins_SP803280_X_Spont_Cluters_MCA_top_hit_query_coverage_90.txt updated
November 15, 2022 Data type for File proteins_SP803280_X_R570_Cluters_MCA_top_hit_query_coverage_90.txt updated
November 15, 2022 Data type for File rna-seq_gene_expression_levels.tar.gz updated