Data released on September 22, 2017
Ginseng, which contains ginsenosides characterized as bioactive compounds, has been regarded as an important traditional medicine for several millennia. Howerver, the genetic background of ginseng remains poorly understood partly because of the plant's large and complex genome composition.
We report the entire genome sequence of Panax ginseng using next-generation sequencing. The 3.5 Gb nucleotide sequence contained more than 60% repeats and encoded 42,006 predicted genes. Twenty-two transcriptome datasets and mass spectrometry images of ginseng roots were adopted to precisely quantify the functional genes. Thirty-one genes were identified to be involved in the mevalonic acid pathway. Eight of these genes were annotated as 3-hydroxy-3-methylglutaryl-CoA reductases, which displayed diverse structures and expression characteristics. A total of 225 UDP-glycosyltransferase (UGTs) were identified, and these UGTs accounted for one of the largest gene families of ginseng. Tandem repeats contributed to the duplication and divergence of UGTs. Molecular modeling of UGTs in the 71, 74, and 94 families revealed a regiospecific conserved motif located at the N-terminus. Molecular docking predicted that this motif captured ginsenoside precursors.
The panorama of ginseng genome represents a valuable resource for understaning and improving the breeding, cultivation, and synthesis biology of this key herb.