Supporting data for "Accurate Prediction of Personalized Olfactory Perception from Large-Scale Chemoinformatic Features"

Dataset type: Software
Data released on December 20, 2017

Li H; Panwar B; Omenn GS; Guan Y (2017): Supporting data for "Accurate Prediction of Personalized Olfactory Perception from Large-Scale Chemoinformatic Features" GigaScience Database. http://dx.doi.org/10.5524/100384

DOI10.5524/100384

The olfactory stimulus-percept problem has been studied for more than a century, yet it is still hard to precisely predict the odor given the large-scale chemoinformatic features of an odorant molecule. A major challenge is that the perceived qualities vary greatly among individuals due to different genetic and cultural backgrounds. Moreover, the combinatorial interactions between multiple odorant receptors and diverse molecules significantly complicate the olfaction prediction. Many attempts have been made to establish structure-odor relationships for intensity and pleasantness, but no models are available to predict the personalized multi-odor attributes of molecules. In this study, we describe our winning algorithm for predicting individual and population perceptual responses to various odorants in DREAM Olfaction Prediction Challenge.
We find that random forest model consisting of multiple decision trees is well-suited to this prediction problem, given the large feature spaces and high variability of perceptual ratings among individuals. Integrating both population and individual perceptions into our model effectively reduces the influence of noise and outliers. By analyzing the importance of each chemical feature, we find that a small set of low- and non-degenerative features is sufficient for accurate prediction.
Our random forest model successfully predicts personalized odor attributes of structurally diverse molecules. This model together with the top discriminative features has the potential to extend our understanding of olfactory perception mechanisms and provide an alternative for rational odorant design.

Additional details

Read the peer-reviewed publication(s):

Li, H., Panwar, B., Omenn, G. S., & Guan, Y. (2017). Accurate prediction of personalized olfactory perception from large-scale chemoinformatic features. GigaScience, 7(2). doi:10.1093/gigascience/gix127

Additional information:

https://github.com/Hongyang449/olfaction_prediction_manuscript

https://www.synapse.org/#!Synapse:syn3098005





Sample IDTaxonomic IDCommon NameGenbank NameScientific NameSample Attributes
176-2  non-biological sample Description:acetic acid
Relevant electronic resources:pubchem.ncbi.nlm.nih...
Sample source:Sigma-Aldrich
...
+
177-2  non-biological sample Description:acetaldehyde
Relevant electronic resources:pubchem.ncbi.nlm.nih...
Sample source:Sigma-Aldrich
...
+
180-2  non-biological sample Description:acetone
Relevant electronic resources:pubchem.ncbi.nlm.nih...
Sample source:Sigma-Aldrich
...
+
196-2  non-biological sample Description:Adipic acid
Relevant electronic resources:pubchem.ncbi.nlm.nih...
Sample source:Sigma-Aldrich
...
+
239-2  non-biological sample Description:beta-Alanine
Relevant electronic resources:pubchem.ncbi.nlm.nih...
Sample source:Sigma-Aldrich
...
+
240-2  non-biological sample Description:benzaldehyde
Relevant electronic resources:pubchem.ncbi.nlm.nih...
Sample source:Sigma-Aldrich
...
+
241-2  non-biological sample Description:Benzene
Relevant electronic resources:pubchem.ncbi.nlm.nih...
Sample source:Sigma-Aldrich
...
+
243-2  non-biological sample Description:Benzoic acid
Relevant electronic resources:pubchem.ncbi.nlm.nih...
Sample source:Sigma-Aldrich
...
+
244-2  non-biological sample Description:benzyl alcohol
Relevant electronic resources:pubchem.ncbi.nlm.nih...
Sample source:Sigma-Aldrich
...
+
126-2  non-biological sample Description:4-Hydroxybenzaldehyde
Relevant electronic resources:pubchem.ncbi.nlm.nih...
Sample source:Sigma-Aldrich
...
+
Displaying 1-10 of 476 Sample(s).




File NameSample IDData TypeFile FormatSizeRelease Date 
Tabular DataCSV1.55 MB2017-12-01
Tabular DataCSV554.4 KB2017-12-01
TextTEXT403.77 KB2017-12-01
mixed archivearchive92.54 MB2017-12-01
mixed archivearchive7.61 MB2017-12-01
ReadmeTEXT2.97 KB2017-12-01
mixed archivearchive15.77 MB2017-12-01
Tabular DataCSV1.84 KB2017-12-01
Displaying 1-8 of 8 File(s).
Funding body Awardee Award ID Comments
National Science Foundation Y Guan 1452656
Alzheimer’s Association Y Guan BAND-15-367116
Date Action
December 20, 2017 Dataset publish
January 9, 2018 Manuscript Link added : 10.1093/gigascience/gix127