Abstract
Understanding the relationship between molecular structure and odor perception is essential for applications in fragrance formulation, food product development, and pharmaceutical design. Traditional approaches often rely on sensory evaluations or expert-engineered molecular features, which are labor-intensive and lack scalability. In this study, we developed a multitask learning model capable of simultaneously predicting multiple odor categories from chemical structures, aiming to capture shared representations across related odors. Using a graph neural network-based architecture (kMoL) trained on experimental data spanning 14 odor categories, the proposed model outperformed conventional single-task models and Random Forests in both accuracy and stability. Label co-occurrence analysis revealed that compounds frequently exhibit multiple odor characteristics, and a higher degree of label overlap was associated with improved performance in the multitask setting. Chemical structure visualization using UMAP and t-SNE showed no pronounced clustering by odor type, suggesting a balanced prediction difficulty across categories. To enhance interpretability, we applied Integrated Gradients to identify atom-level contributions, which aligned with known substructures relevant to olfactory receptor interactions, including hydrogen-bond donors and aromatic rings. Notably, for sweet-smelling compounds such as maltol, our model highlighted regions that correspond to interaction sites identified in receptor–ligand docking studies. These findings demonstrate that the multitask model not only delivers strong predictive performance but also captures chemically and biologically relevant features. This approach supports a mechanistic understanding of structure–odor relationships and provides a scalable, interpretable framework for rational olfactory design.
Keywords: Odor prediction, Graph neural network, Olfactory receptor, Explainable AI, Multitask learning
Graphical abstract

Highlights
-
•
Developed an interpretable multitask GNN model for odor prediction.
-
•
Achieved superior accuracy over single-task and Random Forest models.
-
•
Model identifies shared molecular features across multiple odor labels.
-
•
Integrated Gradients revealed key substructures driving odor predictions.
1. Introduction
In the fragrance and food industries, there is a continuous demand for the development of novel compounds with unique scents and flavors to accommodate diverse consumer preferences and evolving market trends (Hassoun et al., 2022; Rodrigues et al., 2024). Fragrance plays a central role in determining product appeal and perceived quality, making it one of the most influential sensory attributes affecting brand value (Girona-Ruíz et al., 2021). In this context, a thorough understanding of the relationship between odor and molecular structure is essential for the efficient and rational design of compounds with targeted olfactory properties. Previous studies have reported correlations between specific functional groups or substructures and particular odor characteristics (Dunkel et al., 2014; Genva et al., 2019; Rossiter, 1996), and several molecular design strategies for novel fragrance compounds have been developed based on these findings. Advances in computational molecular design are expected to accelerate the development of new fragrances and broaden the scope of olfactory innovation.
Beyond the fields of perfumery and food science, odor also plays a significant role in pharmaceutical formulation, particularly in influencing product quality and patient acceptability. Some pharmaceutical compounds emit unpleasant odors, which can lead to medication aversion and reduced compliance (Saada et al., 2024; Schiffman, 2018). This concern is particularly relevant in pediatric and geriatric populations, where heightened sensitivity to odor can serve as a barrier to medication compliance (Ranmal et al., 2025). A predictive understanding of structure-odor relationships could facilitate the early identification of undesirable scent profiles and support the design of less odorous drug candidates (Ranmal et al., 2025; Schiffman, 2018).
Accordingly, elucidating the relationship between odor and chemical structure holds significant value not only for enhancing fragrance and food product development but also for improving patient adherence in clinical settings. Traditionally, molecular odor identification has relied on human sensory evaluation. However, large-scale sensory testing is time-consuming and labor-intensive, necessitating more efficient alternatives. In recent years, methods such as electronic noses and manually engineered feature extraction techniques have been explored to predict olfactory characteristics (Wu et al., 2019). These approaches, however, rely heavily on expert knowledge and hand-crafted descriptors, which limits their scalability and real-time applicability.
To address these limitations, computational models that quantitatively capture structure-odor relationships have received increasing attention (Bo et al., 2022). While many previous studies have focused on odor prediction, few have investigated the underlying chemical interactions with olfactory receptors. For example, a Graphormer model integrating Bidirectional Encoder Representations from Transformers and graph-based methods was recently proposed for predicting odor from molecular structure (Ranjan et al., 2024). Similarly, models combining graph neural networks (GNNs) with contrastive self-supervised learning have demonstrated high accuracy in predicting odor descriptors and show potential for applications in food, perfumery, and environmental science (Jain et al., 2024). Traditional machine learning approaches have also been applied. For instance, a Random Forest classifier adapted for multi-label classification using molecular fingerprints was applied to predict odor perception, demonstrating the utility of ensemble-based models in quantitative structure–odor relationship studies (Saini and Ramanathan, 2022). An additional study developed an XGBoost-based model to predict seven primary odors from a dataset of 1278 odorant molecules, in which a variant incorporating principal component analysis (XGBoost-PCA) outperformed conventional approaches (Tyagi et al., 2024).
Some studies have also addressed the relationship between molecular structures and receptor interactions. For instance, a Transformer-based model trained on a dataset of 100,000 molecules revealed associations between specific atomic substructures and odor qualities (Zheng et al., 2022). Another study using convolutional neural networks (CNNs) and graph convolutional networks (GCNs) demonstrated accurate predictions of aroma–molecule and molecule–receptor interactions, based on a dataset comprising 5955 molecules, 160 scent types, and 106 olfactory receptors (Achebouche et al., 2022). Odor perception is believed to arise from the binding of odorant molecules to olfactory receptors, which subsequently triggers neural signaling (Choi et al., 2023; Menini et al., 2004). Similar receptor-level interpretability approaches have also been demonstrated in taste research. For example, GNN-based models combined with Integrated Gradients (IG) and molecular docking successfully identified molecular features associated with sweetness and bitterness by linking ligands to their corresponding receptors (Iwata, 2025). These strategies highlight the potential of extending receptor-level analysis methods to olfaction, thereby improving the interpretability of odor prediction models. Therefore, analyzing odorant-receptor interactions is essential for developing accurate odor prediction models. Since a single odorant molecule can bind to multiple receptors and elicit different odor perceptions, modeling this complexity requires approaches that extend beyond single-molecule or single-receptor analyses.
In this study, we constructed a multitask learning model capable of simultaneously predicting multiple odor attributes using experimental data from 14 odorant categories. This approach not only enables concurrent prediction across several odor types but also facilitates knowledge transfer among related odor classes, thereby effectively augmenting the training data for each individual label. The model architecture was implemented using kMoL (Cozac et al., 2025), a GNN-based framework that learns directly from molecular structures. Compared to single-task models and Random Forests (Breiman, 2001), our multitask model demonstrated consistently high and stable predictive performance across diverse odor categories. To interpret the model's predictions, we employed IG (Sundararajan et al., 2017), enabling atomic-level visualization of the structural contributions to the output. This interpretability analysis revealed features that align with known relationships between specific odors and corresponding functional groups or molecular substructures.
2. Materials and methods
2.1. Dataset
To construct the AI model, we used the odor dataset compiled by Kou et al. (2023) (Table 1) (Kou et al., 2023). The authors curated this dataset to provide comprehensive information on known flavor-related molecules by collecting data from academic databases, including Scopus, PubMed, Web of Science, and Google Scholar. The dataset contains molecular names, Chemical Abstract Service (CAS) registry numbers, SMILES representations of molecular structures, and descriptive information related to flavor properties. Data from 25 flavor-related databases were ultimately integrated.
Table 1.
Summary of representative odorant compounds used in this study across 14 odor categories.
| Odor category | Number of odor compounds | Number of off-odor compounds |
|---|---|---|
| Fruity | 1385 | 775 |
| Sweet | 1185 | 775 |
| Green | 1149 | 775 |
| Floral | 1005 | 775 |
| Woody | 670 | 775 |
| Herbal | 590 | 775 |
| Waxy | 433 | 775 |
| Fresh | 430 | 775 |
| Musty | 411 | 775 |
| Yearthy | 400 | 775 |
| Citrus | 377 | 775 |
| Spices | 370 | 775 |
| Rose | 341 | 775 |
| Nutty | 320 | 775 |
To improve reusability and reliability, Kou et al. removed redundant entries and excluded molecules with vague descriptors (e.g., “sweet-like” or “not sweet”). After curation, the final dataset included 8982 molecules with known taste attributes and 5046 with defined odor attributes. The full dataset is publicly accessible through the accompanying GitHub repository: https://github.com/ecological-systems-design/flavor-chemical-design.
In this study, we selected 14 odor categories, each comprising more than 300 compounds (Table 1). For each category, the model was trained to classify whether a compound exhibits the specific odor (positive class) or not (off-flavor/negative class). The dataset was randomly split into training, validation, and test subsets in a 70:20:10 ratio across the entire set of compounds, without stratification by odor category. As each odor category was modeled independently in a binary classification framework (positive vs. negative), and each category contained a sufficient number of both positive and negative samples (≥300 compounds), stratified sampling was not applied. This approach ensured that each subset retained a balanced and representative distribution of compounds for individual category modeling.
2.2. Construction of the proposed method
We developed a GNN-based multi-label classification model capable of simultaneously predicting the presence or absence of 14 distinct odor categories from molecular structure (Fig. 1). The model architecture integrates a GNN module, which processes molecular graph inputs, and a multilayer perceptron (MLP) module, which leverages molecular fingerprint features. This hybrid design combines topological information from molecular graphs with expert-defined chemical descriptors, enabling more comprehensive feature representation. The effectiveness of this architecture was also demonstrated in our previous work on related taste prediction tasks, where GNN-based models accurately predicted sweetness and bitterness from molecular structures with high accuracy and interpretability (Iwata, 2025). As a baseline, we implemented a Random Forest classifier using molecular fingerprints, which is a widely used and interpretable model in cheminformatics, allowing comparison with conventional approaches.
Fig. 1.
Overview of the proposed model architecture for odor prediction.
The model takes a molecular structure as input and processes it through two parallel pathways: one using a graph neural network to learn directly from the molecular graph, and the other computing ECFP fingerprints that are input into a multilayer perceptron (MLP). The outputs from both pathways are concatenated and further processed by downstream MLP layers to perform multitask classification across multiple odor categories.
The GNN component was implemented using kMoL ver.1.1.9, an open-source cheminformatics library (https://github.com/elix-tech/kmol) (Cozac et al., 2025). Molecular fingerprints for the MLP input were generated using RDKit, an open-source cheminformatics toolkit. A sigmoid activation function was applied to the GNN output to generate probability scores between 0 and 1 for each odor label.
To optimize the model's hyperparameters, we used Optuna, a Bayesian optimization framework (Akiba et al., 2019), and conducted 100 optimization trials. Tuned parameters included hidden layer size, dropout rate, number of GCN layers, learning rate, and weight decay. The optimal values were as follows: hidden features of 128, dropout rate of 0.2, two GCN layers, a learning rate of 0.00359, and a weight decay coefficient of 0.0033. The optimization objective was defined as the mean area under the receiver operating characteristic curve (ROC-AUC) score from 5-fold cross-validation, and Optuna's Tree-structured Parzen Estimator algorithm was used to efficiently explore the parameter space.
For reproducibility, we provide additional details of the model architecture and training procedure. The GNN module consisted of two graph convolutional layers with 128 hidden units each, followed by a Rectified Linear Unit (ReLU) activation and a dropout layer (dropout rate = 0.2). Node features included atom type, degree, and formal charge, whereas edge features represented bond type. Molecular-level features, represented as 17-bit RDKit descriptors, were processed through a shallow network comprising a linear layer, a dropout layer, batch normalization, and a ReLU activation. The outputs from the GNN module and the processed molecular features were concatenated and passed through a final output block consisting of two linear layers, a ReLU activation, and another dropout layer. Binary cross-entropy loss was used for multi-label classification, and the model was optimized using the Adam optimizer with a learning rate of 0.00359 and weight decay of 0.0033. Models were trained for 100 epochs with a batch size of 32, and all experiments were conducted using a fixed random seed to ensure reproducibility.
2.3. Previous machine learning models
To evaluate the performance of the proposed GNN-based multitask model, we constructed two baseline machine learning models that performed independent predictions for each odor category: a Random Forest model (Breiman, 2001), an MLP multitask model, and a GNN-based single-task model. For the Random Forest model, molecular descriptors were represented using Extended-Connectivity Fingerprints (ECFP), computed via RDKit (Rogers and Hahn, 2010). Hyperparameters for all models were optimized using Optuna, with 100 optimization trials conducted for each task. For the Random Forest model, four hyperparameters were tuned: n_estimators (10–200), max_depth (2–32), min_samples_split (2–16), and min_samples_leaf (1–16). The best hyperparameter values for each odor category are summarized in Supplementary Table S1.
The MLP multitask model consisted of an MLP module for learning ECFP features, designed to predict multiple odor categories simultaneously. Unlike the GNN multitask model, it does not incorporate graph-based features and relies solely on fingerprint representations for prediction. To optimize the model's hyperparameters, we conducted 100 optimization trials. Tuned parameters included hidden features (128–1024, step 128), dropout rate (0.0–0.4, step 0.1), learning rate (0.00001–0.01, step 0.00001), and weight decay (0.0001–0.005, step 0.0001). The optimal values were hidden features of 1024, a dropout rate of 0.0, a learning rate of 0.00052, and a weight decay coefficient of 0.001.
The GNN single-task model employed the same architecture as the multitask model, consisting of a GNN module for learning molecular graph structures and an MLP module for processing molecular fingerprints. However, in contrast to the multitask learning approach, the single-task model was trained independently for each odor label. Hyperparameters for both models were optimized using Optuna, with 100 optimization trials conducted for each task. For the GNN single-task models, five hyperparameters were tuned: hidden features (128–1024, step 128), dropout rate (0.0–0.4, step 0.1), number of GCN layers (2–4, step 1), learning rate (0.00001–0.01, step 0.00001), and weight decay (0.0001–0.005, step 0.0001). The best hyperparameter values for each odor category are summarized in Supplementary Table S2.
2.4. Performance evaluation of models
To evaluate model classification performance, we computed the number of true positives (TP), false negatives (FN), false positives (FP), and true negatives (TN). Youden's Index (Youden, 1950) was then applied to determine the optimal decision threshold for each task, balancing sensitivity and specificity. Using this threshold, we calculated precision and recall, followed by the F1-score. Youden's Index serves as a robust criterion for threshold selection in binary classification, as it accounts for sensitivity and specificity.
Model performance was evaluated using precision, recall, and F1-score based on the following standard formulas (Fawcett, 2006):
| (1) |
| (2) |
| (3) |
2.5. Model interpretability using Integrated Gradients
To interpret the model's predictions, we applied the IG method (Sundararajan et al., 2017). For each compound, a baseline input of an all-zero feature vector was defined, and attribution scores were computed by integrating the gradients along a linear interpolation path from the baseline to the actual molecular representation. The resulting attributions were mapped onto atomic features, enabling visualization of atom-level contributions to odor predictions. This approach was chosen because IG satisfies axioms such as sensitivity and implementation invariance, providing a theoretically principled framework for directly attributing model outputs to input features in graph neural networks, which is critical for understanding the molecular determinants of odor perception.
3. Results and discussion
3.1. Comparative evaluation of odor prediction models
To evaluate the performance of the proposed multitask learning model, we conducted a comparative analysis against conventional methods using 5-fold cross-validation. The baseline models included a Random Forest model and an MLP multitask model trained on ECFP features, as well as a single-task GNN model that learns molecular structures represented as graphs.
For each of the 14 odor categories, we calculated the ROC-AUC to quantify classification performance (Supplementary Table S3 and Fig. 2A). All models demonstrated strong discriminative power, with most ROC-AUC values exceeding 0.9. Notably, the multitask model, which learns across multiple odor categories simultaneously, achieved the highest ROC-AUC scores in 9 of the 14 categories, indicating superior generalizability and efficient feature learning (Supplementary Fig. S1A).
Fig. 2.
Bar plot comparison of prediction performance across 14 odor categories using different machine learning models.
The figure shows the ROC-AUC and F1-score for each odor category, predicted using four models: Random Forest (green), MLP multitask (light blue), GNN single-task (blue), and GNN multitask (red). Performance metrics are based on 5-fold cross-validation, with bars representing mean values and error bars indicating standard error.
We further determined optimal decision thresholds for each odor category using Youden's Index, derived from the ROC curves. Based on these thresholds, we calculated precision and recall values, and subsequently computed the F1-score (Supplementary Table S3 and Fig. 2B). The multitask model again outperformed the baseline models, achieving the highest F1-scores in 13 of the 14 odor categories. These findings underscore the effectiveness of multitask learning in predicting odor-related molecular properties compared to conventional single-task approaches (Supplementary Fig. S1B).
3.2. Comparative analysis of single-task and multitask learning models
We first investigated the extent to which odorant molecules were associated with multiple odor labels (Supplementary Table S4). The analysis revealed that many compounds were annotated with more than one odor label, indicating that a single molecule often exhibits multiple odor characteristics. This observation highlights the potential advantage of multitask learning, which can leverage shared information across related odor categories through synergistic learning effects.
To quantitatively evaluate the performance gains achieved through multitask learning, we plotted the proportion of molecules shared with other odor labels against the difference in F1-scores between the multitask and single-task models for each odor category (Fig. 3). The results showed a clear trend: as the proportion of shared compounds increased, the performance advantage of the multitask model also tended to increase. Specifically, for odor categories such as Fresh, Musty, Yearthy, and Rose, where over 80 % of associated compounds overlapped with other labels, the multitask model achieved F1-score improvements exceeding 0.02. Paired t-tests showed that this improvement was statistically significant for Fresh (p = 0.03), whereas other categories (Musty, p = 0.07; Yearthy, p = 0.15; Rose, p = 0.17) exhibited consistent but not statistically significant trends. Nevertheless, an increase of approximately 0.02 in F1-score can be considered practically meaningful, as it translates into a substantial reduction of misclassifications in large-scale prediction tasks. These results suggest that shared annotations across odor categories contribute meaningfully to the enhanced performance of the multitask learning model.
Fig. 3.
Performance improvement of the multitask model over the single-task model across varying degrees of compound overlap among odor labels.
(A) Differences in ROC-AUC score (Multitask – Single-task) and (B) differences in F1-score (Multitask – Single-task) are plotted on the Y-axis. The X-axis represents the compound share ratio, defined as the proportion of chemical compounds shared among multiple odor categories.
To determine whether these multitask learning benefits stem solely from label overlap or are also influenced by molecular structure similarities, we visualized the distribution of odorant molecules in chemical feature space. For this, we computed 2048-dimensional ECFP vectors for each molecule and projected them into two-dimensional space using Uniform Manifold Approximation and Projection (UMAP) (McInnes et al., 2018) and t-Distributed Stochastic Neighbor Embedding (t-SNE) (Maaten and Hinton, 2008) (Fig. 4; Supplementary Figs. S2 and S3). The visualizations revealed that molecules were broadly and evenly distributed across the chemical structure space, without extreme clustering specific to any odor category. This distribution reflects the multi-label nature of the dataset, where individual compounds are frequently associated with multiple odors.
Fig. 4.
Visualization of odor-related molecules in chemical structure space using UMAP and t-SNE.
Molecules were first encoded into 2048-dimensional ECFP descriptors and subsequently projected into two-dimensional space using UMAP (A) and t-SNE (B). The panels display all odor-related and off-flavor molecules plotted, with each of the 15 odor categories (14 primary odors and off-flavor) represented by a distinct color. UMAP1 and UMAP2 denote the first and second dimensions of the UMAP-reduced feature space, respectively, whereas t-SNE1 and t-SNE2 denote the first and second dimensions of the t-SNE-reduced feature space, respectively.
Importantly, no single odor category exhibited either abnormally high or low predictive performance, and the multitask model maintained consistently strong accuracy across all categories. These findings support the robustness and generalizability of the multitask learning framework for odor prediction.
3.3. Interpretability of odorant prediction models via visualization techniques
To investigate the interpretability of the odor prediction AI model, we conducted visualization analyses using IG. This method identifies specific substructures within a molecule that contribute most significantly to the model's predictions. The resulting visualizations consistently aligned with established structure–odor relationships reported in the literature, supporting the validity and mechanistic relevance of the model's predictive behavior.
In the following sections, we discuss the structural features and their sensory significance for four representative odor categories, Fruity, Floral, Woody, and Rose, based on the visualization results and comparison with published data (Table 2).
Table 2.
Visualization results of representative odorant compounds highlighting key substructures associated with odor prediction.
| Compound name | Chemical structure | Odor category | Prediction score | Sub-structure | Visualizationa | Reference |
|---|---|---|---|---|---|---|
| Ethyl 2-methylbutanoate | ![]() |
Fruity | 0.9335 | Esters | ![]() |
Holt et al. (2019) |
| Isoamyl acetate | ![]() |
Fruity | 0.9330 | Esters | ![]() |
Holt et al. (2019) |
| Benzyl acetate | ![]() |
Floral | 0.9298 | Esters | ![]() |
(Azuma et al., 2005; Zhang et al., 2019) |
| Phenethyl acetate | ![]() |
Floral | 0.9296 | Esters | ![]() |
Zhang et al. (2019) |
| Guaiacol | ![]() |
Woody | 0.7414 | Methoxyphenol | ![]() |
Wang and Chambers (2018) |
| Syringol | ![]() |
Woody | 0.7970 | Methoxyphenol | ![]() |
Wang and Chambers (2018) |
| 2-Phenylethanol | ![]() |
Rose | 0.8484 | Phenylethanol | ![]() |
Etschmann et al. (2002) |
| Phenethyl acetate | ![]() |
Rose | 0.8547 | Phenylethanol | ![]() |
Etschmann et al. (2002) |
| Phenethyl isobutyrate | ![]() |
Rose | 0.9030 | Phenylethanol | ![]() |
(Etschmann et al., 2002; McGinty et al., 2012) |
| Phenethyl propionate | ![]() |
Rose | 0.9257 | Phenylethanol | ![]() |
(Etschmann et al., 2002; McGinty et al., 2012) |
The circled regions indicate the functional groups responsible for the corresponding odor characteristics.
3.3.1. Fruity
Fruity aromas are important sensory attributes in alcoholic beverages such as beer and wine and are primarily attributed to esters. Holt et al. (2019) demonstrated that esters, formed by the condensation of short-chain fatty acids and alcohols during yeast fermentation, are responsible for characteristic fruity notes such as banana (isoamyl acetate), apple (ethyl hexanoate), and pineapple (ethyl butyrate) (Holt et al., 2019).
In this study, the model yielded prediction scores exceeding 0.93 for ethyl 2-methylbutanoate and isoamyl acetate, both representative fruity odorants. IG visualizations revealed strong contributions across the entire molecular structure, with particular emphasis on the ester linkage (–COO–), suggesting that this functional group plays a critical role in the model's recognition of fruity odors.
3.3.2. Floral
Esters are also central contributors to floral aromas. Zhang et al. (2019) analyzed volatile profiles of Prunus mume (Japanese apricot) cultivars with varying petal colors and identified aromatic esters such as methyl benzoate and methyl salicylate as key components of the floral scent. These compounds are associated with sweet, fragrant odors, and their levels correlated with both scent intensity and flower color (Zhang et al., 2019).
In this study, benzyl acetate and phenethyl acetate received prediction scores greater than 0.92. IG visualizations highlighted the ester bond in both compounds, indicating that the model correctly identifies ester structures as important features in floral odor prediction.
3.3.3. Woody
Woody aromas are closely associated with phenolic and methoxyphenol compounds. Wang and Chambers (2018) systematically evaluated phenolic compounds contributing to smoky and woody aromas in food, identifying guaiacol (2-methoxyphenol) and 4-methylguaiacol (2-methoxy-4-methylphenol) as key odorants linked to woody, smoky, and burnt characteristics (Wang and Chambers, 2018).
In our model, guaiacol and syringol were predicted with scores of 0.74 and 0.79, respectively. IG visualizations distinctly highlighted the shared methoxyphenol moiety in both compounds, indicating that the model focuses on this structure when recognizing woody odors.
3.3.4. Rose
Phenethyl propionate is an aromatic ester widely used in cosmetic and food products for its sweet, floral, rose-like scent. McGinty et al. (2012) reported that this compound produces a “floral, rose-like” aroma and is found in trace amounts in natural rose oil (McGinty et al., 2012).
In this study, four rose-scented compounds, 2-phenylethanol, phenethyl acetate, phenethyl isobutyrate, and phenethyl propionate, achieved high prediction scores (>0.8). IG visualizations consistently highlighted the phenylethanol moiety across all compounds, suggesting that this structural unit is a critical feature for accurate prediction of rose-like aromas.
3.4. Interpretation of feature visualization results based on Receptor–Ligand interaction mechanisms
In this study, we performed feature attribution visualizations using IG for nine representative odorant molecules classified under the sweet odor category, specifically those exhibiting caramel-like olfactory characteristics. The primary objective was to determine whether the structural features emphasized by the model align with key interaction sites between ligands and the human olfactory receptor OR2J2, as reported in a recent study by Zeng et al. (2023) (Zeng et al., 2023). In that study, molecular docking and molecular dynamics (MD) simulations revealed that compounds such as maltol and methylglyoxal form hydrogen bonds with the Tyr260 residue of OR2J2. Additionally, π–π stacking interactions involving aromatic residues (e.g., tyrosine and phenylalanine) were suggested to contribute to ligand binding stability.
When we applied IG to the AI model developed in the present study, we observed that the regions with high attribution scores corresponded well to those previously identified as interaction sites in the literature (Table 3). For maltol, the model assigned high attribution scores to the aromatic ring structure bearing a hydroxyl group at the 4-position and a methyl group, consistent with the interaction regions involved in hydrogen bonding and π–π stacking with Tyr260, as reported in the prior study. In the case of methylglyoxal, the carbonyl group was distinctly highlighted, aligning with its proposed role as a hydrogen bond donor or acceptor in the receptor–ligand complex. Similarly, other caramel-like odorants such as furaneol and sotolon also exhibited high attribution scores around polar functional groups and aromatic rings, structural features likely involved in molecular interactions with OR2J2.
Table 3.
Comparison of model-derived visualization results with known ligand–receptor interaction sites in odorant compounds.
| Compound name | Chemical structure | Prediction score | Visualizationa |
|---|---|---|---|
| Maltol | ![]() |
0.9020 | ![]() |
| Ethylvanillin | ![]() |
0.8570 | ![]() |
| 2,3-Hexanedione | ![]() |
0.8627 | ![]() |
| Homofuraneol | ![]() |
0.9598 | ![]() |
| Sotolone | ![]() |
0.9206 | ![]() |
| Abhexone | ![]() |
0.9483 | ![]() |
| 3,4-Hexanedione | ![]() |
0.9033 | ![]() |
| Furaneol | ![]() |
0.9372 | ![]() |
| Butanedione | ![]() |
0.4346 | ![]() |
The arrow-marked substructures indicate regions identified in previous studies as structurally important for ligand–receptor interactions.
These findings suggest that the AI model is not merely capturing statistical correlations but is instead recognizing structurally and biologically meaningful features, specifically, those that reflect physical interactions with olfactory receptors. Thus, the visualization analysis conducted in this study supports the utility of explainable AI approaches in elucidating structure–odor relationships in flavor and fragrance chemistry.
These findings highlight the potential of the proposed model to uncover meaningful structure–odor relationships. Nevertheless, some limitations should be acknowledged. First, the model was trained and evaluated on a curated dataset covering 14 common odor categories, which may limit its generalizability to less common or newly characterized odors. Second, although the IG-based interpretations suggest alignment with known structure–odor relationships, the predicted molecular interactions and model-derived features have not yet been experimentally validated. Further biochemical or receptor-binding assays will be required to confirm the biological relevance of these computational insights. Finally, the IG method itself has inherent limitations: as a gradient-based attribution technique, it assumes local linearity of the model around the baseline input and may therefore overlook complex nonlinear dependencies in odor perception. Although atom-level contributions can be visualized, higher-order interactions between functional groups or contextual effects across multiple substructures cannot be fully represented. Nevertheless, IG provides valuable insights into key structural features that drive the model's predictions, complementing other analytical approaches.
4. Conclusion
In this study, we conducted a comprehensive analysis of the relationship between molecular structure and odor properties by comparing single-task and multitask learning models for odorant prediction. Our results demonstrated that many odorant molecules exhibit multiple odor attributes and that the predictive accuracy of the multitask learning model improves with increasing degrees of attribute overlap. Furthermore, distributional analyses of the chemical structure space using UMAP and t-SNE showed that odorant molecules are structurally diverse and widely dispersed without forming distinct clusters, suggesting the absence of inherently unpredictable odor categories.
From the perspective of model interpretability, feature attribution analysis using IG revealed that the structural elements highlighted for each odor category were highly consistent with functional groups and substructures reported in the literature. Notably, for the sweet category, the model assigned high attribution scores to molecular regions that correspond to interaction sites identified in receptor–ligand docking studies involving the olfactory receptor OR2J2. This finding suggests that the model is capable of learning biologically meaningful structural features beyond statistical correlations.
Overall, these results underscore the utility of multitask learning for odor prediction and highlight the value of explainable AI approaches in advancing mechanistic understanding of structure–odor relationships. Future research integrating a broader range of odor categories and experimental data on receptor–ligand interactions may further improve the accuracy and reliability of odor prediction models.
Declaration of generative AI and AI-assisted technologies in the writing process
During the preparation of this work the author used ChatGPT (OpenAI) in order to improve the readability of the manuscript and to translate text from Japanese to English. After using this tool, the author reviewed and edited the content as needed and takes full responsibility for the content of the published article.
Funding
This work was supported by a Grant-in-Aid for Transformative Research Areas (A) “Latent Chemical Space” [grant number JP24H01771] from the Ministry of Education, Culture, Sports, Science and Technology, Japan, and by the Mishima Kaiun Memorial Foundation.
Declaration of competing interest
None Declared.
Handling Editor: Dr. Maria Corradini
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.crfs.2025.101219.
Appendix A. Supplementary data
The following is the Supplementary data to this article.
Data availability
Data is provided at the online public link https://github.com/ecological-systems-design/flavor-chemical-design.
References
- Achebouche R., Tromelin A., Audouze K., Taboureau O. Application of artificial intelligence to decode the relationships between smell, olfactory receptors and small molecules. Sci. Rep. 2022;12(1) doi: 10.1038/s41598-022-23176-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Akiba T., Sano S., Yanase T., Ohta T., Koyama M. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019. Optuna: a next-generation hyperparameter optimization framework; pp. 2623–2631. [Google Scholar]
- Azuma H., Toyota M., Asakawa Y. Floral scent chemistry and stamen movement of Chimonanthus praecox (L.) link (Calycanthaceae) Acta Phytotaxonomica Geobot. 2005;56(2):197–201. [Google Scholar]
- Bo W., Yu Y., He R., Qin D., Zheng X., Wang Y., Ding B., Liang G. Insight into the structure-odor relationship of molecules: a computational study based on deep learning. Foods. 2022;11(14) doi: 10.3390/foods11142033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Breiman L. Random forests. Mach. Learn. 2001;45:5–32. doi: 10.1023/A:101093340432. [DOI] [Google Scholar]
- Choi C., Bae J., Kim S., Lee S., Kang H., Kim J., Bang I., Kim K., Huh W.K., Seok C., Park H., Im W., Choi H.J. Understanding the molecular mechanisms of odorant binding and activation of the human OR52 family. Nat. Commun. 2023;14(1):8105. doi: 10.1038/s41467-023-43983-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cozac R., Hasic H., Choong J.J., Richard V., Beheshti L., Froehlich C., Koyama T., Matsumoto S., Kojima R., Iwata H., Hasegawa A., Otsuka T., Okuno Y. kMoL: an open-source machine and federated learning library for drug discovery. J. Cheminf. 2025;17(1):22. doi: 10.1186/s13321-025-00967-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dunkel A., Steinhaus M., Kotthoff M., Nowak B., Krautwurst D., Schieberle P., Hofmann T. Nature's chemical signatures in human olfaction: a foodborne perspective for future biotechnology. Angew. Chem. Int. Ed. 2014;53(28):7124–7143. doi: 10.1002/anie.201309508. [DOI] [PubMed] [Google Scholar]
- Etschmann M.M., Bluemke W., Sell D., Schrader J. Biotechnological production of 2-phenylethanol. Appl. Microbiol. Biotechnol. 2002;59(1):1–8. doi: 10.1007/s00253-002-0992-x. [DOI] [PubMed] [Google Scholar]
- Fawcett T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006;27(8):861–874. doi: 10.1016/j.patrec.2005.10.010. [DOI] [Google Scholar]
- Genva M., Kenne Kemene T., Deleu M., Lins L., Fauconnier M.L. Is it possible to predict the odor of a molecule on the basis of its structure? Int. J. Mol. Sci. 2019;20(12) doi: 10.3390/ijms20123018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Girona-Ruíz D., Cano-Lamadrid M., Carbonell-Barrachina Á.A., López-Lluch D., Esther S. Aromachology related to foods, scientific lines of evidence: a review. Appl. Sci. 2021;11(13) doi: 10.3390/app11136095. [DOI] [Google Scholar]
- Hassoun A., Cropotova J., Trif M., Rusu A.V., Bobis O., Nayik G.A., Jagdale Y.D., Saeed F., Afzaal M., Mostashari P., Khaneghah A.M., Regenstein J.M. Consumer acceptance of new food trends resulting from the fourth industrial revolution technologies: a narrative review of literature and future perspectives. Front. Nutr. 2022;9 doi: 10.3389/fnut.2022.972154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holt S., Miks M.H., de Carvalho B.T., Foulquie-Moreno M.R., Thevelein J.M. The molecular biology of fruity and floral aromas in beer and other alcoholic beverages. FEMS Microbiol. Rev. 2019;43(3):193–222. doi: 10.1093/femsre/fuy041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Iwata H. AI-driven prediction of bitterness and sweetness and analysis of receptor interactions. Curr. Res. Food Sci. 2025;10 doi: 10.1016/j.crfs.2025.101090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jain N., Dua D., Raju U., Shanthi C., Beniwal R. International Conference on Power Engineering and Intelligent Systems (PEIS) Springer; 2024. Deep learning for digital olfaction: graph-based self-supervised learning; pp. 185–196. [Google Scholar]
- Kou X., Shi P., Gao C., Ma P., Xing H., Ke Q., Zhang D. Data-driven elucidation of flavor chemistry. J. Agric. Food Chem. 2023;71(18):6789–6802. doi: 10.1021/acs.jafc.3c00909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maaten L.v.d., Hinton G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008;9(Nov):2579–2605. [Google Scholar]
- McGinty D., Vitale D., Letizia C.S., Api A.M. Fragrance material review on phenethyl propionate. Food Chem. Toxicol. 2012;50(Suppl. 2):S430–S434. doi: 10.1016/j.fct.2012.02.068. [DOI] [PubMed] [Google Scholar]
- McInnes L., Healy J., Melville J. Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426. 2018 [Google Scholar]
- Menini A., Lagostena L., Boccaccio A. Olfaction: from odorant molecules to the olfactory cortex. Physiology. 2004;19(3):101–104. doi: 10.1152/nips.1507.2003. [DOI] [PubMed] [Google Scholar]
- Ranjan S., Kumar N., Singh S.K. 2024 14th International Conference on Cloud Computing, Data Science & Engineering (Confluence) 2024. Deciphering smells from SMILES notation of the chemical compounds: a deep learning approach; pp. 538–543. IEEE. [Google Scholar]
- Ranmal S.R., Walsh J., Tuleu C. Poor-tasting pediatric medicines: part 1. A scoping review of their impact on patient acceptability, medication adherence, and treatment outcomes. Front. Drug Deliv. 2025;5 doi: 10.3389/fddev.2025.1553286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rodrigues B.C.L., Santana V.V., Murins S., Nogueira I.B.R. Molecule generation and optimization for efficient fragrance creation. Ind. Eng. Chem. Res. 2024;63(33):14480–14494. doi: 10.1021/acs.iecr.4c00650. [DOI] [Google Scholar]
- Rogers D., Hahn M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 2010;50(5):742–754. doi: 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]
- Rossiter K.J. Structure− odor relationships. Chem. Rev. 1996;96(8):3201–3240. doi: 10.1021/cr950068a. [DOI] [PubMed] [Google Scholar]
- Saada M., Ürgüp N.S., Bakirtas F., Çirban G. Effects of the tablets" organoleptic properties on patients" compliance. ACTA Pharm. Sci. 2024;62(3) doi: 10.23893/1307-2080.Aps6244. [DOI] [Google Scholar]
- Saini K., Ramanathan V. Predicting odor from molecular structure: a multi-label classification approach. Sci. Rep. 2022;12(1) doi: 10.1038/s41598-022-18086-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schiffman S.S. Influence of medications on taste and smell. World J. Otorhinolaryngol. Head Neck Surg. 2018;4(1):84–91. doi: 10.1016/j.wjorl.2018.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sundararajan M., Taly A., Yan Q. Axiomatic attribution for deep networks. ICML. 2017:3319–3328. PMLR. [Google Scholar]
- Tyagi P., Sharma A., Semwal R., Tiwary U.S., Varadwaj P.K. XGBoost odor prediction model: finding the structure-odor relationship of odorant molecules using the extreme gradient boosting algorithm. J. Biomol. Struct. Dyn. 2024;42(20):10727–10738. doi: 10.1080/07391102.2023.2258415. [DOI] [PubMed] [Google Scholar]
- Wang H., Chambers E. Sensory characteristics of various concentrations of phenolic compounds potentially associated with smoked aroma in foods. Molecules. 2018;23(4) doi: 10.3390/molecules23040780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu D., Luo D., Wong K.-Y., Hung K. POP-CNN: predicting odor pleasantness with convolutional neural network. IEEE Sens. J. 2019;19(23):11337–11345. [Google Scholar]
- Youden W.J. Index for rating diagnostic tests. Cancer. 1950;3(1):32–35. doi: 10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
- Zeng S., Zhang L., Li P., Pu D., Fu Y., Zheng R., Xi H., Qiao K., Wang D., Sun B., Sun S., Zhang Y. Molecular mechanisms of caramel-like odorant-olfactory receptor interactions based on a computational chemistry approach. Food Res. Int. 2023;171 doi: 10.1016/j.foodres.2023.113063. [DOI] [PubMed] [Google Scholar]
- Zhang T., Bao F., Yang Y., Hu L., Ding A., Ding A., Wang J., Cheng T., Zhang Q. A comparative analysis of floral scent compounds in intraspecific cultivars of Prunus mume with different corolla colours. Molecules. 2019;25(1) doi: 10.3390/molecules25010145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng X., Tomiura Y., Hayashi K. Investigation of the structure-odor relationship using a transformer model. J. Cheminf. 2022;14(1):88. doi: 10.1186/s13321-022-00671-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data is provided at the online public link https://github.com/ecological-systems-design/flavor-chemical-design.










































