Skip to main content
ACS Omega logoLink to ACS Omega
. 2025 Aug 22;10(35):39933–39945. doi: 10.1021/acsomega.5c04324

A Structure-Based Approach for Predicting Odor Similarity of Molecules via Docking Simulations with Human Olfactory Receptors

Hirotada Kaneshiro , Masakazu Sato , Airi Tanaka , Shuya Nakata , Yoshiko Aihara , Hirotaka Kitoh-Nishioka §, Yoshiharu Mori †,∥,, Shigenori Tanaka †,#,*
PMCID: PMC12423883  PMID: 40949280

Abstract

The mechanisms underlying human odor recognition remain largely unclear, making it challenging to predict the scent of a novel molecule based solely on its molecular structure. Unlike taste, which is classified into a limited number of categories, odor perception is highly complex and lacks universally defined labels, rendering absolute odor classification inherently ambiguous. To address this issue, we propose a relative evaluation framework for odor prediction, focusing on odor similarity rather than absolute descriptors. In this study, we constructed three-dimensional structures of approximately 400 human olfactory receptors (hORs) using AlphaFold2 and performed molecular docking simulations with odorant compounds. Each odorant was represented as a 409-dimensional docking score vector, and odor similarity was inferred by comparing these vectors statistically. To evaluate the effectiveness of this approach, we used odorant molecules from the ATLAS database and tested whether molecules with similar docking profiles correspond to similar olfactory perceptions. Our results demonstrate that the proposed docking-based method enables the relative prediction of odor similarity between molecules, even for compounds not included in the reference database. This method offers a promising alternative to traditional QSAR-based approaches relying solely on molecular structural similarity, and provides a structure-based, receptor-level framework for computational olfaction.


graphic file with name ao5c04324_0018.jpg


graphic file with name ao5c04324_0016.jpg

1. Introduction

Humans are capable of detecting a remarkably large number of odors, estimated to be at least tens of thousands, and possibly up to trillions depending on the criteria used for discrimination. Although many animals possess a highly developed sense of smell, the molecular mechanisms underlying olfaction remain only partially understood. In general, odorant molecules are detected by olfactory receptors (ORs) expressed in the olfactory epithelium of the nasal mucosa, and the resulting signals are transmitted to the brain for interpretation. Humans possess approximately 400 types of functional ORs, while other mammals such as mice, rats and elephants have more than 1000 types. Experimental studies, primarily in mice, have attempted to identify the interactions between odorants and ORs using methods such as luciferase reporter assays and calcium imaging. However, the vast diversity of detectable odors cannot be explained by a simple one-to-one relationship between odorants and ORs. Instead, it is now widely accepted that odor perception arises from complex many-to-many interactions, where each receptor may recognize multiple odorants and each odorant may activate multiple receptors. This complexity limits the explanatory power of in vivo and in vitro experiments alone in fully decoding the olfactory system.

In parallel with biological and experimental approaches, computational methods for analyzing and predicting odor perception have been developed. These are often based on machine learning techniques trained on human perception test data. Typically, such studies involve collecting odor descriptors and intensity ratings from human subjects exposed to numerous odorant molecules. , Using these data, models are constructed to associate molecular features with perceived odors, following the framework of quantitative structure–activity relationship (QSAR) or quantitative structure-odor relationship (QSOR). , While such models can predict general trends, they often struggle to differentiate between molecules with similar structures but different odors, or vice versa. An alternative strategy involves converting odor information into digital form using sensor-based measurements combined with machine learning. In this approach, odors are expressed as combinations of predefined “base odors”, enabling an intuitive numerical representation. However, the selection and calibration of base odors are inherently subjective, and there is no established standard set that can be applied to unknown odorant molecules. In addition, the lack of universally accepted descriptors for odors and the cultural variability in odor perception further complicate the task.

In this study, to overcome these difficulties in odor description and prediction, we propose a novel structure-based approach using large-scale docking simulations between odorant molecules and human ORs. Advances in protein structure prediction by artificial intelligence, particularly AlphaFold2, have enabled accurate modeling of membrane proteins, including ORs, which were previously difficult to determine experimentally. Leveraging these structural predictions, we dock odorant molecules in silico to all (409) available human OR (hOR) models and compute binding affinity scores. These docking score vectors serve as molecular “odor representations,” where similarity between molecules can be assessed statistically over all the hORs.

Because odors lack a clear and absolute reference scale (unlike the five basic tastes), we adopt a relative evaluation strategy in this study: predicting the odor of an unknown molecule by identifying the most similar known odorant in database based on docking score similarity. To test this method, we use the ATLAS database, which contains perception evaluations of over 100 odorant molecules, each characterized by more than 100 odor descriptors rated by human subjects. Our approach calculates docking score vectors (features) for all ATLAS molecules and compares them using correlation coefficients or Euclidean distances to find the closest matches in the docking score vectors, thereby enabling the prediction of the odor of novel molecules not included in the database.

Section describes the computational methods employed, including receptor structure modeling using AlphaFold2, ligand docking for 136 odorants across 409 ORs, and similarity evaluations using the Tanimoto coefficients for molecular structure fingerprints and the correlation coefficients for odor descriptor vectors as well as docking score vectors. In Section , we assess whether molecular structure or docking score features provide better predictors of odor similarity using clustering, heat maps, and ROC curve analyses. Finally, Section summarizes our findings and discusses the implications and limitations of docking-based odor prediction.

2. Materials and Methods

2.1. Odor Molecules

2.1.1. ATLAS Database

We collected the odorant data of odor molecules from the ATLAS database. ATLAS is a database for about 140 odor molecules rated with 146 odor descriptors (e.g., sweet, sour, rotten, etc.). The database was compiled with the aim of developing information on different types of odors, from pleasant to unpleasant, and each molecule was scored by 120–140 subjects after smelling the odorant dissolved in low-odor dipropylene glycol. In this study we employed 136 odor molecules in ATLAS to construct the prediction protocol. The three-dimensional (3D) structures of the odor molecules were downloaded from PubChem. The partial charges of the downloaded molecules were calculated using Open Babel with the Gasteiger–Marsili method.

2.1.2. Odor Score and Similarity

The scores (0–1) for the 146 odor descriptors in ATLAS were calculated by dividing the total score by the number of subjects who evaluated the odor times 5, with the subjects rating how much they felt the odor on a 5-point scale. After transforming into the percentage scale (1–100), these scores were treated as a 146-dimensional vector for each of the 136 odor molecules. Here, since the magnitude of this score varies for each odor, it can be standardized using the mean and variance of each (see Section below).

Through comparison of these feature vectors by similarity, the distance between each odor molecule can be quantified. This is used to quantitatively indicate whether the odors are close or distant. Here, similarity was examined and compared in three ways: the cosine similarity of vectors and the correlation coefficient in a scatter plot of each odor score, and the Euclidean distance of multidimensional vectors. In addition, based on the distance relationship between all odor molecule pairs, clustering by Ward’s method, as well as PCA (Principal Component Analysis) and t-SNE (t-distributed Stochastic Neighboring Embedding) analysis, was performed. It is noted that there is another approach , to directly evaluate the odor similarity instead of comparing the odor descriptor scores as addressed above. However, this method is not employed in the present study to utilize the ATLAS database for the odor prediction.

By these comparisons, it is possible to rank other odor molecules in the order of their similarity to a certain odor molecule. Here, the top 10% molecules in the ATLAS database were defined as “molecules with similar odors.” The criterion of “top 10 %” is somewhat arbitrary but tentatively selected to realize a feasibility study. Some case studies in which other criteria were chosen are discussed in Section .

2.1.3. Molecular Structure Similarity

The structural similarity of a pair of odor molecules was expressed in terms of the Tanimoto coefficient in SMILES notation. The Tanimoto coefficient was calculated using the Morgan fingerprint with radius 2, which evaluates how similar the two molecular structures are within two bonds away from each central atom in the molecule. Specifically, the Python library RDKit was used to generate the molecular fingerprints using AllChem.GetMorganFingerprint, and the Tanimoto coefficient T was calculated using TanimotoSimilarity. The formula is T­(fp1,fp2) = c/(a + bc), where fp1 and fp2 are the fingerprints (0 or 1) of two molecules, a is the number of bits that are 1 in fp1, b is the number of bits that are 1 in fp2, and c is the number of bits that are 1 in both fp1 and fp2.

The Tanimoto coefficient between two odor molecules was obtained in this way, and the other molecules were ranked in the order of the higher Tanimoto coefficient when a certain odor molecule was chosen as the reference. An example for the calculation of Tanimoto coefficient is shown in SI-1 in Supporting Information. As shown there, the Tanimoto coefficient is not so high (not near unity) even if a pair of molecules seem to be structurally similar. We have also made comparative studies in which the Morgan fingerprint with radius 4 was employed (see SI-4 in Supporting Information).

2.2. Olfactory Receptors

2.2.1. Structure Preparations

In the present study, the three-dimensional (3D) structures of human olfactory receptors (hORs) were generated using AlphaFold2, a state-of-the-art, AI-based protein structure prediction tool. To facilitate the structure prediction, we utilized ColabFold, a user-friendly implementation of AlphaFold2 that runs on Google Colaboratory. Here, due to the limitations in the cloud-based environment of ColabFold, such as intermittent runtime interruptions and suboptimal GPU performance, we implemented the LocalColabFold on our workstation equipped with an NVIDIA RTX3090 GPU, enabling more stable and efficient computations.

AlphaFold2 predicts the 3D structure of a protein based on its amino acid sequence. In addition, the algorithm allows for the incorporation of structural templates to guide the folding process, particularly useful when no experimentally determined structure is available for that protein. Given that most hORs lack experimentally resolved structures, this template-guided folding capability was employed to enhance the reliability of the predicted conformations.

A total of 409 human olfactory receptor (hOR) sequences, as registered in UniProt (retrieved on May 13, 2024), were used as input for AlphaFold2. For the structural templates, we selected three experimentally determined cryo-electron microscopy (cryo-EM) structures registered in Protein Data Bank (PDB):

  • OR51E2 (PDB ID: 8F76)

  • OR52c in the active (ligand-bound or holo) state (PDB ID: 8HTI)

  • OR52c in the inactive (apo) state (PDB ID: 8W77)

These are three PDB structures currently available as templates for hORs. Both of OR51E2 and OR52c (c means “consensus”) belong to the Class I olfactory receptors which typically respond to water-soluble odorant molecules. We take a strategy so that an appropriate template is selected from these PDB structures to generate the docking structures of 409 hORs by means of AlphaFold2.

To identify the most suitable template for AlphaFold2 in the present study, we compared the receptor structures with a focus on ligand-binding pocket size, which critically influences docking outcomes. Specifically, smaller pockets may physically exclude large odorant molecules during docking simulations, leading to unrealistic binding predictions. Therefore, we analyzed and compared the pocket volumes of each OR structure.

The binding pocket of OR51E2 was estimated to be 71.05 Å3, whereas the OR52c-holo (ligand-bound) structure featured a considerably larger pocket of 433.11 Å3, as identified using FPocketWeb , (see Figure ). This 6-fold difference in pocket volume strongly favored the OR52c-based templates. In addition, the druggability scores also estimated by FPocketWeb were 0.006 for OR51E2 and 0.899 for OR52c-holo, respectively, again favoring the OR52c-based templates. While the estimated pocket volume of OR52c-apo is larger than that of OR52c-holo, structural comparison revealed significant conformational changes between them, particularly in transmembrane helices TM5 and TM6, induced by the presence of the ligand (OCA; octanoic acid). These displacements, highlighted by orange arrows in Figure , are known to modulate ligand accessibility and receptor activation.

1.

1

Molecular structures of (a) OR51E2 (PDBID: 8F76; left) and (b) OR52c-holo (PDBID: 8HTI; right) with the evaluation of the ligand-binding pocket (represented by α spheres) size by FPocketWeb.

2.

2

Molecular structures of OR52c superimposed with PDB entries: 8HTI (holo form, green) and 8W77 (apo form, cyan) depicted by PyMOL (https://pymol.org). The presence or absence of the ligand (OCA) causes a significant movement in the TM5 and TM6 regions (orange arrows).

Based on these observations, we selected OR52c-holo as the preferred structural template for constructing the 3D models of all 409 hORs with AlphaFold2 used in subsequent docking simulations, while the OR52c-apo template was employed for comparison.

2.2.2. Addition of Hydrogen Atoms

Hydrogen was added to the structure created by AlphaFold2, because hydrogens were not contained in the receptor structure when the structure was created. pdb2pqr was used to add hydrogen, which could also determine the protonation state, where the pK a value of each residue was calculated using the PropKa program.

2.2.3. Validation of Receptor Structures

All hORs belong to the GPCR (G protein-coupled receptor) family and have similar 3D structures. Therefore, to confirm whether the 3D structures created by AlphaFold2 are valid, and to align the output 3D structures whose orientations are random, structural alignment was performed. TM-align was used with its default parameters to align all hOR structures to the OR52c-holo structure. More specifically, residues 8–310 of chain R from PDB entry 8HTI were used as the reference. TM-align can evaluate the degree of similarity of a structure to the original (reference) structure by using an index called TM-score, which evaluates the topological similarity of protein structures, in addition to structural alignment. TM-score has a value between 0 and 1, and the closer to 1 the structure is, the more similar it is. As a result of the structure preparations above, the TM- score was in the range of 0.7–1.0, indicating that all hOR structures have valid folded structures.

2.3. Docking Simulation

2.3.1. Software

In this study, we performed docking simulations between receptors and ligands using the deep learning model GNINA. GNINA employs an enhanced version of the score function used in AutoDock Vina through deep learning, and is expected to obtain more accurate binding poses and scores. As a result of docking, GNINA predicts the CNN affinity, which is the binding affinity using a convolutional neural network, and the CNN pose score, which calculates the probability of the ligand taking a binding pose, and makes the scoring using the VS score, which is a combination of these two. This is because it has been suggested that performance can be improved by using the VS score. Since the score obtained in this manner varies depending on the size of the ligand molecule, a standardization can be performed for each molecule when comparing the similarity of docking scores, as with the odor similarity.

GNINA can take multiple options when performing docking. In the present work, the docking calculation range was set to the center coordinate (x,y,z) = (8,–2,8) (in units of Å, employed from the ligand position of OR52c-holo structure) and the grid box size of (30,30,30), and docking was performed with a seed value of 0.

2.3.2. Docking Score Similarity

The VS scores obtained from the docking simulation with GNINA were used to calculate the similarity between each pair of odor molecules. Since 409 receptor scores were obtained for one molecule, each molecule was represented by a 409-dimensional vector, and the cosine similarity or the correlation coefficient of the vectors between two molecules was considered to be the pairwise docking score similarity. Molecules with strong correlations in scores are expected to have similar odors in the present scheme. In addition, the similarity of docking scores can be measured with the use of the Euclidean distance between the two vectors as well for comparison.

2.4. Comparisons of Similarities

2.4.1. Data Processing

In the ATLAS database, each molecule has 146 odor descriptors. Each of these descriptors has a score of 0–100 point for its strength, but the average score for each odor varies depending on the species of the odor. Therefore, the scores can be standardized for each odor to reduce the impact of this variation. When the variable (odor score in this case) x i has an average μ and an standard deviation σ, the standardized value y i is defined as y i = (x i – μ)/σ.

When calculating the similarities between molecules, the vector cosine similarity and correlation coefficient were considered as evaluation criteria (see above). Since there was no significant difference in the results between the two calculation methods, in this study, when calculating the similarities of two vectors, only the correlation coefficient in a scatter plot is calculated for simplicity and this value is used as the similarity, in addition to the Euclidean distance (see above).

Regarding the VS score obtained by docking simulation with GNINA, it is noted that the docking pose has more interaction points as the ligand molecule becomes larger, and the bias in the score due to the size of the ligand molecule cannot be ignored. Therefore, we can reduce the impact by standardizing the scores for each odorant molecule with different molecular size.

2.4.2. ROC Curve

The ROC (Receiver Operating Characteristic) curve is a measure generally calculated with the true positive rate (TPR) and false positive rate (FPR) for each cutoff point that distinguishes between positive and negative results. It is used to evaluate the performance of judgment method (classification in this case) and to consider the probability of hitting the correct answer (see SI-4 in Supporting Information for more details). In the present scheme, we defined the set of odor molecules that falls in the top 10% of the scores obtained by odor similarity as the correct answer, and labeled the docking score similarity and structural similarity at that time to obtain the ROC curve. Since the criterion of “top 10 %” for the correct answer, which was employed to make the odor prediction based on the ATLAS database feasible, is somewhat arbitrary, the impact of this threshold value on the ROC curve is investigated in Section . To assess the quality of ROC curve, the AUC (Area Under the Curve) score, which is defined by the area under the ROC curve, is used. The AUC value becomes unity if the classification of true/false is perfect, and 0.5 when the judgment is made randomly.

3. Results and Discussion

3.1. Correlation between Docking Score and Odor Similarities

First, we demonstrate some example results in which the similarity of docking scores between two molecules well represents the similarity of odors.

Figure (a) shows a scatter plot of 409 docking scores between Auralva and Indolene registered in ATLAS. The correlation coefficient (R) of docking scores is 0.76, while the correlation coefficient of odor scores between the two molecules is 0.65. This example illustrates that two molecules with similar odors share a similar variation with respect to the docking scores for the 409 olfactory receptors.

3.

3

Correlations (left) of ligand docking scores between two compounds (right) with similar odors registered in ATLAS database. (a) Auralva (upper on the right) versus Indolene (lower on the right). (b) Pentanoic Acid (upper on the right) versus Iso-Valeric Acid (lower on the right).

Next, Figure (b) shows a scatter plot of 409 docking scores between Pentanoic Acid and Valeric Acid (iso-Valeric Acid) registered in ATLAS. The correlation coefficient of docking scores is 0.60, while the correlation coefficient of odor scores between the two molecules is 0.95. Again, this example illustrates that two molecules with similar odors share a similar variation with respect to the docking scores for the 409 olfactory receptors.

Figure illustrates the scatter plot between the docking similarities and the odor similarities over totally 9180 molecular pairs in the ATLAS database. Overall, a weak correlation (R = 0.22) is observed in the figure, which should be improved in the future work where the technical ingredients in the present work such as the accuracy of the docking simulations and the processing of the odor database will be further refined and optimized. For example, the quality of ligand docking structures and scores can be improved by performing all-atom molecular dynamics (MD) simulations , following the machine-learning-based, semiempirical docking simulations with GNINA. In addition, it is noted that the correlation coefficient in the scatter plot of Figure is improved up to R = 0.25 if we limit the data to those with docking score similarity of more than 0.2 and odor similarity of more than 0.4 in order to find molecular pairs with high odor similarity from the pairs with high docking score similarity.

4.

4

Scatter plot between odor similarity and docking score similarity for all the molecular pairs in the ATLAS database.

3.2. Heat Map and Clustering with Odor Descriptor and Ligand Docking Scores

It is informative to see the heat map for the variations of odor descriptor scores and docking scores, which underlie the correlation between the two measures. Figure illustrates the heat map for ATLAS odor descriptor scores normalized per odor molecule, where the abscissa and ordinate refer to the odor descriptors and the molecular species, respectively. By normalizing the descriptor scores between 0 and 1 for each odor molecule, one can clearly detect the molecular characteristics. One can also observe in the heat map that certain descriptors frequently have high scores in each molecule. For example, Fragrant, Aromatic, Sweet, and Sickening highlighted with red color are highly perceived descriptors compared to other odors.

5.

5

Heat map for ATLAS odor descriptor scores normalized per odor molecule.

We next constructed in Figure the heat map for docking scores standardized per receptor, which corresponds to Figure below. As seen in the figure, particular odor molecules frequently have high scores (highlighted with red color) at each receptor, which seems to be associated with the molecular size. Examples include Adoxal, Amyl Cinnamic Aldehyde Diethyl Acetal, Auralva, Indolene, and Maritima.

6.

6

Heat map for docking scores standardized per receptor.

7.

7

Heat map depicted by using the normalized values of the docking scores for 136 molecules registered in ATLAS. Dendrogram was created on the left due to the clustering with the Ward method. Black lines in the heat map are drawn between the clusters to make easier to distinguish.

Next, we clustered the odor molecules registered in ATLAS based on odor descriptors, docking scores and structural similarities, and examined how the 136 molecules are distributed in each representation space. See SI-2 in Supporting Information for more details.

In order to actually see the correspondence between the heat map patterns and clustering, a heat map was generated on the right using the normalized values of the docking scores, and a dendrogram was created on the left when clustering was performed, as shown in Figure . Ward’s method was used for clustering, and the clusters were divided into five (Clusters 0–4). Here, the vertical axis of the heat map represents odor molecules, and the horizontal axis represents the species of receptors. Black lines are drawn between the clusters in the heat map to make them easier to distinguish. When we actually checked whether the odors of the molecules in each cluster were similar, we found that there were a number of pairwise combinations of molecules with high odor similarity values within the clusters, demonstrating the possibility of odor prediction using docking score similarity. Examples (Clusters 0–4 from the top of Figure ) include the following: In Cluster 0, between Amyl Cinnamic Aldehyde Diethyl Acetal and Indolene, the odor correlation coefficient is 0.78; in Cluster 1, between Hydratropic Aldehyde Dimethyl Acetal and Zingerone, the odor correlation coefficient is 0.71; in Cluster 2, between Thienopyrimidine and Methyl Thiobutyrate, the odor correlation coefficient is 0.83; in Cluster 3, between Coumarin and Hexanol 1-Hexanol, the odor correlation coefficient is 0.73; in Cluster 4, between Hexanoic Acid and Pentenoic Acid (4-Pentenoic Acid), the odor correlation coefficient is 0.89.

Furthermore, we used the normalized mutual information (NMI) to calculate the degree of correspondence between the clustering by docking similarity or structural similarity and the clustering by odor similarity. If the NMI is closer to 1, the clustering results match perfectly, and conversely, if it is closer to 0, they do not match at all. It is noted that the calculated values of NMI are usually not so high even if two clustering results are relatively similar. We also created a confusion matrix for NMI, as seen in SI-3 of Supporting Information. When the Euclidean distance for the score vector is used, the NMI score between docking similarity and odor similarity is 0.12, and the NMI score between structural similarity and odor similarity is 0.07, respectively. These results show that the docking similarity is capable to classify odors more appropriately than the structural similarity in the clustering results. For more details concerning NMI, see SI-3 of Supporting Information.

3.3. Examples for the Relevance of Docking Scores: Anisole and dl-Camphor

Anisole has a distinctive odor that is often described as sweet, aromatic, and similar to the smell of anise or licorice. It has a light and pleasant fragrance with hints of vanilla-like or spicy undertones. This odor profile makes it a recognizable compound in perfumery and flavoring, where it is sometimes used to impart a warm, slightly floral or sweet aroma.

Anisole is registered in ATLAS database, and we employ this odor molecule as a remarkable case in the present analysis. The odor similarity between Anisole (Figure (c)) and another ATLAS-registered molecule, 3-Hexanol (Figure (d)), is calculated to be 0.698, which means that these two molecules have similar odors (see Figure below). However, the structure similarity (Tanimoto coefficient) between them is found to be 0.025, implying highly dissimilar molecular structures (ranking #101 of 135 molecules). On the other hand, as seen in Figure (a) in which the scatter plot of docking scores between the two molecules is shown, the docking score similarity is 0.282, which is the 13th ranking in ATLAS database. Thus, this instance illustrates a case that the present docking-based method can account for the existence of two molecules with dissimilar structure and similar odor. In contrast, the structure similarity between Anisole and Hydratropic aldehyde dimethyl acetal (Figure (e)) is relatively high (0.404), but their odor similarity is low (0.030). Figure (b) illustrates the scatter plot of docking scores between these two molecules, showing the low docking-score similarity (−0.045). This case thus provides an example that the present docking-based method can account for the existence of two molecules with similar structure and dissimilar odor. Here, to assess the relative rankings of each similarity, we show in Figure the distributions of the similarities of odor, molecular structure and docking score for all the 9180 molecular pairs.

8.

8

(a) Scatter plot of docking scores for Anisole and 3-Hexanol, indicating a good correlation. This is an example where prediction cannot be made based on structural similarity due to high odor similarity but low structural similarity. (b) Scatter plot of docking scores for Anisole and Hydratropic aldehyde dimethyl acetal, indicating a poor correlation. This is an example of low odor similarity but high structural similarity that cannot be predicted from structural similarity. (c–e) Molecular structures of odorants used for the comparisons above.

9.

9

Distributions of the similarities of odor (left), molecular structure (middle) and docking score (right) for all the 9180 molecular pairs.

When we focus on the Anisole molecule, we can rank other 135 molecules in the ATLAS database according to the odor similarity to Anisole. Table shows the top 10% ranking for the odor similarity to Anisole along with the docking score and molecular structure similarity values for the enumerated odor molecules. Figure then illustrates the ROC curves when one enumerates the molecules according to the docking score and molecular structure similarities. In this case of Anisole, we find that the docking score is significantly better than the molecular structure similarity in order to look for the odor molecules with similar odors.

1. Molecules with Top 10% Odor Similarity to Anisole in ATLAS Database .

3.3.

a

Their docking and structural similarities are also shown with their ranks (parentheses).

10.

10

ROC curves concerning the odor similarity to Anisole. The results obtained with the rankings by docking score similarity and molecular structure similarity are shown by blue and red curves, respectively.

We show one more example to assess the relevance of docking score approach. dl-Camphor is a naturally occurring molecule that has a refreshing mint-like odor and an insect repellent effect. The ATLAS database contains racemic dl-Camphor. When we searched for molecules with the highest odor similarity based on the ATLAS database, we found that Eucalyptol was the molecule with the highest odor similarity. The odor similarity between these two molecules was calculated to be 0.80, indicating that these two molecules have a very similar odor. On the other hand, the docking similarity between these two molecules was 0.12 (ranked 12th in 135 molecules), and the structural similarity was 0.37 (ranked second), indicating that the both similarities can detect the odor similarity within the top 10% threshold. Furthermore, we next focused on Methyl salicylate. The odor similarity between this molecule and dl-Camphor was 0.34 (ranked 13th), indicating that they have a similar odor within the top 10% threshold. Then, the docking score similarity between these two molecules was 0.25 (ranked sixth), and the structural similarity was 0.05 (ranked 73rd in 135 molecules), indicating that this is an example in which the docking similarity is superior to the structural similarity for detecting the odor similarity.

3.4. ROC Curve

Next, we studied the ROC curves obtained by using all the 136 odor molecules in ATLAS database, in which the similarities between all the pairwise combinations of molecules are compared and the top 10% ranking combinations are regarded as “similar”. The odor and docking score similarities were calculated by employing the standardized or nonstandardized scores with respect to the odor molecule or receptor species variations. We calculated the docking scores for apo and holo structures, and also employed the differences between them. In addition, the top 10% similarities were defined both for the whole pair data (9180 in total) and those for each odor molecule (135 each for 136 molecules). Thus, we examined the 18 (= 3 × 3 × 2) patterns of calculations.

As seen in Figure , the AUC (Area Under the Curve) value for the ROC curve obtained by using the receptor-standardized docking scores for the holo receptor structures is 0.641, which outperforms the score of 0.602 obtained by the molecular structure similarity. When one uses the holo structures for the docking simulations, all the estimated AUC scores are similar, while the results with the apo structures are worse, suggesting that the deformation of receptor structure (in particular, around the binding pocket) induced by ligand binding is crucial for prediction accuracy. Some of typical calculated results for the ROC curves are compiled in SI-4 of Supporting Information, where the ROC curves obtained from the molecular structure similarity using the Morgan fingerprint with radius 4 are also shown.

11.

11

ROC curves for the prediction of odor similarity, where the docking scores standardized regarding the variations of holo receptor species are used, and the top 10% molecules with higher correlations are regarded as similar. Blue and red curves refer to the results based on the docking score and molecular structure similarities, respectively.

By the way, there may be concerns about the validity of the criterion of “top 10 %” for selecting similar odors. We illustrate in Figure the ROC curves for the comparison in prediction performance between the molecular structure and docking score similarities also in the cases that the criteria of top 5% and top 2% for similar odors are employed. We observe in the figures that the AUC values by the docking score and molecular structure similarities with the top 2% threshold become 0.735 and 0.696, respectively, because structurally more similar molecular pairs are focused on in the ATLAS database. The optimization of the similarity threshold (presently 10%) should be made in accordance with the odor database used for prediction.

12.

12

ROC curves for the prediction of odor similarity, where the top 5% (left) and top 2% (right) molecules with higher correlations are regarded as similar. Blue and red curves refer to the results based on the docking score and molecular structure similarities, respectively.

3.5. Prediction of Odors for Odorants Outside the Database

A main goal of the present approach is to provide a computational tool to predict the odor of a novel molecule whose molecular structure is known. To test the validity of the present docking-based method, we performed the “prediction” of the odor of Muscone (Figure ), , which is not registered in the ATLAS database. We enumerate in Table the top 10% odor molecules in ATLAS whose docking score similarity to Muscone is high. It is significant that Cashmeran, Musk Tonalid, Santalol and Musk Galaxolide are contained in this top 10% ranking, while three of these four molecules are outside of top 10% when judged in terms of the structural similarity. The main odor of these four molecules is described as “Musk” in the ATLAS database, thus confirming the usefulness of the present docking-based approach to the prediction of odor.

13.

13

Molecular structure of Muscone.

2. Predicted Molecules with Odor Similarity to Muscone along with Their Docking Similarity (Top 10%) and Structural Similarity Values, and Their Ranks (Parentheses).

3.5.

3.6. Future Works for Improvement of Prediction Capability

This research was largely a feasibility study whose primary objective was to provide a proof-of-concept, and the predictive capabilities themselves are currently incomplete and immature; however, it is expected that predictive accuracy will improve in the future as individual component technologies are refined.

For example, regarding the modeling of ORs, if experimental structures of not only Class I but also Class II can be obtained, it is expected that the accuracy of structure determination covering all 409 ORs will be improved supposing the use of structure prediction method such as AlphaFold with appropriate templates. It is noted here that approximately 85% of hORs belong to Class II to which many volatile odorants are supposed to bind. Furthermore, concerning the calculations of the docking pose and score, we relied on docking simulations for the static receptor structure using GNINA, but more reliable computational predictions can be performed by using more sophisticated or accurate force fields and time-consuming MD calculations which enable the inclusion of dynamic conformation changes of ORs. ,

In addition, when calculating the odor similarity of molecules from an odor database such as ATLAS, there is a problem of accuracy degradation due to overlapping (redundancy) of odor descriptors. To overcome this issue, it is considered effective to reduce the dimension of information using PCA, for example. Some kind of information compression may also be effective for the docking scores for ORs (currently 409 dimensions). As mentioned above, specific protocol optimization, such as setting the threshold for odor similarity at 10%, is also a future issue. This depends on the type and attributes of the database used, and while ATLAS was used as a test case in this study, optimization should be attempted according to the database to be combined and how to use it.

The amount of computation required in this research is not so large; the structure preparations with AlphaFold2 and the docking calculations with GNINA took approximately 6 days and 3 days, respectively, on our workstation equipped with an NVIDIA RTX3090 GPU. If abundant computational resources become available in the future study, we can expect to improve the accuracy of predictions by carrying out the above-mentioned sophisticated protocols. Furthermore, a hybrid combination with various odor prediction methods that have already been well developed (such as conventional QSAR type approaches) will lead to an improvement in the overall accuracy of odor prediction.

Finally, we used in this study a strategy of relative evaluation for odor prediction by searching for the molecule with the closest odor feature in given database. However, other odor prediction methods, e.g., such as a method of determining whether a certain odor element is present or not for a molecule with a given structure, are also possible. In that case, if the docking score feature is used for prediction, a strategy can be considered in which the docking scores are used as input for deep neural network, and the presence or absence of a certain odor element is determined as the output, for example. Such research strategy developments would also be a future challenge.

4. Conclusions

In this study, we have proposed a novel in silico framework for predicting the scent of odorant molecules based on docking simulations with human olfactory receptors (hORs). Our prediction strategy is grounded in a relative evaluation approach, in which we forego absolute odor assessment and instead identify molecules with similarly perceived odors by referencing databases such as ATLAS. Specifically, we computed docking scores for each molecule against a panel of 409 hORs, representing each odorant as a 409-dimensional score vector. This vector-based feature representation statistically compensates for inaccuracies in individual docking poses and affinities.

The central hypothesis underlying this approach is that molecules with similar docking score vectors are likely to exhibit similar olfactory perceptions. We tested this hypothesis using the ATLAS database and found that our docking-based scheme in some ways outperformed conventional QSAR-based methods that rely solely on molecular structural similarity, thus providing a first-step proof of concept.

It is noteworthy that using the active-state conformation of hORs as templates for AlphaFold2 predictions was essential for achieving high performance. Moreover, it is remarked that the performance of conventional QSAR models for odor prediction could potentially be improved by incorporating a broader set of molecular features beyond compound structure alone. Therefore, we anticipate that a hybrid strategycombining the conventional QSAR approach with the proposed docking-based methodcould enhance predictive accuracy by expanding the QSAR descriptor space.

The reliability and accuracy of the proposed method ultimately depend on how well the docking poses and binding affinities of ligand–receptor interactions can be estimated. In this regard, we expect that continued improvements in docking simulation methodologies, as well as the accumulation of structural information on olfactory receptors, will further enhance the performance of odor prediction in the future.

Supplementary Material

ao5c04324_si_001.pdf (1.2MB, pdf)

Acknowledgments

We thank Science & Technology Systems, Inc. for helping the data curation on odorant molecules. We would like to acknowledge the MEXT Quantum Leap Flagship Program, Japan (Grant no. JPMXS0120330644) for financial support.

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.5c04324.

  • Tanimoto coefficient for the similarity of molecular structure; clustering of odor molecules by odor descriptors and docking scores; confusion matrix for normalized mutual information (NMI) representing the correspondence of clustering; ROC curves for the prediction of odor similarity (PDF)

The authors declare no competing financial interest.

References

  1. Bushdid C., Magnasco M. O., Vosshall L. B., Keller A.. Humans can discriminate more than one trillion olfactory stimuli. Science. 2014;343:1370–1372. doi: 10.1126/science.1249168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Sell C. S.. On the Unpredictability of Odor. Angew. Chem., Int. Ed. 2006;45:6254–6261. doi: 10.1002/anie.200600782. [DOI] [PubMed] [Google Scholar]
  3. Buck L., Axel R.. A novel multigene family may encode odorant receptors: A molecular basis for odor recognition. Cell. 1991;65:175–187. doi: 10.1016/0092-8674(91)90418-X. [DOI] [PubMed] [Google Scholar]
  4. Soelter J., Schumacher J., Spors H., Schmuker M.. Computational exploration of molecular receptive fields in the olfactory bulb reveals a glomerulus-centric chemical map. Sci. Rep. 2020;10:77. doi: 10.1038/s41598-019-56863-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Niimura Y., Matsui A., Touhara K.. Extreme expansion of the olfactory receptor gene repertoire in African elephants and evolutionary dynamics of orthologous gene groups in 13 placental mammals. Genome Res. 2014;24:1485–1496. doi: 10.1101/gr.169532.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Plank D. M., Sussman M. A.. Intracellular Ca2+ measurements in live cells by rapid line scan confocal microscopy: simplified calibration methodology. Methods Cell Sci. 2004;25:123–133. doi: 10.1007/s11022-004-2043-8. [DOI] [PubMed] [Google Scholar]
  7. Keller A., Gerkin R. C., Guan Y.. et al. Predicting Human Olfactory Perception from Chemical Features of Odor Molecules. Science. 2017;355:820–826. doi: 10.1126/science.aal2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Sanchez-Lengeling, B. ; Wei, J. N. ; Lee, B. K. ; Gerkin, R. C. ; Aspuru-Guzik, A. ; Wiltschko, A. B. . Machine Learning for Scent: Learning Generalizable Perceptual Representations of Small Molecules. 2019, arXiv:1910.10685v2. arXiv.org e-Print archive. 10.48550/arXiv.1910.10685. [DOI]
  9. Kowalewski J., Ray A.. Predicting Human Olfactory Perception from Activities of Odorant Receptors. iScience. 2020;23:101361. doi: 10.1016/j.isci.2020.101361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Kowalewski J., Huynh B., Ray A.. A System-Wide Understanding of the Human Olfactory Percept Chemical Space. Chem. Senses. 2021;46:bjab007. doi: 10.1093/chemse/bjab007. [DOI] [PubMed] [Google Scholar]
  11. Gupta R., Mittal A., Agrawal V.. et al. OdoriFy: A Conglomerate of Artificial Intelligence-Driven Prediction Engines for Olfactory Decoding. J. Biol. Chem. 2021;297:100956. doi: 10.1016/j.jbc.2021.100956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Gerkin R. C.. Parsing Sage and Rosemary in Time: The Machine Learning Race to Crack Olfactory Perception. Chem. Senses. 2021;46:bjab020. doi: 10.1093/chemse/bjab020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Lee B. K., Mayhew E. J.. et al. A Principal Odor Map Unifies Diverse Tasks in Olfactory Perception. Science. 2023;381:999–1006. doi: 10.1126/science.ade4401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Lötsch J., Kringel D., Hummel T.. Machine Learning in Human Olfactory Research. Chem. Senses. 2019;44:11–22. doi: 10.1093/chemse/bjy067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Sharma A., Kumar R., Ranjta S., Varadwaj P. K.. SMILES to Smell: Decoding the Structure-Odor Relationship of Chemical Compounds Using the Deep Neural Network Approach. J. Chem. Inf. Model. 2021;61:676–688. doi: 10.1021/acs.jcim.0c01288. [DOI] [PubMed] [Google Scholar]
  16. Saini K., Ramanathan V.. Predicting odor from molecular structure: a multi label classification approach. Sci. Rep. 2022;12:13863. doi: 10.1038/s41598-022-18086-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Shirasu M., Yoshikawa K., Takai Y., Nakashima A., Takeuchi H., Sakano H., Touhara K.. Olfactory receptor and neural pathway responsible for highly selective sensing of musk odors. Neuron. 2014;81:165–178. doi: 10.1016/j.neuron.2013.10.021. [DOI] [PubMed] [Google Scholar]
  18. Xu H., Kitai K., Minami K., Nakatsu M., Yoshikawa G., Tsuda K., Shiba K., Tamura R.. Determination of quasi primary odors by endpoint detection. Sci. Rep. 2021;11:12070. doi: 10.1038/s41598-021-91210-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Jumper J., Evans R., Pritzel A.. et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Mirdita M., Schütze K., Moriwaki Y.. et al. ColabFold: making protein folding accessible to all. Nat. Methods. 2022;19:679–682. doi: 10.1038/s41592-022-01488-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Varadi M., Anyango S., Deshpande M.. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein sequence space with high accuracy models. Nucleic Acids Res. 2022;50:D439–D444. doi: 10.1093/nar/gkab1061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Dravnieks, A. Atlas of Odor Character Profiles; ASTM: Philadelphia, 1985. [Google Scholar]
  23. https://pubchem.ncbi.nlm.nih.gov/ (accessed on June 30, 2025).
  24. O’Boyle N. M., Banck M., James C. A., Morley C., Vandermeersch T., Hutchison G. R.. Open Babel: An open chemical toolbox. J. Cheminf. 2011;3:33. doi: 10.1186/1758-2946-3-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Gasteiger J., Marsili M.. Iterative partial equalization of orbital electronegativity: a rapid access to atomic charges. Tetrahedron. 1980;36:3219–3228. doi: 10.1016/0040-4020(80)80168-2. [DOI] [Google Scholar]
  26. van der Maaten L., Hinton G. E.. Visualizing Data Using t-SNE. J. Mach. Learn. Res. 2008;9:2579–2605. [Google Scholar]
  27. Snitz K., Yablonka A., Weiss T., Frumin I., Khan R. M., Sobel N.. Predicting Odor Perceptual Similarity from Odor Structure. PLoS Comput. Biol. 2013;9:e1003184. doi: 10.1371/journal.pcbi.1003184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Ravia A., Snitz K., Honigstein D., Finkel M., Zirler R., Perl O., Secundo L., Laudamiel C., Harel D., Sobel N.. A measure of smell enables the creation of olfactory metamers. Nature. 2020;588:118–123. doi: 10.1038/s41586-020-2891-7. [DOI] [PubMed] [Google Scholar]
  29. Weininger D.. SMILES, A Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988;28:31–36. doi: 10.1021/ci00057a005. [DOI] [Google Scholar]
  30. Morgan H. L.. The generation of a unique machine description for chemical structures: a technique developed at chemical abstracts service. J. Chem. Doc. 1965;5:107–113. doi: 10.1021/c160017a018. [DOI] [Google Scholar]
  31. http://www.rdkit.org (accessed on June 30, 2025).
  32. https://github.com/YoshitakaMo/localcolabfold (accessed on June 30, 2025).
  33. Billesbølle C. B., de March C. A., van der Velden W. J. C.. et al. Structural Basis of Odorant Recognition by a Human Odorant Receptor. Nature. 2023;615:742–749. doi: 10.1038/s41586-023-05798-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Choi C., Bae J., Kim S.. et al. Understanding the molecular mechanisms of odorant binding and activation of the human OR52 family. Nat. Commun. 2023;14:8105. doi: 10.1038/s41467-023-43983-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Le Guilloux V., Schmidtke P., Tuffery P.. Fpocket: An open source platform for ligand pocket detection. BMC Bioinf. 2009;10:168. doi: 10.1186/1471-2105-10-168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Kochnev Y., Durrant J. D.. FPocketWeb: Protein pocket hunting in a web browser. J. Cheminf. 2022;14:58. doi: 10.1186/s13321-022-00637-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Dolinsky T. J., Czodrowski P., Li H., Nielsen J. E., Jensen J. H., Klebe G., Baker N. A.. PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res. 2007;35:W522–W525. doi: 10.1093/nar/gkm276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Søndergaard C. R., Olsson M. H. M., Rostkowski M., Jensen J. H.. Improved Treatment of Ligands and Coupling Effects in Empirical Calculation and Rationalization of pKa Values. J. Chem. Theory Comput. 2011;7:2284–2295. doi: 10.1021/ct200133y. [DOI] [PubMed] [Google Scholar]
  39. Zhang Y., Skolnick J.. TM align: a protein structure alignment algorithm based on the TM score. Nucleic Acids Res. 2005;33:2302–2309. doi: 10.1093/nar/gki524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. McNutt A. T., Francoeur P., Aggarwal R., Masuda T., Meli R., Ragoza M., Sunseri J., Koes D. R.. GNINA 1.0: Molecular Docking with Deep Learning. J. Cheminf. 2021;13:43. doi: 10.1186/s13321-021-00522-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Trott O., Olson A. J.. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010;31:455–461. doi: 10.1002/jcc.21334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Sunseri J., Koes D. R.. Virtual Screening with GNINA 1.0. Molecules. 2021;26:7369. doi: 10.3390/molecules26237369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Aier I., Dubey N., Varadwaj P. K.. Structural dynamics of olfactory receptors: implications for odorant binding and activation mechanisms. J. Biomol. Struct. Dyn. 2025;Apr 17:1–12. doi: 10.1080/07391102.2025.2492235. [DOI] [PubMed] [Google Scholar]
  44. Berwal B., Saha P., Kumar R.. A Fully In Silico Protocol to Understand Olfactory Receptor-Odorant Interactions. ACS Omega. 2025;10:24030–24049. doi: 10.1021/acsomega.4c08181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Feher M., Schmidt J. M.. Property Distributions: Differences between Drugs, Natural Products, and Molecules from Combinatorial Chemistry. J. Chem. Inf. Comput. Sci. 2003;43:218–227. doi: 10.1021/ci0200467. [DOI] [PubMed] [Google Scholar]
  46. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html (accessed on June 30, 2025).
  47. Vögele M., Zhang B. W., Kaindle J., Wang L.. Is the Functional Response of a Receptor Determined by the Thermodynamics of Ligand Binding? J. Chem. Theory Comput. 2023;19:8414–8422. doi: 10.1021/acs.jctc.3c00899. [DOI] [PubMed] [Google Scholar]
  48. Ahmed L., Zhang Y., Block E.. et al. Molecular mechanism of activation of human musk receptors OR5AN1 and OR1A1 by (R)-muscone and diverse other musk-smelling compounds. Proc. Natl. Acad. Sci. U.S.A. 2018;115:E3950–E3958. doi: 10.1073/pnas.1713026115. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. https://github.com/YoshitakaMo/localcolabfold (accessed on June 30, 2025).

Supplementary Materials

ao5c04324_si_001.pdf (1.2MB, pdf)

Articles from ACS Omega are provided here courtesy of American Chemical Society

RESOURCES