Skip to main content
Food Chemistry: X logoLink to Food Chemistry: X
. 2025 Jun 10;29:102645. doi: 10.1016/j.fochx.2025.102645

Predicting the low-level and extremely low-threshold compounds in Baijiu: uniform manifold approximation and projection

Yintao Jia a,b,c, Yue Qiu a,b,c, Qi Deng a,b,c, Ying Han d, Baoguo Sun a,b,c, Rong Liu d, Pan Zhen d, Wenxian Li a,b,c, Wei Dong a,b,c,, Xiaotao Sun a,b,c,, Xiao Yang a,b,c, Fan Cui d
PMCID: PMC12192529  PMID: 40567578

Abstract

In the flavor analysis of Baijiu, the identification of compounds exhibiting high flavor dilution factors but displaying minimal or undetectable response signals in analytical measurements, remains a significant challenge in current research. To clearly elucidate the correlation between the structure of odorants and their odors in Baijiu, a “compound-aroma” database consisting of 646 compounds and their known associated odors (70 aroma descriptors) has been compiled. Each individual compound was coded with 1024-bit molecular fingerprints and analyzed by K-means and uniform manifold approximation and projection (UMAP). Moreover, by calculating the presence or the absence of odor molecular substructures, associations for both odor notes and chemical groups of the aroma compounds in Baijiu were revealed, and an indicative model diagram was constructed. Finally, a strong roasted odor of Baijiu, which presented at the retention index of 1324–1334 with no distinct response signal, has been successfully identified as 2,5-dimethylpyrazine or 2,6-dimethylpyrazine by using the aforementioned model.

Keywords: Baijiu, UMAP, Structure-odor correlations, Molecular fingerprints, Compounds prediction

Highlights

  • “Aroma-compound” database containing 646 compounds and 70 descriptors was compiled.

  • “Aroma-structure” model for Baijiu was constructed by UMAP and K-means.

  • Strong roasted odor of Baijiu, without distinct signal, was identified as 2,5- or 2,6-dimethylpyrazine.

1. Introduction

Baijiu, one of the most representative fermented foods in China, is a distilled alcoholic beverage composed mainly of ethanol, water (accounting for over 98 %), and a complex mixture of aroma-active compounds, including esters, aldehydes, acids, and other volatiles (Wang, Liu, et al., 2024). Unique solid-state fermentation and distillation processes are fundamental in determining its organoleptic characteristics (Wu et al., 2024). Owing to the intricate flavor attributes, more than 12 types of Baijiu have been established, and over 2700 compounds detected (Wang, Tang, et al., 2024). Meanwhile, the Molecular Sensory Science has been systematically developed for the analysis of the key aroma compounds in Baijiu, which includes, isolation of volatile compounds, qualitative analysis, quantitative analysis, odor activity value (OAV) calculations, aroma recombination and omission experiments (Dong et al., 2024). In light of this, 386 aroma-active compounds have been identified and characterized (Wang, Tang, et al., 2024). However, accurately identifying compounds present at low concentrations with extremely low aroma thresholds remains difficult. Particularly, in gas chromatography-olfactometry or gas chromatography-olfactometry-mass spectrometry (GC-O-MS) analyses, regions where the compounds were detectable only by olfaction but lacked corresponding chromatographic or mass spectrometric signals, researchers were compelled to rely on empirical inferences to hypothesize their potential chemical composition. For example, Duan et al. reported the identification of six key sulfur compounds in sauce-aroma Baijiu, starting with an empirical “Guess” based on their long-term experimental experience (Duan et al., 2024). Sha et al. identified sulfur-containing compounds in roasted sesame-flavored Baijiu that exhibited high flavor dilution factors but did not show prominent peaks in MS. The identification process were initially “Guessed” through data comparison and further validated using pulsed flame photometric detector (Sha et al., 2017). Despite these empirical “Guess” into odorant recognition, how aroma profiles interact with molecular configurations remains elusive. Therefore, the development of a model based on the correlation between aroma descriptors and molecular structures, which enables the scientific prediction of the chemical composition of unknown flavor regions rather than relying on empirical inference, has long represented a significant challenge in the advancement of Baijiu flavor science.

In recent years, machine learning has effectively been used in the analysis and prediction of food flavor (Zeng et al., 2023). Prediction of flavor is carried out using predictive models based on the molecular structures of compounds with available flavor profiles, which establish a correlation between the aroma and structure of the compounds. Prantar Dutta et al. predicted the taste of 3468 compounds, reporting that umami molecules are rich in nitrogen-containing functional groups, while bitter and sweet molecules have similar structures, demonstrating significant correlations with functional groups such as aldehydes, ketones, and carboxylic acids (Dutta et al., 2023). Additionally, the study developed a classification model that achieved favorable classification performance. The predictive performance of the model depends on the dataset containing a large number of effective molecules. However, the large volume of high-dimensional data generated during the process of model construction complicates the data analysis. Dimensionality reduction, as an unsupervised learning technique, provides a useful method to overcome this problem. It embeds high-dimensional data into a meaningful low-dimensional feature space while retaining important information and facilitating visualization. Additionally, this method can be used to analyze and discover potential structures from large datasets, which align closely with our research objectives. Existing dimensionality-reduction algorithms can be divided into two main categories. The first category is based on linear dimensionality reduction techniques, which focus on preserving the distance structure in high-dimensional space, including principal component analysis (PCA)(Ehiro, 2023), multidimensional scaling (MDS)(Herrera-Rocha et al., 2024), and non-negative matrix factorization (Zeng et al., 2024). The second category is based on non-linear dimensionality reduction techniques, having the ability to learn the local or global structure of non-linear manifolds with well-known algorithms, such as t-distributed stochastic neighbor embedding (t-SNE)(Marx, 2024), locally linear embedding (Roweis & Saul, 2000) and self-organizing maps (Wen et al., 2024). The new algorithm called uniform manifold approximation and projection (UMAP) published in recent years claims to preserve as much of the local and more of the global data structures than other techniques that are currently available (Becht et al., 2019). For instance, an effective correlations between the aroma profiles and chemical structures of compounds in food matrix have been established by Marylène et al., using UMAP combined with K-means and AHC, which indicated that the “woody” and “spicy” odors were linked to allylic and bicyclic structures (Rugard et al., 2021). However, to the best of our knowledge, the application of UMAP in Baijiu flavor research to predict the correlation between aroma and structure has not yet been reported.

In the present study, we propose a novel approach to investigate the correlation between the molecular structure and aroma profile in Baijiu, consisting of the following steps: (i) Compilation of aroma compounds and their corresponding descriptors from the publicly accessible aroma databases “Flavornet and Human Odor Space,” “Web of Science,” and “China National Knowledge Infrastructure (CNKI)”; (ii) Calculation and encoding of molecular structures utilizing KNIME software; (iii) Application of UMAP in conjunction with K-means clustering techniques to analyze the correlation between aroma, molecular structure, and compounds in Baijiu, followed by the construction of a predictive model; (iv) Identification of compounds detectable only by olfaction but lack any other corresponding chromatographic or mass spectrometric signals in actual Baijiu samples using the predictive model so constructed.

2. Materials and methods

2.1. Data collection

The odor-compound datasets were compiled in this study, which includes “aroma compounds from Flavornet and Human Odor Space” (Flavornet Home Page), as well as “aroma compounds with OAV higher than 1 in Baijiu, collected from CNKI and Web of Science”. Due to matrix effects and interactions in food that may alter odors, the odor profiles of compounds with an OAV > 1 in Baijiu were used to refine and supplement the descriptors collected from websites, aiming to establish a more accurate relationship between “odor” and “structure”. Additionally, some odor descriptors collected were vague and inconsistent, thus, the manual screening process was employed. Specifically, several descriptors that indicate the degree or are used for modification (e.g., “pleasant,” “warm,” etc.) were excluded. Descriptors with identical meanings were standardized to ensure consistency. For example, “apple-like” was simplified to “apple” and “grassy” was transformed into “grass.” Ultimately, only those aroma descriptors occurring more than five times were extracted and compiled (Tromelin et al., 2018). It is noteworthy that this study differs from previous research on the elucidation of odor-structure correlations, as the aroma descriptions of compounds were revised based on the aromas of substances with OAV greater than 1 in Baijiu. The detailed process of database construction is illustrated in Fig. 1.

Fig. 1.

Fig. 1

Construction process of the “aroma-compound” database.

2.2. Computation and encoding of molecular structures by KNIME

Molecular structure calculations and encoding were performed using KNIME software (v 5.1.1.), an open-source workflow platform that supports a wide range of functionalities and is supported by an active cheminformatics community, along with a variety of available plugins (Beisken et al., 2013). The Simplified Molecular-Input Line-Entry System (SMILES) strings of the compounds were used as inputs in the workflow, and various open-source plugins were employed to calculate the chemical features of the molecules (Sharma et al., 2021). Specifically, the molecular structures of the compounds were computed using the “RDKit from molecule” plugin, followed by the identification and analysis of functional groups. The “RDKit Fingerprint” plugin was used to encode the compounds as Extended-connectivity fingerprints (ECFP). ECFP effectively represents the presence or absence of substructures within a specified radius of small molecules and encodes them as binary data, where 1 indicates the presence and 0 indicates the absence (Capecchi et al., 2020). To achieve optimal results, the following parameters were configured: radius = 2, allowing the acquisition of Extended-connectivity fingerprints with a diameter of 4 (ECFP4), and bits number = 1024, the suitability of which has been demonstrated by prior studies (Probst & Reymond, 2020; Rugard et al., 2021).

2.3. Dimension reduction from the 1024-bit fingerprints

To facilitate visualization and obtain low-dimensional representations, four dimensionality reduction techniques, namely PCA, MDS, t-SNE, and UMAP, were applied to the 1024-bit encoded molecular structures. While PCA, MDS, and t-SNE methods were carried out using the R (v 4.3.1) software, UMAP algorithm was implemented within the Jupyter Notebook environment using the umap-learn library in Python. Further, PCA was performed using the PCA function included in the FactoMineR package in R. Two principal components were extracted from the analysis to enable the intuitive visualization of the data within a two-dimensional space. The dimensionality reduction of MDS was implemented using the stats package in R software. In contrast to PCA, MDS focuses on preserving the similarity or distance correlations between the original data points, upon which a low-dimensional space representation is constructed. To achieve this, the dist function was first utilized to accurately calculate the Euclidean distances between data points, resulting in the construction of a distance matrix. Subsequently, the cmdscale function was applied to this distance matrix for dimensionality reduction, effectively mapping the high-dimensional data into a lower-dimensional space, while preserving the relative positional information among the data points with reasonable accuracy. Implementation of the t-SNE algorithm was primarily facilitated using the Rtsne package. During the computation process, the Kullback-Leibler (KL) divergence between the original space and the low-dimensional space is minimized, thereby reflecting the similarity between data points. To ensure that the algorithm had sufficient time to converge to a stable low-dimensional representation, the maximum number of iterations was set to 1000. Further, the theta and perplexity parameters were maintained at 0.5 and 216 respectively, thereby influencing the execution of the algorithm and determining the shape of the final cluster. The superior ability of UMAP to preserve both detailed local and global structures can be attributed to the fine-tuning of its hyperparameters, which play a crucial role in determining the quality of the resulting embedding. Critical libraries, including NumPy, Pandas, and UMAP, were imported and utilized to establish a framework for data analysis in the UMAP down-scaling process. The dataset was then loaded, and the UMAP model was set up for initialization. Specifically, the number of neighbors (n_neighbors) = 15 signifies that during the construction of local neighborhoods, each data point considers 15 nearest neighbors. The minimum distance (min_dist) = 0.1 indicates the minimum distance between data points in the low-dimensional space, preventing data points from losing separability owing to overcrowding and making it possible to clearly distinguish different data points even after dimensionality reduction. Moreover, due to the binary or categorical nature of molecular fingerprint data, metric = “jaccard” was chosen as the metric for the distance between points, which accurately reflects the similarity among binary data.

2.4. Clustering and visualization

To facilitate a more comprehensive analysis of dimensionality-reduced data, K-means clustering was used to group structurally similar compounds (Rugard et al., 2021). This method enables the identification of underlying patterns within the data that are not immediately apparent in the raw, high-dimensional feature space. The clustering results were subsequently visualized to highlight the groupings and structures within the data. Clustering analysis was implemented in the R software using the K-means algorithm. The “cluster” and “factoextra” packages were the primary packages used for clustering and visualization, respectively. In this approach, a distance matrix is produced by calculating the Euclidean distances between observations. Prior to performing K-means clustering, the optimal number of clusters was determined using the Kelly-penalty function, which balances the trade-off between the number of clusters and the model fit, and the number of clusters corresponding to the minimum score of the function was selected as the optimal number for the analysis (Li et al., 2023).

2.5. Establishment of the “aroma-structure” model

To elucidate the correlation between the aroma and structure of the compounds, a meticulous analysis was carried out on the primary components within each cluster after the clustering. The analysis involves three pivotal steps: (1) Initial assessment of the aroma distribution patterns within the clusters, (2) Detailed exploration of the structural features inherent in the clusters, (3) Establishment of a correlation between aroma and structure through the application of rigorous statistical analysis methodologies.

The analysis of clustered aroma components primarily focused on the representative aromas within each cluster examined from two distinct perspectives. The distribution frequency of a specific aroma within the various clusters was calculated based on its total occurrence in the entire database (A%), and for individual clusters, the percentage of a particular aroma in a cluster was calculated based on the total number of elements (aromatic compounds) in that cluster (B%). To facilitate this analysis, the following equations were used (Rugard et al., 2021):

A%=CD (1)
B%=EF (2)

where parameters C represents the frequency of occurrence of aroma within a cluster, the parameter D represents the total frequency of aroma occurrences in the dataset, the parameter E represents the frequency of occurrence of aroma within a cluster, the parameter F represents the total number of elements (compounds) in the cluster.

For example, by employing the UMAP dimensionality-reduction technique in conjunction with the K-means clustering method, the dataset was partitioned into four distinct clusters. Within cluster C1, a total of 256 odor compounds were identified, with the aroma of “fruit” occurring 108 times specifically within C1 and 166 times across the entire dataset. Using Eqs. (1), (2), parameters A and B are calculated to be as follows:

A%=108256=42.19% (3)
B%=108166=65.06% (4)

From the above equations, it may be observed that 42.19 % of the fruity aroma molecules were distributed in C1, accounting for 65.06 % of the total compounds in C1. The distribution of the structures within different clusters was calculated in the same manner.

2.6. Application on samples of actual baijiu

2.6.1. Sampling and sample preparation

Light-aroma types of Baijiu (LAB) was used in this study, which was provided by XingHuaCun Fenjiu Group Co., Ltd. (Shanxin, China). Baijiu samples were diluted with ultrapure water to 15 % ethanol (v/v), and NaCl was added for saturation. The diluted sample was then transferred to a separatory funnel and extracted thrice with CH2Cl2. The organic phase was combined and anhydrous Na2SO4 was added for overnight drying. The solution was concentrated to 500 μL under a gentle stream of nitrogen and subsequently evaluated by GC-O-MS.

2.6.2. GC-O-MS analysis

The instrument parameters for GC-O-MS were set based on previous reports and subsequently modified (Dong et al., 2018). GC-O-MS analyses were performed using the Agilent 7890B gas chromatograph equipped with an Agilent 5977 A mass selective detector (MSD) and a sniff port (ODP3, Gerstel, Germany). The samples were analyzed on a DB-FFAP column (60 m × 0.25 mm i.d., 0.25 μm film thickness; J&W Scientific, USA). The temperature of olfactory port was set to 250 °C, the ion source temperature to 230 °C, and the transmission line temperature to 250 °C. Mass spectra were acquired in electron ionization (EI) mode at 70 eV. The injection volume was 1 μL, and helium was used as the carrier gas. Aroma compound identification was carried out in full scan mode, with a mass range of 35 to 400 amu. The oven temperature for the DB-FFAP columns was initially increased from 40 °C to 50 °C (held for 6 min) at a rate of 10 °C/min, followed by an increase from 50 °C to 80 °C (held for 6 min) at 3 °C/min, and then further increased at a rate of 5 °C/min until reaching 235 °C (held for 10 min).

Olfactometric analysis was performed by three trained panelists (two females and one male) who were part of the Beijing Key Laboratory of Flavor Chemistry at Beijing Technology and Business University. During GC analysis, each panelist independently evaluated the same sample using a sniff port. The aroma attributes, retention time, and intensity of the odor types present in the sample extract were recorded. To ensure accuracy and minimize the risk of overlooking or misidentifying individual odor-active compounds, only odorants detected by a minimum of two assessors were documented. Each panelist sniffed each extract in triplicate.

3. Result and discussion

3.1. Data sets

The construction of the “odor-compound” database is crucial for the subsequent analysis of the “aroma-structure” correlation. In this study, a total of 646 aroma compounds and 70 aroma descriptors, with 530 compounds in the open access aroma database and 149 compounds in Baijiu with OAV > 1 have been compiled. Table 1 presents the compounds with the OAV greater than 1 in Baijiu, along with their descriptions. From these results, it can be observed that the number of compound types significantly exceeded the aroma descriptors, suggesting that the same aroma can be generated by different compounds. A single compound may also exhibit multiple aroma descriptors. Fig. S1 A illustrates that most compounds have 1–3 descriptive terms. Notably, four compounds, “butyric acid,” “2-undecanone,” “2-Methyl-1-propanol,” and “Guaiacol” generate the highest number of descriptive terms of up to nine. In Fig. S1 B, the horizontal axis represents the frequency of the specific descriptive terms, and the vertical axis indicates the number of compounds that generate a specific number of these terms. As can be observed from the Table S1, the frequency of aroma descriptive terms ranges from 5 to 166, and the three aroma descriptors “fruit,” “flower,” and “sweet” have the highest frequencies of occurrence. These are also important aroma attributes in the flavor profile of Baijiu, especially in light aroma-type Baijiu, where floral and fruity notes are the main flavor characteristics (Li et al., 2023; Sun et al., 2022).

Table 1.

Significant aromatic compounds in Baijiu with OAV > 1.

No. CAS Aroma compounds Descriptor OAV SMILES
1 109-60-4 propyl acetate fruit 2 CCCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C
2 25415-67-2 ethyl 4-methylvalerate fruit 59–872 CCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)CCC(C)C
3 105-79-3 isobutyl hexanoate fruit 1–2 CCCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCC(C)C
4 687-47-8 ethyl L (-)-lactate fruit 6.2 CCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C(C)O
5 3842-03-3 Isovaleraldehyde diethyl acetal, (1,1-diethoxy-3-methylbutane) fruit 2 CCOC(CC(C)C)OCC
6 51115-64-1 2-methylbutyl butyrate fruit, flower 0.7–43 CCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCC(C)CC
7 106-27-4 isoamyl butyrate fruit, flower 1–120 CCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCCC(C)C
8 142-92-7 hexyl acetate fruit, flower 1–4 CCCCCCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C
9 6290-37-5 2-phenylethyl hexanoate fruit, flower 2–19 CCCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCCC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C1
10 23726-91-2 damascone apple, flower, fruit 4–257.1 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C1 Created by potrace 1.16, written by Peter Selinger 2001-2019 C(CCCC1(C)C)C
11 103-82-2 phenylacetic acid flower, honey, fruit 3 C1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)O
12 111-87-5 1-octanol fruit, flower, grass 0–2.2 CCCCCCCCO
13 106-30-9 ethyl heptanoate fruit, pineapple, flower 1–43.8 CCCCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCC
14 78-70-6 linalool flower, fruit, lavender, acid 13.4 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 CCCC(C)(C Created by potrace 1.16, written by Peter Selinger 2001-2019 C)O)C
15 122-97-4 3-phenyl-1-propanol anise, cinnamon, fruit, flower 4.1 C1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)CCCO
16 23696-85-7 β-damascenone fruit, flower, sweet, honey 164–829 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C1 Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 CCC1(C)C)C
17 123-92-2 isoamyl acetate banana, fruit, flower, sweet 7–1000 CC(C)CCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C
18 101-97-3 ethyl phenylacetate sweet, fruit, flower, rose, honey 1–4.1 CCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)CC1  Created by potrace 1.16, written by Peter Selinger 2001-2019  CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C1
19 105-54-4 ethyl butyrate fruit, apple, pineapple, flower, sweet 7–1000 CCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCC
20 111-27-3 1-hexanol resin, flower, green, fruit, grass, nut 1–10 CCCCCCO
21 103-45-7 phenethyl acetate honey, rose, tobacco, fruit, flower, sweet 1–8.6 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCCC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C1
22 141-78-6 ethyl acetate pineapple, fruit, apple, flower, sweet, faint scent 2–99.3 CCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C
23 123-66-0 ethyl hexanoate apple, fruit, banana, flower, sweet, alcohol, cellar 10–197.4 CCCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCC
24 123-51-3 3-methyl-1-butanol burnt, malt, whiskey, fruit, flower, sweet, nail polish, mildew, empyreumatique, bitter 1–11 CC(C)CCO
25 106-33-2 ethyl laurate leaf, fruit, flower, sweet, faint scent, acid, walnut 1–8 CCCCCCCCCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCC
26 100-52-7 benzaldehyde almond, caramel, fruit, flower, nut, cherry 0–5 C1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)C Created by potrace 1.16, written by Peter Selinger 2001-2019 O
27 110-38-3 ethyl caprate grape, fruit, pineapple, pine, wax, flower, sweet 1–6.3 CCCCCCCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCC
28 2021-28-5 ethyl 3-phenylpropionate flower, fruit, pineapple, rose, sweet, honey, alcohol 1–108.8 CCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)CCC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C1
29 71-23-8 1-propanol fruit, flower, yeast, alcohol, grass, green, solvent, bitter 2–14.2 CCCO
30 106-32-1 ethyl caprylate fat, fruit, pear, litchi, sesame, almond, oil, balsamic, flower, alcohol 4–1000 CCCCCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCC
31 66840-71-9 DMST fruit, sweet 7285.2 CC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)NS( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)N(C)C
32 97-62-1 ethyl isobutyrate rubber, fruit, sweet, yeast, pungent 1–65.7 CCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C(C)C
33 539-82-2 ethyl valerate fruit, apple, strawberry, sweet, yeast, mold culture 0.1–1000 CCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCC
34 124-07-2 octanoic acid sweet, cheese, sweat, fruit, oil, balsamic, acid, rancid 1–1.2 CCCCCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)O
35 108-64-5 ethyl isovalerate fruit, apple, pineapple, sweet, banana, alcohol, water smell 10.9–1000 CCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)CC(C)C
36 112-12-9 2-undecanone fresh, green, orange, fruit, sweet, oil, balsamic, cream, citrus 2.9 CCCCCCCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C
37 78-83-1 2-methyl-1-propanol bitter, solvent, wine, fruit, sweet, alcohol, pine, malt, mildew, rubber 1–17.6 CC(C)CO
38 539-88-8 ethyl levulinate fruit, apple 2–22 CCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)CCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C
39 6378-65-0 hexyl hexanoate apple, peach, fruit 3–34 CCCCCCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)CCCCC
40 110-19-0 isobutyl acetate apple, banana, fruit, rum 1.5–2.3 CC(C)COC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C
41 7452-79-1 ethyl 2-methylbutyrate apple, fruit, pineapple, sweet 5–1000 CCC(C)C( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCC
42 513-85-9 2,3-butanediol fruit, onion 4–6 CC(C(C)O)O
43 111-35-3 3-ethoxy-1-propanol fruit, alcohol 20.6 CCOCCCO
44 543-49-7 2-heptanol mushroom, fruit 4 CCCCCC(C)O
45 7789-92-6 1,1,3-triethoxypropane fruit, earth, green, vegetable, mushroom 0–2.6 CCOCCC(OCC)OCC
46 107-87-9 2-pentanone ether, fruit 2 CCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C
47 75-07-0 acetaldehyde ether, malt, pungent, fruit, grass, cream, aldehyde, bran 9.5–84 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 O
48 97-64-3 ethyl lactate fruit, grass 1–10 CCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C(C)O
49 105-57-7 acetal cream, fruit, grass, vegetable 24–1000 CCOC(C)OCC
50 590-86-3 3-methylbutanal fruit, grass, green, malt, cocoa 2.3–10,000 CC(C)CC Created by potrace 1.16, written by Peter Selinger 2001-2019 O
51 3777-69-3 2-pentylfuran butter, green, bean, fruit, grass, cream, earth, vegetable 3 CCCCCC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CO1
52 51755-83-0 3-mercapto-1-hexanol sulfur, fruit 1443–10,058 CCCC(CCO)S
53 78-92-2 2-butanol wine, fruit, alcohol, nut, yeast, malt, solvent 1.9–1000 CCC(C)O
54 628-63-7 amyl acetate fruit, banana 1.2 CCCCCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C
55 105-37-3 ethyl propionate fruit, banana, nail polish 1–34.1 CCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCC
56 71-36-3 1-butanol fruit, medicine, banana, alcohol, oil, balsamic, pungent, solvent 1–10 CCCCO
57 628-97-7 palmitic acid ethyl ester fruit, cream 2–53 CCCCCCCCCCCCCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCC
58 103-36-6 ethyl cinnamate cinnamon, honey, fruit 20.3–28.0 CCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C Created by potrace 1.16, written by Peter Selinger 2001-2019 CC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C1
59 103-52-6 phenethyl butyrate flower, rose 314 CCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCCC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C1
60 689-67-8 6,10-dimethyl-5,9-undecadien-2-one flower, sweet 2 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 CCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 CCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C)C)C
61 111-13-7 2-octanone flower, soap 1–10 CCCCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C
62 821-55-6 2-nonanone flower, cream 1–12.3 CCCCCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C
63 23726-93-4 beta-damascenone apple. Rose, honey, flower 93 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C1 Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 CCC1(C)C)C
64 121-33-5 vanillin flower, sweet, vanilla 0–10 COC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)C Created by potrace 1.16, written by Peter Selinger 2001-2019 O)O
65 79-77-6 β-lonone seaweed, flower, raspberry, violet 1–3.5 CC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C(CCC1)(C)C)C Created by potrace 1.16, written by Peter Selinger 2001-2019 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C
66 143-08-8 1-nonanol fat, green, flower, oil, grass, citrus 3.8–10 CCCCCCCCCO
67 104-61-0 gamma-nonanolactone coconut, peach, flower, sweet, cream, nut 0–27.5 CCCCCC1CCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)O1
68 122-78-1 phenylacetaldehyde sweet, hawthorne, honey, flower, rose, sweet 1–19 C1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)CC Created by potrace 1.16, written by Peter Selinger 2001-2019 O
69 60-12-8 phenethyl alcohol honey, lilac, spice, flower, rose, sweet, Chinese rose 1–2.7 C1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)CCO
70 96-48-0 gamma butyrolactone sweet, caramel 1–18 C1CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OC1
71 4466-24-4 2-butylfuran empyreumatique, sweet 464–1167 CCCCC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CO1
72 13529-27-6 2-furaldehyde diethyl acetal sweet, pine, spice 7 CCOC(C1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CO1)OCC
73 513-86-0 acetoin butter, cream, sweet, acid 1–195 CC(C( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C)O
74 620-02-0 5-methyl furfural almond, caramel, caramel, sweet 2.2 CC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C(O1)C Created by potrace 1.16, written by Peter Selinger 2001-2019 O
75 98-00-0 furfuryl alcohol burnt, sweet, caramel, empyreumatique, alcohol 1–44 C1 Created by potrace 1.16, written by Peter Selinger 2001-2019 COC( Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)CO
76 109-52-4 valeric acid sweet, cream, acid, rancid, sweat, cellar, earth 1–49.9 CCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)O
77 2785-89-9 4-ethyl-2-methoxyphenol sweet, spice, clove, medicine, smoke, pungent 1–10.8 CCC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)O)OC
78 79-31-2 isobutyric acid sweet, butter, cheese, rancid, acid, sweat, pungent 0–10 CC(C)C( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)O
79 98-01-1 furfural sweet, almond, bread, pine, nut, almond, burnt odor, potato, caramel, bran, roast, aging aroma 1–11.2 C1 Created by potrace 1.16, written by Peter Selinger 2001-2019 COC( Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)C Created by potrace 1.16, written by Peter Selinger 2001-2019 O
80 2463-53-8 2-nonenal paper, green 255 CCCCCCC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 O
81 2198-61-0 isoamyl hexanoate grass, green 1–71 CCCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCCC(C)C
82 557-48-2 (E, Z)-2,6-nonadienal cucumber, green, wax 10–66 CCC Created by potrace 1.16, written by Peter Selinger 2001-2019 CCCC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 O
83 66-25-1 hexanal pine, grass, green 0–133 CCCCCC Created by potrace 1.16, written by Peter Selinger 2001-2019 O
84 78-84-2 isobutyraldehyde green, pungent, grass, malt 1.2 CC(C)C Created by potrace 1.16, written by Peter Selinger 2001-2019 O
85 18829-56-6 trans-nonenal cucumber, fat, green, oil, balsamic 3–22 CCCCCCC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 O
86 124-13-0 Octanal fat, green, lemon, soap, honey, orange 130 CCCCCCCC Created by potrace 1.16, written by Peter Selinger 2001-2019 O
87 124-19-6 1-nonannal fat, oil, balsamic, grass, soap, wax 0–354 CCCCCCCCC Created by potrace 1.16, written by Peter Selinger 2001-2019 O
88 111-71-7 heptaldehyde citrus, cocoa, fat, medicine, rancid, grass 66 CCCCCCC Created by potrace 1.16, written by Peter Selinger 2001-2019 O
89 107-92-6 butyric acid cheese, fat, rancid, sweat, oil, balsamic, acid, cellar, earth, pungent 1–56.0 CCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)O
90 97-53-0 eugenol spice, smoke 21 COC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C)O
91 28664-35-9 4,5-dimethyl-3-hydroxy-2,5-dihydrofuran-2-one cotton candy, maple, spice, caramel, herb 5 CC1C( Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)O1)O)C
92 90-05-1 guaiacol oil, balsamic, pine, spice, clove, medicine, grain, smoke, pungent 1–10 COC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C1O
93 76-49-3 bornyl acetate pine, herb 230 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OC1CC2CCC1(C2(C)C)C
94 3913-81-3 3-heptylacrolein oil, balsamic 34 CCCCCCCC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 O
95 2548-87-0 (E)-2-octenal oil, balsamic 15–11,515 CCCCCC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 O
96 107-03-9 1-propanethiol boiled egg, garlic 25–26 CCCS
97 111-14-8 heptanoic acid oil, balsamic, sweat 1–2 CCCCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)O
98 25152-84-5 trans, trans-2,4-decadien-1-al oil, balsamic, chicken, cucumber 3391 CCCCCC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 O
99 64-19-7 acetic acid acid, oil, balsamic, rancid, pungent 1–7.5 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)O
100 112-31-2 decyl aldehyde orange, soap, tallow, oil, balsamic 3.6–10 CCCCCCCCCC Created by potrace 1.16, written by Peter Selinger 2001-2019 O
101 503-74-2 Isovaleric acid acid, rancid, sweat, oil, balsamic, cream, pungent, sauce 1–10 CC(C)CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)O
102 123-07-9 4-ethylphenol smoke, animal, pungent 1–4 CCC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)O
103 3658-80-8 dimethyl trisulfide cabbage, fish, sulfur, pungent, onion, ether, pickles, gas odor, cabbage 1–138.7 CSSSC
104 1124-11-4 Tetramethylpyrazine nut 2 CC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 C(N Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C( Created by potrace 1.16, written by Peter Selinger 2001-2019 N1)C)C)C
105 16630-66-3 methyl(methylthio)acetate nut, potato 15–23 COC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)CSC
106 1534-08-3 (s)-methylthioacetate nut, potato 9 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)SC
107 13925-03-6 2-ethyl-6-methylpyrazine roast, nut, potato 23–1923 CCC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 NC( Created by potrace 1.16, written by Peter Selinger 2001-2019 CN Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)C
108 108-50-9 2,6-dimethylpyrazine roast, beef, nut 2–15 CC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CN Created by potrace 1.16, written by Peter Selinger 2001-2019 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 N1)C
109 124-06-1 ethyl myristate ether, yeast, coconut, sauce 0.1–9.7 CCCCCCCCCCCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)OCC
110 14667-55-1 trimethyl-pyrazine roast, nut, peanuts, peppers, coffee 2–225 CC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CN Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C( Created by potrace 1.16, written by Peter Selinger 2001-2019 N1)C)C
111 100-53-8 benzyl mercaptan roast 487–12,538 C1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)CS
112 13678-68-7 furfuryl thioacetate roast, sulfur 3–22 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)SCC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CO1
113 5405-41-4 ethyl 3-hydroxybutyrate marshmallow, alcohol, roast, grass, solvent 6.6 CCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)CC(C)O
114 98-02-2 2-furylmethanethiol roast, sesame 135 C1 Created by potrace 1.16, written by Peter Selinger 2001-2019 COC( Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)CS
115 136954-20-6 3-mercaptohexyl acetate sulfur 327–377 CCCC(CCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C)S
116 74-93-1 methyl mercaptan gasoline, garlic, sulfur, cabbage, rubber 273 CS
117 624-92-0 dimethyl disulfide rancid, cabbage, onion, aging aroma, sulfur 4–5 CSSC
118 106-36-5 propyl propionate pineapple, solvent 2 CCCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)CC
119 16423-19-1 (+/−)-geosmin earth 10 CC1CCCC2(C1(CCCC2)O)C
120 19700-21-1 geosmin beet, earth 6–64 CC1CCCC2(C1(CCCC2)O)C
121 3391-86-4 1-octen-3-ol mushroom, grass, earth, grain 1–1000 CCCCCC(C Created by potrace 1.16, written by Peter Selinger 2001-2019 C)O
122 96-76-4 2,4-di-tert-butylphenol smoke, lemon 2–6 CC(C)(C)C1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)O)C(C)(C)C
123 108-95-2 phenol phenolic resin, smoke 2 C1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)O
124 7786-61-0 4-hydroxy-3-methoxystyrene curry, clove, smoke 3.6 COC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)C Created by potrace 1.16, written by Peter Selinger 2001-2019 C)O
125 93-51-6 2-methoxy-4-methylphenol pine, smoke, phenol, medicine, sauce 1–9.3 CC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)O)OC
126 106-44-5 p-cresol medicine, phenol, smoke, medicine, animal, cellar, earth, stable 1–1.9 CC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)O
127 28588-74-1 2-methyl-3-furanthiol medicine, empyreumatique 33–135 CC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 CO1)S
128 79-09-4 propionic acid acid, rancid, sweat 1–2.5 CCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)O
129 646-07-1 4-methylvaleric acid acid, sweat, rancid 4.9 CC(C)CCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)O
130 142-62-1 hexanoic acid cream, acid, rancid, sweat, animal 1–6 CCCCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)O
131 57-55-6 1,2-propane diol alcohol 10–100 CC(CO)O
132 10348-47-7 ethyl 2-hydroxy-4-methylvalerate fresh 1–16 CCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C(CC(C)C)O
133 124-76-5 isoborneol camphor 3 CC1(C2CCC1(C(C2)O)C)C
134 83-34-1 3-methylindole camphor, fecal, cellar 8 CC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CNC2 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C12
135 75-18-3 dimethyl sulfide cabbage, onion 4–14 CSC
136 489-41-8 (−)-globulol pine 2 CC1CCC2C1C3C(C3(C)C)CCC2(C)O
137 75-08-1 ethanethiol onion, rubber 94 CCS
138 30899-19-5 3-methylbutanol nail polish, malt, rancid, mildew 1–3.7 CCCCCO
139 13184-86-6 4-[ethoxymethyl]-2-methoxyphenol cocoa, vanilla 5 CCOCC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 C(C Created by potrace 1.16, written by Peter Selinger 2001-2019 C1)O)OC
140 3268-49-3 3-(methylthio)propionaldehyde potato 10 CSCCC Created by potrace 1.16, written by Peter Selinger 2001-2019 O
141 431-03-8 2,3-butanedione butter 4–80 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)C
142 96-17-3 2-methylbutyraldehyde mildew 1–512 CCC(C)C Created by potrace 1.16, written by Peter Selinger 2001-2019 O
143 505-10-2 3-methylthiopropanol salty 1–35.8 CSCCCO
144 2639-63-6 hexyl butyrate fruit 1–38 CCCCCCOC( Created by potrace 1.16, written by Peter Selinger 2001-2019 O)CCC
145 138-86-3 limonene fruit 1–6.4 CC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CCC(CC1)C( Created by potrace 1.16, written by Peter Selinger 2001-2019 C)C
146 91-20-3 naphthalene green, pungent, empyreumatique 1.0 C1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 C2C Created by potrace 1.16, written by Peter Selinger 2001-2019 CC Created by potrace 1.16, written by Peter Selinger 2001-2019 CC2 Created by potrace 1.16, written by Peter Selinger 2001-2019 C1
147 2437-95-8 α-pinene turpentine 61 CC1 Created by potrace 1.16, written by Peter Selinger 2001-2019 CCC2CC1C2(C)C
148 123-35-3 β-myrcene pine, grass 13 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 CCCC( Created by potrace 1.16, written by Peter Selinger 2001-2019 C)C Created by potrace 1.16, written by Peter Selinger 2001-2019 C)C
149 515-13-9 β-elemene fennel 6 CC( Created by potrace 1.16, written by Peter Selinger 2001-2019 C)C1CCC(C(C1)C( Created by potrace 1.16, written by Peter Selinger 2001-2019 C)C)(C)C Created by potrace 1.16, written by Peter Selinger 2001-2019 C

The compounds listed in the table represent significant aroma compounds in Baijiu with OAV greater than 1, collected from various literature sources. The asterisk (*) denotes unique terms found exclusively in the “Flavornet Home Page” aroma database.

3.2. Dimension reduction, clustering and visualization of the data

The chemical structures of all 646 compounds were represented by 1024-bit molecular fingerprints, which were encoded using the KNIME software. To facilitate data visualization, the K-means clustering algorithm was combined with four-dimensionality reduction techniques, enabling the representation of data in a two-dimensional coordinate system. The Kelly penalty function is introduced as an accurate tool for calculating the optimal number of clusters. In the Fig. S2 presented, the “Kelly penalty score” trends for the four dimensionality reduction methods all exhibit a pattern of initial decline followed by a subsequent rise, with the lowest point on the horizontal axis corresponding to the optimal number of clusters (Kelley et al., 1996). The findings indicated that the ideal cluster count for t-SNE, PCA, and MDS was five, whereas for UMAP, the optimal number of clusters was four. Fig. 2 displays the results of dimensionality reduction clustering for several methods.

Fig. 2.

Fig. 2

The clustering outcomes of two-dimensional data generated from various dimensionality reduction methods applied to compounds. Specifically, A, B, C, and D represent the results of combining PCA, MDS, t-SNE, and UMAP dimensionality reduction techniques with K-means clustering, respectively.

Based on the colour distribution within each cluster, it was evident that the resulting visualizations generated by PCA, MDS, and t-SNE exhibited closely interconnected neighboring clusters. Compared to other dimensionality reduction techniques, UMAP produces denser, more compact clusters and allocates a greater blank space between different clusters (Fig. 2D), indicating its superiority in data separation. This observation is consistent with the reports from other studies (Kobak & Linderman, 2021; Probst & Reymond, 2020; Rabasovic et al., 2023). This is due to the special algorithm and adjustable parameters of UMAP, whereby high-dimensional data points that are already close together become closer in two dimensions following dimensionality reduction, allowing space to distinguish between distinct groups.

The most significant outcome of this study is the ability to group similar structures/aromas into the same cluster using effective dimensionality reduction and clustering methods. Consequently, to evaluate the efficacy of the odor-recognition techniques employed, we calculated the aroma distribution within the clusters generated by various dimensionality reduction and clustering approaches. If a dimensionality reduction technique can group more than 50 % of the compounds of a specific aroma into the same cluster, it is considered to have good classification performance (Rugard et al., 2021). Therefore, we calculated the number of aromas for which each dimensionality reduction clustering technique grouped more than 50 % of the compounds into the same cluster, a higher number indicating a better classification performance of the dimensionality reduction technique. From Fig. S3, it can be observed that the number of aromas where more than 50 % of the compounds were grouped into the same cluster was 34 for both PCA and MDS dimensionality reduction techniques, 37 for t-SNE, and 49 for UMAP, which was significantly higher than the other three techniques, indicating a clear advantage of UMAP in classification performance. Table S1 presents the aroma distribution within different clusters generated by the combination of UMAP dimensionality reduction and K-means clustering techniques. Subsequent studies shall be carried out on results generated using the UMAP dimensionality reduction technique.

3.3. Analysis of the cluster constituents: Structure-odor correlations

3.3.1. Aroma distribution of compounds

A comprehensive analysis was conducted on the clustering results of the two-dimensional data generated by UMAP dimensionality reduction. As evidenced in Table S1, the individual clusters demonstrated well-defined aroma distribution characteristics. Upon analyzing the overall distribution, it becomes evident that the “fruit” aroma is the predominant aroma in cluster C1, comprising 42.19 % of the total elements in C1, significantly exceeding the proportions of other aroma profiles. In cluster C2, three aroma profiles stand out, “flower,” “sweet,” and “roast”, which account for 21.64 %, 22.81 %, and 18.13 % of the total molecular count in C2, respectively. Meanwhile, although “fruit,” “flower,” “sweet,” “spice,” and “herb” are all present in cluster C3, their proportional representation does not yield a clear dominance. Additionally, “green” and “fat” exhibit pronounced aroma representativeness within cluster C4. A more detailed analysis of the distribution of aroma profiles across various clusters revealed that cluster C1 exhibited the greatest diversity of aroma types. Majority of the fruit-related aromas are prominently represented in C1, with significant proportions such as 65.06 % of “fruit” aroma, 90.91 % of “apple” aroma, 90.91 % of “pineapple” aroma, 100 % of “banana” aroma, and 71.43 % of “lemon” aroma being well-distributed within this cluster. A substantial proportion of vegetable odors are also present, including “onion,” “cabbage,” and “mushroom.” 56 % of “sulfur” odor is also assigned to C1, which aligns with the well-known fact that “onion” and “cabbage” are typical sources of sulfur compounds (Sun et al., 2022). Furthermore, “alcohol” (78.57 %), “solvent” (88.89 %), “ether” (88.89 %), “wine” (85.71 %), and “yeast” (100 %) aromas are all distributed within C1. A significant portion of “acid,” “sweat,” and “rancid” aromas, which are generally associated with acidic compounds, are also present in this cluster (Dong et al., 2024). In cluster C2, over 70 % of the “roast”, “smoke”, “medicine”, “phenol”, and “clove” aromas are concentrated, with an additional 68.75 % of the “honey” aroma also being distributed. The proportion analysis in Table S1 revealed that cluster C3 harbored a relatively limited range of aroma types, with nearly half of the total aromas exhibiting zero representation within this cluster. Notably, a segment of refreshing aromas is present in C3, specifically 73.68 % of “mint” aroma and 69.23 % of “camphor” aroma. Additionally, a substantial proportion of “peach” and “turpentine” aromas are also distributed within C3. As for cluster C4, prior research has identified “green” and “fat” as the predominant aroma types, accounting for 41.33 % and 24.00 % respectively of the total aroma compounds in C4. Furthermore, C4 encompasses 88.89 % of “cucumber” aroma and 62.5 % of “leaf” aroma among its diverse composition. This demonstrates that green plant odor is the main type of aroma in C4. The distribution of “fat” odor in C4 is 46.15 % of the total number of odors. From the aforementioned results, it can be observed that although most of the same aromas can be well divided into the same cluster, several of the aroma distributions are more scattered and there is no certain regularity. Since the same aroma can be produced by different substances, combined with the principle of cluster analysis, it is possible that different structural substances carrying the same smell resulted in this occurrence.

3.3.2. Calculation and statistics of compound structure

Using KNIME software to determine the structures of 646 molecules, 26 distinct functional groups were identified. Twelve of these functional groups, which were present in a minimum of 5 % of the molecules in one of the four clusters, were selected for examination to increase the accuracy of the results (Rugard et al., 2021). The distribution of the functional groups of the compounds inside each cluster is illustrated in Fig. 3. Most of the functional groups exhibited significant distribution characteristics. Further investigation revealed that the C1 cluster contained over 50 % of the ester functional groups, alcohol hydroxyl groups, and carboxyl groups, which together made up 73.73 % of the total number of compounds in C1. The vast majority of sulfur-containing heterocycles, nitrogen-containing heterocycles, oxygen-containing heterocycles, and nitrogen-containing sulfur-containing heterocycles were distributed in C2, and several ether bonds and phenolic hydroxyls were also distributed in C2. Among them, cyclic substances dominated in C2, occupying a total of 87.73 % of the total number of compounds in C2. In C3, compared with the other clusters, there was no significant representative distribution of functional groups. Aldehydes make up the majority of C4, and 67.07 % of functional groups based on aldehydes are found there. It is evident from the aforementioned study that the majority of functional groups belonging to the same class may be assigned to the same clusters and are well represented within them. These findings highlight the effectiveness of the dimensionality reduction and clustering techniques employed.

Fig. 3.

Fig. 3

Distribution of selected functional groups in the clusters: red denoting cluster C1, black representing cluster C2, blue for cluster C3, and yellow indicating cluster C4. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

3.3.3. Construction of an “aroma-structure” predictive model

Although the distribution and proportion of each individual aroma and functional group within the various clusters are clearly understood, this did not allow us to directly correlate each aroma with its corresponding functional group, and further detailed analyses are required. We focus on the cluster in which each aroma is primarily distributed and calculate the probability of each aroma being produced by different functional groups. For the C1 cluster, which exhibits the most abundant distribution of aroma compounds, statistical analysis reveals that the database comprises 166 “fruit” aroma molecules, with 108 of them residing in C1. As evident from the Table S2-S5, more than one functional group can contribute to the “fruit” aroma. Nevertheless, ester functional groups are predominantly responsible for 68.52 % of the “fruit” aroma, indicating a strong correlation between “fruit” aroma and ester groups. Furthermore, our investigation into other fruit-like aromas demonstrates that all “apple” and “pineapple” aromas present in C1 are exclusively derived from ester groups, and 88.89 % of “banana” aroma is also attributed to esters. These findings further underscore the crucial contribution of ester functional groups to the pleasant fruity aroma of Baijiu. This result is consistent with earlier reports (Fan & Qian, 2005; Jin et al., 2017; Song et al., 2020; Song et al., 2021). The aroma of “flower” which stands as one of the most ubiquitous fragrances in Baijiu, exhibits a commendable distribution profile. Specifically, “flower” aroma is predominantly concentrated in clusters C1 and C2. In C1, 54.35 % of “flower” aroma is generated by ester functional groups, while 41.30 % is attributed to alcohol hydroxyl groups. In contrast, C2 harbors 37 “flower” aroma molecules, among which 15 compounds possess ether functional groups, 11 compounds carry ester functional groups, and 9 compounds feature phenolic hydroxyl groups. Consequently, the “flower” aroma exhibits a relationship with ester groups, hydroxyl groups, and ether bonds. Additionally, “rose” aroma also demonstrates a strong association with ester groups and alcohol hydroxyl groups. “Sweet,” another prevalent and delightful aroma in Baijiu, exhibited a notable presence in cluster C1. Specifically, 34 sweet-scented compounds were distributed within C1, out of which 61.76 % carry ester groups. Additionally, the presence of certain alcohol hydroxyl groups, aldehyde groups, ether bonds, and O-heterocycles was also implicated in the production of “sweet” aromas, indicating a multifaceted contribution to this desirable fragrance. For the creamy scents categorized as “cream” and “butter,” their generation is largely attributed to the presence of ketonic functional groups. Among the 11 “alcohol” aroma compounds distributed in C1, 8 carry hydroxyl groups, indicating a significant correlation between hydroxyl groups and “alcohol.” This can be attributed to the fact that when acidic compounds in Baijiu reach certain concentrations, they can impart undesirable odors, as reported in prior studies. As discernible from the Table S2, the presence of carboxyl groups is intimately tied to the production of “acid” “sweat” and “rancid” aromas, with 71.43 % of “acid” aroma, 83.33 % of “sweat” and 83.33 % of “rancid” odors in cluster C1 emanating from carboxyl groups. Table S6 revealed that cluster C2 was dominated by a series of cyclic compounds. When correlated with the distribution of aroma profiles, it becomes evident that cluster C2 is abundant in roasted/nutty aromas such as “roast,” “nut,” and “almond.” A detailed analysis indicates that 83.78 % of the “roast” aroma is localized within C2, with over half of these aroma compounds bearing nitrogen-containing heterocyclic rings. Furthermore, 84.62 % of the “nut” aroma in Cluster C1 is attributed to nitrogen-containing heterocycles. Consequently, the generation of roasted or nutty aromas is primarily associated with the presence of nitrogen-containing heterocycles. Additionally, 85.71 % of the “potato” aroma is also produced by nitrogen-containing heterocycles, as observed from the table. Notably, these compounds exhibit not only “potato” aroma but also “roast” aroma, suggesting that the presence of nitrogen-containing heterocycles contributes to the characteristic aroma of roasted potatoes, rather than a purely “potato” scent in all likeliness. Nitrogenous compounds present in Baijiu at low concentrations have a low threshold, making a significant contribution to its flavor. Zhu et al. performed GC-O analysis on Maotai Baijiu, and the results indicated that pyrazine compounds contain “roasted and nutty” aromas, constituting a crucial component of Maotai Baijiu's flavor profile and this aligns with our findings (Zhu et al., 2020). Concurrently, we have observed that the generation of “clove” aroma is intimately related to the presence of ether bonds and phenolic hydroxyl groups. For the “smoke” and “medicine” aromas, the existence of phenolic hydroxyl groups serves as the primary factor in their production, while the presence of certain ether bonds also contributes to their release. The production of “phenol” aroma is solely attributed to phenolic hydroxyl groups. In cluster C2, there are 11 aroma compounds associated with “honey,” of which 6 are contributed by ester functional groups. As a subset of “sweet” aroma, “honey” exhibits similar research findings. Furthermore, 76.47 % of the “spice” aroma in cluster C2 is generated by ether bonds. Meanwhile, “earth” and “caramel” aromas display certain patterns in clusters C2 and C3. Specifically, the “earth” aroma is largely associated with ether bonds, nitrogen-containing heterocycles, and alcoholic hydroxyl groups. In contrast, the “caramel” aroma is primarily associated with oxoheterocycles, alcoholic hydroxyl groups, ketone groups and ether bonds. Earthy flavor is considered a common off-odor in Baijiu and is more pronounced in the strong-aroma types of Baijiu. Dong et al. elucidated for the first time that 3-methylindole is the key compound for the “mud odor” by excavating the mud odor substances in strong-flavored Baijiu (Dong et al., 2018). Indoles are an important class of N-containing heterocyclic compounds. This is consistent with our experimental findings, indicating the significance of this present study work. Cluster C3 lacks a prominent representative aroma, yet 66.67 % of the “camphor” aroma exhibits a certain degree of correlation with alcoholic hydroxyl groups. The “peach” aroma is predominantly distributed in cluster C3 and displays a strong association with ester functional groups, which is consistent with similar conclusions drawn from the fruity aromas in cluster C1. Cluster C4 exhibited distinct functional group characteristics despite its relatively small proportion of aroma distribution. Aldehydes were the primary functional groups in C4, with aroma profiles dominated by green vegetal and fatty odors. A detailed analysis reveals that 37.35 % of the “green” aroma is concentrated in this cluster, comprising 31 aroma compounds, of which 27 carry aldehyde functional groups. Additionally, 87.50 % of the “cucumber” aroma and “grass” aroma within this cluster display similar properties. These findings collectively indicate a significant correlation between the aroma of green plants and aldehyde groups. Sun et al. used comprehensive molecular sensory science techniques to demonstrate that acetaldehyde (grass) and 3-methylbutanal (grass and malt) are important aroma-active compounds in fresh Xiaoqu Baijiu (Sun et al., 2022). Furthermore, fatty odors such as “fat,” “oil,” and “balsamic” also exhibit a strong correlation with aldehyde groups. Based on the aforementioned studies, a “structure-odor” network correlation model was established, with the results presented in Fig. 4.

Fig. 4.

Fig. 4

Aroma-structure prediction model. Thicker lines indicate a stronger correlation between aroma and structure, while thinner lines represent a weaker correlation.

3.4. Research on the practical applications of real baijiu

3.4.1. Discovery of the unknown substance with strong roasted aroma

GC-O-MS results indicate that most of the aroma regions corresponded to the peaks observed in the chromatogram and could accurately be identified. However, there are certain regions where aromas can be detected but the specific substances cannot be determined. Notably, at the retention index of 1324–1334, a strong “roasted aroma” is presented, signifying a significant contribution to the aroma profile of the sample. However, as observed from the chromatogram illustrated in Fig. 5, there were nearly no peaks in this aroma region, suggesting that the concentrations of the substances were extremely low, rendering them undetectable. To clarify the compounds contributing to the prominent “roasted aroma” in the Baijiu sample, the unknown substance within this time range was investigated through the “aroma-structure” model.

Fig. 5.

Fig. 5

TIC chromatogram of the light-aroma types of Baijiu. Panel A shows the full chromatogram, while Panel B presents a partial chromatogram. The red-marked region in Panel B corresponds to the intense “roasted aroma” detected between retention index of 1324 and 1334. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

3.4.2. Identification of compound with strong roasted aroma in baijiu

Previous studies have elucidated the correlation between the aroma and structure. Fig. 4 reveals a significant correlation between “roasted aroma” and “nitrogen heterocycles.” Therefore, by utilizing the “aroma-compound” database, we can prioritize compounds that contain both “roasted aroma” and “nitrogen heterocycles” to narrow the search scope (Fig. 6 illustrates the complete identification process). Additionally, based on the generation and end times of the aroma region, the retention index range for the unknown substances can be calculated, while the retention index for other selected substances can be obtained through the website (CAS Number Search, nist.gov). The retention indices of the same substances on the same chromatographic column were found to be similar. Therefore, the retention index can serve as a screening criterion, and known substances with retention indices close to those of unknown substances can be selected. This significantly reduces the number of substances that are required to be analyzed. Through calculations, the retention index of the aroma region in the Baijiu sample is 1324–1334. Generally, under the same chromatographic conditions, the deviation of the measured retention index from the standard value should be within 10–25 i.u (Zhu et al., 2020). By comparison and screening, three possible compounds contributing to the “roasted aroma” of the Baijiu were identified: 2-acetyl-1-pyrroline, 2,5-dimethylpyrazine, and 2,6-dimethylpyrazine.

Fig. 6.

Fig. 6

Flowchart for the qualitative analysis of unknown compounds responsible for the “roasted aroma” in Baijiu samples.

It is important to note that various aromas can be classified as “roasted aromas,” such as “roasted bread,” “roasted nuts,” and so on. To identify the unknown compounds responsible for the “roasted aroma” in Baijiu samples, standard solutions of the three aforementioned compounds were evaluated on GC-O-MS using the same procedure. The results indicated that 2-acetyl-1-pyrroline emitted a distinct popcorn-like aroma, which, while categorized as a “roasted aroma,” did not align with the odor observed in the Baijiu samples. In contrast, the other two compounds exhibited aromas more similar to those detected in the Baijiu samples. Further analysis revealed that 2-acetyl-1-pyrroline is a key aromatic compound in rice, with an exceptionally low odor threshold of 0.02–0.04 ng/L, making it highly detectable (Cai et al., 2024; Huang et al., 2024). However, the results of the sniffing experiment clearly excluded it, as its aroma did not correspond to the “roasted aroma” found in the Baijiu samples. Consequently, the unknown substances that produce the “roasted scent” can only be 2,5-dimethylpyrazine or 2,6-dimethylpyrazine, both of which are typical isomers. Further analysis revealed that the mass spectra of the two compounds were extremely similar, with almost identical ion fragments. In conjunction with Fig. S4, further analysis of the peak elution of the two compounds was conducted. Calculations show that the retention index of 2,5-dimethylpyrazine is 1330, and the retention index of 2,6-dimethylpyrazine is 1336, indicating that both compounds' retention indices fall within the normal fluctuation range. However, with the current analytical method, we are unable to distinguish between the two. A more scientific approach is required for their differentiation. Therefore, in our study, we conducted a rigorous screening process and ultimately identified that the aroma compounds responsible for the “roasted aroma” within the retention index of 1324–1334 are 2,5-dimethylpyrazine and 2,6-dimethylpyrazine. These two compounds are relatively rare in reports on the light flavor of Baijiu. Furthermore, future studies could confirm whether they play a significant role as aroma compounds contributing to the “roasted aroma” of LAB using recombinant missing techniques.

The main source of the “roasted aroma” in Baijiu is related to the Maillard reaction products formed during the brewing process, which is most prominently observed in sauce-flavored Baijiu. Typical thermal processes in the production of sauce-flavored Baijiu include high-temperature fermentation with Qu (65–70 °C), grain stacking, and multiple cycles of grain fermentation (42–45 °C) and distillation (100–105 °C). These distinctive brewing techniques are responsible for the characteristic baked and roasted aromas of sauce-flavored Baijiu, which are considered crucial criteria in its sensory evaluation and quality assessment (Sha et al., 2017). In contrast, strong-aroma Baijiu is produced with medium-temperature Qu preparation (55–60 °C), followed by fermentation at 32–35 °C and distillation at 95–102 °C(Dong et al., 2019). As for light-aroma Baijiu, it is produced with low-temperature Qu preparation (typically not exceeding 60 °C), combined with low-temperature fermentation and a shorter fermentation duration. Currently, 2,5-dimethylpyrazine and 2,6-dimethylpyrazine have been widely identified in both sauce-aroma and strong-aroma Baijiu and have been confirmed as important aroma compounds in sauce-aroma Baijiu. Although LAB is produced at lower temperatures during fermentation, a study by Van-Diep Le et al. on the volatile compounds at various stages of fen-daqu production revealed that during the high-temperature stage of Qu making, the temperature can reach 60 °C, promoting the occurrence of the Maillard reaction, which identified six pyrazine compounds, including 3-dimethylpyrazine, 2-ethyl-6-methylpyrazine, trimethylpyrazine, and tetramethylpyrazine (Van-Diep et al., 2012). These compounds are transferred into the spirit during the subsequent fermentation and brewing processes, contributing to the final flavor profile. Therefore, from the perspective of the mechanisms of compound formation, we further validated the possibility that 2,5-dimethylpyrazine and 2,6-dimethylpyrazine were present in the light-flavored Baijiu samples used in this study, while also confirming the effectiveness of the model. One possible reason for the absence of detection in our samples might be that the concentrations were below the threshold levels for ordinary detectors. However, their significant contribution to flavor highlights the need for further research.

4. Conclusions

In summary, we have developed an “aroma-compound” database for Baijiu, which includes 646 compounds and 70 aroma descriptors. The compounds were encoded using 1024-bit molecular fingerprints and analyzed using UMAP dimensionality reduction and K-means clustering techniques to establish correlations between aroma and chemical structure. This approach is particularly valuable for low-concentration, low-threshold compounds in Baijiu, as they may produce strong aromas but lack distinct mass spectrometry or chromatographic signals, making accurate identification difficult. Furthermore, by applying real Baijiu samples, we successfully identified the unknown compounds responsible for the intense “roasted aroma” as 2,5-dimethylpyrazine and 2,6-dimethylpyrazine, thereby validating the effectiveness of the constructed “aroma-structure” model. In future studies, more advanced machine learning algorithms will be employed to process large datasets, offering deeper insights into the “aroma-structure” relationship.

CRediT authorship contribution statement

Yintao Jia: Writing – original draft, Methodology, Conceptualization. Yue Qiu: Writing – review & editing, Visualization. Qi Deng: Visualization, Formal analysis, Data curation. Ying Han: Supervision, Conceptualization. Baoguo Sun: Supervision. Rong Liu: Resources. Pan Zhen: Resources. Wenxian Li: Writing – review & editing. Wei Dong: Writing – review & editing, Visualization, Supervision. Xiaotao Sun: Writing – review & editing, Visualization, Supervision. Xiao Yang: Visualization. Fan Cui: Resources.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Ethical approval

All procedures for sensory evaluation were carried out in accordance with relevant laws and institutional guidelines and were approved by the Scientific Research Academic Committee of Beijing Technology and Business University.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was supported by the National Key Research and Development Program of China (2022YFD2101205), National Natural Science Foundation of China (32102122), and National Engineering Research Center of Solid-State Brewing of Luzhou Laojiao Distillery Co., Ltd.

Footnotes

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.fochx.2025.102645.

Contributor Information

Wei Dong, Email: 20200812@btbu.edu.cn.

Xiaotao Sun, Email: sxt_btbu66@163.com.

Appendix A. Supplementary data

Supplementary material

The distribution of aroma generated by UMAP and K-means across different clusters (Table S1); relationship of “aroma-structure” (Table S2-S5); proportion of functional groups in different clusters (Table S6). The relationship between odor descriptors and compounds (Fig. S1); statistics of the number of aroma descriptors with A values greater than 50 % (Fig. S2); results of the Kelly penalty function (Fig. S3), chromatogram and mass spectrometry of the Baijiu sample and two compounds (Fig. S4).

mmc1.docx (1.5MB, docx)

Data availability

Data will be made available on request.

References

  1. Becht E., Mcinnes L., Healy J., Dutertre C.A., Kwok I.W.H., Ng L.G.…Newell E.W. Dimensionality reduction for visualizing single-cell data using umap. Nature Biotechnology. 2019;37(1):38. doi: 10.1038/nbt.4314. [DOI] [PubMed] [Google Scholar]
  2. Beisken S., Meinl T., Wiswedel B., De Figueiredo L.F., Berthold M., Steinbeck C. Knime-cdk: Workflow-driven cheminformatics. BMC Bioinformatics. 2013;14 doi: 10.1186/1471-2105-14-257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Cai Y., Pan X., Zhang D., Yuan L., Lao F., Wu J. The kinetic study of 2-acetyl-1-pyrroline accumulation in the model system: An insight into enhancing rice flavor through the maillard reaction. Food Research International. 2024;191 doi: 10.1016/j.foodres.2024.114591. [DOI] [PubMed] [Google Scholar]
  4. Capecchi A., Probst D., Reymond J.L. One molecular fingerprint to rule them all: Drugs, biomolecules, and the metabolome. Journal of. Cheminformatics. 2020;12(1) doi: 10.1186/s13321-020-00445-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Dong W., Dai X., Jia Y., Ye S., Shen C., Liu M., Lin F., Sun X., Xiong Y., Deng B. Association between baijiu chemistry and taste change: Constituents, sensory properties, and analytical approaches. Food Chemistry. 2024;437 doi: 10.1016/j.foodchem.2023.137826. [DOI] [PubMed] [Google Scholar]
  6. Dong W., Guo R., Liu M., Shen C., Sun X., Zhao M., Sun J., Li H., Zheng F., Huang M., Wu J. Characterization of key odorants causing the roasted and mud-like aromas in strong-aroma types of base baijiu. Food Research International. 2019;125 doi: 10.1016/j.foodres.2019.108546. [DOI] [PubMed] [Google Scholar]
  7. Dong W., Shi K., Liu M., Shen C., Li A., Sun X., Zhao M., Sun J., Li H., Zheng F., Huang M. Characterization of 3-methylindole as a source of a “mud”-like off-odor in strong-aroma types of base baijiu. Journal of Agricultural and Food Chemistry. 2018;66(48):12765–12772. doi: 10.1021/acs.jafc.8b04734. [DOI] [PubMed] [Google Scholar]
  8. Duan J., Cheng W., Lv S., Deng W., Hu X., Li H., Sun J., Zheng F., Sun B. Characterization of key aroma compounds in soy sauce flavor baijiu by molecular sensory science combined with aroma active compounds reverse verification method. Food Chemistry. 2024;443 doi: 10.1016/j.foodchem.2024.138487. [DOI] [PubMed] [Google Scholar]
  9. Dutta P., Jain D., Gupta R., Rai B. Classification of tastants: A deep learning based approach. Molecular Informatics. 2023;42(12) doi: 10.1002/minf.202300146. [DOI] [PubMed] [Google Scholar]
  10. Ehiro T. Feature importance-based interpretation of umap-visualized polymer space. Molecular Informatics. 2023;42(8–9) doi: 10.1002/minf.202300061. [DOI] [PubMed] [Google Scholar]
  11. Fan W.L., Qian M.C. Headspace solid phase microextraction and gas chromatography-olfactometry dilution analysis of young and aged chinese “yanghe daqu” liquors. Journal of Agricultural and Food Chemistry. 2005;53(20):7931–7938. doi: 10.1021/jf051011k. [DOI] [PubMed] [Google Scholar]
  12. Herrera-Rocha F., Fernandez-Nino M., Duitama J., Cala M.P., Chica M.J., Wessjohann L.A.…Barrios A.F.G. Flavorminer: A machine learning platform for extracting molecular flavor profiles from structural data. Journal of. Cheminformatics. 2024;16(1) doi: 10.1186/s13321-024-00935-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Huang Y., Huang L., Cheng M., Li C., Zhou X., Ullah A., Sarfraz S., Khatab A., Xie G. Progresses in biosynthesis pathway, regulation mechanism and potential application of 2-acetyl-1-pyrroline in fragrant rice. Plant Physiology and Biochemistry. 2024;215 doi: 10.1016/j.plaphy.2024.109047. [DOI] [PubMed] [Google Scholar]
  14. Jin G., Zhu Y., Xu Y. Mystery behind chinese liquor fermentation. Trends in Food Science & Technology. 2017;63:18–28. doi: 10.1016/j.tifs.2017.02.016. [DOI] [Google Scholar]
  15. Kelley L.A., Gardner S.P., Sutcliffe M.J. An automated approach for clustering an ensemble of nmr-derived protein structures into conformationally related subfamilies. Protein Engineering. 1996;9(11):1063–1065. doi: 10.1093/protein/9.11.1063. [DOI] [PubMed] [Google Scholar]
  16. Kobak D., Linderman G.C. Initialization is critical for preserving global data structure in both t-sne and umap. Nature Biotechnology. 2021;39(2) doi: 10.1038/s41587-020-00809-z. [DOI] [PubMed] [Google Scholar]
  17. Li H., Zhang X., Gao X., Shi X., Chen S., Xu Y., Tang K. Comparison of the aroma-active compounds and sensory characteristics of different grades of light-flavor baijiu. Foods. 2023;12(6) doi: 10.3390/foods12061238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Marx V. Seeing data as t-sne and umap do. Nature Methods. 2024;21(6):930–933. doi: 10.1038/s41592-024-02301-x. [DOI] [PubMed] [Google Scholar]
  19. Probst D., Reymond J.-L. Visualization of very large high-dimensional data sets as minimum spanning trees. Journal of. Cheminformatics. 2020;12(1) doi: 10.1186/s13321-020-0416-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Rabasovic M.S., Pavlovic D.M., Sevic D. Analysis of laser ablation spectral data using dimensionality reduction techniques: Pca, t-sne and umap. Contributions of the Astronomical Observatory Skalnate Pleso. 2023;53(3):51–57. doi: 10.31577/caosp.2023.53.3.51. [DOI] [Google Scholar]
  21. Roweis S.T., Saul L.K. Nonlinear dimensionality reduction by locally linear embedding. Science. 2000;290(5500):2323–+. doi: 10.1126/science.290.5500.2323. [DOI] [PubMed] [Google Scholar]
  22. Rugard M., Jaylet T., Taboureau O., Tromelin A., Audouze K. Smell compounds classification using umap to increase knowledge of odors and molecular structures linkages. PLoS ONE. 2021;16(5) doi: 10.1371/journal.pone.0252486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Sha S., Chen S., Qian M., Wang C., Xu Y. Characterization of the typical potent odorants in chinese roasted sesame-like flavor type liquor by headspace solid phase microextraction-aroma extract dilution analysis, with special emphasis on sulfur-containing odorants. Journal of Agricultural and Food Chemistry. 2017;65(1):123–131. doi: 10.1021/acs.jafc.6b04242. [DOI] [PubMed] [Google Scholar]
  24. Sharma A., Kumar R., Ranjta S., Varadwaj P.K. Smiles to smell: Decoding the structure-odor relationship of chemical compounds using the deep neural network approach. Journal of Chemical Information and Modeling. 2021;61(2):676–688. doi: 10.1021/acs.jcim.0c01288. [DOI] [PubMed] [Google Scholar]
  25. Song X., Jing S., Zhu L., Ma C., Song T., Wu J., Zhao Q., Zheng F., Zhao M., Chen F. Untargeted and targeted metabolomics strategy for the classification of strong aroma-type baijiu (liquor) according to geographical origin using comprehensive two-dimensional gas chromatography-time-of-flight mass spectrometry. Food Chemistry. 2020;314 doi: 10.1016/j.foodchem.2019.126098. [DOI] [PubMed] [Google Scholar]
  26. Song X., Zhu L., Geng X., Li Q., Zheng F., Zhao Q., Ji J., Sun J., Li H., Wu J., Zhao M., Sun B. Analysis, occurrence, and potential sensory significance of tropical fruit aroma thiols, 3-mercaptohexanol and 4-methyl-4-mercapto-2-pentanone, in Chinese Baijiu. Food Chemistry. 2021;363 doi: 10.1016/j.foodchem.2021.130232. [DOI] [PubMed] [Google Scholar]
  27. Sun X., Qian Q., Xiong Y., Xie Q., Yue X., Liu J., Wei S., Yang Q. Characterization of the key aroma compounds in aged chinese xiaoqu baijiu by means of the sensomics approach. Food Chemistry. 2022;384 doi: 10.1016/j.foodchem.2022.132452. [DOI] [PubMed] [Google Scholar]
  28. Tromelin A., Chabanet C., Audouze K., Koensgen F., Guichard E. Multivariate statistical analysis of a large odorants database aimed at revealing similarities and links between odorants and odors. Flavour and Fragrance Journal. 2018;33(1):106–126. doi: 10.1002/ffj.3430. [DOI] [Google Scholar]
  29. Van-Diep L., Zheng X.-W., Chen J.-Y., Han B.-Z. Characterization of volatile compounds in fen-daqu-a traditional chinese liquor fermentation starter. Journal of the Institute of Brewing. 2012;118(1):107–113. doi: 10.1002/jib.8. [DOI] [Google Scholar]
  30. Wang G., Liu F., Pan F., Li H., Zheng F., Ye X., Sun B., Cheng H. Study on the interaction between polyol glycerol and flavor compounds of baijiu: A new perspective of influencing factors of baijiu flavor. Journal of Agricultural and Food Chemistry. 2024;72(48):26832–26845. doi: 10.1021/acs.jafc.4c05935. [DOI] [PubMed] [Google Scholar]
  31. Wang L., Tang P., Zhang P., Lu J., Chen Y., Xiao D., Guo X. Unraveling the aroma profiling of baijiu: Sensory characteristics of aroma compounds, analytical approaches, key odor-active compounds in different baijiu, and their synthesis mechanisms. Trends in Food Science & Technology. 2024;146. Article 104376 doi: 10.1016/j.tifs.2024.104376. [DOI] [Google Scholar]
  32. Wen H., Nan S., Zhang J., Lei Z., Shen W. Chemical space deconstruction-based dynamic model ensemble architecture for molecular property prediction. Chemical Engineering Science. 2024;295 doi: 10.1016/j.ces.2024.120118. [DOI] [Google Scholar]
  33. Wu M., Fan Y., Zhang J., Chen H., Wang S., Shen C., Fu H., She Y. A novel organic acids-targeted colorimetric sensor array for the rapid discrimination of origins of baijiu with three main aroma types. Food Chemistry. 2024;447 doi: 10.1016/j.foodchem.2024.138968. [DOI] [PubMed] [Google Scholar]
  34. Zeng S., Duan X., Bai J., Tao W., Hu K., Tang Y. Soft multiprototype clustering algorithm via two-layer semi-nmf. IEEE Transactions on Fuzzy Systems. 2024;32(4):1615–1629. doi: 10.1109/tfuzz.2023.3329108. [DOI] [Google Scholar]
  35. Zeng X., Cao R., Xi Y., Li X., Yu M., Zhao J., Cheng J., Li J. Food flavor analysis 4.0: A cross-domain application of machine learning. Trends in Food Science & Technology. 2023;138:116–125. doi: 10.1016/j.tifs.2023.06.011. [DOI] [Google Scholar]
  36. Zhu J., Niu Y., Xiao Z. Characterization of important sulfur and nitrogen compounds in lang baijiu by application of gas chromatography-olfactometry, flame photometric detection, nitrogen phosphorus detector and odor activity value. Food Research International. 2020;131 doi: 10.1016/j.foodres.2020.109001. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

The distribution of aroma generated by UMAP and K-means across different clusters (Table S1); relationship of “aroma-structure” (Table S2-S5); proportion of functional groups in different clusters (Table S6). The relationship between odor descriptors and compounds (Fig. S1); statistics of the number of aroma descriptors with A values greater than 50 % (Fig. S2); results of the Kelly penalty function (Fig. S3), chromatogram and mass spectrometry of the Baijiu sample and two compounds (Fig. S4).

mmc1.docx (1.5MB, docx)

Data Availability Statement

Data will be made available on request.


Articles from Food Chemistry: X are provided here courtesy of Elsevier

RESOURCES