Advancing chemical safety prediction: an integrated GNN framework with DFT-augmented cyclic compound solution

Seul Lee; Jooyeon Lee; Unghwi Yoon; Jahyun Koo; Young Wook Yoon; Yoonjae Cho; Seung-Ryul Hwang; Keunhong Jeong

doi:10.1186/s13321-026-01151-3

. 2026 Jan 28;18:28. doi: 10.1186/s13321-026-01151-3

Advancing chemical safety prediction: an integrated GNN framework with DFT-augmented cyclic compound solution

Seul Lee ^1,^6,^#, Jooyeon Lee ^2,^6,^#, Unghwi Yoon ^3,^6,⁷, Jahyun Koo ^2,⁶, Young Wook Yoon ^4,⁶, Yoonjae Cho ^4,⁶, Seung-Ryul Hwang ^4,⁶, Keunhong Jeong ^5,^6,^✉

PMCID: PMC12922296 PMID: 41606695

Abstract

The rapid proliferation of chemical substances presents significant challenges in assessing their safety–critical physicochemical properties. This study presents an integrated approach using Graph Neural Networks (GNNs) to predict three crucial properties for chemical safety assessment: Heat of Combustion (HoC), Vapor Pressure (VP), and Flashpoint. Leveraging comprehensive datasets of 4780, 3573, and 14,696 compounds respectively, we developed a unified prediction model that outperforms existing approaches. Our model achieves mean absolute errors of 126 J/mol (R² = 0.993) for HoC, 0.617 log units (R² = 0.898) for VP, and 14.42 °C (R² = 0.839) for Flashpoint, representing notable improvements over conventional methods. Through detailed analysis, we identified and addressed a specific challenge in predicting HoC for cyclic compounds by implementing a hybrid approach combining DFT calculations and Random Forest modeling. This specialized treatment expanded our cyclic compound dataset from 12 to 55 compounds and achieved an R² of 0.918 for these traditionally challenging structures. The model was integrated into a real-time prediction system using Flask, allowing users to input chemical structures through SMILES notation or direct drawing. The system includes features for comparing predictions with experimental data and benchmarking against common industrial chemicals (acetone, n-hexane, and n-decane), enhancing its practical utility in emergency response scenarios. Our approach provides a robust, unified solution for predicting multiple safety–critical properties simultaneously, addressing a crucial need in chemical safety assessment and emergency response planning.

Scientific contribution

Overall, this study provides an integrated framework that deploys three GNN-based prediction models within a common architecture and a real-time prediction system. For cyclic compounds, which exhibit systematic prediction challenges under the GNN framework, we incorporate a targeted alternative modeling strategy to improve predictive reliability, thereby enhancing the practical applicability of machine-learning approaches to chemical safety assessment.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13321-026-01151-3.

Keywords: Graph neural networks (GNN), Chemical safety prediction, Density functional theory (DFT), Real-time prediction system, Data augmentation

Introduction

The 21 century has witnessed an unprecedented explosion in the diversity and complexity of chemical substances, driven by advances in synthetic chemistry, materials science, and industrial innovation. According to the Chemical Abstracts Service (CAS) Registry, the number of registered chemical substances surpassed 180 million in 2021, with an average of 15,000 new substances added daily (CAS, 2021). This exponential growth, while a testament to human ingenuity, presents formidable challenges to safety regulators, industrial hygienists, and emergency responders worldwide. The sheer volume and variety of these substances necessitate a paradigm shift in how we approach chemical safety and risk assessment. Traditionally, the characterization of physicochemical properties of chemical substances has relied heavily on direct experimental methods. These approaches, while accurate, are increasingly challenging given the vast number of new chemicals entering the market and the time- and resource-intensive nature of laboratory testing. For example, the United States Environmental Protection Agency (US EPA) provides cost estimates [1] for studies required for pesticide registration, which can reach up to $4.9 million, and, as reported by Devito et al. [2], considering that only a small portion (15%) of commercial chemicals undergo traditional toxicity testing, this shows that relying solely on experimental methods to assess the safety of the millions of substances in use today is unrealistic. Safety Data Sheets (SDS) are designed to bridge this gap by providing safety information, including physicochemical properties relevant to accident prevention and response. However, the completeness of SDS data have been subjects of concern in the scientific community. Nicol et al. [3] revealed that chemicals not listed on the SDS were found in 30 to 100% of the products analyzed. Additionally, the revised 2020 EU regulation [4] highlights that international demands for chemical safety management have continued to grow, emphasizing the need for greater transparency and accuracy in chemical reporting. This underscores the ongoing importance of providing clear and accurate information on chemical substances.

In the context of chemical accident prevention and response, three physicochemical properties stand out as particularly crucial: Heat of Combustion (HoC), Vapor Pressure (VP), and Flashpoint. These properties play crucial roles in determining the behavior of chemicals during accidental releases, fires, or explosions: Heat of Combustion (HoC) quantifies the energy released when a substance undergoes complete combustion. It is a critical parameter in assessing the potential severity of fires involving chemical substances. High HoC values indicate substances that can release large amounts of energy, potentially leading to more intense fires or explosions. Vapor Pressure (VP) is a measure of a liquid's tendency to evaporate. It is crucial in determining the rate at which a chemical will form a vapor cloud following a spill. Substances with high VP pose greater risks of forming explosive atmospheres or causing inhalation hazards. Flashpoint is the lowest temperature at which a substance can form an ignitable mixture with air. It is a key factor in determining the fire hazard of a substance under ambient conditions. Low flashpoint substances are more likely to ignite at room temperature, posing significant fire risks.

Accurate knowledge of these properties is indispensable for risk assessment, safe handling procedures, and effective emergency response strategies. For instance, in a chemical spill scenario, knowing the VP and flashpoint of the substance can inform decisions about evacuation zones, personal protective equipment requirements, and fire suppression strategies. The challenge of characterizing these properties becomes even more acute when dealing with unknown or newly synthesized chemicals, where experimental data is often entirely absent. In such scenarios, the ability to rapidly and accurately predict these critical properties could be the difference between a contained incident and a catastrophic event. The 2013 West Fertilizer Company fire and explosion serve as a prominent example, resulting in 15 fatalities and over 260 injuries, according to a report by the U.S. Chemical Safety Board [5]. This explosion occurred when a large amount of stored chemicals was ignited by a fire, highlighting the critical importance of having accurate hazard information for chemical substances.

While artificial intelligence (AI) models may not achieve perfect accuracy, they can provide invaluable guidance to first responders and safety professionals in the critical early stages of an incident. The potential of AI in this domain has been recognized by major safety organizations, with the U.S. Occupational Safety and Health Administration (OSHA) highlighting machine learning as a key technology for future chemical safety management.

Before delving into AI-driven methods, it is instructive to revisit the classical and physics-based approaches that have historically supported chemical property prediction. Beyond direct measurements, classical structure–property approaches such as Benson’s group additivity (and related group-contribution methods, e.g., Joback–Reid) have long been used to estimate pure-component properties. In parallel, first-principles workflows—e.g., quantum-chemistry and COSMO-RS—provide physics-grounded approximations for vapor pressure and thermochemical quantities. For HoC, an established alternative is to derive it from standard enthalpies of formation (ΔH_f) via Hess’s law; accordingly, recent ML models for ΔH_f offer an indirect pathway to HoC estimation [6], situating our work within experimental, physics-based, and data-driven methodologies.

Recently, AI-based predictive modeling of physicochemical and combustion-related properties has rapidly expanded across the fields of fuel chemistry and chemical safety. Üstün et al. [7] provided a comprehensive review of machine learning applications for predicting ignition delay, flame speed, and flashpoint, emphasizing the increasing role of graph-based deep learning in combustion property modeling. Lin et al. [8] further explored interpretable machine learning approaches for hazardous property prediction, including explosivity and flashpoint, while Jeong et al. [9] proposed a Bayesian kernel framework for flashpoint prediction. These studies collectively highlight the growing potential of AI in modeling safety–critical physicochemical properties. Building on these recent advances, our research focuses on developing a unified GNN framework for predicting three key properties -HoC, VP, and Flashpoint- within a single integrated system. [38–44]

We begin by reviewing previous studies that have established models for predicting HoC. Jameel et al. [10] developed a model to predict the enthalpy of combustion for 204 compounds using data collected from the literature. The authors utilized an Artificial Neural Network (ANN) model with 14 input layers, 26 hidden layers, and 1 output layer to predict the enthalpy of combustion of various oxygenated fuels. For the test set, the $R^{2}$ value was calculated as ${(0.98914)}^{2} = 0.9784$ . Cao et al. [11] employed data from a total of 1496 compounds. In this study, both a multilinear regression (MLR) model and an ANN model were fitted to the data, and the results were compared. The ANN model specifically adopted a multilayer perceptron structure, incorporating 49 atom-type electrotopological state indices in the input layer. Ultimately, the $R^{2}$ values for the test set were 0.990 for the MLR model and 0.992 for the ANN model. In addition, studies that have developed models for predicting flashpoint were examined. Sun et al. [12] utilized a dataset of 10,575 entries and applied both Graph Convolutional Neural Network (GCNN) and Message Passing Neural Network (MPNN) models. When non-boiling point was not used as a molecular feature, both methods yielded an $R^{2}$ value of 0.86, with mean absolute error (MAE) values of 18.2 K and 17.3 K, respectively. Xu et al. [13] proposed a model using a Convolutional Neural network (CNN), which treated molecular structures as images and applied this approach to flashpoint prediction. The dataset comprised 4,098 entries, and for the test set, the model resulted in an MAE of 20.2 and an $R^{2}$ value of 0.79. Mirshahvalad et al. [14] sought to develop an ANN model to predict flashpoint using several properties of compounds, such as the number of hydrogen and carbon atoms, critical temperature, and normal boiling point, as input variables. The model was fitted to 393 compounds, resulting in a high $R^{2}$ value of 0.998. We also include recent machine-learning studies on vapor-pressure prediction [15].

Furthermore, models for predicting vapor pressure have evolved through combinations with various methodologies. Lin et al. [16] used a Directed Message Passing Neural Network (D-MPNN) to predict vapor pressure, incorporating the effects of temperature directly into the input data, which enhanced the accuracy of the predictions. They utilized a dataset of 19,081 data points, and the model's performance was measured by an average absolute relative deviation (AARD) of 0.617, indicating the model's precision in vapor pressure prediction. Santana et al. [17] propose a machine learning-based VP prediction model called PUFFIN. The model uses a GNN to learn molecular structural relationships and transfers this information to a feed-forward neural network (FFNN) for VP prediction. The FFNN integrates domain-specific knowledge, such as the boiling point, and physical laws to enhance its predictions. Tested on 1,851 data points, the model achieved a performance result of MSE 0.1609, demonstrating its effectiveness in accurately predicting VP.

We have assembled the most comprehensive dataset to date for these properties, spanning a wide range of chemical classes and structures. Our dataset includes over 20,000 unique compounds, representing a significant expansion over previous studies, which typically included fewer than 5,000 compounds. Leveraging recent advances in machine learning, particularly in the domain of Graph Neural Networks (GNNs), we have succeeded in developing models that significantly outperform previous predictive approaches in both accuracy and generalizability. GNNs, which can directly operate on molecular graphs, offer a natural way to capture the structural information of chemicals, potentially leading to more accurate and interpretable models. The culmination of our efforts is an integrated prediction system capable of estimating HoC, VP, and Flashpoint simultaneously from molecular structure input. This system represents a significant advancement in the field of chemical safety informatics, offering real-time predictions that can inform critical decision-making in emergency scenarios. By bridging the gap between the vast chemical space and our ability to characterize it, our work addresses a fundamental challenge in chemical safety and accident prevention.

Methods

Message passing framework in graph neural networks

Chemical compounds with molecular structures are difficult to implement using Convolutional Neural Networks (CNNs) because CNNs rely on regular grid-like spatial information. It is more appropriate to represent the molecular structure of a chemical compound as a graph, using nodes ( $N$ ) and edges ( $E$ ). This is denoted as $G = {N, E}$ , and we use Graph Neural Networks (GNNs) to process this graph representation. Among the GNNs, we chose to use the Message Passing Neural Network (MPNN), which is one of the fundamental frameworks of GNNs.

MPNN is especially suitable for chemistry-related tasks because it explicitly incorporates chemical bond information alongside atomic features. MPNN was first introduced by Gilmer et al. [18] as a general framework for message passing in molecular graphs. In this architecture, messages are passed between atoms using both node and edge features, enabling the model to capture bond-specific interactions such as the difference between single, double, and aromatic bonds. Its effectiveness has been widely validated, including in the MoleculeNet benchmark by Wu et al. [19], where MPNN served as a baseline model and demonstrated strong performance across tasks such as solubility prediction (ESOL), quantum chemistry (QM9), and toxicity classification (Tox21). Therefore, MPNN was selected as the core model in this study.

The MPNN is broadly composed of two phases: the message passing phase and the readout phase. In the message passing phase, each node's features are decomposed and converted into vector form through an embedding process. Then, the information from its neighbors is aggregated and used to update the target node. In other words, this phase consists of two parts: the message function, which integrates information, and the update function, which updates the hidden state.

m_{v}^{t + 1} = \sum_{w \in N (v)} M_{t} (h_{v}^{t}, h_{w}^{t}, e_{v, w})

$v$ is the target node, $N (v)$ represents the neighboring nodes of node $v$ , $h_{v}$ denotes the hidden state of node $v$ , $e_{v, w}$ represents the edge information between nodes $v$ and $w$ , and $M_{t}$ is the message function. This can be understood as aggregating the current state of the target node, the current state of the neighborhood, and the information of the edges to generate a new message.

h_{v}^{t + 1} = U (h_{v}^{t}, m_{v}^{t + 1})

$U (\cdot)$ means update function. The update function uses the message calculated in the previous step to update the next hidden state of the target node. After this process, we consider one layer to be completed, and depending on the model, this process is repeated multiple times.

In the readout phase, the hidden state we obtained is used to predict the value we are interested in.

\hat{y} = R (h_{v}^{T}, \forall v \in V)

We also considered a model that includes a global state. The global state reflects the overall information of the graph, aiming to preserve the overall structure of the compound. If the model's performance improves when fitting the data, we will choose the model with the global state included.

Density function theory (DFT) calculation

Gas-phase geometries and harmonic frequencies were computed with the B3LYP [20, 21], M06-2X [22], and PBEPBE [23] functionals in combination with the 6–311 + + G(d,p) [24] and cc-pVTZ [25–28] basis sets, as implemented in Gaussian 16 [29]; molecular structures were visualized with Gaussian View 6 [30]. Experimental heats of combustion were obtained from the NIST Chemistry WebBook [36]. To approximate condensed-phase contributions, we estimated the standard molar enthalpies of vaporization and sublimation from molecular-surface descriptors computed with Multiwfn [31, 32], following the correlations of Politzer and co-workers [28–30]:

{Δ H}_{vap}^{o} = α_{1} A_{S}^{0.5} + {α_{2} (v σ_{tot}^{2})}^{0.5} + α_{3}

{Δ H}_{sub}^{o} = β_{1} A_{S}^{2} + {β_{2} (v σ_{tot}^{2})}^{0.5} + β_{3}

Here, $Δ H_{vap}^{\circ}$ and $Δ H_{sub}^{\circ}$ denote the standard molar enthalpies of vaporization and sublimation at 298.15 K and 1 atm (kJ mol⁻¹); $A_{S}$ is the molecular surface area (Å²) evaluated on the electron-density isosurface $ρ = 0.001 e {bohr}^{- 3}$ (Multiwfn default), $v$ is the enclosed molecular volume (Å³) on the same isosurface, and $σ_{tot}^{2}$ is the total variance of the electrostatic potential $V (r)$ sampled over that surface (units as defined in the original regressions [32–35]). The empirical coefficients $α_{1 - 3}$ and $β_{1 - 3}$ were taken from Politzer et al.’s fits built on 30 liquefaction and 66 sublimation enthalpies [35]; the numerical values used in this work are listed in the supplementary materials (Table S9–S11). Unless otherwise specified, all thermochemical quantities refer to the standard state (298.15 K, 1 atm). Among the methods examined, M06-2X/6–311 + + G(d,p) yielded the best agreement with experimental heats of combustion and was therefore adopted in subsequent analyses.

Bootstrap confidence interval estimation

Because conventional percentile bootstrap intervals can be biased when the sampling distribution of a statistic is skewed or asymmetric, the bias-corrected (BC) bootstrap method [37] was adopted. Using this approach, we estimated confidence intervals for the MAE and the coefficient of determination ( $R^{2}$ ) on the test set to quantify the uncertainty of model performance.

In each bootstrap iteration, the test samples were resampled with replacement (B replicates), and the corresponding performance metrics were recalculated.

Let $\hat{θ}$ denote the performance metric (e.g., MAE or $R^{2}$ ) computed from the original test data, and let $θ_{1}^{*}, \dots, θ_{B}^{*}$ denote the statistics computed from bootstrap resamples. The BC method adjusts the percentile positions of the bootstrap distribution according to the bias-correction factor $z_{α}$ , defined as:

z_{α} = Φ^{- 1} [(# \{θ_{b}^{*} < \hat{θ}\}) / B],

where $Φ^{- 1}$ is the inverse standard normal cumulative distribution function, and $# \{θ_{b}^{*} < \hat{θ}\}$ represents the number of bootstraps estimates smaller than $\hat{θ}$ .

The adjusted lower and upper quantiles of the two-sided $(1 - α)$ confidence interval is obtained as:

α_{1} = Φ (2 z_{0} + z_{\frac{α}{2}}), α_{2} = Φ (2 z_{0} + z_{1 - \frac{α}{2}}),

yielding the final interval, $[θ^{*} \{(α_{1})\}, θ^{*} \{(α_{2})\}]$ .

Because $R^{2}$ is a ratio-type statistic that can exhibit asymmetry or bias in its sampling distribution, the bias-corrected bootstrap interval provides more accurate and stable confidence limits than the simple percentile bootstrap.

Data

The datasets for each target property were curated from publicly available sources as follows. For Heat of Combustion (HoC), experimentally measured values were primarily obtained from the NIST Chemistry WebBook [36], a widely used reference for thermochemical data. Vapor Pressure (VP) values were collected from published datasets in literature and standardized by applying a logarithmic transformation (logVP). Flashpoint data were compiled from multiple sources, including safety data sheets (SDS), open-access benchmark studies such as Sun et al. [12] and Xu et al. [13], and publicly shared chemical property databases. These datasets were then processed to remove duplicates, ensure format consistency, and support model training.

For every record we tracked label provenance (source ∈ {exp, dft}) and harmonized units to the standard state (298.15 K, 1 atm; kJ mol⁻¹ for HoC). Records labeled as estimates or lacking a clear experimental method were excluded, and duplicates were removed by InChIKey. Each entry retains CAS, canonical SMILES, and the original citation to ensure traceability.

To increase structural coverage for cyclic compounds in HoC, we augmented the training pool with DFT-generated labels (see Density Functional Theory (DFT) calculation), expanding cyclic coverage from 12 to 55 molecules. Computational labels were used only as auxiliary training signals; the held-out test set remains purely experimental. All results and tables are reported stratified by label source (experimental vs DFT) to avoid overstating performance.

To communicate when predictions are reliable, we compute ECFP4-based AD indicators (radius 2, 2048 bits): the maximum Tanimoto similarity to the training set (hereafter referred to as “ ${sim}_{max}$ ”) and a k-NN density score (mean similarity to the 5 nearest training neighbors). We flag molecules as out-of-domain (OOD) if ${sim}_{max}$ < 0.35 or if the density percentile falls below the 20th percentile of the training distribution. For the DFT-augmented cyclic set (n = 55), when training fingerprints were unavailable, we used an intra-set density proxy for AD: 46/55 (83.6%) were in-domain and 9/55 (16.4%) were flagged OOD (density < 20th percentile). Additional dataset distributions are provided in the supplementary materials (Figure S2). When prediction intervals are reported, they are presented together with in-domain/out-of-domain (AD) flags in the Results.

Data curation & coverage

SMILES were canonicalized and sanitized (RDKit); salts/solvents were removed by keeping the largest organic fragment; and obvious valence/charge errors were rejected. All labels were harmonized to the standard state (298.15 K, 1 atm; HoC in kJ mol⁻¹). Records tagged as “estimate”, missing an experimental method, or lacking an unambiguous identifier were excluded. Duplicates were removed by InChIKey, keeping the most complete citation/metadata per entry.

To document chemical coverage, we derived standard RDKit descriptors (molecular weight, TPSA, Crippen logP, rotatable bonds, HBD/HBA, empirical formula). Ring statistics were computed from RDKit’s SSR: maximum ring size and the ring-atom fraction

f_{ring} = \frac{# ring atoms}{# atoms} .

ECFP4 fingerprints (radius = 2, 1024 bits) were computed and summarized with PCA to visualize chemical-space coverage. Redundancy was assessed by the distribution of random-pair Tanimoto similarities (~ 50 k pairs; self-pairs excluded); very-high similarity (≥ 0.95) was used to flag potential near-duplicates during curation. Aggregate atom counts derived from formulas summarize represented element types. Corresponding visualizations appear in Fig. 6 (with extended distributions in the supplementary materials (Figure S2).

Fig. 6 — This result was obtained by fitting the model excluding the red box from Fig. 4. Most observations lie along the line, confirming a good fit

DFT-generated HoC values were used only as auxiliary training signals to expand cyclic coverage. Computational labels can deviate from experiment owing to the choice of functional/basis set, gas-phase reference with empirical condensed-phase corrections (ΔH°_vap, ΔH°_sub), conformer sampling, and anharmonic effects. To avoid bias leakage, no DFT-labeled point appears in the held-out test set; results are reported on experimental labels, and reliability indicators are calibrated on experimental validation data.

We distinguish label noise from DFT and predictive uncertainty of the ML models. Model reliability is communicated via applicability-domain (AD) indicators from ECFP4: (i) the maximum Tanimoto similarity to the training set ( ${sim}_{max}$ ) and (ii) a k-NN density score (mean similarity to the 5 nearest training neighbors). Molecules are flagged out-of-domain when ${sim}_{max} < 0.35$ or the density percentile falls below the 20th percentile of the training distribution. When prediction intervals are reported, they are calibrated only on experimental validation data (e.g., split-conformal) and summarized alongside ID/OOD-stratified metrics.

Routing & inference (number-free)

We compute $f_{ring} = # ring atoms / # atoms$ with RDKit. At inference, molecules that satisfy a ring-dominance criterion, defined by a threshold on $f_{ring}$ chosen on the validation split, are routed to the cyclic RF; all others are predicted by the base model. The threshold was selected to improve validation MAE without over-routing, and borderline cases default to the base model. When an explicit "cyclo" tag is available, that tag also triggers routing to the RF. All reported metrics are on experimental labels only, and applicability-domain (AD) flags accompany predictions.

Implementation and reproducibility details

To ensure full reproducibility, all experiments were conducted with fixed random seeds and on clearly specified CPU-based environments. A seed value of 42 was used for the GNN-based experiments and 41 for the Random Forest and DFT-augmented models, applied consistently across data partitioning, cross-validation, and model initialization.

The GNN models were trained on a Windows 11 workstation equipped with an Intel Core i9-13900 K CPU (3.0 GHz, 32 GB RAM) using Python 3.7.12 and TensorFlow 2.11.0. After the optimal hyperparameters were determined, each final model was trained for 100 epochs, requiring approximately 8–12 min depending on the target property. The Random Forest and DFT-augmented models were trained on a CPU-only system with an Intel Core Ultra 7 258 V CPU (32 GB RAM), completing within about 20 s and achieving an inference throughput of roughly 360 molecules per second.

Results

We used three datasets containing 4780, 3573, and 14,696 observations to build predictive models for HoC, VP, and Flashpoint, respectively. The target variables were HoC (values divided by 1000), LogVP (defined as the natural logarithm of vapor pressure, ln(VP)), and Flashpoint. The distribution of each target variable is shown in Fig. 1.

Fig. 1 — The left side shows the distribution of the target variable, and right side shows the distribution of the training data, validation data, and test data. Since each dataset shows a distribution similar to that of the entire dataset, it can be confirmed that the data was randomly and properly split

The overall flow of the GNN model is shown in Fig. 2. The SMILES data is converted into graph form using RDkit in Python, and then this data is used to fit the GNN model. If the model with a global state performs better, we plan to add the global state during the training process. Before fitting the GNN model, hyperparameter tuning is necessary to optimize the model. We aimed to find the proper four hyperparameters: The number of message-passing layers, the dimension of hidden layers, batch size, and learning rates. These hyperparameters were tuned through a grid search using a fixed validation set, and the combination with the lowest MAE was selected.

Message-passing layers are a central part of the GNN model, a process in which each node gathers information from neighboring nodes and updates its characteristics. That is, the number of message-passing layers determines how many levels of neighborhood each node collects information from. For instance, if the model uses 4 message-passing layers, each node can get information from neighboring nodes that are 4 steps away. The larger the number of message-passing layers, the more information each node can use. However, there are drawbacks such as overfitting and increased computational costs. Therefore, it is important to choose the appropriate number of message-passing layers.

The dimension of hidden layers determines the length of the characteristic vector that each node has. The higher the dimension, the more diverse and complex characteristics each node can express. If the dimension is too high, the model risks overfitting. Conversely, if the dimension is too low, it is difficult for the model to learn adequately.

Batch size refers to the number of data samples input into the model in a single training step. If the batch size is small, learning proceeds more precisely, but an unstable model can be generated due to a lot of noise. In contrast, if the batch size is large, learning proceeds stably, but updates do not proceed in detail because the model learns only the average pattern. For example, when the total dataset is 1,000 and the batch size is 20, 50 training sessions are repeated during one epoch.

The learning rate controls the speed at which the model's weight is updated. Using an appropriate learning rate allows the model to converge quickly and reliably.

We used MAE as the criterion to determine the appropriate hyperparameters. Additionally, between the basic GNN model and the GNN model with the global state added, we intend to use the model with better performance.

During the hyperparameter tuning stage, the epochs were set to 20 for efficiency, and the results are in the supplementary materials (Table S1–S3). All three models, each with HoC, VP, and Flashpoint as response variables, showed better performance when fitting the GNN model with the global state was added. Based on optimal hyperparameters identified during the tuning stage, we fitted the GNN model with global state and set the epochs to 100. After the total data was separated into training data and test data, the training data was further divided into training data and validation data. The model trains on the training data, and we can check the generalizability of the model using validation data. Typically, 80% of the entire dataset is designated as training data, and within this training data, 20% is further split into validation data. The remaining 20% of the entire dataset is allocated as test data.

Figure 3 shows the change in the model's training loss and validation loss as epochs increased. All three models show a sharp decrease in losses during the initial epochs, followed by stabilization. In particular, the validation loss represents the model's prediction error on data not used during training, which can be interpreted as an indication that our model's generalization performance is ensured.

Fig. 3 — During the model fitting process, the flow of loss is shown over the epochs

The performance results of the final model are shown in Table 1. For each target variable, the results do not vary significantly across different datasets. This indicates that the model has good generalization performance and is not overfitted. To quantify uncertainty, 95% BC bootstrap confidence intervals were calculated for the test set based on 1000 bootstrap resamples (B = 1000). The resulting intervals are relatively narrow, indicating that both MAE and $R^{2}$ estimates are stable and robust to sampling variation.

Table 1.

Model performance for each target variable (HoC, VP, and Flashpoint)

Dataset	Metric	Training	Validation	Test
HoC	MAE	0.102	0.153	0.126 [0.010, 0.167]
HoC	$R^{2}$	0.994	0.989	0.993 [0.985, 0.997]
VP	MAE	0.401	0.643	0.617 [0.549, 0.704]
VP	$R^{2}$	0.963	0.886	0.898 [0.860, 0.926]
Flashpoint	MAE	10.31	14.50	14.42 [13.17, 15.77]
Flashpoint	$R^{2}$	0.921	0.846	0.839 [0.790, 0.879]

Open in a new tab

MAE represents the mean absolute error, indicating the average difference between predicted and true values. $R^{2}$ represents the coefficient of determination, indicating the proportion of variance in the observed data explained by the model. Values in brackets denote 95% BC bootstrap confidence intervals for the test set

Figure 4 visualizes the similarity between the true values and the predicted values from the model after fitting is complete. The closer the points are to the $y = x$ line, the better the model's fit. Focusing on the red points that represent the fitting results on the test data, we can see from the graph that HoC, VP, and Flashpoint are all well predicted. Particularly, HoC’s red points are on the $y = x$ line. Among the three properties, Flashpoint prediction showed relatively lower accuracy. This may be partly due to the fact that Flashpoint measurements are more sensitive to experimental conditions such as pressure or contamination, which increases the noise level in the dataset.

In Fig. 4b, a linear cluster of points in the upper-right region corresponds to compounds that were consistently overpredicted in vapor pressure. These compounds have lower molecular weights, a higher fraction of ${sp}^{3}$ carbons, and fewer hydrogen-bond acceptors, with a lower occurrence of halogen or polar functional groups. Such small and nonpolar molecules are structurally simple, providing limited variation for the model to learn from. As a result, the model tends to assign similar predictions, producing the observed linear overprediction. For more detailed analyses, please refer to the supplementary materials, Section A.2.

To quantify prediction error in the test data, obtaining the absolute value of the difference between predicted values and true values, the boxplot results, which measure the absolute differences between the predicted values and the true values for the test data, are displayed on the right side of each graph. In statistics, the value of the third quartile minus the first quartile is called the IQR. When an observation exceeds the value of the third quartile plus 1.5 times the IQR, it is considered an outlier. The number of observations detected as outliers in the test data are 31 for HoC, 14 for VP, and 121 for Flashpoint.

As shown in Table 1, our integrated system achieves the MAE of 0.126 kJ/mol for HoC, 0.617 log units for VP, and 14.42 °C for Flashpoint on the test set. These results demonstrate better performance compared to the models proposed in previous studies, as summarized in Table 2. With an $R^{2}$ of 0.99 for HoC, our model shows comparable or improved accuracy compared to previous studies. For VP, Lin et al. [6] evaluated model performance using AARD. When applying the same method to our model across the overall dataset, we obtain an AARD of 0.5636, which demonstrates a superior result compared to Lin et al.'s reported AARD of 0.617, further demonstrating the superiority of our model in VP prediction. For Flashpoint prediction, MAE improved by 20.8% and 16.6% compared to Sun et al. [3] GCNN and MPNN results, and by 28.6% compared to Xu et al. [4]. Although Table 2 summarizes results from different datasets, it is intended to provide overall context rather than direct one-to-one comparisons.

Table 2.

The performance of our model on the test set was compared with that of existing models from previous studies for each dataset

Data (Evaluation)		HoC ( $R^{2}$ )	VP (AARD)	Flashpoint (MAE)
Our model (Test set)		0.993	0.5636	14.42
Jameel et al. [6]		0.978	–	–
Lin et al. [11]		–	0.617	–
Sun et al. [8]	–	-	–	18.2
Sun et al. [8]	–	-	–	17.3
Xu et al. [9]		–	-	–

Open in a new tab

More importantly, our system demonstrates robust performance across a diverse range of chemical structures, including those outside the training set, suggesting strong generalization capabilities.

As summarized in Table 3, the HoC model exhibited only a slight increase in MAE ( $Δ$ MAE = 0.086), indicating robust generalization across the training chemical space. In contrast, the VP model showed a larger performance gap ( $Δ$ MAE = 0.493), reflecting its more structurally diverse and volatile compound set. The Flashpoint model displayed the largest absolute MAE difference ( $Δ$ MAE = 21.80), mainly because Flashpoint values span a much wider temperature range than the other properties, rather than reflecting any model instability. Overall, these results demonstrate that all models retain reliable performance within their applicability domain while showing predictable trends across out-of-domain samples.

Table 3.

Distance-based applicability domain (AD) analysis results

Property	Threshold	Inside-AD MAE	outside-AD MAE	$Δ$ MAE
HoC	0.450	0.120	0.206	0.086
VP	0.355	0.591	1.084	0.493
Flashpoint	0.429	13.16	34.96	21.80

Open in a new tab

Thresholds correspond to the 5th percentile of the test–train maximum Tanimoto similarity distribution for each property. Compounds with maximum similarity below the threshold were considered outside the domain. MAE were evaluated separately for inside- and outside-domain subsets

Using t-SNE projections of RDKit-based molecular fingerprints, we visualized the chemical space coverage for the HoC, VP, and Flashpoint models (Fig. 5). The HoC model shows that most test molecules, including those labeled as out-of-domain (red), remain close to dense regions of the training set, consistent with its small ΔMAE in Table 3. In contrast, the VP dataset exhibits a more scattered distribution of out-of-domain points across sparsely populated areas, reflecting its higher ΔMAE and greater chemical diversity. The Flashpoint data show broad but overlapping train–test coverage, indicating that most predictions were made within a well-sampled chemical space. Together, these visualizations support that model reliability strongly correlates with the degree of chemical space overlap between training and test sets.

Fig. 5 — t-SNE visualization of the chemical space coverage for the HoC, VP, and Flashpoint datasets. Each point represents a molecule based on RDKit Morgan fingerprints (radius = 2). Training data are shown in gray, in-domain test compounds (similarity above threshold) in blue, and out-of-domain compounds (below threshold) in red. Most test molecules fall within dense training regions, confirming adequate chemical space coverage

While our integrated model demonstrated strong performance across all three physicochemical properties, we observed that flashpoint prediction exhibited slightly lower accuracy (R² = 0.839) compared to heat of combustion (R² = 0.993) and vapor pressure (R² = 0.898). This difference can be attributed to two interrelated factors. First, flashpoint is inherently more sensitive to experimental conditions than the other two properties. Its measured value can vary significantly depending on the ambient temperature and pressure, the physical state and purity of the substance, and the specific experimental method used (e.g., open-cup vs. closed-cup apparatus). As highlighted in previous literature, inter-laboratory variability for flashpoint can exceed ± 10 °C even for well-characterized substances [1, 2], introducing unavoidable label noise even in curated datasets. Second, flashpoint is an emergent property that reflects not only volatility, but also phase behavior and combustion kinetics. It depends on the ability of a compound to generate sufficient vapor pressure under ambient conditions to form an ignitable mixture with air. This complexity makes flashpoint inherently more difficult to model from structure alone.

Many previous studies have combined different methodologies to enhance predictive power. While this improves accuracy, it also requires extensive input data and prior information, raising concerns about model parsimony. Moreover, while three separate GNN models were trained individually for HoC, VP, and Flashpoint, these models share a common architecture and are deployed together within a unified prediction framework, offering a streamlined platform for integrated chemical safety assessment. By leveraging a sufficient dataset, our model demonstrates strong performance across all predictions, achieving reliable results with the most parsimonious approach for simultaneous predictions.

In this paper, we present the development, validation, and application of our integrated prediction system. We demonstrate its superior performance compared to existing models and discuss its potential impact on enhancing chemical safety protocols and emergency response strategies. Furthermore, we explore the broader implications of this technology for regulatory frameworks, industrial safety practices, and the future of chemical risk assessment in an era of rapid innovation and expanding chemical diversity. We also address the limitations and ethical considerations of relying on AI-driven predictions for safety–critical applications. While our system represents a significant step forward, we emphasize the importance of using these predictions as a complement to, rather than a replacement for, experimental data and expert judgment. As the chemical landscape continues to evolve at an unprecedented pace, tools like our integrated prediction system will become increasingly vital. By providing rapid, reliable estimates of critical physicochemical properties, we aim to contribute significantly to the global effort to ensure safer handling, transportation, and use of chemical substances, ultimately reducing the risk and impact of chemical accidents in industrial and research settings alike.

While the quantitative comparison of predictive performance with prior studies provides useful benchmarks, it should be noted that the datasets used in those works differ in chemical coverage and modeling scope. For instance, Xu et al. [9] used a flashpoint dataset mainly composed of hydrocarbons and small organic compounds, whereas our dataset includes a wider range of structural classes—including heterocycles, halogenated compounds, and DFT-augmented cyclic molecules. Thus, performance metrics are not directly comparable but rather illustrate the robustness of our integrated model across a structurally diverse chemical domain.

GNNs for HoC excluding cyclic compounds

As an additional analysis beyond our integrated ML model, we examined the 12 compounds within the red box in the (a) HoC plot of Fig. 4 and identified that all were cycloalkanes, as shown in Figure S2 of the supplementary materials. We then re-fit the GNN model for the HoC data excluding the 12 cyclic compounds, while employing a Random Forest Regression model specifically for the cyclic compounds. This approach aimed to enhance predictive accuracy, and the results are presented below. After removing the chemicals within the red box, we used the remaining data points to fit a GNN model with an added global state. The results of this model are presented in Fig. 6 and Table 4. As shown in Fig. 6, which compares the true values with those predicted by the model, the predictions align well with the actual values, indicating the model’s accuracy. A comparison between Table 4 and Table 1 reveals a slight increase in MAE; however, given the change in dataset, performance remains consistent and stable. Additionally, the coefficient of determination ( $R^{2}$ ), representing the explanatory power of the model, is comparable to or improved over that of the previous HoC prediction model, supporting its significance.

Table 4.

The performance of the model fitted without the red box from Fig. 4 is presented with MAE and $R^{2}$

Dataset	Training	Validation	Test
MAE	0.174	0.172	0.174
$R^{2}$	0.991	0.994	0.993

Open in a new tab

Dataset characterization & chemical-space coverage

Distributions of HoC, $f_{ring}$ , ring size, and physicochemical descriptors are summarized in Fig. 7 (with extended histograms in the supplementary materials, Figure S2). We computed ECFP4 fingerprints (radius = 2, 1024 bits) for each molecule and projected them with PCA to visualize chemical-space coverage (Fig. 7c). Dataset redundancy was assessed from a histogram of random-pair Tanimoto similarities (~ 50 k; self-pairs excluded), where very-high similarity (≥ 0.95) is rare (Fig. 7d). Aggregate atom counts derived from formulas summarize the element types represented (Fig. 7a). These diagnostics provide the context for the cyclic failure analysis and the specialized RF described below.

Cyclic subset failure mode

Cyclic compounds show a flattened prediction pattern with values clustered near a local mean. We attribute this to three factors: (i) under-representation of ring-dominant structures (see Fig. 7a), which encourages mean-shrinkage under MSE training; (ii) representation bias—message passing trained mostly on linear/branched scaffolds under-differentiates subtle ring-strain/topology cues; and (iii) edge-of-domain conditions in fingerprint space, where some cyclic queries lie at low density/low similarity. To mitigate this without retraining the base GNN, we introduce a small, descriptor-based Random Forest specialization for ring-dominant molecules.

The HoC random forest model for cyclic compounds

We chose Random Forest over GNN for the cyclic compound subset due to the small sample size. GNN models have complex architectures and a large number of parameters, which can easily lead to overfitting when the available data is limited. In addition, if the molecular graphs are sparse or have irregular structures, training can become unstable. On the other hand, Random Forest is a non-parametric, ensemble-based method that is more stable and less sensitive to small data sizes. It tends to have lower variance and better generalization in such cases, which helps reduce overfitting. For these reasons, we applied Random Forest to the cyclic compounds instead of using a GNN model.

To build a more robust model for this subset, we expanded the data using DFT calculations. Using DFT as described in the Method section, the original dataset of 12 cyclic compounds was expanded to 55 data points. This expansion was achieved by generating data for structurally similar cyclic compounds through quantum calculations. Subsequently, features such as MolWt, TPSA, HBA, HBD, Kappa, Bertz_ct, chi1, Balaban and Os. MolWt represents the molecular weight of a compound, calculated as the sum of its atomic weights. TPSA measures the polar surface area based on molecular topology, reflecting the molecule’s polarity. HBA and HBD represent the counts of hydrogen bond acceptors and donors, respectively, which influence molecular interactions. Kappa indices quantify the molecular shape and flexibility based on the connectivity of atoms in a molecule. Bertz_ct refers to the Bertz Complexity Index, which measures the structural complexity of a molecule using graph-theoretic properties. chi1, the First-order Molecular Connectivity Index, is a topological descriptor derived from atomic connectivity, while the Balaban J Index quantifies molecular structure based on the shortest paths in the molecular graph. Lastly, Os, representing the number of oxygen atoms, indicates the total count of oxygen atoms in a molecule. These descriptors provide a comprehensive perspective on the molecule’s structure and properties.

Based on these features, a Random Forest model was designed to predict the HoC. The dataset was divided into training and test sets, and the process of hyperparameter tuning and selecting the optimal model through cross-validation was conducted similarly to the previously described method. The commonly used hyperparameters in random forest models are as follows. First, the number of trees was set to values in [50,100,200], balancing performance and computational cost. Additionally, the maximum depth of each tree balances the model's ability to capture complex patterns while avoiding overfitting. We set it to [None,5,10], where None allowed unlimited depth. The minimum number of samples required to split a node was tested with values of 2 and 5, while the minimum number of samples in a leaf node was set to 1 and 2. Using GridSearchCV from Python's sklearn library, the optimal combination of these hyperparameters was found to be 50 trees, a maximum depth of 10, a minimum split size of 2, and a minimum leaf size of 1.

At inference we route ring-dominant molecules, defined by a threshold on the ring-atom fraction $f_{ring}$ , to the cyclic RF; all others use the base model. The threshold was chosen on the validation split to improve MAE without over-routing. Evaluation is based on experimental test labels, and AD flags accompany all predictions (cf. Figure 7).

The results using the Random Forest model are summarized in Fig. 8 and Table 5. Figure 8a shows a plot where, as in the previously discussed graph, the closer the data points are to the line $y = x$ , the better the model’s performance can be interpreted. While most points for the training set lie close to $y = x$ , some deviations are observed in the test set. This results in a difference in MAE values between the training and test sets, as shown in Table 5. This discrepancy could be attributed to the smaller size of the test set, where extreme deviations between predicted and true values may have a disproportionate impact on the overall results. As indicated in Table 5, the model demonstrates high explanatory power, $R^{2}$ , ensuring reliable predictive performance for HoC.

Table 5.

It shows the results of the Random Forest model for different datasets of cyclic compounds

Dataset	Training	Test
MAE	0.369	1.149
$R^{2}$	0.976	0.918

Open in a new tab

In the feature importance plot of Fig. 8b, HBA emerged as the most critical feature for model prediction. This is likely because HBA significantly influences molecular interactions and stability. During combustion, the presence of hydrogen bond acceptors affects the molecule’s ability to form hydrogen bonds, which can, in turn, impact the way energy is released. Thus, HBA may serve as a key determinant of the total energy released during combustion, making it a vital factor in predicting the heat of combustion for these compounds.

Additionally, given that the dataset primarily consists of cyclic compounds, the structural characteristics of these compounds could enhance the importance of HBA relative to other features. This structural dependency likely makes HBA an even more influential feature for accurately modeling the heat of combustion in this specific dataset.

Real-time prediction system for unknown chemical substances

Building upon the three crucial prediction models discussed earlier for Heat of Combustion (HoC), Vapor Pressure (VP), and Flashpoint, we have developed a real-time prediction system for potential chemical accident substances. This system leverages Flask to create an interactive web application, allowing users to input chemical structures and receive instant predictions for these three critical properties. The prediction system incorporates several key features to enhance its utility and accessibility. Users can input chemical structures using either SMILES notation or by drawing the molecular structure directly in the interface. Upon input, the system instantly generates predictions for HoC, VP, and Flashpoint using our pre-trained Graph Neural Network (GNN) models. (Fig. 9) To enhance the system's utility, we've included a feature that allows users to input experimental data. This enables the system to check whether the queried substance was part of the original training dataset.

Fig. 9 — Conceptual diagram for real-time prediction system on HoC, VP, Flashpoint

If the substance is found in the training data, the system provides a side-by-side comparison of the experimental data and the model's predictions. This feature helps users assess the model's accuracy for known substances. Furthermore, to provide context and improve the system's practical applicability in chemical accident scenarios, we've incorporated a comparison feature with key reference chemicals: acetone, n-hexane, and n-decane. Users can easily compare their queried substance's properties against these common industrial chemicals, offering valuable insights into relative risks and behaviors (https://terror-n.run.goorm.site/).

This real-time prediction system offers several practical benefits in both research and emergency response scenarios. In the event of a chemical accident involving an unknown substance, the system can quickly provide estimates of critical properties, aiding in immediate response planning. For substances with incomplete experimental data, our system can fill in crucial gaps, especially for the three key properties: Vapor Pressure, Heat of Combustion, and Flashpoint. The ability to benchmark against common industrial chemicals allows responders to quickly gauge the relative risks associated with an unknown substance. Moreover, the system's capability to compare predictions with experimental data (when available) allows for ongoing assessment and refinement of our prediction models. This feature not only aids in model validation but also contributes to the continuous improvement of the system's accuracy and reliability.

The development of this real-time prediction system represents a significant step forward in our ability to rapidly assess and respond to potential chemical accidents involving unknown substances. By leveraging our advanced GNN models and providing an intuitive, interactive interface, we've created a tool that can be invaluable in enhancing safety measures and response strategies in the chemical industry. The system's ability to predict missing data for crucial properties like Vapor Pressure, Heat of Combustion, and Flashpoint addresses a critical need in emergency response scenarios, where quick and accurate information can be lifesaving. The dataset covers a broad range of chemical classes, including aliphatic and aromatic hydrocarbons, alcohols, acids, esters, amines, and various cyclic systems. However, it remains incomplete. Compound types such as organometallics and larger biomolecules (e.g., peptides) are underrepresented, which may lead to reduced predictive reliability for such substances. Future work could address this limitation by expanding the training data to better represent these novel chemical classes and enhance the model’s applicability domain.

Conclusions

We have developed an integrated Graph Neural Network approach for predicting three crucial chemical safety properties: Heat of Combustion (HoC), Vapor Pressure (VP), and Flashpoint. Our model achieves exceptional performance with MAE values of 126 kJ/mol (R² = 0.993) for HoC, 0.617 log units (R² = 0.898) for VP, and 14.42 °C (R² = 0.839) for Flashpoint, representing significant improvements over existing prediction methods. The incorporation of global state features within the GNN architecture proved particularly effective, enhancing model performance across all three properties while maintaining computational efficiency.

A significant contribution of our work lies in the systematic identification and resolution of prediction challenges for cyclic compounds. By leveraging Density Functional Theory calculations with the M06-2X method and 6–311 + + G(d,p) basis set, we expanded our cyclic compound dataset from 12 to 55 compounds. This theoretical data augmentation, combined with a specialized Random Forest model incorporating molecular descriptors, successfully addressed the cyclic compound prediction challenge, achieving an R² of 0.918. The identification of hydrogen bond acceptor (HBA) count as the primary predictor offers valuable insights into structure–property relationships for cyclic systems.

The real-time prediction platform we developed demonstrates the practical applicability of our approach. Our system uniquely combines SMILES notation input with direct structural drawing capabilities and provides benchmarking against common industrial reference compounds (acetone, n-hexane, and n-decane). This integration of theoretical advancement with practical utility addresses a critical need in emergency response scenarios, where rapid and reliable property prediction is essential.

While acknowledging certain limitations, such as the computational intensity of DFT calculations, varying prediction accuracy for novel structures, and the relatively lower performance for Flashpoint prediction, our approach represents a significant advancement in the field. These limitations provide direction for future research, including extending our methodology to other challenging chemical classes and exploring more efficient ways to generate theoretical data for model enhancement.

Looking forward, our successful approach to handling cyclic compounds through quantum chemistry calculations and specialized modeling provides a template for addressing other challenging chemical classes. Future research should focus on extending this methodology to other outlier groups, improving prediction accuracy for complex molecular structures, and exploring the integration of our system with existing chemical management protocols. Additionally, further investigation of structure–property relationships revealed through our models could guide the development of safer chemicals.

This work represents a significant step forward in chemical safety informatics, offering not only improved prediction accuracy but also a practical framework for real-world applications. The combination of high predictive performance, theoretical rigor in handling challenging compounds, and practical accessibility through a real-time platform positions this work as a valuable contribution to both industrial safety practices and regulatory frameworks.

Supplementary Information

Supplementary Material 1.^{(646KB, docx)}

Acknowledgements

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (Ministry of Science and ICT) (RS-2025–00519428) and (RS-2023-00225125) This work was also supported by the Korea Health Industry Development Institute (KHIDI) (RS-2023-KH136924) and the National Institute of Chemical Safety (NICS) of the Ministry of Environment (MOE) of the Republic of Korea

Author contributions

Seul Lee and Jooyeon Lee contributed equally to this work as co-first authors, performing the majority of the research, data analysis, and manuscript writing. Seul Lee and Jooyeon Lee led the development and optimization of the Graph Neural Network models for predicting Heat of Combustion, Vapor Pressure, and Flashpoint properties.Unghwi Yoon was responsible for the Density Functional Theory calculations and the specialized Random Forest modeling approach for cyclic compounds. Jahyun Koo, Young Wook Yoon, Yoonjae Cho, and Seung-Ryul Hwang contributed to experimental design, model validation, and provided critical discussion throughout the research process. Keunhong Jeong, as the corresponding author, conceptualized the project, supervised the research activities, provided guidance on the methodology, and contributed significantly to manuscript preparation and revision. All authors reviewed and approved the final version of the manuscript.

Data availability

The source code and the database generated for our analysis are available via the following github link: [https://github.com/doas1min/Chemical-Safety-Prediction-I]. Real time prediction web is available via the following link: [https://terror-n.run.goorm.site/].

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Seul Lee and Jooyeon Lee have contributed equally to this work.

References

1.United States Environmental Protection Agency (2024) Studies & Cost Estimates. https://www.epa.gov/system/files/documents/2024-05/studies-cost-estimates-2024.pdf
2.Devito M, Farrell P, Hagiwara S, Harrill A, Krewski D, Paoli G, Thomas R (2024) Value of information case study: human health and economic trade-offs associated with the timeliness, uncertainty, and costs of the draft EPA transcriptomic assessment product (ETAP). U.S. Environmental Protection Agency: Washington, DC. 10.23645/epacomptox.26093572 [PubMed]
3.Nicol AM, Hurrell AC, Wahyuni D, McDowall W, Chu W (2008) Accuracy, comprehensibility, and use of material safety data sheets: a review. Am J Ind Med 51(11):861–876. 10.1002/ajim.20613 [DOI] [PubMed] [Google Scholar]
4.Commission Regulation (EU) 2020/878 of 18 June 2020 amending Annex II to Regulation (EC) No 1907/2006 of the European Parliament and of the Council concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH)
5.U.S. Chemical Safety Board. (2016) West Fertilizer Explosion and Fire. https://www.csb.gov/west-fertilizer-explosion-and-fire-/. Accessed 3 Feb 2025
6.Yalamanchi KK, van Oudenhoven VCO, Tutino F, Monge-Palacios M, Alshehri A, Gao X, Sarathy SM (2019) Predicting standard enthalpy of formation of hydrocarbons using machine learning. J Phys Chem A 123(38):8305–8313. 10.1021/acs.jpca.9b04771 [DOI] [PubMed] [Google Scholar]
7.Üstün CE, Karaoğlan ME, Yıldız AA, Yılmaz A (2025) Machine learning applications for predicting fuel ignition and flame properties: current status and future perspectives. Energy Fuels 39(28):13281–13314. 10.1021/acs.energyfuels.4c04517 [Google Scholar]
8.Lin K, Liao B, Chen X, Chen C, Lei W, Zhou X (2025) Interpretable machine learning for predicting key hazardous properties of chemicals. Waste Manag 207:115111. 10.1016/j.wasman.2025.115111 [DOI] [PubMed] [Google Scholar]
9.Jeong K, Nam JH, Lee S, Koo J, Lee J, Yu D, Jo S, Kim J (2024) Prediction of flash point of materials using Bayesian kernel machine regression based on Gaussian processes with LASSO-like spike-and-slab hyperprior. J Chemom 38(5):e3586. 10.1002/cem.3586 [Google Scholar]
10.Abdul Jameel A, Al-Muslem A, Ahmad N, Alquaity A, Zahid U, Ahmed U (2022) Predicting enthalpy of combustion using machine learning. Processes 10(11):2384. 10.3390/pr10112384 [Google Scholar]
11.Cao HY, Jiang JC, Pan Y, Wang R, Cui Y (2009) Prediction of the net heat of combustion of organic compounds based on atom-type electrotopological state indices. J Loss Prev Process Ind 22(2):222–227 [Google Scholar]
12.Sun X, Krakauer NJ, Politowicz A, Chen W-T, Li Q, Li Z, Shao X, Sunaryo A, Shen M, Wang J, Morgan D (2020) Assessing graph-based deep learning models for predicting flash point. Mol Inf 39:1900101 [DOI] [PubMed] [Google Scholar]
13.Xu Y, Huang X, Li C, Wei Z, Wang M (2022) Predicting structure-dependent properties directly from the three dimensional molecular images. AIChE J 68:8 [Google Scholar]
14.Mirshahvalad HR, Ghasemiasl R, Raufi N, Dirin M (2020) A neural networks model for accurate prediction of the flash point of chemical compounds. Iran J Chem Chem Eng 39(4):297–304 [Google Scholar]
15.Chen Z, Vom Lehn F, Pitsch H, Cai L (2025) Design of novel high-performance fuels with artificial intelligence: Case study for spark-ignition engine applications. Appl Energ Combust Sci 23:100341. 10.1016/j.jaecs.2025.100341 [Google Scholar]
16.Lin YH, Liang HH, Lin ST, Li YP (2024) Advancing vapor pressure prediction: a machine learning approach with directed message passing neural networks. J Taiwan Inst Chem Eng. 10.1016/j.jtice.2024.105926 [Google Scholar]
17.Santana VV, Rebello CM, Queiroz LP, Ribeiro AM, Shardt N, Nogueira IBR (2024) PUFFIN: a path-unifying feed-forward interfaced network for vapor pressure prediction. Chem Eng Sci 286:119623 [Google Scholar]
18.Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for Quantum chemistry. Int Conf Mach Learn 70:1263–1272 [Google Scholar]
19.Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V (2017) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9(2):513–530 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Becke AD (1988) Density-functional exchange-energy approximation with correct asymptotic behavior. Phys Rev A 38(6):3098–3100. 10.1103/physreva.38.3098 [DOI] [PubMed] [Google Scholar]
21.Lee C, Yang W, Parr RG (2018) Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. Phys Rev B Condens Matter 37(2):785–789. 10.1103/physrevb.37.785 [DOI] [PubMed] [Google Scholar]
22.Zhao Y, Truhlar DG (2008) The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals. Theor Chem Acc 120(1–3):215–241. 10.1007/s00214-007-0310-x. [Google Scholar]
23.Perdew JP, Burke K, Ernzerhof M (1996) Generalized gradient approximation made simple. Phys Rev Lett 77(18):3865–3868. 10.1103/PhysRevLett.77.3865 [DOI] [PubMed] [Google Scholar]
24.Frisch MJ, Pople JA, Binkley JS (1984) Self-consistent molecular orbital methods 25. supplementary functions for gaussian basis sets. J Chem Phys 80(7):3265–3269. 10.1063/1.447079 [Google Scholar]
25.Kendall RA, Dunning TH, Harrison RJ (1992) Electron affinities of the first-row atoms revisited. systematic basis sets and wave functions. J Chem Phys 96(9):6796–6806. 10.1063/1.462569 [Google Scholar]
26.Woon DE, Dunning TH (1993) Gaussian basis sets for use in correlated molecular calculations. III. The atoms aluminum through argon. J Chem Phys 98(2):1358–1371. 10.1063/1.464303 [Google Scholar]
27.Peterson KA, Woon DE, Dunning TH (1994) Benchmark calculations with correlated molecular wave functions. IV. The classical barrier height of the H+H2→H2+H reaction. J Chem Phys 100(10):7410–7415. 10.1063/1.466884 [Google Scholar]
28.Wilson AK, Van Mourik T, Dunning TH (1996) Gaussian basis sets for use in correlated molecular calculations. VI. Sextuple zeta correlation consistent basis sets for boron through neon. J Mol Struct THEOCHEM. Vol. 388. http://www.emsl.pnl.gov:2080/forms/basisform.html
29.Frisch MJ et al (2016) Gaussian16. Gaussian Inc, Wallingford CT [Google Scholar]
30.Dennington R et al (2016) Gaussian view 6. Semichem Inc, Shawnee Mission [Google Scholar]
31.Lu T, Chen F (2012) Multiwfn: a multifunctional wavefunction analyzer. J Comput Chem 33(5):580–592. 10.1002/jcc.22885 [DOI] [PubMed] [Google Scholar]
32.Lu T (2024) A comprehensive electron wavefunction analysis toolbox for chemists, Multiwfn. J Chem Phys. 10.1063/5.0216272 [DOI] [PubMed] [Google Scholar]
33.Murray JS, Politzer P (1994) Quantitative treatments of solute/solvent interactions. Elsevier, Amsterdam [Google Scholar]
34.Politzer P, Murray JS, Edward Grice ME, Desalvo M, Miller E (1997) Calculation of heats of sublimation and solid phase heats of formation. Mol Phys 91(5):923–928. 10.1080/002689797171030 [Google Scholar]
35.Politzer P, Ma Y, Lane P, Concha MC (2005) <article-title update="added">Computational prediction of standard gas, liquid, and solid‐phase heats of formation and heats of vaporization and sublimation. Int J Quantum Chem 105:341–347. 10.1002/qua.20709 [Google Scholar]
36.Mallard WG, Linstrom PJ. NIST chemistry webbook, NIST standard reference database No. 69. http://webbook.nist.gov
37.Efron B (1981) Nonparametric estimates of standard error: the jackknife, the bootstrap and other methods. Biometrika 68(3):589–599. 10.2307/2335441 [Google Scholar]
38.Liu J, Gong S, Li H, Liu G (2022) Molecular graph-based deep learning method for predicting multiple physical properties of alternative fuel components. Fuel 313:122712. 10.1016/j.fuel.2021.122712 [Google Scholar]
39.Saldana DA, Starck L, Mougin P, Rousseau B, Pidol L, Jeuland N, Creton B (2011) Flash point and cetane number predictions for fuel compounds using quantitative structure property relationship (QSPR) methods. Energ Fuel 25:3900–3908. 10.1021/ef200795j [Google Scholar]
40.Tarjomannejad A (2015) Prediction of the liquid vapor pressure using the artificial neural network-group contribution method. Iran J Chem Chem Eng 34(4):97–111 [Google Scholar]
41.Hyttinen N, Li L, Hallquist M, Wu C (2024) Machine learning model to predict saturation vapor pressures of atmospheric aerosol constituents. ACS ES&T Air 1(9):1156–1163 [Google Scholar]
42.Liang C, Gallagher DA (1998) QSPR prediction of vapor pressure from solely theoretically-derived descriptors. J Chem Inf Comput Sci 38(2):321–324 [Google Scholar]
43.Reiser P, Neubert M, Eberhard A et al (2022) Graph neural networks for materials science and chemistry. Commun Mater 3:93 [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Jiang D, Wu Z, Hsieh CY et al (2021) Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. J Cheminform 13:12 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1.^{(646KB, docx)}

Data Availability Statement

[CR1] 1.United States Environmental Protection Agency (2024) Studies & Cost Estimates. https://www.epa.gov/system/files/documents/2024-05/studies-cost-estimates-2024.pdf

[CR2] 2.Devito M, Farrell P, Hagiwara S, Harrill A, Krewski D, Paoli G, Thomas R (2024) Value of information case study: human health and economic trade-offs associated with the timeliness, uncertainty, and costs of the draft EPA transcriptomic assessment product (ETAP). U.S. Environmental Protection Agency: Washington, DC. 10.23645/epacomptox.26093572 [PubMed]

[CR3] 3.Nicol AM, Hurrell AC, Wahyuni D, McDowall W, Chu W (2008) Accuracy, comprehensibility, and use of material safety data sheets: a review. Am J Ind Med 51(11):861–876. 10.1002/ajim.20613 [DOI] [PubMed] [Google Scholar]

[CR4] 4.Commission Regulation (EU) 2020/878 of 18 June 2020 amending Annex II to Regulation (EC) No 1907/2006 of the European Parliament and of the Council concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH)

[CR5] 5.U.S. Chemical Safety Board. (2016) West Fertilizer Explosion and Fire. https://www.csb.gov/west-fertilizer-explosion-and-fire-/. Accessed 3 Feb 2025

[CR6] 6.Yalamanchi KK, van Oudenhoven VCO, Tutino F, Monge-Palacios M, Alshehri A, Gao X, Sarathy SM (2019) Predicting standard enthalpy of formation of hydrocarbons using machine learning. J Phys Chem A 123(38):8305–8313. 10.1021/acs.jpca.9b04771 [DOI] [PubMed] [Google Scholar]

[CR7] 7.Üstün CE, Karaoğlan ME, Yıldız AA, Yılmaz A (2025) Machine learning applications for predicting fuel ignition and flame properties: current status and future perspectives. Energy Fuels 39(28):13281–13314. 10.1021/acs.energyfuels.4c04517 [Google Scholar]

[CR8] 8.Lin K, Liao B, Chen X, Chen C, Lei W, Zhou X (2025) Interpretable machine learning for predicting key hazardous properties of chemicals. Waste Manag 207:115111. 10.1016/j.wasman.2025.115111 [DOI] [PubMed] [Google Scholar]

[CR9] 9.Jeong K, Nam JH, Lee S, Koo J, Lee J, Yu D, Jo S, Kim J (2024) Prediction of flash point of materials using Bayesian kernel machine regression based on Gaussian processes with LASSO-like spike-and-slab hyperprior. J Chemom 38(5):e3586. 10.1002/cem.3586 [Google Scholar]

[CR10] 10.Abdul Jameel A, Al-Muslem A, Ahmad N, Alquaity A, Zahid U, Ahmed U (2022) Predicting enthalpy of combustion using machine learning. Processes 10(11):2384. 10.3390/pr10112384 [Google Scholar]

[CR11] 11.Cao HY, Jiang JC, Pan Y, Wang R, Cui Y (2009) Prediction of the net heat of combustion of organic compounds based on atom-type electrotopological state indices. J Loss Prev Process Ind 22(2):222–227 [Google Scholar]

[CR12] 12.Sun X, Krakauer NJ, Politowicz A, Chen W-T, Li Q, Li Z, Shao X, Sunaryo A, Shen M, Wang J, Morgan D (2020) Assessing graph-based deep learning models for predicting flash point. Mol Inf 39:1900101 [DOI] [PubMed] [Google Scholar]

[CR13] 13.Xu Y, Huang X, Li C, Wei Z, Wang M (2022) Predicting structure-dependent properties directly from the three dimensional molecular images. AIChE J 68:8 [Google Scholar]

[CR14] 14.Mirshahvalad HR, Ghasemiasl R, Raufi N, Dirin M (2020) A neural networks model for accurate prediction of the flash point of chemical compounds. Iran J Chem Chem Eng 39(4):297–304 [Google Scholar]

[CR15] 15.Chen Z, Vom Lehn F, Pitsch H, Cai L (2025) Design of novel high-performance fuels with artificial intelligence: Case study for spark-ignition engine applications. Appl Energ Combust Sci 23:100341. 10.1016/j.jaecs.2025.100341 [Google Scholar]

[CR16] 16.Lin YH, Liang HH, Lin ST, Li YP (2024) Advancing vapor pressure prediction: a machine learning approach with directed message passing neural networks. J Taiwan Inst Chem Eng. 10.1016/j.jtice.2024.105926 [Google Scholar]

[CR17] 17.Santana VV, Rebello CM, Queiroz LP, Ribeiro AM, Shardt N, Nogueira IBR (2024) PUFFIN: a path-unifying feed-forward interfaced network for vapor pressure prediction. Chem Eng Sci 286:119623 [Google Scholar]

[CR18] 18.Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for Quantum chemistry. Int Conf Mach Learn 70:1263–1272 [Google Scholar]

[CR19] 19.Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V (2017) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9(2):513–530 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Becke AD (1988) Density-functional exchange-energy approximation with correct asymptotic behavior. Phys Rev A 38(6):3098–3100. 10.1103/physreva.38.3098 [DOI] [PubMed] [Google Scholar]

[CR21] 21.Lee C, Yang W, Parr RG (2018) Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. Phys Rev B Condens Matter 37(2):785–789. 10.1103/physrevb.37.785 [DOI] [PubMed] [Google Scholar]

[CR22] 22.Zhao Y, Truhlar DG (2008) The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals. Theor Chem Acc 120(1–3):215–241. 10.1007/s00214-007-0310-x. [Google Scholar]

[CR23] 23.Perdew JP, Burke K, Ernzerhof M (1996) Generalized gradient approximation made simple. Phys Rev Lett 77(18):3865–3868. 10.1103/PhysRevLett.77.3865 [DOI] [PubMed] [Google Scholar]

[CR24] 24.Frisch MJ, Pople JA, Binkley JS (1984) Self-consistent molecular orbital methods 25. supplementary functions for gaussian basis sets. J Chem Phys 80(7):3265–3269. 10.1063/1.447079 [Google Scholar]

[CR25] 25.Kendall RA, Dunning TH, Harrison RJ (1992) Electron affinities of the first-row atoms revisited. systematic basis sets and wave functions. J Chem Phys 96(9):6796–6806. 10.1063/1.462569 [Google Scholar]

[CR26] 26.Woon DE, Dunning TH (1993) Gaussian basis sets for use in correlated molecular calculations. III. The atoms aluminum through argon. J Chem Phys 98(2):1358–1371. 10.1063/1.464303 [Google Scholar]

[CR27] 27.Peterson KA, Woon DE, Dunning TH (1994) Benchmark calculations with correlated molecular wave functions. IV. The classical barrier height of the H+H2→H2+H reaction. J Chem Phys 100(10):7410–7415. 10.1063/1.466884 [Google Scholar]

[CR28] 28.Wilson AK, Van Mourik T, Dunning TH (1996) Gaussian basis sets for use in correlated molecular calculations. VI. Sextuple zeta correlation consistent basis sets for boron through neon. J Mol Struct THEOCHEM. Vol. 388. http://www.emsl.pnl.gov:2080/forms/basisform.html

[CR29] 29.Frisch MJ et al (2016) Gaussian16. Gaussian Inc, Wallingford CT [Google Scholar]

[CR30] 30.Dennington R et al (2016) Gaussian view 6. Semichem Inc, Shawnee Mission [Google Scholar]

[CR31] 31.Lu T, Chen F (2012) Multiwfn: a multifunctional wavefunction analyzer. J Comput Chem 33(5):580–592. 10.1002/jcc.22885 [DOI] [PubMed] [Google Scholar]

[CR32] 32.Lu T (2024) A comprehensive electron wavefunction analysis toolbox for chemists, Multiwfn. J Chem Phys. 10.1063/5.0216272 [DOI] [PubMed] [Google Scholar]

[CR33] 33.Murray JS, Politzer P (1994) Quantitative treatments of solute/solvent interactions. Elsevier, Amsterdam [Google Scholar]

[CR34] 34.Politzer P, Murray JS, Edward Grice ME, Desalvo M, Miller E (1997) Calculation of heats of sublimation and solid phase heats of formation. Mol Phys 91(5):923–928. 10.1080/002689797171030 [Google Scholar]

[CR35] 35.Politzer P, Ma Y, Lane P, Concha MC (2005) <article-title update="added">Computational prediction of standard gas, liquid, and solid‐phase heats of formation and heats of vaporization and sublimation. Int J Quantum Chem 105:341–347. 10.1002/qua.20709 [Google Scholar]

[CR36] 36.Mallard WG, Linstrom PJ. NIST chemistry webbook, NIST standard reference database No. 69. http://webbook.nist.gov

[CR37] 37.Efron B (1981) Nonparametric estimates of standard error: the jackknife, the bootstrap and other methods. Biometrika 68(3):589–599. 10.2307/2335441 [Google Scholar]

[CR38] 38.Liu J, Gong S, Li H, Liu G (2022) Molecular graph-based deep learning method for predicting multiple physical properties of alternative fuel components. Fuel 313:122712. 10.1016/j.fuel.2021.122712 [Google Scholar]

[CR39] 39.Saldana DA, Starck L, Mougin P, Rousseau B, Pidol L, Jeuland N, Creton B (2011) Flash point and cetane number predictions for fuel compounds using quantitative structure property relationship (QSPR) methods. Energ Fuel 25:3900–3908. 10.1021/ef200795j [Google Scholar]

[CR40] 40.Tarjomannejad A (2015) Prediction of the liquid vapor pressure using the artificial neural network-group contribution method. Iran J Chem Chem Eng 34(4):97–111 [Google Scholar]

[CR41] 41.Hyttinen N, Li L, Hallquist M, Wu C (2024) Machine learning model to predict saturation vapor pressures of atmospheric aerosol constituents. ACS ES&T Air 1(9):1156–1163 [Google Scholar]

[CR42] 42.Liang C, Gallagher DA (1998) QSPR prediction of vapor pressure from solely theoretically-derived descriptors. J Chem Inf Comput Sci 38(2):321–324 [Google Scholar]

[CR43] 43.Reiser P, Neubert M, Eberhard A et al (2022) Graph neural networks for materials science and chemistry. Commun Mater 3:93 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Jiang D, Wu Z, Hsieh CY et al (2021) Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. J Cheminform 13:12 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Advancing chemical safety prediction: an integrated GNN framework with DFT-augmented cyclic compound solution

Seul Lee

Jooyeon Lee

Unghwi Yoon

Jahyun Koo

Young Wook Yoon

Yoonjae Cho

Seung-Ryul Hwang

Keunhong Jeong

Abstract

Abstract

Scientific contribution

Supplementary Information

Introduction

Methods

Message passing framework in graph neural networks

Density function theory (DFT) calculation

Bootstrap confidence interval estimation

Data

Data curation & coverage

Fig. 6.

Routing & inference (number-free)

Implementation and reproducibility details

Results

Fig. 1.

Fig. 2.

Fig. 3.

Table 1.

Fig. 4.

Table 2.

Table 3.

Fig. 5.

GNNs for HoC excluding cyclic compounds

Table 4.

Dataset characterization & chemical-space coverage

Fig. 7.

Cyclic subset failure mode

The HoC random forest model for cyclic compounds

Fig. 8.

Table 5.

Real-time prediction system for unknown chemical substances

Fig. 9.

Conclusions

Supplementary Information

Acknowledgements

Author contributions

Data availability

Declarations

Competing interests

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases