Abstract
Photodynamic therapy (PDT) is a noninvasive clinical treatment for cancers using photosensitizers and light. While most research has focused on organic molecules, such as porphyrins as photosensitizers, there is emerging interest in the utilization of transition metal complexes (TMCs). Photosensitizer synthesis and the following performance test are time- and resource-consuming, so presynthetic screening of photosensitizers for their property would be critical. In this work, a hybrid mechanistic and data-driven model is proposed for the quantitative structure–property relationship (QSPR) of photosensitizers; important excited-state quantum chemistry descriptors (e.g., excitation energy) are first calculated based on density functional theory (DFT), and these descriptors, together with other molecular descriptors, are used to build single and hybrid machine learning (ML) models for the prediction of the singlet oxygen quantum yield of hexacoordinate TMC photosensitizers (Ru-, Ir-, and Re-complex). The support vector regression model and kernel ridge regression model are shown to provide good predictions on test (R 2 > 0.9) and external test sets (R 2 > 0.7) in single-ML models, while the delta-learning model and the Mixture-of-Experts model can further improve the generalization ability (R 2 up to 0.87 on the external test set) and show strong universality. SHAP analysis further confirms the reasonable choice of the mechanistic descriptors in the QSPR model. To our knowledge, this constitutes the first integrated DFT-ML framework specifically designed for the unique challenges of small data sets in TMC photosensitizer research.


1. Introduction
Cancer is one of the major diseases of great threat to human health. The global cancer cases are expected to grow to 28.4 million cases in 2040, with a 47% increase over 2020. At present, the main therapies for cancer include surgical therapy, chemotherapy, radiotherapy, gene therapy, photodynamic therapy (PDT), and photothermal therapy (PTT). Among the above therapies, PDT is considered to be effective for superficial cancerous tissues because of its advantages of low toxicity, lack of drug resistance, and mild adverse reactions.
The main process of PDT is to use light sources to activate nontoxic or microtoxic photosensitizers to produce cytotoxic reactive oxygen species (ROS), thereby inducing apoptosis and necrosis of cells at the tumor site. As shown in Figure , under the irradiation of appropriate wavelength light, the photosensitizer (PS) will be excited to a singlet state, and then quickly converted to a triplet state through intersystem crossing (ISC). Then, triplet state PS reacts with the substrate photodynamically to produce ROS. At present, this photodynamic process is divided into two types: type I and type II. − In the process of type I photochemical reaction, PS in the triplet state reacts with nearby substrates to form radical cations or radical anions through electron transfer, which will further react with oxygen-containing substrates (such as water, oxygen, etc.) to produce ROS (such as superoxide anions and hydroxyl radicals). , In type II photochemical reactions, PS in the triplet state directly transfers energy to oxygen to form highly reactive singlet oxygen (1O2), as shown in Figure . Therefore, PS is the core element of PDT, and its photophysical and chemical properties determine the therapeutic effect. There is now emerging interest in extending the use to transition metal complexes (TMC), which can display intense absorptions in the visible region, and many also possess high two-photon absorption cross sections, enabling two-photon excitation with NIR light. Therefore, transition metal complexes have become efficient candidates for PSs with developing potential.
1.

Jablonski energy level diagram for PDT.
2.
Schematic illustration of the process of the Ru complex in type II PDT.
The most studied transition metal in PDT has been Ru, , which usually has high water solubility compared with porphyrins or phthalocyanines as well as a high ΦΔ, which is essential as a type II PDT PS. For example, a Ru (II) complex named “TLD1433” entered clinical trials in early 2017. A series of water-soluble Ru (II) phthalocyanines with large and stable conjugated π systems have been developed that enable efficient energy and electron transfer processes; a new generation series of cyclometalated Ru(II) polypyridyl complexes have been designed and synthesized with the photophysical properties of revealed absorption maxima around 560 nm with an absorption up to 700 nm. Ir-complexes also have relatively wide applications in the field of photosensitizers. Organic-modified mesoporous silica nanoparticles containing iridium complexes have been synthesized, which exhibit photophysical properties such as high photoreaction yield and high singlet oxygen quantum yield. Two novel cyclometalated Ir-complexes have also been developed, which have strong emission peaks, long excited state lifetimes, and high singlet oxygen particle yields. There are currently very limited studies on rhenium complex photosensitizers, but they also have development potential; for example, a tricarbonyl Recomplex with endoplasmic reticulum-targeting activity has been designed and synthesized, which has strong absorption and a high singlet oxygen quantum yield.
However, photosensitizer synthesis and the following determination of its photosensitizing properties are time- and resource-consuming processes. Therefore, preliminary and presynthetic screening of sensitizers for their ability to generate 1O2 would be of great value. Recently, mathematical modeling method , and machine learning (ML) method − have gained popularity and proved to be a powerful tool in various areas, which use algorithms to learn from data, detect patterns, and make fast and accurate predictions. − ML has already been used in property prediction of organic molecule photosensitizers, including related properties with type I and type II PDT. , A quantitative structure property relationship (QSPR) model has been established for a data set containing 32 porphyrins and metalloporphyrins. A new machine learning method has been developed to efficiently and accurately predict the emission energy and photoluminescent quantum yield. 15 single models and three different hybrid models have been proposed to evaluate a data set of 3,066 organic materials to predict photophysical properties (absorption wavelength, emission wavelength, and quantum yield). However, these models are not suitable for TMC photosensitizers because traditional structure descriptors such as SMILES can hardly capture all the information on such PSs, and the lack of data makes it difficult to use deep learning models such as a graph neural network. To the best of our knowledge, there is no research reporting the property prediction method of TMC photosensitizers for PDT. Thus, it is necessary to develop machine learning models with a small data set to predict the properties of TMC photosensitizers such as the triplet state lifetime τT, the triplet quantum yield ΦT, and the singlet oxygen quantum yield ΦΔ. DFT can elucidate intrinsic mechanisms that cannot be observed by experimental techniques and is widely used in theoretical chemistry. − The combined DFT and ML method could be an excellent method for property prediction of TMC photosensitizers.
In this work, we introduce a systematic DFT-ML framework explicitly developed for the small-data regime prevalent in TMC photosensitizer development, which provides a tailored solution for accelerating the discovery of TMC photosensitizers. We study various single- and hybrid-ML models to predict the singlet oxygen quantum yield ΦΔ of TMC, which is an evaluation index of type II photosensitizers for PDT; additionally, ΦΔ is more important and easier to collect from the literature compared to the triplet state lifetime and the triplet quantum yield. In order to characterize the structure and charge transition under light irradiation of TMC photosensitizers during the PDT process, density functional theory (DFT) is chosen to calculate the properties of the excited state as the quantum chemistry descriptor, which could provide low-dimensional ML models suitable for the small data sets of TMC available in the literature. These models, based on quantum chemistry descriptors and other descriptors, are trained and tested on TMC data sets including Ru-complexes, Ir-complex, and Recomplex because they are the majority of reported TMCs, and they are all hexa-coordination TMCs and have similar structural characteristics. The result is compared with the performance of two hybrid-ML models, including the delta-learning model (DLM) and Mixture-of-Experts model (MoE), trained on a specific TMC photosensitizer data set to test whether the generalized metal model trained on three six-coordination TMC can replace the specialized metal model trained on TMC of a given metal center. The subsequent SHAP analysis shows descriptor contribution to the predicted ΦΔ, which could provide strong interpretability of the proposed model, such as the excitation energy of the S1 state and T1 state. Based on the modeling results, the proposed DFT-MoE model has been found to be most accurate, which could provide theoretical support for experimental synthesis and screening.
The article is structured as follows. In Section , we introduce the calculation method of four types of descriptors and the details of six single-ML models and two hybrid-ML models. In Section , we analyze and compare the training results of these models and test the performance of the best models on a separate Ru-complex data set and Ir-complex data set. In Section , we make the concluding remarks.
2. Method
In this section, we first present the TMC photosensitizers used in the data set of the machine learning model training process; then, we introduce the calculation method of descriptors; finally, we give the details of the single and hybrid-ML models proposed in this work.
2.1. Data Set Construction and Preprocessing
Our data set consists of 136 sets of TMC photosensitizers data from different references. We collect the structures, solvents, and irradiation wavelengths in ΦΔ test and corresponding ΦΔ from these references (the data with ΦΔ less than 0.01 and those with excessive differences in ΦΔ only under different wavelength irradiations were removed in the preprocessing process). The properties distribution of the data set is shown in Figure . Additionally, the external test set is made up of the other 11 sets of data from references to test the generalization ability of the proposed models. The detailed information on the data set and the external test set is shown in Tables S1 and S2.
3.
Distribution of the data set on (A) solvent, (B) metal center, (C) singlet oxygen quantum yield, and (D) irradiation wavelength.
2.2. Descriptor Acquisition
The descriptors used in this work consist of four kinds of descriptors for TMC photosensitizers. Quantum chemistry descriptors are the most important descriptors, which reflect the electron transfer information on S1 and T1 excited states and their differences calculated by time-dependent density functional theory (TD-DFT). Molecule structure descriptors include molecule size, charge, and structure on the photodynamic property. Metal-centered descriptors depict the impact of the metal center to distinguish the different kinds of TMC. External condition descriptors describe the impact of external conditions on the effect of the PDT process. The descriptors employ both implicit and explicit methods to describe the influence of the solvent. The implicit method incorporates the solvent’s effect into quantum chemistry descriptors through the CPCM and SMD solvent models used in DFT calculations. The explicit method, on the other hand, directly uses the static dielectric constant and dielectric constant at the infinite frequency of the solvent as external condition descriptors.
2.2.1. Quantum Chemistry Descriptors
In the TD-DFT calculation, the excited state wave function is described by a linear combination of single excited configuration functions. Each configuration function has a coefficient w as excited configuration or w′ as deexcited configuration. First, hole distribution ρhole and electron distribution ρele are defined as follows:
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
In eqs –, r is the coordinate vector; φ is the orbital wave function; i or j is the occupied orbital, and a or b is the empty orbital. means summation over each excited configuration and means summation over each deexcited configuration, while w i is the coefficient of the excited configuration from occupied orbital i to empty orbital a and w i is the coefficient of the deexcited configuration from empty orbital a to occupied orbital i. The hole distribution and electron distribution are divided into two parts: local term and cross term. The local term is generally dominant, reflecting the contribution of the configuration function itself, while the cross term reflects the impact of the coupling between the configuration functions on the hole and electron distribution. Then we adopt the following excitation descriptors:
| 7 |
| 8 |
| 9 |
| 10 |
| 11 |
| 12 |
| 13 |
| 14 |
In these descriptors, the Sr index describes the degree of overlap between electron and hole distributions. The D index describes the distance between the mass centers of the electron and hole distributions, where X ele/hole is the X coordinate of the mass center of the electron/hole and Y ele/hole, Z ele/hole stand for the other coordinate. The Δσ index describes the difference in the overall spatial distribution width of the electron and hole. σ ele and σ hole are the distribution breadth or dispersion degree of the electron and hole, and their x, y, and z components are the square mean root deviations of the electron and hole distribution with the X, Y, and Z coordinate of the mass center of the electron and hole calculated by eq . H CT describes the average spread of electron and hole in the charge-transfer (CT) direction, where H is the sum of the average spread of electron and hole in x, y, and z direction, s calculated by eq and u CT is a unit vector in the CT direction. The H index describes the overall average width of the distribution of electrons and holes. The T index describes the degree of separation between electrons and holes. The hole delocalization index and electron delocalization index describe the uniformity of the distribution of holes and electrons.
| 15 |
| 16 |
Additionally, vertical ionization energy (VIE) and vertical electron affinity (VEA) are also candidate QCD; they are critically important for characterizing electron-transfer processes, which are fundamental to Type I PDT mechanisms. , They are not calculated as the QCD in our model because they are not directly correlated with the singlet oxygen quantum yield for typical Type II systems.
First, DFT geometry optimizations were carried out using ORCA 5.0.4 , under the PBE0-D3 method and def2-SVP basis set; , the solvation effect was considered using the CPCM solvent model; , then use TD-DFT to complete S1 state and T1 state geometry optimization and calculation under PBE0-D3 method and def2-TZVP basis set ,, with SMD solvent model to get excited state wave function information for subsequent descriptor calculations; finally, S1-T1 spin–orbit coupling (SOC) is calculated under the same calculation level as excited state calculation at optimized S1 state structure. All the following quantum chemistry descriptors of S1 (S-) and T1 (T-) states shown in Table are calculated by the software Multiwfn 3.8 dev. The property differences (D-) between the S1 state and T1 state were also calculated as QCD. The PBE0 hybrid functional was selected because it has demonstrated accuracy similar to the B3LYP functional, which is one of the most widely used functionals. Moreover, the PBE0 functional is particularly well-suited for transition metal complexes, as it often provides better results in geometric optimization and energy calculation. ,
1. Meanings of Quantum Chemistry Descriptors.
| descriptors | meaning |
|---|---|
| S/T/D-sr | degree of overlap between electron and hole distributions of S1 state/of T1 state/of their difference |
| S/T/D-d | distance between the mass centers of the hole and electron distributions of the S1 state/of the T1 state/of their difference |
| S/T/D-sigma | difference in the overall spatial distribution width of the electron and hole of the S1 state/of the T1 state/of their difference |
| S/T/D-hct | average spread of the electron and hole in the CT direction of the S1 state/of the T1 state/of their difference |
| S/T/D-h | overall average width of the distribution of the electron and hole of the S1 state/of the T1 state/of their difference |
| S/T/D-t | degree of separation between the electron and hole of the S1 state/of the T1 state/of their difference |
| S/T/D-hdi | uniformity of the distribution of the hole of the S1 state/of the T1 state/of their difference |
| S/T/D-edi | uniformity of the distribution of electrons of the S1 state/of the T1 state/of their difference |
| S/T/D-ee | excitation energy of the S1 state/of the T1 state/difference between S1 and T1 state |
| S/T/D-mlct | metal to ligand charge transfer proportion of the S1 state/of the T1 state/of their difference |
| S/T/D-tedm | transition electric dipole moment of the S1 state/of the T1 state/of their difference |
| S/T/D-tmdm | transition magnetic dipole moment of the S1 state/of T1 state/of their difference |
| soc | spin–orbit coupling (SOC) matrix element between S1 and T1 states |
| S1 | absorption wavelength of the S1 state calculated by DFT |
| fosc1 | oscillator strengths of the S1 state via transition electric dipole moments |
| fosc2 | oscillator strengths of the S1 state via transition velocity dipole moments |
2.2.2. Molecule Structure Descriptors
Molecule structure descriptors describe the impact of molecule size, charge, and structure on the photodynamic property including the charge of entire complex cation, the charge of all ligand, the charge of connected atom with metal center, the number of atoms, relative molecular mass, and the number of important functional groups (if a functional group contains several smaller functional groups, the largest functional group is counted). Take two complexes shown in Table S1 as an example. For complex 1, the entire complex cation has a valence of +2, so nc = 2, all three ligands are electrically neutral, so lc = 0; among the 6 atoms connected to ruthenium, the oxygen carries a unit of negative charge, so cc = −1; the Ru in the complex has a valence of +2, so mc = 2. By the same reasoning, for complex 17, nc = 2, lc = 0, cc = 0, and mc = 2. The detailed meanings are shown in Table .
2. Meanings of Molecular Structure Descriptors.
| descriptors | meaning |
|---|---|
| nc | net charge of the entire complex cation |
| lc | charge of all ligands |
| cc | charge of connected atoms with a metal center |
| an | number of atoms |
| mw | relative molecular mass |
| n-X | number of halogens |
| n-COO | number of ester groups |
| n-CO | number of carbonyl groups |
| n-CHO | number of aldehyde groups |
| n-CONH | number of peptide bonds |
| n–OH | number of hydroxyl groups |
| n-NH2 | number of amino groups |
| n-S | number of sulfur atoms |
| n-O | number of oxygen atoms |
| n-CN | number of cyan groups |
| n-bodipy | number of dipyrromethene boron difluorides |
| n-py | number of pyridines |
| n-ph | number of phenyl groups |
| n-pyra | number of pyrazines |
| n-pyrr | number of pyrroles |
| n-r6 | number of six-membered rings |
| n-r5 | number of five-membered rings |
| n-db | number of double bonds |
| n-tb | number of triple bonds |
2.2.3. Metal-Centered Descriptors
Metal-centered descriptors distinguish the impact of different transition metals as center on TMC properties by describing the number of metal center, the charge of metal center (mc), the period which the metal element is located in (cp), and the outer electron configuration of metal atoms (cs, cd, cf for s-, d- and f- electron count). The outer electron configuration of metal atoms is based solely on the intrinsic properties of the free metal atom (prior to coordination). For example, Ru is in period 5, so cp = 5; Ru has the configuration [Kr]4d75s1, so cs = 1, cd = 7, and cf = 0. The detailed meanings are shown in Table .
3. Meanings of Metal-Centered Descriptors.
| descriptors | meaning |
|---|---|
| cn | number of metal centers |
| cp | period in which the central metal element is located |
| mc | charge of the metal center |
| cf | number of electrons in the outermost f orbital of the central metal |
| cd | number of electrons in the outermost d orbital of the central metal |
| cs | number of electrons in the outermost s orbital of the central metal |
2.2.4. External Condition Descriptors
External condition descriptors describe the impact of external condition on the effect of PDT process including the static dielectric constant, dielectric constant at infinite frequency of the solvent, and irradiation wavelengths. The detailed meanings are shown in Table .
4. Meanings of External Condition Descriptors.
| descriptors | meaning |
|---|---|
| eps | static dielectric constant of the solvent |
| epsinf | dielectric constant at the infinite frequency of the solvent |
| wl | number of electrons in the outermost s orbital of the central metal |
2.3. Machine Learning Models
The data set was randomly divided; the training set accounts for 90% (122 data points), and the test set accounts for 10% (14 data points). Leave-one-out (LOO) cross-validation was used on the training set to test the stability of the model. The trained models are also tested on the external test set to obtain their generalization ability. All the input descriptors are normalized by eq (soc is normalized after the logarithm is taken).
| 17 |
Six candidate single-ML models are first utilized for the prediction of small data set TMC photosensitizers: support vector regression (SVR), kernel ridge regression (KRR), Gaussian process regression (GPR), eXtreme Gradient Boosting regression (XGBoost), random forest regression (RFR), and k-neighbor regression (KNR). These models are first trained on all descriptors to achieve the descriptors' importance rank through SHAP analysis, and then the first 30–50 descriptor groups (at intervals of 5) are tested as the model input and retrained these models to get the best descriptor group. To further improve the prediction accuracy and generalization ability of the single-ML model, two hybrid models are proposed: the delta-learning model (DLM) and the Mixture-of-Experts model (MoE). The process of the delta-learning model is shown in Figure . The first model is used to predict the target value, and the next model is used to predict the error (delta) of the real value and predicted value of the previous model as an amendment item, and so on. Thus, the final predicted value is the predicted value by the first model compared to the predicted errors by all subsequent models. MoE uses multiple single-ML models as expert models to predict the target value simultaneously, as shown in Figure . The final predicted value of the MoE model is the weighted average of the predicted values of multiple machine learning models. All the model parameters above are optimized by the optuna library in Python 3.11, and the hyperparameters optimized for these models are shown in Table .
4.
Process of the delta-learning model.
5.

Process of the Mixture-of-Experts model.
5. Hyperparameters Optimized of Single-ML Models.
| models | parameters optimized |
|---|---|
| SVR | penalty coefficient, tolerance, kernel type |
| KRR | regularization parameter, hyperparameter of Gaussian kernel |
| GPR | noise variance, kernel length scale, number of optimizers |
| XGBoost | max depth of the tree, learning rate, number of estimators, sample ratio and feature ratio of each tree |
| RFR | number of decision tree; number of features considered per branch; max depth of the decision tree |
| KNR | number of neighbors, distance measurement parameters, weight allocation strategies, nearest neighbor search algorithms |
The target variable for all machine learning models developed in this work is the experimentally measured ΦΔ. Because the machine learning models are inherently nonlinear and very complex in explicit form, the subsequent SHAP analysis is employed to show descriptors contribution to the predicted ΦΔ which could provide strong interpretability of the proposed model.
3. Results and Discussion
In this section, we present the result of the descriptor filter and the training result with the best descriptor group of single-ML models. We use SHAP analysis to determine the importance of descriptors and choose the most important ones. Then, we show the performance of two hybrid models to find the best model to predict the ΦΔ of TMC photosensitizers. Finally, we compare the generalized metal model trained on three six-coordination TMCs with the specialized metal model trained on a specific TMC.
3.1. Single-ML Models
In order to verify the importance of different kinds of descriptors, we first compared these models using all descriptors against models where one kind of descriptor was removed at a time. The key finding is that the removal of QCD led to the most significant drop in model performance (the R 2 of the external test set decreased from 0.830 to 0.051 in SVR, from 0.747 to 0.214 in KRR, and from 0.451 to −0.142 in GPR) as shown in Table S3. This demonstrates that the QCD provides unique and critical information that cannot be compensated for by the other descriptors. After the descriptor filter, the best models are the SVR model and the KRR model, followed by GPR, as shown in Tables and S4. These models with the best descriptor groups show a good regression effect, generalization ability, as shown in Table and Figure . The stability of these three models is not very well because of the lack of data set, the complexity of the mechanism of the photodynamic therapy process, and the different environmental influences during the ΦΔ testing process. The other three single-ML models do not satisfy the conditions of the QSPR model (Q 2 ≥ 0.6 in cross-validation and R 2 ≥ 0.6 in the external test set). The superior performance of the kernel-based models (SVR, KRR, and GPR) over tree-based models (RFR and XGBoost) can be attributed to the nature of our feature space. Our descriptor set primarily consists of continuous, normalized quantum-chemical, and structural properties. Kernel methods are particularly adept at modeling complex, nonlinear relationships in such continuous feature spaces by implicitly mapping them into higher-dimensional spaces where linear relationships may be found. In contrast, tree-based models, which rely on axis-aligned splits, often perform exceptionally well with highly dimensional, sparse data such as molecular fingerprints. The descriptors' importance ranking by SHAP analysis of six single-ML models with all descriptors is shown in Figures S1–S6.
6. R 2(Q 2) Result of the SVR Model and KRR Model in a Descriptor Filter.
| model | R 2(Q 2) | 30 descriptors | 35 descriptors | 40 descriptors | 45 descriptors | 50 descriptors | all descriptors |
|---|---|---|---|---|---|---|---|
| SVR | training set | 0.970 | 0.957 | 0.992 | 0.992 | 0.990 | 0.990 |
| test set | 0.948 | 0.947 | 0.940 | 0.927 | 0.940 | 0.935 | |
| external test set | 0.681 | 0.603 | 0.741 | 0.815 | 0.772 | 0.830 | |
| LOO cross-validation | 0.568 | 0.618 | 0.619 | 0.622 | 0.626 | 0.579 | |
| KRR | training set | 0.991 | 0.998 | 0.996 | 0.997 | 0.996 | 0.993 |
| test set | 0.942 | 0.914 | 0.884 | 0.903 | 0.924 | 0.944 | |
| external test set | 0.713 | 0.663 | 0.700 | 0.753 | 0.729 | 0.747 | |
| LOO cross-validation | 0.594 | 0.608 | 0.616 | 0.635 | 0.534 | 0.593 |
7. Performance of Three Best Models with Filtered Descriptors.
| model | R 2(Q 2) | MaxAE | MAE | MSE | |
|---|---|---|---|---|---|
| SVR (45 filtered descriptors) | training set | 0.992 | 0.091 | 0.024 | 0.001 |
| test set | 0.927 | 0.130 | 0.049 | 0.004 | |
| external test set | 0.815 | 0.254 | 0.091 | 0.016 | |
| LOO cross-validation | 0.622 | 0.631 | 0.122 | 0.033 | |
| KRR (45 filtered descriptors) | training set | 0.997 | 0.055 | 0.011 | 0.0003 |
| test set | 0.903 | 0.157 | 0.061 | 0.005 | |
| external test set | 0.753 | 0.296 | 0.115 | 0.021 | |
| LOO cross-validation | 0.635 | 0.610 | 0.119 | 0.032 | |
| GPR (40 filtered descriptors) | training set | 0.989 | 0.117 | 0.019 | 0.001 |
| test set | 0.866 | 0.261 | 0.056 | 0.007 | |
| external test set | 0.701 | 0.303 | 0.125 | 0.025 | |
| LOO cross-validation | 0.647 | 0.556 | 0.122 | 0.031 |
6.
Performance of (A) SVR model, (B) KRR model, and (C) GPR model on the training set, test set, and external test set.
The filtered descriptors contribution is sorted by SHAP analysis, shown in Figures S7–S9. Among the top 15 descriptors of the three models, 12 are the same, indicating that these descriptors can well describe the influence factors of the photodynamic therapy process, and these models can also recognize these factors as shown in Figure A. QCD has a major influence on the descriptors, which indicates that these models have strong interpretability of the PDT mechanism as shown in Figure B. The excitation energy of S1 state (S-ee) and T1 state (T-ee) as QCD are the two most important descriptors, which influence the ΦΔ by ΔE st in ISC process. The notable importance of molecule structural descriptors such as relative molecular mass (mw) and number of atoms (an), while not directly involved in the photophysical process, can be interpreted as a consequence of the data set’s composition. We posit that the relative molecular mass acts not as a direct causal factor but rather as a proxy variable for molecular complexity, which is a key outcome of successful ligand engineering. To achieve high singlet oxygen quantum yields, sophisticated ligand modifications, such as extending π-conjugation, are typically employed. These modifications inevitably increase the molecular mass and atom number. Consequently, the model identifies a correlation wherein higher-performing molecules tend to possess more complex, and therefore heavier, architectures. This insight underscores that the model learns from the patterns of successful design presented in the literature.
7.
Modeling descriptor results: (A) distribution in 15 top descriptors of SVR, KRR, and GPR model by SHAP analysis separately; and the (B) type they belong to.
3.2. Hybrid-ML Models
3.2.1. Delta-Learning Model
We use the delta-learning model with bilayers in this section. We selected the top-performing model, SVR, as the base model to provide a strong initial estimate. For the critical delta model, which must capture the complex pattern of errors, we chose the second-best performer, KRR. Compared with the SVR and KRR models, which constitute the delta-learning model, it not only shows better fitting effects on the training set and test set but also has better generalization ability on the external test set, as shown in Table and Figure A. This indicates that the delta-learning model can correct the results of the single-ML model by further predicting the residual, thereby improving its generalization ability.
8. Performance of DLM and MoE.
| model | R 2(Q 2) | MaxAE | MAE | MSE | |
|---|---|---|---|---|---|
| DLM | training set | 1.000 | 0.031 | 0.004 | 10–5 |
| test set | 0.928 | 0.132 | 0.051 | 0.004 | |
| external test set | 0.820 | 0.256 | 0.091 | 0.015 | |
| LOO cross-validation | 0.625 | 0.628 | 0.120 | 0.032 | |
| mixture-of-SVR-KRR-GPR model (MoE1) | training set | 0.990 | 0.067 | 0.025 | 0.001 |
| test set | 0.922 | 0.132 | 0.055 | 0.004 | |
| external test set | 0.870 | 0.211 | 0.083 | 0.011 | |
| LOO cross-validation | 0.657 | 0.637 | 0.117 | 0.030 | |
| mixture-of-SVR-KRR model (MoE2) | training set | 0.992 | 0.092 | 0.024 | 0.001 |
| test set | 0.927 | 0.142 | 0.051 | 0.004 | |
| external test set | 0.840 | 0.234 | 0.085 | 0.014 | |
| LOO cross-validation | 0.621 | 0.660 | 0.120 | 0.033 |
8.
Performance of (A) DLM, (B) mixture-of-SVR-KRR-GPR model, and (C) mixture-of-SVR-KRR model on training set, test set, and external test set.
3.2.2. Mixture-of-Experts Model
In this section, we designed two variants to systematically evaluate the impact of expert composition: MoE1 (3 Experts), which incorporates all three kernel-based models (SVR, KRR, and GPR), and MoE2 (2 Experts), which incorporates only the top two models (SVR and KRR). This comparative design allowed us to test whether the performance of the MoE model is optimal with a focused set of the best experts or a broader ensemble. The result shows that the MoE model also shows better fitting effects on the training set and test set and better generalization ability on the external test set, as shown in Table and Figure B,C. This indicates that the MoE model balances the errors of each single-ML model and enhances the prediction ability of the model.
Judging from the performances of the training set and the test set, all three hybrid models have shown improved fitting effects. Judging from the performance of the external test set, among the three hybrid models, the best-performing model is the MoE1 model, followed by the MoE2 model and the delta-learning model. Thus, it is observed that hybrid models can improve the prediction accuracy and generalization ability of single-ML models for TMC photosensitizing properties.
3.3. Comparison with the Specialized Metal Model
In this section, we retrain the two kinds of hybrid models: the delta-learning model (DLM) and the MoE1 model from scratch on Ru-complex and Ir-complex photosensitizers, respectively (first train the single-ML model and get filtered descriptor groups, then optimize and retrain these two models). The result is compared with the performance of these two models trained in Section on all TMC photosensitizers data sets to test whether the generalized metal model trained on three six-coordination TMC can replace the specialized metal model trained on TMC of a given metal center. The Ru-complex data set consists of 77 data points in the internal data set (used for training set/test set with a 9:1 random split) and 5 data points in the external test set. The Ir-complex data set consists of 51 data points in the internal data set and 5 data points in the external test set.
The generalized metal DLM performed slightly better on the Ru-complex data set compared to the model trained on all metal complexes (with the upper and lower quartiles error closer to 0), but slightly worse on the external test set, though the difference was minor, as shown in Figure . It also performed marginally better on the Ir-complex data set (with a smaller negative maximum error) and performed basically the same on the external test set. This indicates that the generalized metal DLM has strong universality and can replace the specialized metal DLM trained on Ru-complexes. The detailed result of DLM comparison is shown in Table S5.
9.
Violin plots of error distribution on (A) Ru-complex, (B) Ir-complex and the result comparisons of external test set on (C) Ru-complex, and (D) Ir-complex of DLM.
The performance of both DLM and the following MoE model on the external test set of the Ir-complex data set is poorer compared to that of the Ru-complex data set. The possible reasons are as follows: (I) The smaller data set for Ir-complexes might contribute to the observed difference. The model may have learned more robust patterns for Ru-complexes due to their larger representation, leading to more confident and accurate predictions for this class; (II) For many Ru-complexes, the dominant route for singlet oxygen generation involves the intersystem crossing (ISC) between the S1 and T1 state. In contrast, Ir-complexes, with their stronger spin–orbit coupling, may exhibit more complex excited-state dynamics. The generation of singlet oxygen may involve energy transfer from other triplet states (e.g., T2) or proceed through mixed triplet state character. Our current QCD, which focuses on S1 and T1 states, may not fully capture the critical energetics and dynamics associated with these alternative pathways, leading to reduced predictive accuracy for Ir-complexes.
In terms of the prediction of Ru-complexes, the generalized metal MoE1 model performs slightly worse (with a wider distribution of prediction errors) compared to the specialized metal MoE1 model, and its performance on the external test set is also slightly lower, as shown in Figure . In terms of the prediction of Ir-complexes, the generalized metal MoE1 model performs slightly better than the specialized metal MoE1 model (with lower maximum positive and negative errors), and their performance on the external test set is basically the same. This indicates that the MoE model also has strong universality and can be used to predict the properties of various TMC photosensitizers. Compared with DLM, the MoE model achieved better performance on both the external test sets of Ru-complexes and Ir-complexes. This might be because the MOE model using multiple monolayer models is more likely to learn patterns from small data sets than the DLM using a bilayer model. The detailed result of the MoE1 model comparison is shown in Table S6.
10.
Violin plots of error distribution on (A) Ru-complex, (B) Ir-complex and the result comparisons of external test set on (C) Ru-complex, and (D) Ir-complex of the MoE1 model.
4. Conclusions
Transition metal complexes are potential photosensitizer candidates in PDT for their high singlet oxygen quantum yield (ΦΔ) and water solubility, but the synthesis of photosensitizers and the experimental determination of ΦΔ are both time-intensive and laborious processes. Traditional structure descriptors for machine learning models, such as SMILES, can hardly capture all the information of TMC, and the lack of data makes it difficult to use deep learning models. In this work, we propose a DFT-ML modeling approach to predict the photosensitizing properties of TMC. The excited state descriptors are calculated by DFT, and ML models are built using them together with other descriptors to characterize the structure and charge transition process under light irradiation of TMC photosensitizers. Six single-ML models and two kinds of hybrid-ML models are proposed based on these descriptors and their performance on the ΦΔ prediction is tested.
The best descriptor groups are filtered to optimize single-ML models, respectively, and the best single models are then utilized to build hybrid models. The results show SVR and KRR provide good predictions on the test set (R 2 > 0.9) and external test set (R 2 > 0.7) in single-ML models; while DLM and MoE models can further improve the prediction effect (R 2 up to 0.87 on the external test set). The comparison with the same hybrid-ML model trained on a specialized metal complex indicates that the proposed models also have strong universality (ΔR 2 < 0.1 on the external test set between the generalized metal model and the specialized metal model) and can be used to predict the properties of various TMC photosensitizers. The subsequent SHAP analysis provides strong interpretability of the PDT mechanism. The excitation energy of the S1 state and the T1 state are the two most important descriptors, while relative molecular mass and dielectric constant at infinite frequency of the solvent also have an outstanding impact. These results demonstrate that the excited state descriptors have a good effect in predicting PDT process properties, and the hybrid-ML model with these descriptors can provide accurate predictions on photosensitizing properties based on a small data set of TMCs. Thus, the proposed approach could be a useful addition and theoretical guidance as a screening step prior to experiments of organic synthesis and photosensitivity testing.
This approach can filter out a large proportion of promising but low-performing candidates in the computational stage. By reducing the number of compounds that require synthesis, the proposed model can (I) significantly decrease the consumption of valuable and expensive metal precursors, ligands, and other chemicals, (II) save weeks or months of synthetic and characterization labor, and (III) allow researchers to focus their experimental efforts on the most promising leads, thereby increasing the efficiency and success rate of the discovery pipeline. However, one should notice that the scarcity of experimentally measured ΦΔ for TMC is a fundamental constraint in the field, directly leading to our small data set. In the future, we will work collaboratively to expand the data set by incorporating photosensitizers based on other metals (such as Pd, Pt, Zn, etc.), leading to a more robust and universally applicable model.
Supplementary Material
Acknowledgments
The authors appreciate funding from the National Natural Science Foundation of China (22308044).
Glossary
Abbreviations
- TMC
transition metal complex
- PS
photosensitizer
- DFT
density functional theory
- TD-DFT
time-dependent density functional theory
- ML
machine learning
- QSPR
quantitative structure–property relationship
- PDT
photodynamic therapy
- ROS
reactive oxygen species
- ISC
intersystem crossing
- CT
charge-transfer
- QCD
quantum chemistry descriptor
- MSD
molecule structure descriptor
- MCD
metal-centered descriptor
- ECD
external condition descriptor
- LOO
leave-one-out
- SVR
support vector regression
- KRR
kernel ridge regression
- GPR
Gaussian process regression
- XGBoost
extreme gradient boosting regression
- RFR
random forest regression
- KNR
k-neighbor regression
- DLM
delta-learning model
- MoE
Mixture-of-Experts
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.5c08727.
Detailed calculated descriptors input used for model training (XLSX)
Detailed information on TMC photosensitizers used to construct the data set and external test set, result of the GPR model, XGBoost model, RFR model, and KNR model in a descriptor filter, result of hybrid model comparison on specialized TMC photosensitizers, result of SHAP analysis, and optimized hyperparameters of the machine learning models (PDF)
The authors declare no competing financial interest.
References
- Sung H., Ferlay J., Siegel R. L., Laversanne M., Soerjomataram I., Jemal A., Bray F.. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA. Cancer. J. Clin. 2021;71(3):209–249. doi: 10.3322/caac.21660. [DOI] [PubMed] [Google Scholar]
- Mohamedahmed A., Zaman S., Wuheb A. A., Ismail A., Nnaji M., Alyamani A. A., Eltyeb H. A., Yassin N. A.. Peri-operative, oncological and functional outcomes of robotic versus transanal total mesorectal excision in patients with rectal cancer: a systematic review and meta-analysis. Technol. Coloproctology. 2024;28(1):75. doi: 10.1007/s10151-024-02947-x. [DOI] [PubMed] [Google Scholar]
- de Castilhos J., Tillmanns K., Blessing J., Larano A., Borisov V., Stein-Thoeringer C. K.. Microbiome and pancreatic cancer: time to think about chemotherapy. Gut Microbes. 2024;16(1):2374596. doi: 10.1080/19490976.2024.2374596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jain S. M., Nagainallur R. S., Murali K. M., Banerjee A., Sun-Zhang A., Zhang H., Pathak R., Sun X. F., Pathak S.. Understanding the molecular mechanism responsible for developing therapeutic radiation-induced radioresistance of rectal cancer and improving the clinical outcomes of radiotherapy - a review. Cancer Biol. Ther. 2024;25(1):2317999. doi: 10.1080/15384047.2024.2317999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Malakar P., Shukla S., Mondal M., Kar R. K., Siddiqui J. A.. The nexus of long noncoding RNAs, splicing factors, alternative splicing and their modulations. RNA Biol. 2024;21(1):1–20. doi: 10.1080/15476286.2023.2286099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karimi M., Homayoonfal M., Zahedifar M., Ostadian A., Adibi R., Mohammadzadeh B., Raisi A., Ravaei F., Rashki S., Khakbraghi M.. et al. Development of a novel nanoformulation based on aloe vera-derived carbon quantum dot and chromium-doped alumina nanoparticle (al2o3:cr@cdot NPs): evaluating the anticancer and antimicrobial activities of nanoparticles in photodynamic therapy. Cancer Nanotechnol. 2024;15(1):26. doi: 10.1186/s12645-024-00260-8. [DOI] [Google Scholar]
- Li X., Li X., Park S., Wu S., Guo Y., Nam K. T., Kwon N., Yoon J., Hu Q.. Photodynamic and photothermal therapy via human serum albumin delivery. Coord. Chem. Rev. 2024;520:216142. doi: 10.1016/j.ccr.2024.216142. [DOI] [Google Scholar]
- Lin J., Wan M. T.. Current evidence and applications of photodynamic therapy in dermatology. Clin. Cosmet. Investig. Dermatol. 2014;7:145. doi: 10.2147/CCID.S35334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu J., Lei Q., Zhang X.. Recent advances in photonanomedicines for enhanced cancer photodynamic therapy. Prog. Mater. Sci. 2020;114:100685. doi: 10.1016/j.pmatsci.2020.100685. [DOI] [Google Scholar]
- Ni K., Luo T., Nash G. T., Lin W.. Nanoscale metal–organic frameworks for cancer immunotherapy. Acc. Chem. Res. 2020;53(9):1739–1748. doi: 10.1021/acs.accounts.0c00313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baptista M. S., Cadet J., Di Mascio P., Ghogare A. A., Greer A., Hamblin M. R., Lorente C., Nunez S. C., Ribeiro M. S., Thomas A. H.. et al. Type i and type II photosensitized oxidation reactions: guidelines and mechanistic pathways. Photochem. Photobiol. 2017;93(4):912–919. doi: 10.1111/php.12716. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baptista M. S., Cadet J., Greer A., Thomas A. H.. Photosensitization reactions of biomolecules: definition, targets and mechanisms. Photochem. Photobiol. 2021;97(6):1456–1483. doi: 10.1111/php.13470. [DOI] [PubMed] [Google Scholar]
- Liu Y., Meng X., Bu W.. Upconversion-based photodynamic cancer therapy. Coord. Chem. Rev. 2019;379:82–98. doi: 10.1016/j.ccr.2017.09.006. [DOI] [Google Scholar]
- Montaseri H., Kruger C. A., Abrahamse H.. Review: organic nanoparticle based active targeting for photodynamic therapy treatment of breast cancer cells. Oncotarget. 2020;11(22):2120–2136. doi: 10.18632/oncotarget.27596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Erb J., Setter D., Swavey J., Willits F., Swavey S.. BODIPY-ruthenium(II) polypyridyl complexes: synthesis, computational, spectroscopic, electrochemical, and singlet oxygen studies. Inorg. Chim. Acta. 2024;560:121831. doi: 10.1016/j.ica.2023.121831. [DOI] [Google Scholar]
- Mckenzie L. K., Bryant H. E., Weinstein J. A.. Transition metal complexes as photosensitisers in one- and two-photon photodynamic therapy. Coord. Chem. Rev. 2019;379:2–29. doi: 10.1016/j.ccr.2018.03.020. [DOI] [Google Scholar]
- Zhang Z., He M., Wang R., Fan J., Peng X., Sun W.. Development of ruthenium nanophotocages with red or near-infrared light-responsiveness. ChemBioChem. 2023;24(24):e202300606. doi: 10.1002/cbic.202300606. [DOI] [PubMed] [Google Scholar]
- Zhang L., Wang P., Zhou X., Bretin L., Zeng X., Husiev Y., Polanco E. A., Zhao G., Wijaya L. S., Biver T.. et al. Cyclic ruthenium-peptide conjugates as integrin-targeting phototherapeutic prodrugs for the treatment of brain tumors. J. Am. Chem. Soc. 2023;145(27):14963–14980. doi: 10.1021/jacs.3c04855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heinemann F., Karges J., Gasser G.. Critical overview of the use of ru(II) polypyridyl complexes as photosensitizers in one-photon and two-photon photodynamic therapy. Acc. Chem. Res. 2017;50(11):2727–2736. doi: 10.1021/acs.accounts.7b00180. [DOI] [PubMed] [Google Scholar]
- Fong J., Kasimova K., Arenas Y., Kaspler P., Lazic S., Mandel A., Lilge L.. A novel class of ruthenium-based photosensitizers effectively kills in vitro cancer cells and in vivo tumors. Photochem. Photobiol. Sci. 2015;14(11):2014–2023. doi: 10.1039/c4pp00438h. [DOI] [PubMed] [Google Scholar]
- Ferreira J. T., Pina J., Ribeiro C. A. F., Fernandes R., Tomé J. P. C., Rodríguez Morgade M. S., Torres T.. Highly efficient singlet oxygen generators based on ruthenium phthalocyanines: synthesis, characterization and in vitro evaluation for photodynamic therapy. Chem. – Eur. J. 2020;26(8):1789–1799. doi: 10.1002/chem.201903546. [DOI] [PubMed] [Google Scholar]
- Cervinka J., Hernández-García A., Bautista D., Markova L., Kostrhunova H., Malina J., Kasparkova J., Santana M. D., Brabec V., Ruiz J.. New cyclometalated ru() polypyridyl photosensitizers trigger oncosis in cancer cells by inducing damage to cellular membranes. Inorg. Chem. Front. 2024;11(13):3855–3876. doi: 10.1039/D4QI00732H. [DOI] [Google Scholar]
- Estevão B. M., Vilela R. R. C., Geremias I. P., Zanoni K. P. S., de Camargo A. S. S., Zucolotto V.. Mesoporous silica nanoparticles incorporated with ir(III) complexes: from photophysics to photodynamic therapy. Photodiagnosis Photodyn. Ther. 2022;40:103052. doi: 10.1016/j.pdpdt.2022.103052. [DOI] [PubMed] [Google Scholar]
- Martínez-Alonso M., Jones C. G., Shipp J. D., Chekulaev D., Bryant H. E., Weinstein J. A.. Phototoxicity of cyclometallated ir(III) complexes bearing a thio-bis-benzimidazole ligand, and its monodentate analogue, as potential PDT photosensitisers in cancer cell killing. JBIC Journal of Biological Inorganic Chemistry. 2024;29(1):113–125. doi: 10.1007/s00775-023-02031-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feng W., Liang B., Chen B., Liu Q., Pan Z., Liu Y., He L.. A tricarbonyl rhenium(i) complex decorated with boron dipyrromethene for endoplasmic reticulum-targeted photodynamic therapy. Dyes Pigment. 2023;211:111077. doi: 10.1016/j.dyepig.2023.111077. [DOI] [Google Scholar]
- Paragian K., Li B., Massino M., Rangarajan S.. A computational workflow to discover novel liquid organic hydrogen carriers and their dehydrogenation routes. Mol. Syst. Des. Eng. 2020;5(1):1167–1658. doi: 10.1039/D0ME00105H. [DOI] [Google Scholar]
- Thomas H. Y., Ford Versypt A. N.. A mathematical model of glomerular fibrosis in diabetic kidney disease to predict therapeutic efficacy. Front. Pharmacol. 2024;15:1481768. doi: 10.3389/fphar.2024.1481768. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mcdonagh J. L., Palmer D. S., Mourik T. V., Mitchell J. B. O.. Are the sublimation thermodynamics of organic molecules predictable? J. Chem. Inf. Model. 2016;56(11):2162–2179. doi: 10.1021/acs.jcim.6b00033. [DOI] [PubMed] [Google Scholar]
- Fan J., Shi S., Xiang H., Fu L., Duan Y., Cao D., Lu H.. Predicting elimination of small-molecule drug half-life in pharmacokinetics using ensemble and consensus machine learning methods. J. Chem. Inf. Model. 2024;64(8):3080–3092. doi: 10.1021/acs.jcim.3c02030. [DOI] [PubMed] [Google Scholar]
- Haciefendioglu T., Yildirim E.. Band gap and reorganization energy prediction of conducting polymers by the integration of machine learning and density functional theory. J. Chem. Inf. Model. 2025;65(11):5360–5369. doi: 10.1021/acs.jcim.5c00345. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He T., Xiao P., Li D., Qu B., Zhou R.. Accelerating the discovery of efficient phosphorescent iridium(III) complex emitters with targeted color gamuts through interpretable machine learning models and virtual screening. J. Phys. Chem. C. 2025;129(23):10696–10708. doi: 10.1021/acs.jpcc.5c01750. [DOI] [Google Scholar]
- Izquierdo R., Zadorosny R., Rosales M., Marrero-Ponce Y., Cubillan N.. Molecular and descriptor spaces for predicting initial rate of catalytic homogeneous quinoline hydrogenation with ru, rh, os, and ir catalysts. ACS Omega. 2025:4c–9503c. doi: 10.1021/acsomega.4c09503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Horvitz E., Mulligan D.. Data, privacy, and the greater good. Science. 2015;349(6245):253–255. doi: 10.1126/science.aac4520. [DOI] [PubMed] [Google Scholar]
- Santana V. V., Rebello C. M., Queiroz L. P., Ribeiro A. M., Shardt N., Nogueira I. B. R.. PUFFIN: a path-unifying feed-forward interfaced network for vapor pressure prediction. Chem. Eng. Sci. 2024;286:119623. doi: 10.1016/j.ces.2023.119623. [DOI] [Google Scholar]
- Wu G., Zhao Y., Zhang L., Du J., Meng Q., Liu Q.. Machine learning potential model for accelerating quantum chemistry-driven property prediction and molecular design. AIChE J. 2025 doi: 10.1002/aic.18741. [DOI] [Google Scholar]
- Zhu J., Hao L., Zhang H., Wei H.. Development of convolutional neural network-based models for efficient and reliable flashpoint prediction. Ind. Eng. Chem. Res. 2025;64(15):7803–7809. doi: 10.1021/acs.iecr.4c04373. [DOI] [Google Scholar]
- Liao Z., Lu J., Xie K., Wang Y., Yuan Y.. Prediction of photochemical properties of dissolved organic matter using machine learning. Environ. Sci. Technol. 2023;57(46):17971–17980. doi: 10.1021/acs.est.2c07545. [DOI] [PubMed] [Google Scholar]
- Chebotaev P. P., Buglak A. A., Sheehan A., Filatov M. A.. Predicting fluorescence to singlet oxygen generation quantum yield ratio for BODIPY dyes using QSPR and machine learning. Phys. Chem. Chem. Phys. 2024;26(38):25131–25142. doi: 10.1039/D4CP02471K. [DOI] [PubMed] [Google Scholar]
- He L., Dong J., Yang Y., Huang Z., Ye S., Ke X., Zhou Y., Li A., Zhang Z., Wu S.. et al. Accelerating the discovery of type ii photosensitizer: experimentally validated machine learning models for predicting the singlet oxygen quantum yield of photosensitive molecule. J. Mol. Struct. 2025;1321:139850. doi: 10.1016/j.molstruc.2024.139850. [DOI] [Google Scholar]
- Buglak A. A., Filatov M. A., Hussain M. A., Sugimoto M.. Singlet oxygen generation by porphyrins and metalloporphyrins revisited: a quantitative structure-property relationship (QSPR) study. Journal of Photochemistry and Photobiology a: Chemistry. 2020;403:112833. doi: 10.1016/j.jphotochem.2020.112833. [DOI] [Google Scholar]
- Ju C. W., Bai H., Li B., Liu R.. Machine learning enables highly accurate predictions of photophysical properties of organic fluorescent materials: emission wavelengths and quantum yields. J. Chem. Inf. Model. 2021;61(3):1053–1065. doi: 10.1021/acs.jcim.0c01203. [DOI] [PubMed] [Google Scholar]
- Mahato K. D., Kumar Das S. S. G., Azad C., Kumar U.. Machine learning based hybrid ensemble models for prediction of organic dyes photophysical properties: absorption wavelengths, emission wavelengths, and quantum yields. APL Mach. Learn. 2024;2(1):016101. doi: 10.1063/5.0181294. [DOI] [Google Scholar]
- Khaheshi S., Riahi S., Mohammadi-Khanaposhtani M., Shokrollahzadeh H.. Prediction of amines capacity for carbon dioxide absorption based on structural characteristics. Ind. Eng. Chem. Res. 2019;58(20):8763–8771. doi: 10.1021/acs.iecr.9b00567. [DOI] [Google Scholar]
- Wang X., Zhang T., Zhang H., Wang X., Xie B., Fan W.. Combined DFT and machine learning study of the dissociation and migration of h in pyrrole derivatives. J. Phys. Chem. A. 2023;127(35):7383–7399. doi: 10.1021/acs.jpca.3c03192. [DOI] [PubMed] [Google Scholar]
- Mohamed A., Visco D. P., Breimaier K., Bastidas D. M.. Effect of molecular structure on the b3LYP-computed HOMO–LUMO gap: a structure – property relationship using atomic signatures. ACS Omega. 2025;10(3):2799–2808. doi: 10.1021/acsomega.4c08626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Z., Lu T., Chen Q.. An sp-hybridized all-carboatomic ring, cyclo[18]carbon: electronic structure, electronic spectrum, and optical nonlinearity. Carbon. 2020;165:461–467. doi: 10.1016/j.carbon.2020.05.023. [DOI] [Google Scholar]
- Buglak A. A., Telegina T. A., Vorotelyak E. A., Kononov A. I.. Theoretical study of photoreactions between oxidized pterins and molecular oxygen. Journal of Photochemistry and Photobiology a: Chemistry. 2019;372:254–259. doi: 10.1016/j.jphotochem.2018.12.002. [DOI] [Google Scholar]
- Ouattara W. P., Bamba K., Thomas A. S., Diarrassouba F., Ouattara L., Ouattara M. P., N’Guessan K. N., Kone M. G. R., Kodjo C. G., Ziao N.. Theoretical studies of photodynamic therapy properties of azopyridine δ-OsCl2(azpy)2 complex as a photosensitizer by a TDDFT method. Computational Chemistry. 2021;09(01):64–84. doi: 10.4236/cc.2021.91004. [DOI] [Google Scholar]
- Neese F.. The ORCA program system. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2012;2(1):73–78. doi: 10.1002/wcms.81. [DOI] [Google Scholar]
- Neese F.. Software update: TheORCA program systemversion 5.0. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2022;12(5):e1606. doi: 10.1002/wcms.1606. [DOI] [Google Scholar]
- Adamo C., Barone V.. Toward reliable density functional methods without adjustable parameters: the PBE0 model. J. Chem. Phys. 1999;110(13):6158–6170. doi: 10.1063/1.478522. [DOI] [Google Scholar]
- Weigend F., Ahlrichs R.. Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for h to rn: design and assessment of accuracy. Phys. Chem. Chem. Phys. 2005;7(18):3297–3305. doi: 10.1039/b508541a. [DOI] [PubMed] [Google Scholar]
- Weigend F.. Accurate coulomb-fitting basis sets for h to rn. Phys. Chem. Chem. Phys. 2006;8(9):1057–1065. doi: 10.1039/b515623h. [DOI] [PubMed] [Google Scholar]
- Barone V., Cossi M.. Quantum calculation of molecular energies and energy gradients in solution by a conductor solvent model. J. Phys. Chem. A. 1998;102(11):1995–2001. doi: 10.1021/jp9716997. [DOI] [Google Scholar]
- Garcia-Rates M., Neese F.. Effect of the solute cavity on the solvation energy and its derivatives within the framework of the gaussian charge scheme. J. Comput. Chem. 2020;41(9):922–939. doi: 10.1002/jcc.26139. [DOI] [PubMed] [Google Scholar]
- Hellweg A., Hättig C., Höfener S., Klopper W.. Optimized accurate auxiliary basis sets for RI-MP2 and RI-CC2 calculations for the atoms rb to rn. Theor. Chem. Acc. 2007;117(4):587–597. doi: 10.1007/s00214-007-0250-5. [DOI] [Google Scholar]
- Marenich A. V., Cramer C. J., Truhlar D. G.. Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions. J. Phys. Chem. B. 2009;113(18):6378–6396. doi: 10.1021/jp810292n. [DOI] [PubMed] [Google Scholar]
- Lu T., Chen F.. Multiwfn: a multifunctional wavefunction analyzer. J. Comput. Chem. 2012;33(5):580–592. doi: 10.1002/jcc.22885. [DOI] [PubMed] [Google Scholar]
- Vetere V., Adamo C., Maldivi P.. Performance of the `parameter free’ PBE0 functional for the modeling of molecular properties of heavy metals. Chem. Phys. Lett. 2000;325(1):99–105. doi: 10.1016/S0009-2614(00)00657-6. [DOI] [Google Scholar]
- Waller M. P., Braun H., Hojdis N., Bühl M.. Geometries of second-row transition-metal complexes from density-functional theory. J. Chem. Theory Comput. 2007;3(6):2234–2242. doi: 10.1021/ct700178y. [DOI] [PubMed] [Google Scholar]
- Golbraikh A., Tropsha A.. Beware of q2. J. Mol. Graph. 2002;20(4):269–276. doi: 10.1016/S1093-3263(01)00123-1. [DOI] [PubMed] [Google Scholar]
- Shapley, L. S. A value for n-person games. In Contributions to the Theory of Game II; Princeton University Press: 1953; pp 307–318. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.








