Skip to main content
ACS Omega logoLink to ACS Omega
. 2025 Oct 14;10(42):50643–50651. doi: 10.1021/acsomega.5c08722

Enhanced Prediction of Absorption and Emission Wavelengths of Organic Compounds through Hybrid Graph Neural Network Architectures

Dat P Nguyen , Quyet M Le , Hoa T P Tran , Phuc T Le , Tin V T Nguyen ‡,*
PMCID: PMC12573182  PMID: 41179196

Abstract

The prediction of absorption and emission spectra is a crucial task in chemistry and materials science, traditionally addressed with computational methods, such as density functional theory (DFT). While DFT can deliver with high accuracy, its substantial computational cost and long run times hinder its scalability for large molecular systems. In this study, we present an efficient graph neural network (GNN)-based approach for predicting optical properties. Our method combines molecular graph representations with molecular fingerprints, allowing the model to capture detailed structural and electronic features, as well as solvent effects. This simple yet effective GNN framework achieves R 2 values of up to 0.92 for the maximum absorption wavelength and 0.83 for the maximum emission wavelength, significantly improving prediction accuracy while requiring substantially fewer computational resources. Including explicit solvent fingerprints further improves absorption predictions to R 2 = 0.96 on the test set, with solvent-specific subsets reaching R 2 ≈ 0.99 (e.g., methanol). The results highlight the potential of GNNs to transform spectral prediction workflows and pave the way for accelerated molecular design and materials discovery.


graphic file with name ao5c08722_0011.jpg


graphic file with name ao5c08722_0009.jpg

Introduction

The prediction of absorption and emission spectra is a fundamental task in chemistry and materials science, with significant implications for the design of new molecules and materials with the desired optical properties. Traditional computational methods, including density functional theory (DFT) and time-dependent density functional theory (TD-DFT), have been widely used to simulate spectral features. While these methods can provide accurate results, they are often computationally intensive and time-consuming, especially for large molecular systems. This has led to a growing interest in machine learning (ML) approaches as a means to accelerate spectral predictions while maintaining accuracy. Machine learning has demonstrated remarkable success in various spectroscopy analyses, offering improved efficiency and new capabilities across techniques, such as Raman, near-infrared (NIR), , and laser-induced breakdown spectroscopy (LIBS). In these areas, ML models have been employed to preprocess and analyze complex spectral data, automate tasks like predicting spectral properties, denoising, and enhancing classification and recognition tasks.

Conventional machine learning techniques have been employed to efficiently predict absorption and emission spectra of organic molecules, offering alternatives to computationally intensive methods like DFT. Mahato et al. developed hybrid ensemble models combining multi-layer perceptron regression and extra trees regression to predict the wavelengths of organic dyes using a data set comprising of 3066 fluorescent materials represented by Morgan fingerprints. Their model achieved an R 2 of 0.97 with a mean absolute error (MAE) of 8.9 nm for the absorption wavelength prediction. Similarly, Souza et al. used machine learning models trained on the Deep4Chem data set, which includes over 20,000 data points for 7016 chromophores and 365 solvents, to predict fluorescence emission wavelengths. Molecules were represented using SMILES strings to generate descriptors like Morgan fingerprints and MACCS keys. Their Random Forest model was the top performer, achieving a root-mean-square error (RMSE) of 28.8 nm and an R 2 of 0.90, indicating strong generalization capabilities. Advanced ML models, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and gradient boosting models, have also surpassed traditional computational methods in predictive performance. These studies demonstrate that properly optimized machine learning models, utilizing appropriate molecular descriptors, can accurately predict the photophysical properties of organic molecules, providing valuable tools for material design and spectral analysis.

Graph neural networks (GNNs) have emerged as powerful tools for predicting optical properties by effectively modeling molecular structures and capturing the electronic features influencing spectral behavior. Greenman et al. utilized directed message passing neural networks (D-MPNNs) with learned embeddings and a multifidelity approach incorporating over 28,000 TD-DFT calculations to predict absorption peak wavelengths of dye molecules. Their optimized model achieved impressive accuracy, with mean absolute errors (MAE) less than 7 nm on random splits, 14 nm when splitting by dye molecules, and 19 nm on scaffold splits, along with an R2 of 0.80. Meanwhile, Joung et al. developed a graph convolutional network (GCN)-based model to predict key optical properties such as absorption and emission peak positions. By accounting for chromophore-solvent interactions through concatenated feature vectors and training on an extensive database of 30,094 chromophore-solvent combinations, their model achieved an RMSE of 26.6 nm for absorption peaks and 28.0 nm for emission peaks, with R 2 values up to 0.93. These studies demonstrate that GNNs, when adeptly applied, can accurately predict optical properties, providing valuable insights for the material design and spectral analysis.

Despite these advancements, predicting absorption and emission spectra using ML presents several challenges. The complex physical interactions governing spectral features require models that can accurately capture the nuances of structure–property relationships. Additionally, the interpretability of ML models is crucial, as understanding how predictions are made ensures that models are reliable and not merely produce coincidental results. Data requirements and the need for appropriate molecular feature representations also pose significant obstacles, particularly when dealing with limited data sets or ensuring model generalization across diverse chemical spaces.

In this study, we explore the use of various GNN architectures that leverage both molecular graphs and molecular fingerprints to predict absorption and emission spectra (Figure ). We aim to address the key research questions in the field, such as improving the accuracy and efficiency of predictions and developing molecular feature representations that capture critical chemical properties. By systematically evaluating the performance of these GNN models against conventional methods, we seek to demonstrate their effectiveness and outline the potential pathways for integrating machine learning approaches into spectral prediction and molecular design.

1.

1

Hybrid GNN structure for the prediction of absorption and emission wavelengths.

Experimental Methods

Data and Code Availability

All of the code and data necessary to reproduce this work are publicly available. The full repository, which contains the processed data sets and the notebooks used for data preprocessing, training and evaluation of GNN models, is hosted at: https://github.com/phatdatnguyen/hybrid-GNN-optical-property-prediction. An archival snapshot corresponding to the version used for the manuscript is available at: 10.5281/zenodo.17112437.

Data Acquisition

The data set for this study was systematically compiled from the database provided by Park and co-workers, which aggregates experimental data on the optical properties of organic compounds from previously published scientific studies. This comprehensive data set includes key parameters extracted from the absorption and emission spectra of various compounds, such as the maximum absorption wavelength (λabs,max), maximum emission wavelength (λemi,max), extinction coefficients (σabs and σemi), quantum yield (ΦQY), and excited-state lifetimes (τflu). Molecular structures were encoded by using the simplified molecular input line entry system (SMILES) format to facilitate efficient processing and feature extraction in subsequent analyses. All numerical parameters were normalized to a unified scale ranging from 0 to 1 to ensure consistency and improve the performance of the models. Unless otherwise noted, reported wavelength metrics are transformed back to nm.

Data Preprocessing

The original data set contained 20,236 entries. After cleaning (removing entries lacking absorption data and deleting irrelevant columns), we obtained a refined data set comprising 6428 unique chromophores with absorption data. This cleaned data set included three key columns: the SMILES representations of the molecules, their corresponding absorption wavelengths in nanometers (nm), and the solvents used for the measurements. A similar preprocessing procedure was applied to the emission data, resulting in an emission data set with 6407 compounds. Additionally, the fluorophores were classified into different classes by computational pattern matching, enabling us to analyze and visualize the distribution of different fluorophore classes within the data set (see Figure ).

2.

2

Histograms of absorption and emission wavelengths of organic compounds from the cleaned dataset: (a) all chromophores, (b) anthracene, (c) pyrene, (d) coumarin, (e) 1,8-naphthalimide, and (f) BODIPY.

Data Splitting Protocol

Prior to training, the data sets were divided into three subsets (used for training, validation, and testing) in a 60:20:20 ratio. For solvent-agnostic experiments (where absorption and emission measurements were averaged across solvents per chromophore), a random split at the chromophore level was used. For solvent-aware experiments (chromophore–solvent pairs), we applied a grouped random split by the chromophore so that all measurements of a given chromophore (across solvents) were assigned to the same partition, preventing information leakage across splits. Unless otherwise noted, all metrics reported in the main text correspond to the held-out test set from this random split protocol.

GNN Model Training and Evaluation

Our GNN models consist of several components (Figures S1 and S2). Graph convolutional network (GCN) blocks are used to extract meaningful features from the graph data. These blocks employ various types of graph convolutional layers, particularly those incorporating attention mechanisms (e.g., GATConv, GATv2Conv, TransformerConv, and GeneralConv), to effectively capture structural and relational information. Meanwhile, multilayer perceptron (MLP) blocks process the molecular fingerprint data. Together, these components work synergistically to enhance prediction accuracy and overall model performance. The output embeddings from these blocks were concatenated and passed through a predictor block to generate the final output. The models were trained with the Adam optimizer using specific sets of hyperparameters (choice of convolutional layer, hidden dimension, number of predictor layers, dropout).

Results and Discussion

GNN Models for the Prediction of Absorption Wavelengths

First, we investigated the effect of different combinations of graph featurizers and molecular fingerprints on the prediction of the maximum absorption wavelength. We temporarily ignored the effect of solvents by averaging the measurements for each chromophore across different solvents and used the architecture shown in Figure S1. The evaluation on the test set shows that the combination of the MolGraphConvFeaturizer (MGCF) with the Avalon fingerprint (AvalonFP) yields the best average predictive performance for the absorption wavelength (Figure ) with an R2 value of 0.9152. Although the numerical differences among fingerprints are modest, a one-sided paired t-test under matched conditions indicates that AvalonFP achieves significantly higher R 2 than all other molecular fingerprints evaluated (p < 0.05; Table S5). When the performance of different graph featurizers is examined, it is apparent that they all yield a similarly strong performance. The average R2 values associated with MolGraphConvFeaturizer, DMPNNFeaturizer (DMPNNF), and PagtnMolGraphFeaturizer (PMGF) are quite comparable, being 0.8711, 0.8605, and 0.8603, respectively, suggesting that while the choice of the graph featurizer is important, varying between these particular options does not drastically affect model performance.

3.

3

R 2 values of GNN models for the prediction of absorption wavelength with different combinations of graph featurizers and molecular fingerprints.

Regarding the performance of molecular fingerprints, the Avalon fingerprint notably outperforms other types, as indicated by an average R 2 value of 0.9011. This trend underscores the particular efficacy of the Avalon fingerprint in capturing relevant molecular substructures and features vital for the accurate wavelength prediction. The Mordred descriptors (Mordred), along with the Layered, Pattern, and RDKit fingerprints (LayeredFP, PatternFP, and RDKitFP), show a slightly lower precision than AvalonFP. Conversely, the Morgan circular fingerprint (MorganFP), atom-pairs fingerprint (APFP), and topological-torsion fingerprint (TTFP), while widely used for their robust molecular representations, exhibit higher errors compared to AvalonFP, indicating less precision in capturing molecular structural data for predicting absorption wavelengths.

It is important to note that GNN models that neglect the use of molecular graphs or fingerprints perform worse than any combination of the two, as shown by their highest RMSE and lowest R2 values. This observation underscores the critical role that detailed molecular representations play in achieving accurate and reliable predictions, highlighting the necessity of incorporating both molecular graphs and fingerprints into the modeling process.

Using the most effective combination of MGCF and AvalonFP, we further optimized our GNN architecture to improve the prediction of absorption wavelengths (see Table ). After training the models for 100 epochs, we observed that the inclusion of a predictor block did not significantly affect the model performance for the test set, as indicated by R 2 values ranging from 0.9088 to 0.9151 (structures 1–4). Among the different convolutional layers tested, the TransformerConv layer, when used with a single predictor linear layer, provided the best predictions among the architectures (structure 6). Additionally, increasing the size of the model by changing the number of neurons did not consistently enhance the performance (structures 8–10). The optimized model utilizing TransformerConv was then trained for 500 epochs to achieve a higher precision, resulting in an R 2 value of 0.9979 for the training set and 0.9220 for the test set (see Figure ).

1. Optimization of the GNN Model for the Prediction of Absorption Wavelengths.

structure predictor layers type of graph convolutional layer hidden neurons R 2
1 0 GATv2Conv 128 0.9152
2 1 GATv2Conv 128 0.9155
3 2 GATv2Conv 128 0.9111
4 3 GATv2Conv 128 0.9088
5 1 GATConv 128 0.9085
6 1 TransformerConv 128 0.9196
7 1 GeneralConv 128 0.9123
8 1 TransformerConv 64 0.9136
9 1 TransformerConv 256 0.9054
10 1 TransformerConv 512 0.9122
11 1 TransformerConv 128 0.9220
a

Number of linear layers in the predictor block.

b

Number of hidden neurons in each convolutional and linear layer.

c

The model was trained for 500 epochs.

4.

4

Evaluation of GNN model with MolGraphConvFeaturizer and Avalon fingerprint for prediction of maximum absorption wavelengths on the (a) train set and (b) test set.

We assessed the relative importance of graph- and fingerprint-derived information by linearly mixing their embeddings before the predictor and retraining our optimized model across mixture ratios of graph and fingerprint embeddings. On the solvent-agnostic absorption task, hybrid models consistently outperformed either input alone, with performance peaking at a 50:50 composition (R 2 = 0.9196, RMSE = 31.5 nm), indicating that graphs and fingerprints provide a complementary signal (Figure S8). Our model achieves the performance comparable to that of high-end machine learning approaches such as random forest (RF) and XGBoost (XGB), as well as state-of-the-art molecular property predictors including the directed message passing neural network (DMPNN) and the equivariant vector-scalar interactive graph neural network (ViSNet) (Figures S15–S17).

GNN Models for Prediction of Emission Wavelengths

The results indicate that predicting emission wavelengths is generally more challenging than predicting absorption wavelengths (Figure ). Using AvalonFP, the best-performing fingerprint in our studies, we achieved an average RMSE of 44.79 nm, which is significantly higher than that of the absorption prediction models. The combination of the DMPNNFeaturizer with the Avalon fingerprint emerged as the most effective, yielding an RMSE of 44.02 nm. Further optimization of our GNN architecture resulted in a model that can achieve an R 2 of 0.9986 for the training set, while the R 2 for the test set remains at 0.8269 (see Figure S14). Notably, the differences between the performance of different graph featurizers are subtle, indicating that the choice of the molecular fingerprint might play a more critical role in enhancing the prediction accuracy.

5.

5

Evaluation of GNN models with Avalon fingerprint for prediction of maximum absorption and emission wavelengths: (a) RMSE and (b) R2.

In alignment with the observations from absorption prediction, the Avalon fingerprint consistently remains the top performer, underscoring its robustness in capturing molecular features pertinent to the optical properties. Similar to the observations for absorption wavelength predictions, GNN models that do not utilize either molecular graph or fingerprint data exhibit a decline in the performance, emphasizing the necessity of integrating both of these elements for more reliable predictions. Overall, these findings highlight the critical role of both graph representations and molecular fingerprints in advancing the predictive accuracy of emission wavelengths.

Emission wavelengths are more difficult to predict than absorption because they depend on excited-state relaxation and solvent reorganization following photoexcitation. These effects are sensitive to excited-state geometry changes, intramolecular charge transfer, and vibronic coupling, which are only weakly encoded by ground-state molecular graphs and solvent fingerprints. Theoretical methods like TD-DFT show systematically larger errors for emission energies than for absorption, as confirmed by benchmarking studies. ,

Effect of Solvents on the GNN Model Performance

The analysis of our GNN models across different solvents reveals that the choice of the solvent significantly impacts a predictive performance for both absorption and emission wavelengths (Tables S12 and S13). Generally, the models achieve a higher precision in methanol, chloroform, and dichloromethane, while solvents, such as toluene, acetonitrile, and tetrahydrofuran (THF) show a poorer performance. For absorption wavelength predictions, the best results are observed in methanol, with an RMSE of 35.72 nm and an R 2 of 0.9050, whereas the performance declines in acetonitrile, where the R 2 drops to 0.8236. Emission wavelength predictions are overall less accurate, with the highest performance again seen in methanol (R 2 = 0.8068) and a notable decrease in THF (R 2 = 0.7024). These variations suggest that the model’s ability to predict absorption wavelengths is influenced by solvent-specific interactions that may not be fully captured by utilizing molecular features alone.

To further evaluate the contribution of solvent effects to the prediction of absorption wavelengths, we incorporated an additional MLP module specifically designed to process solvent fingerprints alongside molecular features of the chromophores. As shown in Figure , the inclusion of solvent fingerprints, together with the best combination of MGCF and AvalonFP for the chromophores, led to an improvement in the model performance across nearly all types of molecular fingerprints tested (with the exception of PatternFP). The most pronounced improvement was observed with RDKitFP, which achieved the highest R 2 of 0.9560. This result demonstrates that explicitly modeling solvent fingerprints with an independent MLP module can enhance absorption wavelength predictions by allowing the model to better capture solvent–chromophore interactions. The GCN models reported by Joung et al., which adopted a similar strategy for simulating chromophore-solvent interactions, also achieved a high predictive accuracy (R 2 = 0.93).

6.

6

Evaluation of GNN models for the absorption wavelength prediction with different molecular fingerprints of solvents.

However, RDKit fingerprints did not improve the model performance for all solvents. For instance, only 7 out of the 10 common solvents in the data set showed an increase in R 2 when solvent features were included (see Figure ). Entries with methanol (MeOH) showed the greatest accuracy when incorporating the solvent RDKit fingerprint (R 2 = 0.9884). Notably, dimethylformamide (DMF) exhibits the most significant improvement, with R 2 increasing from approximately 0.9391 (without the solvent feature) to 0.9751 (with RDKitFP), indicating a substantial gain in the predictive accuracy. Conversely, for solvents such as chloroform, tetrahydrofuran (THF), and cyclohexane, the R 2 values slightly decrease upon the incorporation of solvent fingerprints, implying that the additional solvent features may introduce noise rather than useful information. We hypothesize that these solvent-dependent outcomes are the result of the magnitude and nature of solvent–chromophore interactions. Protic or strongly polar, hydrogen-bonding solvents (e.g., MeOH, DMF) tend to induce larger solvatochromic shifts; their RDKit fingerprints also contain informative functional groups (e.g., hydroxyl, carbonyl), which the model can leverage. In contrast, nonpolar or structurally simple solvents (e.g., cyclohexane, chloroform, and toluene) generally produce weaker shifts and yield low-entropy fingerprints that add parameters without a commensurate signal, leading to mild overfitting and small performance drops. In summary, solvent features contribute valuable information that can enhance the model performance depending on the solvent, with the most significant benefit observed for DMF. However, this approach is not universally effective across all solvents, highlighting the importance of solvent-specific considerations in modeling.

7.

7

Performance of RDKitFP for different solvents in the prediction of absorption wavelengths.

Solvent effects are not the sole factors influencing the model performance; prediction accuracy also varies substantially across different classes of chromophores (Figure ). Coumarin (R 2 = 0.9732) and BODIPY (R 2 = 0.9704) compounds exhibit the highest predictive accuracies, likely due to their relatively consistent molecular frameworks and well-characterized optical properties. In contrast, pyrene, naphthalimide, and anthracene derivatives yield a significantly lower predictive performance (R 2 = 0.7791, 0.8345, and 0.8447, respectively), suggesting that the models have a greater difficulty generalizing to these more structurally diverse classes. These disparities underscore the need for class-specific modeling strategies to improve the performance for chemically diverse chromophores and emphasize that the predictive accuracy must be interpreted in the context of both structural complexity and representation within the training data set.

8.

8

Absorption wavelength prediction results of different chromophore classes in the test dataset using hybrid GNN model with RDKitFP for solvents: (a) all chromophores, (b) Anthracene, (c) BODIPY, (d) Coumarin, (e) 1,8-Naphthalimide, and (f) Pyrene.

Further analysis of our data revealed that most outlier points correspond to chromophores containing multiple fluorescent groups. For example, a single extreme outlier in the anthracene subset (highlighted in Figure ) has a measured maximum absorption wavelength of 867 nm, while the model predicts 613.15 nm (absolute error ≈254 nm). Its structure contains anthracene, pyrene, and a BODIPY fragment integrated into an expanded π-system. Within our “anthracene derivative” subset, the training distribution is concentrated below 700 nm; consequently, the model regresses toward the center of that distribution and underestimates this highly red-shifted example. We also note that the measurement is in toluene, one of the solvents for which our solvent-aware models show a lower overall R 2, potentially compounding the error. We retained these data points to illustrate the limitation of single-label class visualization when molecules contain multiple chromophoric motifs.

Conclusions

In this study, we demonstrated that integrating both molecular fingerprints and graph-based representations within a GNN architecture significantly improves the accuracy of predicting the optical properties of organic chromophores. Specifically, the hybrid model combining the MolGraphConvFeaturizer’s graph output and Avalon fingerprint of the chromophores and RDKit fingerprint of the solvents achieved an R 2 of 0.9560 for absorption wavelength predictions. Our results are competitive with recently reported machine learning approaches; however, direct cross-study comparisons should be interpreted cautiously given differences in the data set composition, preprocessing, and splitting protocols. While absorption wavelength predictions were notably successful, emission wavelength predictions remained more challenging. The consistently lower R 2 values for the emission suggest that emission processes are governed by more complex, possibly excited-state-specific factors that are not fully captured by ground-state molecular features alone.

A key finding of this work is the critical role of solvent information in enhancing the predictive accuracy. Incorporating solvent fingerprints into the model architecture led to substantial improvements for many solvents, highlighting the importance of accounting for solvent–chromophore interactions. However, the benefit was not consistent across all solvents, indicating that solvent-specific modeling strategies may be necessary for optimal performance. Overall, our results underscore the potential of hybrid GNN models for optical property prediction and emphasize the importance of considering both the molecular structure and environmental context in model development.

Supplementary Material

ao5c08722_si_001.pdf (3.7MB, pdf)

Acknowledgments

We would like to thank Youth Promotion Science and Technology Center of the Ho Chi Minh City Communist Youth Union and the Department of Science and Technology of Ho Chi Minh City for their support of this project. We also thank Mr. Alexandre A. Schoepfer for his valuable comments, which helped improve the manuscript.

Glossary

Abbreviations

DFT

density functional theory

ML

machine learning

GNN

graph neural network

CNN

convolutional neural network

RNN

recurrent neural network

GCN

graph convolutional network

MLP

multilayer perceptron

TD-DFT

time-dependent density functional theory

RF

random forest

XGB

extreme gradient boosting

DMPNN

directed message passing neural network

ViSNet

equivariant vector-scalar interactive graph neural network

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.5c08722.

  • Data set description and preprocessing details; molecular graph featurizers and molecular fingerprints; hybrid GNN model architectures and training parameters; evaluation metrics; results of GNN models for absorption wavelength prediction; results of GNN models for emission wavelength prediction; comparison with RF, XGB, DMPNN, and ViSNet models; and solvent effect on the GNN model performance (PDF)

D.P.N. planned the research, conducted all model training and evaluation, and prepared the materials for writing the manuscript and Supporting Information. Q.M.L., H.T.P.T., and P.T.L. contributed to data processing and assisted in preparing the materials for the manuscript and Supporting Information. T.V.T.N supervised the research, participated in drafting and finalizing the manuscript and proof-read the Supporting Information.

The study was supported by the Science and Technology Incubation Program for Youth (STIY), managed by the Youth Promotion Science and Technology Center of the Ho Chi Minh City Communist Youth Union and the Department of Science and Technology of Ho Chi Minh City, under contract number “28/2024/HD̵-KHCNT-VU”.

The authors declare no competing financial interest.

References

  1. Adamo C., Jacquemin D.. The Calculations of Excited-State Properties with Time-Dependent Density Functional Theory. Chem. Soc. Rev. 2013;42(3):845–856. doi: 10.1039/C2CS35394F. [DOI] [PubMed] [Google Scholar]
  2. Ramirez C. M., Greenop M., Ashton L., Rehman I.. Applications of Machine Learning in Spectroscopy. Appl. Spectrosc. Rev. 2021;56:733–763. doi: 10.1080/05704928.2020.1859525. [DOI] [Google Scholar]
  3. Ye Z.-R., Huang I. S., Chan Y.-T., Li Z.-J., Liao C.-C., Tsai H.-R., Hsieh M.-C., Chang C.-C., Tsai M.-K.. Predicting the Emission Wavelength of Organic Molecules Using a Combinatorial QSAR and Machine Learning Approach. RSC Adv. 2020;10(40):23834–23841. doi: 10.1039/D0RA05014H. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Mamede R., Pereira F., Aires-de-Sousa J.. Machine Learning Prediction of UV–Vis Spectra Features of Organic Compounds Related to Photoreactive Potential. Sci. Rep. 2021;11(1):23720. doi: 10.1038/s41598-021-03070-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Xia Y., Wang G., Lv Y., Shao C., Yang Z.. Prediction of Light Absorption Properties of Organic Dyes Using Machine Learning Technology. Chem. Phys. Lett. 2024;836:141030. doi: 10.1016/j.cplett.2023.141030. [DOI] [Google Scholar]
  6. Stanev V., Maehashi R., Ohta Y., Takeuchi I.. Machine Learning Modeling of the Absorption Properties of Azobenzene Molecules. Artif. Intell. Chem. 2023;1(1):100002. doi: 10.1016/j.aichem.2023.100002. [DOI] [Google Scholar]
  7. Qi Y., Hu D.-P., Jiang Y., Wu Z., Zheng M., Chen E., Liang Y., Sadi M., Zhang K., Chen Y.. Recent Progresses in Machine Learning Assisted Raman Spectroscopy. Adv. Opt. Mater. 2023;11(14):2203104. doi: 10.1002/adom.202203104. [DOI] [Google Scholar]
  8. Zhang W., Kasun L., Wang Q.-J., Zheng Y., Lin Z.. A Review of Machine Learning for Near-Infrared Spectroscopy. Sensors. 2022;22(24):9764. doi: 10.3390/s22249764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Yu H. Y., Muthiah B., Li S. C., Yu W. Y., Li Y. P.. Surface Characterization of Cerium Oxide Catalysts Using Deep Learning with Infrared Spectroscopy of CO. Mater. Today Sustain. 2023;24:100534. doi: 10.1016/j.mtsust.2023.100534. [DOI] [Google Scholar]
  10. Zhang D., Zhang H., Zhao Y., Chen Y., Ke C., Xu T., He Y.. A Brief Review of New Data Analysis Methods of Laser-Induced Breakdown Spectroscopy: Machine Learning. Appl. Spectrosc. Rev. 2022;57:89–111. doi: 10.1080/05704928.2020.1843175. [DOI] [Google Scholar]
  11. Ho Manh L., Chen V. C. P., Rosenberger J., Wang S., Yang Y., Schug K. A.. Prediction of Vacuum Ultraviolet/Ultraviolet Gas-Phase Absorption Spectra Using Molecular Feature Representations and Machine Learning. J. Chem. Inf. Model. 2024;64(14):5547–5556. doi: 10.1021/acs.jcim.4c00676. [DOI] [PubMed] [Google Scholar]
  12. Li S.-C., Wu H., Menon A., Spiekermann K. A., Li Y.-P., Green W. H.. When Do Quantum Mechanical Descriptors Help Graph Neural Networks to Predict Chemical Properties? J. Am. Chem. Soc. 2024;146(33):23103–23120. doi: 10.1021/jacs.4c04670. [DOI] [PubMed] [Google Scholar]
  13. Shao J., Liu Y., Yan J., Yan Z.-Y., Wu Y., Ru Z., Liao J.-Y., Miao X., Qian L.. Prediction of Maximum Absorption Wavelength Using Deep Neural Networks. J. Chem. Inf. Model. 2022;62(6):1368–1375. doi: 10.1021/acs.jcim.1c01449. [DOI] [PubMed] [Google Scholar]
  14. Muthiah B., Li S.-C., Li Y.-P.. Developing Machine Learning Models for Accurate Prediction of Radiative Efficiency of Greenhouse Gases. J. Taiwan Inst. Chem. Eng. 2023;151:105123. doi: 10.1016/j.jtice.2023.105123. [DOI] [Google Scholar]
  15. Mahato K. D., Kumar Das S. S. G., Azad C., Kumar U.. Machine Learning Based Hybrid Ensemble Models for Prediction of Organic Dyes Photophysical Properties: Absorption Wavelengths, Emission Wavelengths, and Quantum Yields. APL Mach. Learn. 2024;2(1):016101. doi: 10.1063/5.0181294. [DOI] [Google Scholar]
  16. Souza R. C., Duarte J. C., Goldschmidt R. R., Borges I. Jr. Predicting Fluorescence Emission Wavelengths and Quantum Yields via Machine Learning. J. Chem. Inf. Model. 2025;65(7):3270–3281. doi: 10.1021/acs.jcim.4c02403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Jung S. G., Jung G., Cole J. M.. Automatic Prediction of Peak Optical Absorption Wavelengths in Molecules Using Convolutional Neural Networks. J. Chem. Inf. Model. 2024;64(5):1486–1501. doi: 10.1021/acs.jcim.3c01792. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Urbina F., Batra K., Luebke K. J., White J. D., Matsiev D., Olson L. L., Malerich J. P., Hupcey M. A. Z., Madrid P. B., Ekins S.. UV-adVISor: Attention-Based Recurrent Neural Networks to Predict UV–Vis Spectra. Anal. Chem. 2021;93(48):16076–16085. doi: 10.1021/acs.analchem.1c03741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Ochiai H., Kaneko H.. Construction of Machine Learning Models to Predict the Maximum Absorption Wavelength Considering the Solute and Solvent and Inverse Analysis of the Models. ACS Omega. 2025;10(1):665–672. doi: 10.1021/acsomega.4c07490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Ksenofontov A. A., Lukanov M. M., Bocharov P. S.. Can Machine Learning Methods Accurately Predict the Molar Absorption Coefficient of Different Classes of Dyes? Spectrochim. Acta Part A Mol. Spectrosc. 2022;279:121442. doi: 10.1016/j.saa.2022.121442. [DOI] [PubMed] [Google Scholar]
  21. Ju C.-W., Bai H., Li B., Liu R.. Machine Learning Enables Highly Accurate Predictions of Photophysical Properties of Organic Fluorescent Materials: Emission Wavelengths and Quantum Yields. J. Chem. Inf. Model. 2021;61(3):1053–1065. doi: 10.1021/acs.jcim.0c01203. [DOI] [PubMed] [Google Scholar]
  22. Tannir S., Pan Y., Josephs N., Cunningham C., Hendrick N. R., Beckett A., McNeely J., Beeler A., Jeffries-El M., Kolaczyk E. D.. Predicting Emission Wavelengths in Benzobisoxazole-Based OLEDs with Gradient Boosted Ensemble Models. J. Phys. Chem. A. 2024;128(30):6116–6123. doi: 10.1021/acs.jpca.4c00077. [DOI] [PubMed] [Google Scholar]
  23. Wieder O., Kohlbacher S., Kuenemann M., Garon A., Ducrot P., Seidel T., Langer T.. A Compact Review of Molecular Property Prediction with Graph Neural Networks. Drug Discovery Today: Technol. 2020;37:1–12. doi: 10.1016/j.ddtec.2020.11.009. [DOI] [PubMed] [Google Scholar]
  24. Reiser P., Neubert M., Eberhard A., Torresi L., Zhou C., Shao C., Metni H., van Hoesel C., Schopmans H., Sommer T., Friederich P.. Graph Neural Networks for Materials Science and Chemistry. Commun. Mater. 2022;3(1):93. doi: 10.1038/s43246-022-00315-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Shi X., Zhou L., Huang Y., Wu Y., Hong Z.. A Review on the Applications of Graph Neural Networks in Materials Science at the Atomic Scale. MGE Adv. 2024;2(2):e50. doi: 10.1002/mgea.50. [DOI] [Google Scholar]
  26. Kotobi A., Singh K., Hoche D., Bari S., Meißner R. H., Bande A.. Integrating Explainability into Graph Neural Network Models for the Prediction of X-ray Absorption Spectra. J. Am. Chem. Soc. 2023;145(41):22584–22598. doi: 10.1021/jacs.3c07513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Greenman K. P., Green W. H., Gómez-Bombarelli R.. Multi-Fidelity Prediction of Molecular Optical Peaks with Deep Learning. Chem. Sci. 2022;13(4):1152–1162. doi: 10.1039/D1SC05677H. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Joung J. F., Han M., Hwang J., Jeong M., Choi D. H., Park S.. Deep Learning Optical Spectroscopy Based on Experimental Database: Potential Applications to Molecular Design. JACS Au. 2021;1(4):427–438. doi: 10.1021/jacsau.1c00035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Ouyang Y., Zeng Y., Liu X.. Explainable Encoder–Prediction–Reconstruction Framework for the Prediction of Metasurface Absorption Spectra. Nanomaterials. 2024;14(18):1497. doi: 10.3390/nano14181497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Fan J., Qian C., Zhou S.. Machine Learning Spectroscopy Using a 2-Stage, Generalized Constituent Contribution Protocol. Research. 2023;6:0115. doi: 10.34133/research.0115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Joung J. F., Han M., Jeong M., Park S.. Experimental Database of Optical Properties of Organic Compounds. Sci. Data. 2020;7(1):295. doi: 10.1038/s41597-020-00634-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Wang Y., Wang T., Li S., He X., Li M., Wang Z., Zheng N., Shao B., Liu T.-Y.. Enhancing Geometric Representations for Molecules with Equivariant Vector-Scalar Interactive Message Passing. Nat. Commun. 2024;15(1):313. doi: 10.1038/s41467-023-43720-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Petrusevich E. F., Bousquet M. H. E., Ośmiałowski B., Jacquemin D., Luis J. M., Zaleśny R.. Cost-Effective Simulations of Vibrationally-Resolved Absorption Spectra of Fluorophores with Machine-Learning-Based Inhomogeneous Broadening. J. Chem. Theory Comput. 2023;19(8):2304–2315. doi: 10.1021/acs.jctc.2c01285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Suellen C., Freitas R. G., Loos P.-F., Jacquemin D.. Cross-Comparisons between Experiment, TD-DFT, CC, and ADC for Transition Energies. J. Chem. Theory Comput. 2019;15(8):4581–4590. doi: 10.1021/acs.jctc.9b00446. [DOI] [PubMed] [Google Scholar]
  35. Santoro F., Jacquemin D.. Going Beyond the Vertical Approximation with Time-Dependent Density Functional Theory. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2016;6(5):460–486. doi: 10.1002/wcms.1260. [DOI] [Google Scholar]
  36. Buncel E., Rajagopal S.. Solvatochromism and Solvent Polarity Scales. Acc. Chem. Res. 1990;23(7):226–231. doi: 10.1021/ar00175a004. [DOI] [Google Scholar]
  37. The preprint version of this manuscript has appeared previously in ChemRxiv Nguyen, D. P. ; Le, Q. M. ; Tran, H. T. P. ; Le, P. T. ; Nguyen, T. V. T. . Enhanced Prediction of Absorption and Emission Wavelengths of Organic Compounds through Hybrid Graph Neural Network Architectures. ChemRxiv 2025. This content is a preprint and has not been peer-reviewed. 10.26434/chemrxiv-2025-rcwxh [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ao5c08722_si_001.pdf (3.7MB, pdf)

Data Availability Statement

All of the code and data necessary to reproduce this work are publicly available. The full repository, which contains the processed data sets and the notebooks used for data preprocessing, training and evaluation of GNN models, is hosted at: https://github.com/phatdatnguyen/hybrid-GNN-optical-property-prediction. An archival snapshot corresponding to the version used for the manuscript is available at: 10.5281/zenodo.17112437.


Articles from ACS Omega are provided here courtesy of American Chemical Society

RESOURCES