Abstract

Excitation with one photon of a singlet fission (SF) material generates two triplet excitons, thus doubling the solar cell efficiency. Therefore, the SF molecules are regarded as new generation organic photovoltaics, but it is hard to identify them. Recently, it was demonstrated that molecules of low-to-intermediate diradical character (DRC) are potential SF chromophores. This prompts a low-cost strategy for finding new SF candidates by computational high-throughput workflows. We propose a machine learning aided screening for SF entrants based on their DRC. Our data set comprises 469 784 compounds extracted from the PubChem database, structurally rich but inherently imbalanced regarding DRC values. We developed well performing classification models that can retrieve potential SF chromophores. The latter (∼4%) were analyzed by K-means clustering to reveal qualitative structure–property relationships and to extract strategies for molecular design. The developed screening procedure and data set can be easily adapted for applications of diradicaloids in photonics and spintronics.
The commercially available solar cells use mainly silicon as the photoactive material, which transforms one photon into two free charge carriers.1 The disadvantage of the silicon-based photovoltaics is the detailed-balance limit,2 which determines that maximum 33% of the incoming solar energy can be converted to electricity. In 2010 Michl et al. demonstrated that the detailed-balance limit can be overcome by exploiting singlet fission (SF)—a photophysical phenomenon observed in some organic chromophores.3,4 In SF, a chromophore in a singlet excited state interacts with a neighboring chromophore in the ground state, producing two triplet excitons, thus generating four free charge carriers. Such an increase in the number of free charge carriers per photon can double the solar cells efficiency. At the molecular level, the SF propensity of a chromophore is determined by the energy differences between its first triplet (T1), first singlet (S1), and second triplet excited states (T2). SF occurs spontaneously if the molecule satisfies the following thermodynamic feasibility conditions: 2E(T1) – E(S1) ≤ 0 and 2E(T1) – E(T2) ≤ 0, where E is the excitation energy to the respective excited state.3,4 Recently, endothermic SF is also experimentally observed in the cases where 2E(T1) – E(S1) is slightly positive ∼0.2 eV.5 However, most of the existing molecules possess a relatively high-energy T1 and the feasibility conditions are unfulfilled even when considering slightly endothermic criteria. The hunt for molecules capable of SF is additionally complicated by the requirements to any photoactive solar cell material, namely, an absorption maximum at about 2 eV and air/moisture stability.3 Moreover, since the thermal losses in SF are proportional to |2E(T1) – E(S1)|, this energy difference should be slightly negative and close to zero.3
Recently, Nakano and co-workers demonstrated that among organic compounds of low-to-intermediate diradical character (DRC), good candidates can be found that simultaneously satisfy the feasibility conditions and minimize the thermal losses in SF.6,7 Therefore, nowadays, DRC is used as a key quantity in the design of SF based photovoltaic materials. DRC can be defined in the framework of the multiconfigurational self-consistent field theory as twice the weight of the doubly excited configuration in the ground state,8 and it is not a directly experimentally observable quantity.9 DRC is a measure of the open-shell character of the molecules and is related to the energy of T1. DRC varies between 0 for closed-shell systems and 1 for ideal diradicals, while the intermediate values correspond to diradicaloids. A classic example for the impact of DRC on the SF propensity can be found in the acenes family. The DRC of anthracene is low,9 and it undergoes endothermic SF.10 For pentacene, with DRC approaching the intermediate region, the process is exothermic, and this large conjugated hydrocarbon is among the most successful SF materials reported so far.11
In brief, the SF chromophores are rare and simultaneously precious treasures that can boost the development of next-generation solar cell technologies. Therefore, we urgently need efficient, computationally inexpensive screening procedures and molecular design strategies for the discovery of new SF chromophores. The bottleneck in the quantum-chemical calculations of SF chromophores is the estimation of the excited states energies.12 The standard time-dependent density functional theory13 (TD-DFT) approach is questionable for molecules with low to intermediate DRC because of spin contamination14,15 and triplet instability16,17 problems. The recently developed spin-flip formulation of the method18,19 and the Tamm–Dankoff approximation20 (TDA) can overcome this problem. However, all TD-DFT approaches are sensitive to the choice of functional, which imposes extensive benchmarking. The alternative approach is to estimate the excitation energies by high-level multiconfigurational methods like CASPT2 and RASPT2.21−23 However, the CASPT2/RASPT2 excited-state calculations are resource-consuming, require human-inspected input (active space selection), and are limited to molecules with up to 16/22 π-electrons. Fortunately, the DRC is also associated with the SF propensity, and it can be easily calculated by using a combination of the Yamaguchi’s spin-projection Hartree–Fock scheme8 (broken symmetry solution) and natural orbitals.24,25 Therefore, an obvious solution of the high-throughput screening problem is to use the DRC as a qualitative criterion for extraction of potential SF chromophores from existing databases.
So far, there have been several studies on the high-throughput screening of large databases for potential SF chromophores. The trend started in 2019, when Perkinson et al. employed a high-throughput procedure to screen 4482 anthracene derivatives and selected 88 (2%) as potential SF candidates.26 In the same year, Padula et al. reported a TD-DFT based screening of the Cambridge Structural Database (CCSD) for potential SF materials and found “few needles in a haystack”, namely, 262 (0.7%) SF candidates out of 40K compounds.27 As a continuation of this work, in 2021 the same group performed another large-scale computational study which strongly supported the positive relationship between SF propensity and multiple DRC in the selected 262 candidates.28 In this work, DRC was calculated with the resource demanding CASSCF method. Meanwhile, the group of Corminboeuf demonstrated a new strategy for design of intermolecular SF materials and screened existing copolymer materials by using high-throughput TDA calculations of donor–acceptor dimers.29,30 Recently, Lopez-Carballeira et al. also applied TD-DFT based screening protocol for finding new SF materials in CCSD databases and reported that only 254 molecules (0.87%) match the feasibility conditions out of which only 24 (0.08%) are of practical concern.31
Nowadays, we live in the data science era and new evidence is collected daily for the effectiveness of machine learning (ML) algorithms in the discovery of new advanced materials.32−39 However, the scientific papers combining ML and SF are intermittent. In 2019, Schröder et al.40 used ML to explore the quantum dynamics in pentacene dimers. Later in 2021, Ma and co-workers41 developed general transferable multilevel attention neural network for prediction of properties like the energy of HOMO and tested it with the 262 SF candidates of Troisi et al.27 The authors reported that the prediction power of their approach is significantly decreased for the SF data set. Again in 2021, Zhu et al. optimized deep neural networks for prediction of excitation energies in SF anthracene-based candidates.42 The following three investigations are from 2022. Weber and co-workers used quantum chemical calculations and ML approaches to explore design rules for singlet fission in 4 million indigoid derivatives.43 Walsh et al. combined an SF data set and ML to calibrate successfully a high-throughput technique (extended tight binding based simplified TDA approximation) against a higher accuracy one like TD-DFT.44 Marom and co-workers employed machine-learning algorithms to generate computationally efficient models that can predict the many-body perturbation theory thermodynamic driving force for SF in a data set of 101 polycyclic aromatic hydrocarbons (PAHs).45
Here, we present an ML-based screening procedure for the recognition of potential SF materials from general-purpose chemical databases like PubChem46 depending on their DRC. We deployed binary classification ML algorithms, namely, a class weighted support vector machine (SVM) and a cost-sensitive decision tree (DT) to build models that can successfully select prominent SF candidates despite the imbalanced nature of the data set. Molecules meeting the following criteria are extracted from the PubChem database:46 to contain between 5 and 28 non-hydrogen atoms (B, C, Si, N, O, S, Se, and F), to possess molecular mass up to 350 Da and low rotable bond count. The feature space covers chemometrics descriptors plus quantum-chemical descriptors for excited states obtained with the recently implemented semiempirical CASSCF method.47 The “observable”, DRC, is obtained with the Yamaguchi’s spin-projection scheme.8 The preselected potential SF candidates with appropriate E(S1) are 17759, and they were subject to cluster analysis and chemical classification for derivation of structure–property relationships.
The overview on the structure and properties of the 469 784 compounds included in the data set can be done by looking at the histograms with respect to key quantum-chemical and chemometrics descriptors (Figure 1). The major part of the entries contains about 10 carbon atoms. The other most abundant elements in the data set are oxygen and nitrogen: generally, up to 3 oxygens and up to 3 nitrogens per compound. About 210 000 of all molecules have no aromatic bonds. These are either compounds with zero DRC, which are a priori not suitable for SF, or nonaromatic, antiaromatic, polyenic, and quinoid structures, which in principle can possess nonzero DRC and serve as SF materials. The rest of the compounds have between 5 and 35 aromatic bonds and are consequently benzene derivatives and polycyclic systems. The latter is consistent with the most abundant ring number 2-4.
Figure 1.
Histograms representing the distribution of DRC and key descriptors in the whole data set of 469 784 compounds (green), as well as in the diradicaloids in DS05 (orange) and in DS13 (blue).
The histogram in Figure 1 shows that the whole data set embraces molecules with DRC between 0.00 and 1.00 but only 318 compounds possess DRC greater than 0.60. To define a binary classification problem, we must divide the whole data set into two classes. Class 0 is reserved for closed-shell molecules without DRC, and most probably is unsuitable for SF. Class 1 includes diradicaloids with nonzero DRC, which are possible SF candidates. To separate the classes, we used two DRC thresholds, namely, ≥0.05 and ≥0.13. These give rise to two differently imbalanced data sets, denoted as DS05 and DS13, respectively, used to train our models. The choice of the thresholds is motivated as follows. In DS05, all molecules with DRC below 0.05 can be regarded as closed-shell systems belonging to Class 0. Like this, DS05 comprises 384 835 (81.02%) compounds with zero DRC (Class 0) and 84 949 (18.08%) molecules with DRC ≥ 0.05 (Class 1). In DS13 the threshold is tighter, and it is set with respect to the anthracene DRC = 0.13, obtained with our computational protocol (SI, Table S3). DS13 contains 450 122 (95.82%) compounds with zero DRC (Class 0) and 19 662 (4.18%) molecules with DRC ≥ 0.13 (Class 1). It is obvious that DS05 and DS13 are imbalanced. As pointed out in the introduction, this imbalanced situation represents an intrinsic particularity of the problem since chromophores with nonzero DRC and SF propensity are known to be scarce. Therefore, both the formulation of the problem and the data set analysis reveal that one should consider the class disparity in the development of successful binary classification models for finding new SF chromophores from general purpose databases.
The S1 and T1 energies of the compounds in the data set span the wide range of 0–12 eV with maxima in the E(S1) and E(T1) distribution around 3 and 2 eV, respectively. Here, it is important to note that the computational approach that we use to extract qualitative trends is expected to overestimate E(S1) and E(T1) (SI, Table S3).48 More accurate values for E(S1) can be obtained when the configurational and active spaces in the INDO/S CASSCF (2,2) calculations are increased and are selected manually. Nevertheless, even at this relatively low theoretical level, we can find compounds that possess E(S1) in the UV–visible range (1.5–3 eV) and are therefore potential SF chromophores for practical photovoltaic application.
Comparison of the ML models as a function of the data set threshold and models’ hyperparameters is represented in Table 1 and Figure 2. The performance of the methods is judged based on the value of PAM (polygon area metric),49 which combines all six metrics suitable for measuring the performance of models trained with imbalanced data sets (SI, Sections 2.4–2.5). In all cases the SVM outperforms the DT and the model performance is almost insensitive to the input quantum chemical descriptors E(T1) or E (S1). The latter is expected since the histograms of these two quantities show qualitatively identical patterns. At first sight, the SVM model with 1:1 class weight (unweighted) for DS05 seems to perform relatively well giving PAM 70.92%, but a careful examination of the metrics reveals that it has lower sensitivity and is therefore unsuitable to find the rare molecules with nonzero DRCs (numerous false negatives). For the more imbalanced DS13 the sensitivity is even lower. The SVM models are improved when the class weights for each data set are optimized by a grid search. Further optimization of the SVM hyperparameters improves the performance (SI, Section 2.4). The best SVM models for DS05 and DS13 have PAM equal to 75.63% and 74.67%, respectively.
Table 1. Performance of the ML Models [%] (SI, Section 2.5) on the Test Set as a Function of the Data Set Threshold, Hyperparametersa and Input Quantum Chemical Descriptors (E(T1) or E(S1)b).
| Model |
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM
(S1) |
SVM
(T1) |
DT
(S1) |
DT
(T1) |
|||||||||
| Data set | DS05 | DS13 | DS05 | DS05 | DS13 | DS13 | DS05 | DS13 | DS05 | DS13 | DS05 | DS13 |
| class weights | 1:1 | 1:1 | 1:1.5 | 1:1.5 | 1:2 | 1:2 | 1:1.5 | 1:2 | ||||
| gamma | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.05 | 0.1 | 0.05 | ||||
| C | 1.0 | 1.0 | 1.0 | 5.0 | 1.0 | 2.0 | 5.0 | 2.0 | ||||
| MD | 14 | 10 | 14 | 10 | ||||||||
| Metric% | ||||||||||||
| PAM | 70.92 | 65.53 | 74.00 | 75.63 | 72.05 | 74.67 | 75.25 | 73.29 | 64.85 | 64.34 | 64.58 | 65.41 |
| CA | 93.70 | 98.23 | 93.68 | 94.14 | 98.30 | 98.36 | 94.09 | 98.28 | 91.83 | 97.91 | 91.82 | 98.01 |
| SE | 77.27 | 68.39 | 83.67 | 84.76 | 78.48 | 82.24 | 84.25 | 80.83 | 73.43 | 69.8 | 72.93 | 70.53 |
| SP | 97.33 | 99.54 | 95.89 | 96.22 | 99.17 | 99.07 | 96.27 | 99.05 | 95.9 | 99.15 | 96.00 | 99.22 |
| AUC | 87.30 | 83.96 | 89.78 | 90.49 | 88.82 | 90.65 | 90.26 | 89.94 | 84.66 | 84.47 | 84.46 | 84.88 |
| JI | 68.95 | 61.89 | 70.57 | 72.38 | 66.00 | 67.85 | 72.07 | 66.4 | 61.95 | 58.44 | 61.75 | 59.9 |
| FM | 70.92 | 65.53 | 74.00 | 83.98 | 72.05 | 80.84 | 83.77 | 79.81 | 76.50 | 73.77 | 76.35 | 74.92 |
Class weights, gamma—RBF-kernel width, C—cost parameter, and MD—max depth.
Metric—polygon area metric (PAM), classification accuracy (CA), sensitivity (SE), specificity (SP), area under curve (AUC), Jaccard Index (J), and F-measure (FM). The underlined values of the hyperparameters are optimized by 4-fold cross validation over the training set (SI, Section 2.4).
Figure 2.
Graphical representation of the performance of the ML models on the test set as a function of the data set threshold (DS13 labels are framed), hyperparameters and input quantum chemical descriptors (either E(T1) or E(S1) as they correlate). The underlined values of the hyperparameters are optimized by 4-fold cross validation over the training set (SI, Section 2.4).
From a molecular design perspective, it is interesting to analyze the structure and properties of the molecules with nonzero DRC and this is done by comparison of the histograms with the descriptors for the whole data set with histograms for the subset of molecules with DRC ≥ 0.05 and DRC ≥ 0.13 (Figure 1). Regarding the number of heavy atoms, we observe a slight shift in the maximum to higher number of heavy atoms for molecules with nonzero DRC. Such behavior is expected since in principle DRC is proportional to the heavy atom content in conjugated systems. The trend is more pronounced when looking into the carbon atom content. For this quantity, we observe a shift of the peak toward higher values when going from the entire data set to DS13. A clear-cut confirmation of the DRC dependence on the conjugation length can be found in the distributions of the fraction of sp2 and sp3 carbons. In the whole data set the sp2 fraction is with maximum at 0 but it peaks beyond 5 for the subsets with nonzero DRC. The situation with the sp3 fraction is reverse.
One of the most important requirements to the SF chromophores is to absorb in the UV–vis region and to possess a relatively low energy E(T1). When comparing the excitation energies histograms (Figure 1), we find a gradual shift of the E(S1) and E(T1) distributions toward lower values with DRC growth. In the whole data set the E(S1) distribution has a maximum at above 4 eV, while in DS05 and DS13 the peak is located below 4 eV. Such a decrease in the excitation energies and their shift toward the UV–vis region agrees with the structure of the compounds with nonzero DRC, which, as discussed above, is characterized with better π-conjugation.
Finally, we performed K-means clustering to gain deeper insight into the structure–property relationships of the most promising SF candidates—those with DRC ≥ 0.13 and E(S1) in the 1.5–3.0 eV region, ideally around 2 eV. Since with the small active and configurational space E(S1) is expected to be overestimated (SI, Table S3), we explore the energy range between 2.0–4.1 eV, where 4.1 eV is the upper limit imposed by E(S1) of anthracene and 2.0 is the value 1.5 eV incremented to account for the error of the computed E(S1) for anthracene. Among all 469 784 compounds, only 17 759 satisfy the imposed criteria, and they represent 90.32% of all molecules with DRC ≥ 0.13.
The 17 759 molecules were subject to K-means clustering with 16 structural chemometrics descriptors, which divides them into two very well distinguishable clusters (Figure 3). The members of Cluster1 are characterized with a relatively high content of aromatic bonds, 6-membered rings, fused rings, and heavy atoms among which dominate the sp2-carbons. The prevailing features in Cluster2 are opposite, and its members have lower mean values for the number of aromatic bonds, 6-membered rings, fused cycles, heavy atoms, and sp2-carbons. The behavior of the other descriptors reveals that Cluster 2 is richer in nitrogen, oxygen, and single bonds. It is worth noting that, although clearly distinguishable with respect to the structural patterns, both clusters share almost identical mean DRC values.
Figure 3.
K-means cluster analysis of 17 759 molecules with DRC ≥ 0.13 and E(S1) in the range 2.0–4.1 eV. Structural descriptors underlying the clustering: 0 corresponds to the mean value of each descriptor for all molecules, while positive and negative values correspond to deviation from the mean value.
Figure 4 summarizes families of potential SF chromophores belonging to Cluster 1 or 2. Careful examination of the members in the clusters (SI) confirms the derived structure-property relationships. In Cluster 1 one can find the classical examples for SF chromophores—anthracene, tetracene, and pentacene—but also other PAHs like benzotetracene, perylene, benzoperylene, etc. Therefore, Cluster 1 contains mainly PAHs and their doped or functionalized derivatives. Following the descriptors pattern and the structural differences between the clusters (Figure 3), it is obvious that most of the diradicaloids in Cluster 1 are stabilized by the presence of Clar’s sextets.50Cluster 2 is composed mainly of smaller in size molecules of quinoid type with 1–3 cycles (rarely 4), high heteroatoms/carbon ratio, and mixed heteroatomic content. Members of Cluster 2 are also polyenes and molecules with nonaromatic and antiaromatic rings.
Figure 4.
Summary of the families of potential SF chromophores belonging to Cluster 1 (left from label 1) or Cluster 2 (right from label 2). The 2D structure images are extracted from PubChem.46
As can be seen, albeit simple, the computational approach confirms and naturally summarizes many of the published molecular design strategies in the SF area.4,27,51−64 The hunted diradicaloids belong to a wide assortment of structural motifs in organic conjugated compounds, and from a DRC perspective the design strategies are broad and worth further investigation. Our approach significantly shortlists the possible SF candidates to 4% but at the same time the good catch of 17 759 compounds shows that the DRC criterion is not as tight as the feasibility conditions. Therefore, for a strict selection of potential photovoltaics candidates, subsequent high-level quantum chemical estimation of the SF thermodynamic requirements is needed. Brief discussion on the limitations and applicability of the model can be found in the SI (Page S21).
In summary, we demonstrate binary classification models to screen general purpose data sets for potential SF candidates based on their DRC. The advantage of our ML approach is that the training data set is simultaneously large enough and structurally diverse and includes input features, quickly obtainable even on a desktop. The ML simulations reveal that well performing models for sieving of SF molecules from general-purpose chemical databases should consider the imbalanced character of the data—a specificity, which originates from the experimentally known fact that these chromophores are relatively rare troves. As a result of the screening procedure, several thousand potential chromophores were preselected based on their DRC and were subject to cluster analysis to explore structure–propertiy relationships. The K-means clustering confirms and logically summarizes many of the published molecular design strategies in the singlet fission area, which demonstrates that data-oriented approaches can be applied successfully in the singlet fission domain. The study confirms the suitability of known SF materials, shortlists new potential compounds, and sifts structures for further computational workflows aiming at multiconfigurational estimation of the feasibility conditions or combinatorial generation of intramolecular SF chromophores. Moreover, since the diradicaloids are attractive in many other application areas of organic molecules,86−90 the developed screening procedure is expected to serve as an inspiration in the design of materials beyond the SF domain. The best optimized ML model is implemented in a user-friendly web application,91 where chemists can check the SF potential of newly designed molecules.
Computational Methods
Step by step data set preparation, computational protocols, ML models training, metric explanation, and k-means analysis details can be found in the SI, Sections 1–5. The molecules were extracted from the PubChem database46 and contain between 5 and 28 heavy atoms (B, C, Si, N, O, S, Se, and F) and have molecular mass up to 350 Da and low rotable bond count. After structural refinement, the Cartesian coordinates of the compounds were obtained from the SMILES codes by using OPENBABEL.65 The structures were subject to geometry optimization and frequency analysis at the PM6 level,66 INDO/S CASSCF (2,2) excited states calculations,47 chemometrics features generation, and DRC computations with spin projected PUHF/6-31G** method.8 The final data set of 469 784 compounds and their descriptors were used to train a class-weighted support-vector machine,67−70 and a cost-sensitive decision tree71−75 models to sort out potential SF chromophores. The scikit-learn76 (version 1.3.0) and LIBSVM77 libraries were used to perform the ML classification. The models are optimized and compared by using a metric suitable to judge the performance of imbalanced classification such as the polygon area metric49 (SI, Section 2.5), which constructs a polygon in a regular hexagon with six widely used metrics. The semiempirical and DRC calculations are done by OPENMOPAC78 and Gaussian 09,79 respectively. The chemometrics descriptors are generated by PaDel,80 the ML metrics are visualized by using MATLAB,81 and the K-means clustering82,83 is performed with STATISTICA.84 All data and ML codes are open-source85 and available on GitHub (SI, Section 6).
Acknowledgments
The authors acknowledge the financial support of the Bulgarian National Science Fund, contract KΠ-06-H39/2/2019; https://ml4sf.chem.uni-sofia.bg/ (accessed on 21/10/2020). Part of the computational facilities used were provided within the Project CoE “National center of mechatronics and clean technologies”, BG05M2OP001-1.001-0008. M.N. was supported by the Swiss National Science Foundation, NCCR Bioinspired Materials.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jpclett.3c02365.
The authors declare no competing financial interest.
Supplementary Material
References
- Shah A.; Torres P.; Tscharner R.; Wyrsch N.; Keppner H. Photovoltaic technology: the case for thin-film solar cells. Science 1999, 285, 692–698. 10.1126/science.285.5428.692. [DOI] [PubMed] [Google Scholar]
- Shockley W.; Queisser H. J. Detailed balance limit of efficiency of p–n junction solar cells. J. Appl. Phys. 1961, 32, 510–519. 10.1063/1.1736034. [DOI] [Google Scholar]
- Smith M. B.; Michl J. Recent advances in singlet fission. Annu. Rev. Phys. Chem. 2013, 64, 361–386. 10.1146/annurev-physchem-040412-110130. [DOI] [PubMed] [Google Scholar]
- Smith M. B.; Michl J. Singlet fission. Chem. Rev. 2010, 110, 6891–6936. 10.1021/cr1002613. [DOI] [PubMed] [Google Scholar]
- Stern H. L.; Cheminal A.; Yost S. R.; Broch K.; Bayliss S. L.; Chen K.; Tabachnyk M.; Thorley K.; Greenham N.; Hodgkiss J. M.; Anthony J.; Head-Gordon M.; Musser A. J.; Rao A.; Friend R. H. Vibronically coherent ultrafast triplet-pair formation and subsequent thermally activated dissociation control efficient endothermic singlet fission. Nat. Chem. 2017, 9, 1205–1212. 10.1038/nchem.2856. [DOI] [PubMed] [Google Scholar]
- Minami T.; Nakano M. Diradical character view of singlet fission. J. Phys. Chem. Lett. 2012, 3, 145–150. 10.1021/jz2015346. [DOI] [PubMed] [Google Scholar]
- Minami T.; Ito S.; Nakano M. Fundamental of diradicalcharacter- based molecular design for singlet fission. J. Phys. Chem. Lett. 2013, 4, 2133–2137. 10.1021/jz400931b. [DOI] [Google Scholar]
- Yamaguchi K. In Self-Consistent field: Theory and applications; Carbo R., Klobukowski M., Eds.; Elsevier: Amsterdam, 1990; p 727. [Google Scholar]
- Kamada K.; Ohta K.; Shimizu A.; Kubo T.; Kishi R.; Takahashi H.; Botek E.; Champagne B.; Nakano M. Singlet diradical character from experiment. J. Phys. Chem. Lett. 2010, 1, 937–940. 10.1021/jz100155s. [DOI] [Google Scholar]
- Singh S.; Jones W. J.; Siebrand W.; Stoicheff B. P.; Schneider W. G. Laser generation of excitons and fluorescence in anthracene crystals. J. Chem. Phys. 1965, 42, 330–342. 10.1063/1.1695695. [DOI] [Google Scholar]
- Nakano M. Open-shell-character-based molecular design principles: Applications to nonlinear optics and singlet fission. Chem. Rec. 2017, 17, 27–62. 10.1002/tcr.201600094. [DOI] [PubMed] [Google Scholar]
- Grotjahn R.; Maier T. M.; Michl J.; Kaupp M. Development of a TDDFT-based protocol with local hybrid functionals for the screening of potential singlet fission chromophores. J. Chem. Theory Comput. 2017, 13, 4984–4996. 10.1021/acs.jctc.7b00699. [DOI] [PubMed] [Google Scholar]
- Runge E.; Gross E. K. U. Density-functional theory for time-dependent systems. Phys. Rev. Lett. 1984, 52, 997–1000. 10.1103/PhysRevLett.52.997. [DOI] [Google Scholar]
- Ipatov A.; Cordova F.; Doriol L. J.; Casida M. E. Excited-state spin-contamination in time-dependent density-functional theory for molecules with open-shell ground states. J. Mol. Struct.: THEOCHEM 2009, 914, 60–73. 10.1016/j.theochem.2009.07.036. [DOI] [Google Scholar]
- Gräfenstein J.; Kraka E.; Filatov M.; Cremer D. Can unrestricted density-functional theory describe open shell singlet biradicals?. Int. J. Mol. Sci. 2002, 3, 360–394. 10.3390/i3040360. [DOI] [Google Scholar]
- Peach M. J. G.; Williamson M. J.; Tozer D. J. Influence of triplet instabilities in TDDFT. J. Chem. Theory Comput. 2011, 7, 3578–3585. 10.1021/ct200651r. [DOI] [PubMed] [Google Scholar]
- Kishi R.; Bonness S.; Yoneda K.; Takahashi H.; Nakano M.; Botek E.; Champagne B.; Kubo T.; Kamada K.; Ohta K.; Tsuneda T. Long-Range Corrected Density Functional Theory Study on Static Second Hyperpolarizabilities of Singlet Diradical Systems. J. Chem. Phys. 2010, 132, 094107. 10.1063/1.3332707. [DOI] [PubMed] [Google Scholar]
- Casanova D.; Krylov A. I. Spin-flip methods in quantum chemistry. Phys. Chem. Chem. Phys. 2020, 22, 4326–4342. 10.1039/C9CP06507E. [DOI] [PubMed] [Google Scholar]
- Shao Y.; Head-Gordon M.; Krylov A. I. The spin-flip approach within time-dependent density functional theory: Theory and applications to diradicals. J. Chem. Phys. 2003, 118, 4807–4818. 10.1063/1.1545679. [DOI] [Google Scholar]
- Hirata S.; Head-Gordon M. Time-dependent density functional theory within the Tamm-Dancoff approximation. Chem. Phys. Lett. 1999, 314, 291–299. 10.1016/S0009-2614(99)01149-5. [DOI] [Google Scholar]
- Malmqvist P. A.; Rendell A.; Roos B. O. The restricted active space self-consistent-field method, implemented with a split graph unitary group approach. J. Phys. Chem. 1990, 94, 5477–5482. 10.1021/j100377a011. [DOI] [Google Scholar]
- Andersson K.; Malmqvist P. A.; Roos B. O.; Sadlej A. J.; Wolinski K. Second-order perturbation theory with a CASSCF reference function. J. Phys. Chem. 1990, 94, 5483–5488. 10.1021/j100377a012. [DOI] [Google Scholar]
- Sauri V.; Serrano-Andrés L.; Shahi A. R. M.; Gagliardi L.; Vancoillie S.; Pierloot K. Multiconfigurational second-order perturbation theory restricted active space (RASPT2) method for electronic excited states: a benchmark study. J. Chem. Theory Comput. 2011, 7, 153–168. 10.1021/ct100478d. [DOI] [PubMed] [Google Scholar]
- Weinhold F.; Landis C. R.. Discovering chemistry with natural bond orbitals; Wiley-VCH: Hoboken, NJ, 2012. [Google Scholar]
- Glendening E. D.; Reed A. E.; Carpenter J. E.; Weinhold F.. NBO, Version 3.1; Gaussian, Inc.: 2003.
- Perkinson C. F.; Tabor D. P.; Einzinger M.; Sheberla D.; Utzat H.; Lin T. A.; Congreve D. N.; Bawendi M. G.; Aspuru-Guzik A.; Baldo M. A. Discovery of blue singlet exciton fission molecules via a high-throughput virtual screening and experimental approach. J. Chem. Phys. 2019, 151, 121102. 10.1063/1.5114789. [DOI] [PubMed] [Google Scholar]
- Padula D.; Omar Ö. H.; Nematiaram T.; Troisi A. Singlet fission molecules among known compounds: finding a few needles in a haystack. Energy Environ. Sci. 2019, 12, 2412–2416. 10.1039/C9EE01508F. [DOI] [Google Scholar]
- Omar Ö. H.; Padula D.; Troisi A. Elucidating the relationship between multiradical character and predicted singlet fission activity. ChemPhotoChem. 2020, 4, 5223–5229. 10.1002/cptc.202000098. [DOI] [Google Scholar]
- Blaskovits J. T.; Fumanal M.; Vela S.; Fabregat R.; Corminboeuf C. Identifying the trade-off between intramolecular singlet fission requirements in donor-acceptor copolymers. Chem. Mater. 2021, 33, 2567–2575. 10.1021/acs.chemmater.1c00057. [DOI] [Google Scholar]
- Blaskovits J. T.; Fumanal M.; Vela S.; Corminboeuf C. Designing singlet fission candidates from donor-acceptor copolymers. Chem. Mater. 2020, 32, 6515–6524. 10.1021/acs.chemmater.0c01784. [DOI] [Google Scholar]
- Lopez-Carballeira D.; Polcar T. A new protocol for the identification of singlet fission sensitizers through computational screening,. J. Comput. Chem. 2021, 42, 2241–2249. 10.1002/jcc.26753. [DOI] [PubMed] [Google Scholar]
- Miyake Y.; Saeki A. Machine learning-assisted development of organic solar cell materials: issues, analyses, and outlooks. J. Phys. Chem. Lett. 2021, 12, 12391–1240. 10.1021/acs.jpclett.1c03526. [DOI] [PubMed] [Google Scholar]
- Srivastava M.; Howard J. M.; Gong T.; Rebello Sousa Dias M.; Leite M. S. Machine learning roadmap for perovskite photovoltaics. J. Phys. Chem. Lett. 2021, 12, 7866–7877. 10.1021/acs.jpclett.1c01961. [DOI] [PubMed] [Google Scholar]
- Laakso J.; Todorović M.; Li J.; Zhang G.-X.; Rinke P. Compositional engineering of perovskites with machine learning. Phys. Rev. Mater. 2022, 6, 113801. 10.1103/PhysRevMaterials.6.113801. [DOI] [Google Scholar]
- Muyassiroh D. A. M.; Permatasari F. A.; Iskandar F. Machine learning-driven advanced development of carbon-based luminescent nanomaterials. J. Mater. Chem. C 2022, 10, 17431–17450. 10.1039/D2TC03789K. [DOI] [Google Scholar]
- Wang Z.; Sun Z.; Yin H.; Liu X.; Wang J.; Zhao H.; Pang C. H.; Wu T.; Li S.; Yin Z.; Yu X.- F. Data-driven materials innovation and applications. Adv. Mater. 2022, 34, 2104113. 10.1002/adma.202104113. [DOI] [PubMed] [Google Scholar]
- Mai J.; Le T. C.; Chen D.; Winkler D. A.; Caruso R. A. Machine learning for electrocatalyst and photocatalyst design and discovery. Chem. Rev. 2022, 122 (16), 13478–13515. 10.1021/acs.chemrev.2c00061. [DOI] [PubMed] [Google Scholar]
- Singh V.; Patra S.; Murugan N. A.; Toncu D.-C.; Tiwari A. Recent trends in computational tools and data-driven modeling for advanced materials. Mater. Adv. 2022, 3, 4069–4087. 10.1039/D2MA00067A. [DOI] [Google Scholar]
- Häse F.; Roch L. M.; Friederich P.; Aspuru-Guzik A. Designing and understanding light-harvesting devices with machine learning. Nat. Commun. 2020, 11, 4587. 10.1038/s41467-020-17995-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schröder F. A. Y. N.; Turban D. H. P.; Musser A. J.; Hine N. D. M.; Chin A. W. Tensor network simulation of multi-environmental open quantum dynamics via machine learning and entanglement renormalisation. Nat. Commun. 2019, 10, 1062. 10.1038/s41467-019-09039-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Z.; Lin L.; Jia Q.; Cheng Z.; Jiang Y.; Guo Y.; Ma J. Transferable multilevel attention neural network for accurate prediction of quantum chemistry properties via multitask learning. J. Chem. Inf. Model. 2021, 61, 1066–1082. 10.1021/acs.jcim.0c01224. [DOI] [PubMed] [Google Scholar]
- Ye S.; Liang J.; Zhu X. Catalyst deep neural networks (Cat-DNNs) in singlet fission property prediction. Phys. Chem. Chem. Phys. 2021, 23, 20835–20840. 10.1039/D1CP03594K. [DOI] [PubMed] [Google Scholar]
- Weber F.; Mori H. Machine-learning assisted design principle search for singlet fission: an example study of cibalackrot. NPJ. Comput. Mater. 2022, 8, 176. 10.1038/s41524-022-00860-1. [DOI] [Google Scholar]
- Verma S.; Rivera M.; Scanlon D. O.; Walsh A. Machine learned calibrations to high-throughput molecular excited state calculations. J. Chem. Phys. 2022, 156, 134116. 10.1063/5.0084535. [DOI] [PubMed] [Google Scholar]
- Liu X.; Wang X.; Gao S.; Chang V.; Tom R.; Yu M.; Ghiringhelli L. M.; Marom N. Finding predictive models for singlet fission by machine learning. NPJ. Comput. Mater. 2022, 8, 70. 10.1038/s41524-022-00758-y. [DOI] [Google Scholar]
- Kim S.; Chen J.; Cheng T.; Gindulyte A.; He J.; He S.; Li Q.; Shoemaker B. A.; Thiessen P. A.; Yu B.; Zaslavsky L.; Zhang J.; Bolton E. E. PubChem 2023 update. Nucleic Acids Res. 2023, 51, D1373–D1380. 10.1093/nar/gkac956. [DOI] [PMC free article] [PubMed] [Google Scholar]; The full record URLs to the 2D structure images of 17 759 compounds extracted from PubChem can be found in the SI.
- Gieseking R. L. M. A new release of MOPAC incorporating the INDO/S semiempirical model with CI excited states. J. Comput. Chem. 2021, 42, 365–378. 10.1002/jcc.26455. [DOI] [PubMed] [Google Scholar]
- Yang Y.; Davidson E. R.; Yang W. Nature of ground and electronic excited states of higher acenes. Proc. Natl. Acad. Sci. U.S.A. 2016, 113, E5098–E5107. 10.1073/pnas.1606021113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aydemir O. A new performance evaluation metric for classifiers: polygon area metric. J. Classif. 2021, 38, 16–26. 10.1007/s00357-020-09362-5. [DOI] [Google Scholar]
- Poater J.; Solà M.; Bickelhaupt F. M. Hydrogen–Hydrogen Bonding in Planar Biphenyl, Predicted by Atoms-In-Molecules Theory, Does Not Exist. Chem. Eur. J. 2006, 12, 2889–2895. 10.1002/chem.200500850. [DOI] [PubMed] [Google Scholar]
- Casanova D. Theoretical modeling of singlet fission. Chem. Rev. 2018, 118, 7164–7207. and the references therein 10.1021/acs.chemrev.7b00601. [DOI] [PubMed] [Google Scholar]
- Ullrich T.; Munz D.; Guldi D. M. Unconventional singlet fission materials. Chem. Soc. Rev. 2021, 50, 3485–3518. and the references therein 10.1039/D0CS01433H. [DOI] [PubMed] [Google Scholar]
- López-Carballeira D.; Casanova D.; Ruipérez F. Theoretical design of conjugated diradicaloids as singlet fission sensitizers: quinones and methylene derivatives. Phys. Chem. Chem. Phys. 2017, 19, 30227–30238. 10.1039/C7CP05120D. [DOI] [PubMed] [Google Scholar]
- Bhattacharyya K.; Pratik S. M.; Datta A. Small organic molecules for efficient singlet fission: Role of silicon substitution. J. Phys. Chem. C 2015, 119 (46), 25696–25702. 10.1021/acs.jpcc.5b06960. [DOI] [Google Scholar]
- Blaskovits J. T.; Fumanal M.; Vela S.; Cho Y.; Corminboeuf C. Heteroatom oxidation controls singlet-triplet energy splitting in singlet fission building blocks. Chem. Commun. 2022, 58, 1338–1341. 10.1039/D1CC06755A. [DOI] [PubMed] [Google Scholar]
- Shen L.; Wang X.; Liu H.; Li X. Tuning the singlet fission relevant energetic levels of quinoidal bithiophene compounds by means of backbone modifications and functional group introduction. Phys. Chem. Chem. Phys. 2018, 20, 5795–5802. 10.1039/C7CP08313K. [DOI] [PubMed] [Google Scholar]
- Singh A.; Humeniuk A.; Röhr M. I. S. Energetics and optimal molecular packing for singlet fission in BN-doped perylenes: electronic adiabatic state basis screening. Phys. Chem. Chem. Phys. 2021, 23, 16525–16536. 10.1039/D1CP01762D. [DOI] [PubMed] [Google Scholar]
- Sui M.-Y.; Xiao S.; Wang F.; Sun G.-Y. A screening of properties and application based on dimerized fused-ring non-fullerene acceptors: Influence of C = C, C-C, spiro-C linkers. J. Mater. Chem. C 2021, 9, 13162–13171. 10.1039/D1TC03300J. [DOI] [Google Scholar]
- Stoycheva J.; Tadjer A.; Garavelli M.; Spassova M.; Nenov A.; Romanova J. Boron-doped polycyclic aromatic hydrocarbons: A molecular set revealing the interplay between topology and singlet fission propensity. J. Phys. Chem. Lett. 2020, 11, 1390–1396. 10.1021/acs.jpclett.9b03406. [DOI] [PubMed] [Google Scholar]
- Pradhan E.; Zeng T. Design of the smallest intramolecular singlet fission chromophore with the fastest singlet fission. J. Phys. Chem. Lett. 2022, 13, 11076–11085. 10.1021/acs.jpclett.2c03131. [DOI] [PubMed] [Google Scholar]
- James D.; Pradhan E.; Zeng T. Design of singlet fission chromophores by the introduction of N-oxyl fragments. J. Chem. Phys. 2022, 156, 034303. 10.1063/5.0077010. [DOI] [PubMed] [Google Scholar]
- Stanger A. Singlet fission and aromaticity,. J. Phys. Chem. A 2022, 126, 8049–8057. 10.1021/acs.jpca.2c04146. [DOI] [PubMed] [Google Scholar]
- López-Carballeira D.; Zubiria M.; Casanova D.; Ruipérez F. Improvement of the electrochemical and singlet fission properties of anthraquinones by modification of the diradical character. Phys. Chem. Chem. Phys. 2019, 21, 7941–7952. 10.1039/C8CP07358A. [DOI] [PubMed] [Google Scholar]
- Ryerson J. L.; Michl J.; Johnson J. C.; Schrauben J. N. Mechanism of singlet fission in thin films of 1,3-diphenylisobenzofuran. J. Am. Chem. Soc. 2014, 136, 7363–7373. 10.1021/ja501337b. [DOI] [PubMed] [Google Scholar]
- O'Boyle N. M; Banck M.; James C. A; Morley C.; Vandermeersch T.; Hutchison G. R; et al. Open Babel: An open chemical toolbox. J. Cheminform 2011, 3, 33–47. 10.1186/1758-2946-3-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stewart J. J. P. Optimization of parameters for semiempirical methods V: Modification of NDDO approximations and application to 70 elements. J. Mol. Modeling 2007, 13, 1173–1213. 10.1007/s00894-007-0233-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cortes C.; Vapnik V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. 10.1007/BF00994018. [DOI] [Google Scholar]
- Rezvani S.; Wang X. A broad review on class imbalance learning techniques. Appl. Soft Comput. 2023, 143, 110415. 10.1016/j.asoc.2023.110415. [DOI] [Google Scholar]
- Ivanciuc O.Applications of support vector machines in chemistry. In Reviews in computational chemistry; Lipkowitz K. B., Cundari T. R., Eds; Wiley: 2007; Vol. 27, pp 291–400. [Google Scholar]
- Akbani R.; Kwek S.; Japkowicz N. In Applying support vector machines to imbalanced datasets. In ECML 2004, Proceedings of the 15th European Conference on Machine Learning, Pisa, Italy, September 20–24, 2004; Boulicaut J. F., Esposito F., Giannotti F., Pedreschi D., Eds.; Springer: 2004. [Google Scholar]
- Breiman L.Classification and regression trees, 1st ed.; Routledge: New York, 1984. [Google Scholar]
- Lomax S.; Vadera S. A survey of cost-sensitive decision tree induction algorithms. ACM Comput. Surv. 2013, 45, 16. 10.1145/2431211.2431215. [DOI] [Google Scholar]
- Datta S.; Dev V. A.; Eden M. R. Developing non-linear rate constant QSPR using decision trees and multi-gene genetic programming. Comput.-Aided Chem. Eng. 2018, 44, 2473–2478. 10.1016/B978-0-444-64241-7.50407-9. [DOI] [Google Scholar]
- Kotsiantis S. B. Decision trees: a recent overview. Artif. Intell. Rev. 2013, 39, 261–283. 10.1007/s10462-011-9272-4. [DOI] [Google Scholar]
- Li F.; Zhang Z.; Zhang X.; Du C.; Xu Y.; Tian Y.-C. Cost-sensitive and hybrid-attribute measure multi-decision tree over imbalanced data sets. Inf. Sci. 2018, 422, 242–256. 10.1016/j.ins.2017.09.013. [DOI] [Google Scholar]
- Pedregosa D.; Varoquaux G.; Gramfort A.; Michel V.; Thirion B.; Grisel O.; Blondel M.; Prettenhofer P.; Weiss R.; Dubourg V.; et al. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Chang C.-C.; Lin C.-J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 1–27. 10.1145/1961189.1961199. [DOI] [Google Scholar]
- Stewart J. J. P.MOPAC; Stewart Computational Chemistry: Colorado Springs, CO, USA, 2016; http://openmopac.net/.
- Frisch M. J.; Trucks G. W.; Schlegel H. B.; Scuseria G. E.; Robb M. A.; Cheeseman J. R.; Scalmani G.; Barone V.; Mennucci B.; Petersson G. A.;. et al. Gaussian 09, Revision E.01; Gaussian, Inc.: Wallingford, CT, 2009. [Google Scholar]
- Yap C. W. PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 2011, 32, 1466–1474. 10.1002/jcc.21707. [DOI] [PubMed] [Google Scholar]
- Statistics and Machine Learning Toolbox Documentation; The MathWorks Inc.: Natick, MA, 2022. [Google Scholar]
- Massart D. L.; Kaufman L.. The Interpretation of analytical chemical data by the use of cluster analysis; John Wiley and Sons: New York, NY, USA, 1983. [Google Scholar]
- Vandeginste B.; Massart D.; de Jong S.; Buydens L.. Handbook of chemometrics and qualimetrics: Part B; Elsevier: Amsterdam, The Netherlands, 1998. [Google Scholar]
- STATISTICA (Data Analysis Software System), Version 10; StatSoft, Inc.: 2011.
- Artrith N.; Butler K. T.; Coudert F.-X.; Han S.; Isayev O.; Jain A.; Walsh A. Best practices in machine learning for chemistry. Nat. Chem. 2021, 13, 505–508. 10.1038/s41557-021-00716-z. [DOI] [PubMed] [Google Scholar]
- Nakano M.; Champagne B. Theoretical design of open-shell singlet molecular systems for nonlinear optics. J. Phys. Chem. Lett. 2015, 6, 3236–3256. 10.1021/acs.jpclett.5b00956. [DOI] [Google Scholar]
- Li H.; Zou X.; Chen H.; Lian W.; Jia H.; Yan X.; Hu X.; Liu X. Diradicaloid strategy for high-efficiency photothermal conversion and high-sensitivity detection of near infrared light. Adv. Opt. Mater. 2023, 11, 2300060. 10.1002/adom.202300060. [DOI] [Google Scholar]
- Ni Y.; Wu J. Diradical approach toward organic near infrared dyes. Tetrahedron Lett. 2016, 57, 5426–5434. 10.1016/j.tetlet.2016.10.100. [DOI] [Google Scholar]
- Mori S.; Moles Quintero S.; Tabaka N.; Kishi R.; González Núñez R.; Harbuzaru A.; Ponce Ortiz R.; Marín-Beloqui J.; Suzuki S.; Kitamura C.; Gómez-García C. J.; Dai Y.; Negri F.; Nakano M.; Kato S.-i.; Casado J.; et al. Medium Diradical Character, Small hole and electron reorganization energies and ambipolar transistors in difluorenoheteroles. Angew. Chem., Int. Ed. Engl. 2022, 61, e202206680. 10.1002/anie.202206680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu X.; Wang W.; Wang D.; Zheng Y. The electronic applications of stable diradicaloids: present and future. J. Mater. Chem. C 2018, 6, 11232–11242. 10.1039/C8TC04484H. [DOI] [Google Scholar]
- Web application for finding potential singlet fission materials based on their diradical character. Machine Learning For Singlet Fission. https://singletfission.chem.uni-sofia.bg/ (accessed 09/2021).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




