Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2022 Sep 8;62(18):4427–4434. doi: 10.1021/acs.jcim.2c00812

QCforever: A Quantum Chemistry Wrapper for Everyone to Use in Black-Box Optimization

Masato Sumita †,‡,*, Kei Terayama †,§, Ryo Tamura †,‡,∥,, Koji Tsuda †,∥,
PMCID: PMC9518232  PMID: 36074116

Abstract

graphic file with name ci2c00812_0010.jpg

To obtain observable physical or molecular properties such as ionization potential and fluorescent wavelength with quantum chemical (QC) computation, multi-step computation manipulated by a human is required. Hence, automating the multi-step computational process and making it a black box that can be handled by anybody are important for effective database construction and fast realistic material design through the framework of black-box optimization where machine learning algorithms are introduced as a predictor. Here, we propose a Python library, QCforever, to automate the computation of some molecular properties and chemical phenomena induced by molecules. This tool just requires a molecule file for providing its observable properties, automating the computation process of molecular properties (for ionization potential, fluorescence, etc.) and output analysis for providing their multi-values for evaluating a molecule. Incorporating the tool in black-box optimization, we can explore molecules that have properties we desired within the limitation of QC computation.

Introduction

In recent years, black-box optimization using machine learning (ML) algorithms as a predictor has achieved significant results in chemistry and materials science.1,2 ML itself is not limited to these disciplines and can be applicable in many disciplines by changing the evaluating system (evaluator).36 Similarly, the evaluator in black-box optimization decides what kind of materials and molecules we desired. If we can install experiments such as synthesizing materials and measuring their chemical or physical values as the evaluator, we can obtain the desired materials. Surely, several examples of black-box optimization with the experiments as the evaluators appear in inorganic materials because synthesizing inorganic materials is more efficient than the simulation depending on the target properties.79 However, organic synthesis is not the case.

Organic synthesis is a time-consuming and formidable task including the characterization of synthesized molecules.10 Hence, several simulation methods are developed as the preliminary methods that are expected to lower the experimental cost to find the expected molecules before the organic synthesis. Quantum chemical (QC)11,12 computation is also one of them. In contrast to the expectation, QC computation has been mainly used as a tool to clarify chemical phenomena13 through QC software packages.1416 Although QC computation is still developing,17 many chemical phenomena for which no experimental information is available have been explained by QC computation.1823 To make black-box optimization efficient by incorporating QC computation instead of chemical experiments, we should develop an automated QC system whose input is a molecule and output is its properties.

Although QC computation is a powerful tool to obtain the electronic structures of molecules or materials, multi-step computation is required to obtain the practically meaningful physical or chemical values because most theories of QC computation are developed based on the orthogonal one-electron states,24 which are not experimentally observable. Hence, to incorporate QC computation in black-box optimization, it is necessary to perform QC computation in a black box by automating the multi-step calculations and the analysis of the obtained results (usually text files). There are several tools for constructing inputs to perform complex computations and parsing output files such as cclib,25 ASE,26 and AutoSovate27 for managing solvent systems, and QChASM28 that target mainly on transition states of catalysis systems. However, these tools are not enough to incorporate QC computation in black-box optimization for observable properties because their target is managing structure, distilling the total energy of the system, and one-electron-state-based values. Furthermore, multi-objective optimization (optimizing multi-molecular properties) is necessary to obtain the practical materials through black-box optimization. Hence, the black box of QC computation should be a system that produces physically meaningful multi properties.

In this paper, we propose a black box of QC computation that is ready to be incorporated in black-box optimization, QCforever whose input is a well-known sdf file and output is a physically meaning multi properties, such as ionization potential, electronic affinity, absorption wavelength, fluorescent wavelength, and so forth (surely, one-electron-orbital-based properties such as the HOMO/LUMO gap are also available) because evaluating materials with multi properties is important for their practical use. In addition, QCforever is useful to exclude arbitrariness due to the different processes in the computation of the physical values with QC computation. Because the orbital and geometry optimization processes largely depend on the initial guess and geometry, there is the arbitrariness (same computations with different initials sometimes converge to different results). Excluding the arbitrariness, QCforever is also useful for building a database with standardized computational processes.

Method

Although there are several theories in QC, we employed density functional theory (DFT)29 implemented in Gaussian1614 because of its ease of use and versatility. Suitable processes for computing molecular properties are important for computational efficiency and reproducing chemical phenomena. Excluding the arbitrariness of the computation process is also important for building a reliable database.

Because Gaussian1614 supports multi-step jobs, we can summarize multi-step jobs to one input file and facilitate the computational process by reading previous electronic structures (orbitals). Figure 1 shows the computational flow to compute the several molecular properties and phenomena at one time. Different structures are saved as the different formatted checkpoint files. Currently, supported input is a common sdf file of one molecule, which is widely used in chemoinformatics, Gaussian chk, and Gaussian fchk files. When an sdf file is used as input, the number of radical electrons and charge are counted by the tool of RDkit.30 For users who need to specify the spin multiplicity and charge of molecules, the instance valuables, self.SpecTotalCharge and self.SpecSpinMulti, are prepared.

Figure 1.

Figure 1

Computational flow of available properties of a molecule in QCforever. An sdf file of one molecule or Gaussian chk/fchk file is accepted as input. Solid arrows indicate reading atomic and electronic structures from the origin of an arrow. The broken arrow indicates that only atomic arrangement is obtained from the state at the origin of the arrow. The base for all computing is the ground state. Same geometries are represented by the same color.

QCforever computes the ground state at the first step. Default values of Gaussian1614 were used as thresholds for the convergence criteria of SCF and geometry optimization. The values related to energies, the difference between ideal and computed values of S2S2), which would be useful to check whether the correct state is computed or not, are also printed. For conformation search, QCforever should rely on the other software.31 It is possible to perform geometry optimization by option. At the present time, the force constant estimation method (Gaussian default) is employed. If geometry optimization is performed, one maximum bond length is printed for checking the geometry. After computation of the ground state, several molecular properties based on the orthogonal orbitals are obtained. The HOMO/LUMO gap, and their relative energies to some references, and atomization energy are of importance to provide speculation to the stability of a molecule in the ground state and its application to several materials. As the reference to compare the HOMO/LUMO level, their relative energies to the SOMO/LUMO energy of an oxygen molecule are computed using the following equations.

graphic file with name ci2c00812_m001.jpg 1
graphic file with name ci2c00812_m002.jpg 2

where, Inline graphic and Inline graphic are the LUMO and SOMO energies of O2, respectively. Et(HOMO) and Et(LUMO) are the HOMO and LUMO energies of the target molecule, respectively. Hence, Ox represents the proximity between HOMO of the target molecule and LUMO of O2, resulting in the oxidation of the target molecule by O2. On the other hand, because Rd represents the energetic proximity between LUMO of the target molecule and SOMO of O2, Rd indicates the possibility of the reduction of target molecules by O2. This value would be also useful to discuss the reaction that is induced by the energetic proximity of SOMO/LUMO of O2 with HOMO/LUMO of a target molecule. QCforever has the data of the SOMO and LUMO energies of O2 that are computed with each combination of basis sets and functionals in advance. Even if we used different computational levels or QC packages (e.g., when we compare orbital levels computed under the periodic boundary condition with those under boundary free), we can discuss the orbital energy levels, obtaining relative energies to the orbitals of O2. This function would be useful to compare the orbital levels to the band level of semiconductors.3234 Similarly, because QCforever has the energy of each atom which is computed with several basis sets and functionals, the atomization energy of the target molecule is computed.

Normal vibration modes of a molecule are computed by the vibrational analysis including intensities of frequency infrared (IR) and Raman spectra. Based on the normal mode, Gaussian calculates several thermochemical properties such as Gibbs free energy, heat capacity, entropy, and so forth. QCforever dilutes these values from the log file. Peak positions in nuclear magnetic resonance (NMR) spectrum to tetramethylsilane (TMS) of the target molecule are also computed using the GIAO method.

QCforever automatically computes the values that are relevant to photochemical properties/phenomena as shown in Figure 2, using the time-dependent density functional theory. Vertical excitation energies to other electronic structures from the ground state, which are observable as ultra-violet visible (UV) absorption measurement, can be computed at single point calculation. By using the time-dependent density functional theory, expected fluorescence is computed by optimizing the geometry in the target excited state as shown in Figure 2.35 The value [the Delta(S-T), energetic delta between singlet and triplet excited states in Figure 2] for estimating the probability of thermally activated delayed fluorescence (TADF)36 is computed through geometry optimization in the triplet state.

Figure 2.

Figure 2

Schematics of potential energy surfaces of the singlet ground state (S0), singlet excited state (S1), and first triplet state (T1) of a molecule. Blue arrows indicate the optimization process starting from the atomic and electronic structures at the origin of the arrows.

Computation for estimating vertical/adiabatic ionization potential (IP) and electronic affinity (EA)37 is also automated in QCforever through the method called ΔSCF. Vertical IP (VIP) and EA (VEA) are the energetic difference between the ground state and the positively/negatively charged state (assuming the ground state is a neutral and singlet state) at the same structure as shown in Figure 3 by using the following equations.

graphic file with name ci2c00812_m005.jpg 3
graphic file with name ci2c00812_m006.jpg 4
graphic file with name ci2c00812_m007.jpg 5
graphic file with name ci2c00812_m008.jpg 6

where Inline graphic is the ground-state energy of the target molecule as shown in Figure 3 (assuming that the ground-state optimization is performed). Inline graphic is the total energy of an electron donated or removed molecule as shown in Figure 3. The values of adiabatic IP (AIP) and adiabatic EA (AEA) are calculated using eqs 4 and 6, where D0minimum is the energy obtained by performing geometry optimization from D0vertical (Figure 3).

Figure 3.

Figure 3

Schematics of potential energy surfaces of a neutral molecule (S0) and its positively/negatively charged one (D0), assuming a neutral molecule is in the singlet state. A blue arrow indicates the optimization process starting from the structure and electronic structure of origin of the arrow.

Currently available values are summarized in Table 1 and the keys of the dictionary are also as the computed values are outputted as the dictionary format of Python.

Table 1. Available Values of QCforever and Keys of the Output Dictionarya.

option names values obtained key
opt geometry optimization in the ground state is performed GS_MaxBoldLength (in Å)
energy ground state energy energy (in Eh) with ΔS2
homolumo HOMO/LUMO gap homolumo (in eV)
stable2o2 stability to O2 stable2o2 (in Eh)
deen atomization energy deen (in Eh)
dipole dipole moment dipole
cden Mulliken charge and spin density cden
symm molecular symmetry symm
nmr NMR chemical shift of each atom to TMS nmr (ppm to TMS)
uv transition energies to excited state uv (in nm with oscillator strength)
    state_index
freq vibrational analysis (298.15 K, 1.0 atm) freq (in cm–1)
    IR_int (IR intensity)
    Raman_int (Raman intensity)
    Ezp (zero point energy)
    Et (thermal energy)
    E_enth (enthalpy)
    E_free (free energy)
    Ei (thermal energy in kcal/mol)
    Cv (heat capacity in mol K)
    Si (entropy in mol K)
vip vertical ionization potential vip (in eV) with ΔS2
vea vertical electronic affinity vea (in eV) with ΔS2
aip adiabatic ionization potential aip (in eV) with ΔS2
    relaxedIP_MaxBondLength (in Å)
aea adiabatic ionization potential aea (in eV) with ΔS2
    relaxedEA_MaxBondLength (in Å)
fluor fluorescent from a specified state MinEtarget (in Eh)
    Min_MaxBondLength (in Å)
    fluor (in nm with oscillator strength)
tadf energetic difference between the singlet and the triplet excited state T_Min (in Eh)
    T_Min_MaxBondLength (in Å)
    T_Phos (in nm with oscillator strength)
    delta(S-T) (in Eh)
a

Job state is saved with “log” key.

Dependencies

QCforever needs external quantum chemical computation package but mainly written in Python 3. Currently, only Gaussian1614 is supported. Although Gaussian16 users may separate the computational scratch folder and data folder, current QCforever requires that data folder is the same as the scratch. Because Gaussian tools, formchk and unfchk, are used for making fchk or chk files, the path to Gaussian should be suitably set before using QCforever. To count the number of radical electrons and the value of total charge from an sdf file, RDKit30 is required. Another required Python library is NumPy. To generate the data for computing atomization energy, chemical shift from TMS, and oxygen orbital level, bash scripts are used.

Example Usage

It is necessary to make an instance because the main of QCforever is written as a class of Python. QCforever needs the kind of functional and basis set, number of cores for Gaussian, options, and input file names at least as the arguments. If one wants to compute molecular properties in solvent, one can specify the kind of solvents listed in Gaussian.14 The memory and computational time can be specified by giving the values as the instance variables. The example of code (main.py) for QCforever is shown in List 1.graphic file with name ci2c00812_0006.jpg

In the example of List 1, QCforever tries to compute the HOMO/LUMO gap, the ground-state energy, dipole moment, atomization energy, the stability to O2 based on the optimized structure of the target molecule in the ground state, and the fluorescence from the third excited state at the B3LYP/STO-3G level.

This code can be executed as the command as shown in List 2.graphic file with name ci2c00812_0007.jpg

The result can be obtained as shown in List 3, which is the dictionary style of Python code with the keys in Table 1. In the “uv” key, four lists are included. The first list indicates the excitation energy to each excited state in nm, the second is the intensity (oscillator strength) to them, the third indicates the length of circular dichroism (CD), and the fourth is the intensity of CD spectrum. Because we use unrestricted DFT calculation, spin-allowed and -forbidden excited states are mixed. Hence, the indices of spin-allowed states are enclosed in the first list of “state_index” key, and those of spin-forbidden states are in the second list. The excitation energies to spin-allowed states are printed in the “uv” key. Similar to the “uv” key, the “fluor” key includes the information of CD emission.graphic file with name ci2c00812_0008.jpg

Applications

Using QCforever combined the black-box optimization algorithms for discovering and designing materials, we have already reported several results. Combining a deep learning-based de novo molecule generator (DNMG)38 with QCforever, we have successfully demonstrated that molecules designed in silico for optical absorption/emission can be realized experimentally.3941 In addition, the DNMG proposed to use a material that had never received attention as an electret material.42 The DNMG becomes a molecular identifier by setting the computed property by QCforever NMR spectrum.43 In addition to the collaboration with DNMG, QCforever is useful for screening database. We have also employed QCforever with boundLess Objective-free eXploration (BLOX) for searching out-of-trend materials from the database.44 Here, we demonstrate database screening as an example of the use of QCforever. Recent development of material informatics increases the importance of experimental4547 and computational databases48,49 of molecules. Although PubChemQC48 provides the observable molecular properties such as absorption wavelength, computational databases basically provide total energies and properties based on one-electron states.5052 There might be important features but not practical properties. QCforever might be useful to translate another database to computational one with practical properties.

From the ZINC database,47 we picked up 100 molecules available from vendors. For these molecules, we have computed the molecular properties at the B3LYP/6-31G* level, using QCforever with the following options listed in Table 1.graphic file with name ci2c00812_0009.jpg

The success ratios for optimization in the ground state (GS), fluorescence (Fluor), TADF, and AIP computations are 91, 97, 69, and 90%, respectively, as tabulated in Table 2. The low ratio for TADF computation might be improved by including the solvent effect. The average computational time per one molecule is about 9 h for 20 cores. This computation is not definitely light. However, we can build the database for several molecular properties based on the electronic structure theory automatically. Because the multi properties can be simultaneously obtained, the correlation heat map among the computed molecular properties as shown in Figure 4 is also easily obtained.

Table 2. Success Ratio (%) for 100 Molecules with QCforever at the B3LYP/6-31G* Level.

GSa Fluorb TADFc AIPd
91 97 69 90
a

Ground-state optimization without any negative vibrational mode.

b

Geometry optimization in the first excited state valuable for evaluating fluorescence emission.

c

Computation for evaluating TADF.

d

Geometry optimization ionized state to obtain adiabatic ionization potential.

Figure 4.

Figure 4

Clustered correlation heat map among molecular properties of the 100 ZINC molecule computed by QCforever. Abs_it/Fluor_it, oscillator strength of absorption/fluorescence to/from the first excited state. Abs_wl/Fluor_wl, absorption/fluorescence wavelength to/from the first excited state. MW, molecular weight. Dipole, absolute value of the dipole moment. Raman/IR, intensity of the lowest vibration modes of Raman/IR spectra. Freq, the lowest vibration mode in wave number. VEA/AEA, vertical/adiabatic electronic affinity. HOMO/LUMO, energetic gap between HOMO and LUMO. Stable2o2, oxidizability by O2. VIP/AIP, vertical/adiabatic ionization potential. Energy, total energy of the ground state. E_free, Gibbs free energy at 297 K. Delta(S-T), the gap between minimums of the first excited state and the first triplet state.

This correlation heat map shows the importance of the static analysis based on the database in spite of data of 100 molecules. The HOMO/LUMO gap shows the negative correlation with the absorption wavelength (Abs_wl), VEA, and AEA strongly. Furthermore, the gap has a positive correlation with Stable2o2 (oxidation by O2), VIP, and AIP. Hence, the HOMO/LUMO gap is a molecular property that dominates not only photochemical properties but also electronic properties. On the other hand, energy and E_free have no difference (this means that the contribution of the free energy is small in the small molecular size) and other properties [Delta(S-T), Fluor_wl, Freq, IR, Abs_it, Fluor_it] are not interrelated with the HOMO/LUMO gap. These results indicate the difficulty to make a prediction model of these properties.

Conclusions

In this paper, we demonstrated a tool automating the process to compute several observable molecular properties through QC computation, QCforever, which is ready to be equipped with black-box optimization. When QC calculations are used to calculate various physical and chemical properties or phenomena, arbitrary values might be obtained even for the same molecule due to the different computation processes. To avoid this, a standard computation process should be provided. Especially, a standardized computation process as is in QCforever would be important for building a database based on QC calculation. As the demonstration of QCforever, we computed 100 molecules picked up from the ZINC database.47 Although the current QCforever could not exclude the several failures including the molecules that have the negative vibrational modes, the computation of 90% of molecules succeeded. In the near future, we will develop QCforever to deal with the negative vibrational mode and several failures such as AiiDA.53 In addition, an ML-based model for predicting the functional parameters of DFT will be presented to avoid the self-interaction error54 that hinders the accuracy of many DFT methods.

Simulation tools are expected to reduce the difficulty to develop new materials. QC computation was also one of them. In practice, however, QC computation is mainly used as a tool giving speculation to chemical phenomena. The history of QC computation proves that it is a powerful tool to get plausible answers to the forward problems where input is molecules. On the other hand, QC computation is also used for finding the expected molecules for chemical synthesis in experimental chemistry laboratories. This process corresponds to an inverse problem55,56 where we should deal with the diversity of the chemical compounds. Surely, the search space is restricted within professional knowledge and favor. Combining QCforever with the black-box optimization algorithm, we can remove this restriction and bias and expand the search space.3944

Data and Software Availability

Our implementation is available on GitHub at https://github.com/molecule-generator-collection/QCforever. The version of RDkit30 we used is 2020.09.1.0. Gaussian1614 (https://gaussian.com) was used for QCs. The list of molecules in the article is shown in the Supporting Information.

Acknowledgments

This research was conducted in “Development of a Next-generation Drug Discovery AI through Industry-academia Collaboration (DAIIA)” supported by Japan Agency for Medical Research and Development (AMED) under grant no. JP22nk0101111. This work was also supported by MEXT as a “Program for Promoting Researches on the Supercomputer Fugaku (Application of Molecular Dynamics Simulation to Precision Medicine Using Big Data Integration System for Drug Discovery)”. This research used the computational resources of the supercomputer center of RAIDEN of AIP (RIKEN).

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.2c00812.

  • SMILES list that supports the findings of this study (PDF)

The authors declare no competing financial interest.

Supplementary Material

ci2c00812_si_001.pdf (280.7KB, pdf)

References

  1. Terayama K.; Sumita M.; Tamura R.; Tsuda K. Black-Box Optimization for Automated Discovery. Acc. Chem. Res. 2021, 54, 1334. 10.1021/acs.accounts.0c00713. [DOI] [PubMed] [Google Scholar]
  2. Pollice R.; dos Passos Gomes G.; Aldeghi M.; Hickman R. J.; Krenn M.; Lavigne C.; Lindner-D’Addario M.; Nigam A.; Ser C. T.; Yao Z.; Aspuru-Guzik A. Data-Driven Strategies for Accelerated Materials Design. Acc. Chem. Res. 2021, 54, 849–860. 10.1021/acs.accounts.0c00785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Snoek J.; Larochelle H.; Adams R. P.. Practical Bayesian Optimization of Machine Learning Algorithms. Advances in Neural Information Processing Systems, 2012.
  4. Silver D.; Huang A.; Maddison C. J.; Guez A.; Sifre L.; van den Driessche G. V. D.; Schrittwieser J.; Antonoglou I.; Panneershelvam V.; Lanctot M.; Dieleman S.; Grewe D.; Nham J.; Kalchbrenner N.; Sutskever I.; Lillicrap T.; Leach M.; Kavukcuoglu K.; Graepel T.; Hassabis D. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 2016, 529, 484–489. 10.1038/nature16961. [DOI] [PubMed] [Google Scholar]
  5. Wigley P. B.; Everitt P. J.; van den Hengel A. V. D.; Bastian J. W.; Sooriyabandara M. A.; McDonald G. D.; Hardman K. S.; Quinlivan C. D.; Manju P.; Kuhn C. C.; Petersen I. R.; Luiten A. N.; Hope J. J.; Robins N. P.; Hush M. R. Fast Machine-learning Online Optimization of Ultra-cold-atom Experiments. Sci. Rep. 2016, 6, 25890. 10.1038/srep25890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Degrave J.; Felici F.; Buchli J.; Neunert M.; Tracey B.; Carpanese F.; Ewalds T.; Hafner R.; Abdolmaleki A.; de las Casas D.; Donner C.; Fritz L.; Galperti C.; Huber A.; Keeling J.; Tsimpoukelli M.; Kay J.; Merle A.; Moret J. M.; Noury S.; Pesamosca F.; Pfau D.; Sauter O.; Sommariva C.; Coda S.; Duval B.; Fasoli A.; Kohli P.; Kavukcuoglu K.; Hassabis D.; Riedmiller M. Magnetic Control of Tokamak Plasmas through Deep Reinforcement Learning. Nature 2022, 602, 414–419. 10.1038/s41586-021-04301-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Ren F.; Ward L.; Williams T.; Laws K. J.; Wolverton C.; Hattrick-Simpers J.; Mehta A. Accelerated Discovery of Metallic Glasses through Iteration of Machine Learning and High-throughput Experiments. Sci. Adv. 2018, 4, eaaq1566 10.1126/sciadv.aaq1566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Homma K.; Liu Y.; Sumita M.; Tamura R.; Fushimi N.; Iwata J.; Tsuda K.; Kaneta C. Optimization of a Heterogeneous Ternary Li3 PO4–Li3BO3–Li2SO4 Mixture for Li-Ion Conductivity by Machine Learning. J. Phys. Chem. C 2020, 124, 12865–12870. 10.1021/acs.jpcc.9b11654. [DOI] [Google Scholar]
  9. Sumita M.; Tamura R.; Homma K.; Kaneta K.; Tsuda K. Li-Ion Conductive Li3PO4-Li3BO3-Li2SO4 Mixture :Prevision through Density Functional Molecular Dynamics and Machine Learning. Bull. Chem. Soc. Jpn. 2019, 92, 1100–1106. 10.1246/bcsj.20190041. [DOI] [Google Scholar]
  10. Nicolaou K. C. Organic Synthesis: the Art and Science of Replicating the Molecules of Living Nature and Creating Others like Them in the Laboratory. Proc. Math. Phys. Eng. Sci. 2014, 470, 20130690. 10.1098/rspa.2013.0690. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Szabo A.; Ostlund N. S.. Modern Quantum Chemistry; Dover Publications, Inc. Mineola: New York, 1989. [Google Scholar]
  12. Lowe J. P.Quantum Chemistry; Academic press: London, 1993. [Google Scholar]
  13. Friesner R. A. Ab Initio Quantum Chemistry: Methodology and Applications. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 6648–6653. 10.1073/pnas.0408036102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Frisch M. J.; Trucks G. W.; Schlegel H. B.; Scuseria G. E.; Robb M. A.; Cheeseman J. R.; Scalmani G.; Barone V.; Petersson G. A.; Nakatsuji H.; Li X.; Caricato M.; Marenich A. V.; Bloino J.; Janesko B. G.; Gomperts R.; Mennucci B.; Hratchian H. P.; Ortiz J. V.; Izmaylov A. F.; Sonnenberg J. L.; Williams-Young D.; Ding F.; Lipparini F.; Egidi F.; Goings J.; Peng B.; Petrone A.; Henderson T.; Ranasinghe D.; Zakrzewski V. G.; Gao J.; Rega N.; Zheng G.; Liang W.; Hada M.; Ehara M.; Toyota K.; Fukuda R.; Hasegawa J.; Ishida M.; Nakajima T.; Honda Y.; Kitao O.; Nakai H.; Vreven T.; Throssell K.; Montgomery J. A. Jr.; Peralta J. E.; Ogliaro F.; Bearpark M. J.; Heyd J. J.; Brothers E. N.; Kudin K. N.; Staroverov V. N.; Keith T. A.; Kobayashi R.; Normand J.; Raghavachari K.; Rendell A. P.; Burant J. C.; Iyengar S. S.; Tomasi J.; Cossi M.; Millam J. M.; Klene M.; Adamo C.; Cammi R.; Ochterski J. W.; Martin R. L.; Morokuma K.; Farkas O.; Foresman J. B.; Fox D. J.. Gaussian 16, Revision C.01; Gaussian Inc.: Wallingford CT, 2016.
  15. Barca G. M. J.; Bertoni C.; Carrington L.; Datta D.; De Silva N.; Deustua J. E.; Fedorov D. G.; Gour J. R.; Gunina A. O.; Guidez E.; Harville T.; Irle S.; Ivanic J.; Kowalski K.; Leang S. S.; Li H.; Li W.; Lutz J. J.; Magoulas I.; Mato J.; Mironov V.; Nakata H.; Pham B. Q.; Piecuch P.; Poole D.; Pruitt S. R.; Rendell A. P.; Roskop L. B.; Ruedenberg K.; Sattasathuchana T.; Schmidt M. W.; Shen J.; Slipchenko L.; Sosonkina M.; Sundriyal V.; Tiwari A.; Galvez Vallejo J. L.; Westheimer B.; Włoch M.; Xu P.; Zahariev F.; Gordon M. S. Recent Developments in the General Atomic and Molecular Electronic Structure System. J. Chem. Phys. 2020, 152, 154102. 10.1063/5.0005188. [DOI] [PubMed] [Google Scholar]
  16. Aprà E.; Bylaska E. J.; de Jong W. A.; Govind N.; Kowalski K.; Straatsma T. P.; Valiev M.; van Dam H. J. J.; Alexeev Y.; Anchell J.; Anisimov V.; Aquino F. W.; Atta-Fynn R.; Autschbach J.; Bauman N. P.; Becca J. C.; Bernholdt D. E.; Bhaskaran-Nair K.; Bogatko S.; Borowski P.; Boschen J.; Brabec J.; Bruner A.; Cauët E.; Chen Y.; Chuev G. N.; Cramer C. J.; Daily J.; Deegan M. J. O.; Dunning T. H.; Dupuis M.; Dyall K. G.; Fann G. I.; Fischer S. A.; Fonari A.; Früchtl H.; Gagliardi L.; Garza J.; Gawande N.; Ghosh S.; Glaesemann K.; Götz A. W.; Hammond J.; Helms V.; Hermes E. D.; Hirao K.; Hirata S.; Jacquelin M.; Jensen L.; Johnson B. G.; Jónsson H.; Kendall R. A.; Klemm M.; Kobayashi R.; Konkov V.; Krishnamoorthy S.; Krishnan M.; Lin Z.; Lins R. D.; Littlefield R. J.; Logsdail A. J.; Lopata K.; Ma W.; Marenich A. V.; Martin del Campo J.; Mejia-Rodriguez D.; Moore J. E.; Mullin J. M.; Nakajima T.; Nascimento D. R.; Nichols J. A.; Nichols P. J.; Nieplocha J.; Otero-de-la-Roza A.; Palmer B.; Panyala A.; Pirojsirikul T.; Peng B.; Peverati R.; Pittner J.; Pollack L.; Richard R. M.; Sadayappan P.; Schatz G. C.; Shelton W. A.; Silverstein D. W.; Smith D. M. A.; Soares T. A.; Song D.; Swart M.; Taylor H. L.; Thomas G. S.; Tipparaju V.; Truhlar D. G.; Tsemekhman K.; Van Voorhis T.; Vázquez-Mayagoitia A.; Verma P.; Villa O.; Vishnu A.; Vogiatzis K. D.; Wang D.; Weare J. H.; Williamson M. J.; Windus T. L.; Woliński K.; Wong A. T.; Wu Q.; Yang C.; Yu Q.; Zacharias M.; Zhang Z.; Zhao Y.; Harrison R. NWChem: Past, Present, and Future. J. Chem. Phys. 2020, 152, 184102. 10.1063/5.0004997. [DOI] [PubMed] [Google Scholar]
  17. Grimme S.; Schreiner P. R. Computational Chemistry: The Fate of Current Methods and Future Challenges. Angew. Chem., Int. Ed. 2018, 57, 4170–4176. 10.1002/anie.201709943. [DOI] [PubMed] [Google Scholar]
  18. Sumiya M.; Sumita Y.; Tsuda M.; Sakamoto Y.; Sang T.; Harada L.; Yoshigoe A. Y. High Reactivity of H2O Vapor on GaN Surfaces. Sci. Technol. Adv. Mater. 2022, 23, 189–198. 10.1080/14686996.2022.2052180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Sumita M.; Tanaka Y.; Ohno T. Possible Polymerization of PS4 at a Li3PS4/FePO4 Interface with Reduction of the FePO4 Phase. J. Phys. Chem. C 2017, 121, 9698–9704. 10.1021/acs.jpcc.7b01009. [DOI] [Google Scholar]
  20. Sumita M.; Tanaka Y.; Ikeda M.; Ohno T. Charged and Discharged States of Cathode/Sulfide-Electrolyte Interfaces in All-Solid-State Lithium-Ion Batteries. J. Phys. Chem. C 2016, 120, 13332–13339. 10.1021/acs.jpcc.6b01207. [DOI] [Google Scholar]
  21. Sumita M.; Morihashi K. Theoretical Study of Singlet Oxygen Molecule Generation via an Exciplex with Valence-Excited Thiophene. J. Phys. Chem. A 2015, 119, 876–883. 10.1021/jp5123129. [DOI] [PubMed] [Google Scholar]
  22. Sumita M.; Ryazantsev N.; Saito K. Acceleration of the Z to E Photoisomerization of Penta-2, 4-dieniminium by Hydrogen Out-of-plane Motion : Theoretical Study on a Model System of Retinal Protonated Schiff Base. Phys. Chem. Chem. Phys. 2009, 11, 6406–6414. 10.1039/b900882a. [DOI] [PubMed] [Google Scholar]
  23. Sumita M.; Saito K. Ab initio Study on One-way Photoisomerization of the Maleic Acid and Fumaric Acid Anion Radical System as a Model System of Their Esters. J. Phys. Chem. A 2006, 110, 12276–12281. 10.1021/jp064377o. [DOI] [PubMed] [Google Scholar]
  24. Sumita M.; Yoshikawa N. Augmented Lagrangian Method for Spin-coupled Wave Function. Int. J. Quantum Chem. 2021, 121, e26746 10.1002/qua.26746. [DOI] [Google Scholar]
  25. O’Boyle N. M.; Tenderholt A. L.; Langner K. M. cclib: A Library for Package-Independent Computational Chemistry Algorithms. J. Comput. Chem. 2008, 29, 839–845. 10.1002/jcc.20823. [DOI] [PubMed] [Google Scholar]
  26. Larsen A. H.; Mortensen J. J.; Blomqvist J.; Castelli I. E.; Christensen R.; Dułak M.; Friis J.; Groves M. N.; Hammer B.; Hargus C.; Hermes E. D.; Jennings P. C.; Jensen P. B.; Kermode J.; Kitchin J. R.; Kolsbjerg E. L.; Kubal J.; Kaasbjerg K.; Lysgaard S.; Maronsson J. B.; Maxson T.; Olsen T.; Pastewka L.; Peterson A.; Rostgaard C.; Schiøtz J.; Schütt O.; Strange M.; Thygesen K. S.; Vegge T.; Vilhelmsen L.; Walter M.; Zeng Z.; Jacobsen K. W. The Atomic Simulation Environment—a Python Library for Working with Atoms. J. Phys. Condens. Matter 2017, 29, 273002. 10.1088/1361-648x/aa680e. [DOI] [PubMed] [Google Scholar]
  27. Hruska E.; Gale A.; Huang X.; Liu F. AutoSolvate A toolkit for Automating Quantum Chemistry Design and Discovery of Solvated Molecules. J. Chem. Phys. 2022, 156, 124801. 10.1063/5.0084833. [DOI] [PubMed] [Google Scholar]
  28. Ingman V. M.; Shaefer A. J.; Andreola L. R. QChASM: Quantum Chemistry Automation and Structure Manipulation. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2021, 11, e1510 10.1002/wcms.1510. [DOI] [Google Scholar]
  29. Cohen A. J.; Mori-Sánchez P.; Yang W. Challenges for Density Functional Theory. Chem. Rev. 2012, 112, 289–320. 10.1021/cr200107z. [DOI] [PubMed] [Google Scholar]
  30. Landrum G.RDKit: Open-Source Cheminformatics Software, 2016. https://github.com/rdkit/rdkit/releases/tag/Release_2016_09_4.
  31. Terayama K.; Sumita M.; Katouda M.; Tsuda K.; Okuno Y. Efficient Search for Energetically Favorable Molecular Conformations against Metastable States via Gray-Box Optimization. J. Chem. Theory Comput. 2021, 17, 5419–5427. 10.1021/acs.jctc.1c00301. [DOI] [PubMed] [Google Scholar]
  32. Hagfeldt A.; Boschloo G.; Sun L.; Kloo L.; Pettersson H. Dye-sensitized Solar Cells. Chem. Rev. 2010, 110, 6595–6663. 10.1021/cr900356p. [DOI] [PubMed] [Google Scholar]
  33. Lu M.; Liang M.; Han H.-Y.; Sun Z.; Xue S. Organic Dyes Incorporating Bis-hexapropyltruxeneamin Moiety for Efficient Dye-Sensitized Solar Cells. J. Phys. Chem. C 2011, 115, 274–281. 10.1021/jp107439d. [DOI] [Google Scholar]
  34. Kranthiraja K.; Saeki A. Experiment-oriented Machine Learning of Polymer:Non-Fullerene Organic Solar Cells. Adv. Funct. Mater. 2021, 31, 2011168. 10.1002/adfm.202170168. [DOI] [Google Scholar]
  35. Atkins P.Atkins’ Physical Chemistry; Oxford University Press, 2017. [Google Scholar]
  36. Uoyama H.; Goushi K.; Shizu K.; Nomura H.; Adachi C. Highly Efficient Organic Light-emitting Diodes from Delayed Fluorescence. Nature 2012, 492, 234–238. 10.1038/nature11687. [DOI] [PubMed] [Google Scholar]
  37. Boldyrev A. I.; Simons J.; Zakrzewski V. G.; von Niessen W. Vertical and Adiabatic Ionization Energies and Electron Affinities of New Silicon-carbon (SinC) and Silicon-oxygen (SinO) (n = 1-3) Molecules. J. Phys. Chem. 1994, 98, 1427. 10.1021/j100056a010. [DOI] [Google Scholar]
  38. Yang X.; Zhang J.; Yoshizoe K.; Terayama K.; Tsuda K. ChemTS: an Efficient Python Library for De Novo Molecular Generation. Sci. Technol. Adv. Mater. 2017, 18, 972–976. 10.1080/14686996.2017.1401424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Sumita M.; Yang X.; Ishihara S.; Tamura R.; Tsuda K. Hunting for Organic Molecules with Artificial Intelligence: Molecules Optimized for Desired Excitation Energies. ACS Cent. Sci. 2018, 4, 1126. 10.1021/acscentsci.8b00213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Fujita T.; Terayama K.; Sumita M.; Tamura R.; Nakamura Y.; Naito M.; Tsuda K. Understanding the Evolution of a De Novo Molecule Generator via Characteristic Functional Group Monitoring. Sci. Technol. Adv. Mater. 2022, 23, 352–360. 10.1080/14686996.2022.2075240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Sumita M.; Terayama K.; Suzuki N.; Ishihara S.; Tamura M. K.; Chahal D. T.; Payne K.; Yoshizoe K. De Novo Creation of a Naked Eye–detectable Fluorescent Molecule Based on Quantum Chemical Computation and Machine Learning. Sci. Adv. 2022, 8, eabj3906 10.1126/sciadv.abj3906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Zhang Y.; Zhang J.; Suzuki K.; Sumita M.; Terayama K.; Li J.; Mao Z.; Tsuda K.; Suzuki Y. Discovery of Polymer Electret Material via De Novo Molecule Generation and Functional Group Enrichment Analysis. Appl. Phys. Lett. 2021, 118, 223904. 10.1063/5.0051902. [DOI] [Google Scholar]
  43. Zhang J.; Terayama K.; Sumita M.; Yoshizoe K.; Ito K.; Kikuchi J. NMR-TS: de novo molecule identification from NMR spectra. Sci. Technol. Adv. Mater. 2020, 21, 552–561. 10.1080/14686996.2020.1793382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Terayama K.; Sumita M.; Tamura R.; Payne D. T.; Chahal M. K.; Ishihara S.; Tsuda K. Pushing Property Limits in Materials Discovery via Boundless Objective-free Exploration. Chem. Sci. 2020, 11, 5959–5968. 10.1039/d0sc00982b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Kim S.; Chen J.; Cheng T.; Gindulyte A.; He J.; He S.; Li Q.; Shoemaker B. A.; Thiessen P. A.; Yu B.; Zaslavsky L.; Zhang J.; Bolton E. E. PubChem in 2021: New Data Content and Improved Web Interfaces. Nucleic Acids Res. 2021, 49, D1388–D1395. 10.1093/nar/gkaa971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Joung J. F.; Han M.; Jeong M.; Park S. Experimental Database of Optical Properties of Organic Compounds. Sci. Data 2020, 7, 295. 10.1038/s41597-020-00634-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Irwin J. J.; Sterling T.; Mysinger M. M.; Bolstad E. S.; Coleman R. G. ZINC: A Free Tool to Discover Chemistry for Biology. J. Chem. Inf. Model. 2012, 52, 1757. 10.1021/ci3001277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Nakata M.; Shimazaki T. PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven Chemistry. J. Chem. Inf. Model. 2017, 57, 1300. 10.1021/acs.jcim.7b00083. [DOI] [PubMed] [Google Scholar]
  49. Ramakrishnan R.; Dral P. O.; Rupp M.; von Lilienfeld O. A. Quantum Chemistry Structures and Properties of 134 Kilo Molecules. Sci. Data 2014, 1, 140022. 10.1038/sdata.2014.22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Ruddigkeit L.; van Deursen R.; Blum L. C.; Reymond J. L. Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17. J. Chem. Inf. Model. 2012, 52, 2864–2875. 10.1021/ci300415d. [DOI] [PubMed] [Google Scholar]
  51. von Lilienfeld O. A.; Müller K. R.; Tkatchenko A. Exploring Chemical Compound Space with Quantum-Based Machine learning. Nat. Rev. Chem. 2020, 4, 347–358. 10.1038/s41570-020-0189-9. [DOI] [PubMed] [Google Scholar]
  52. Cai J.; Chu X.; Xu K.; Li H.; Wei J. Machine Learning-driven New Material Discovery. Nanoscale Adv. 2020, 2, 3115–3130. 10.1039/d0na00388c. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Huber S. P. Automated Reproducible Workflows and Data Provenance with AiiDA. Nat. Rev. Phys. 2022, 4, 431. 10.1038/s42254-022-00463-1. [DOI] [Google Scholar]
  54. Lundberg M.; Siegbahn P. E. M. Quantifying the Effects of the Self-interaction Error in DFT: When Do the Delocalized States Appear?. J. Chem. Phys. 2005, 122, 224103. 10.1063/1.1926277. [DOI] [PubMed] [Google Scholar]
  55. Sanchez-Lengeling B.; Aspuru-Guzik A. Inverse Molecular Design Using Machine Learning: Generative Models for Matter Engineering. Science 2018, 361, 360–365. 10.1126/science.aat2663. [DOI] [PubMed] [Google Scholar]
  56. Kim K.; Kang S.; Yoo J.; Kwon Y.; Nam Y.; Lee D.; Kim I.; Choi Y.-s.; Jung Y.; Kim S.; Son W.-j.; Son J.; Lee H. S.; Kim S.; Shin J.; Hwang S. Deep-learning-based Inverse Design Model for Intelligent Discovery of Organic Molecules. npj Comput. Mater. 2018, 4, 67. 10.1038/s41524-018-0128-1. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ci2c00812_si_001.pdf (280.7KB, pdf)

Data Availability Statement

Our implementation is available on GitHub at https://github.com/molecule-generator-collection/QCforever. The version of RDkit30 we used is 2020.09.1.0. Gaussian1614 (https://gaussian.com) was used for QCs. The list of molecules in the article is shown in the Supporting Information.


Articles from Journal of Chemical Information and Modeling are provided here courtesy of American Chemical Society

RESOURCES