Skip to main content
ACS Omega logoLink to ACS Omega
. 2021 Jan 14;6(3):2001–2024. doi: 10.1021/acsomega.0c04981

Forecasting System of Computational Time of DFT/TDDFT Calculations under the Multiverse Ansatz via Machine Learning and Cheminformatics

Shuo Ma †,, Yingjin Ma †,§,*, Baohua Zhang †,§, Yingqi Tian †,, Zhong Jin †,§,*
PMCID: PMC7841786  PMID: 33521440

Abstract

graphic file with name ao0c04981_0026.jpg

With the view of achieving a better performance in task assignment and load-balancing, a top-level designed forecasting system for predicting computational times of density-functional theory (DFT)/time-dependent DFT (TDDFT) calculations is presented. The computational time is assumed as the intrinsic property for the molecule. Based on this assumption, the forecasting system is established using the “reinforced concrete”, which combines the cheminformatics, several machine-learning (ML) models, and the framework of many-world interpretation (MWI) in multiverse ansatz. Herein, the cheminformatics is used to recognize the topological structure of molecules, the ML models are used to build the relationships between topology and computational cost, and the MWI framework is used to hold various combinations of DFT functionals and basis sets in DFT/TDDFT calculations. Calculated results of molecules from the DrugBank dataset show that (1) it can give quantitative predictions of computational costs, typical mean relative errors can be less than 0.2 for DFT/TDDFT calculations with derivations of ±25% using the exactly pretrained ML models and (2) it can also be employed to various combinations of DFT functional and basis set cases without exactly pretrained ML models, while only slightly enlarge predicting errors.

1. Introduction

Ab initio electronic structure methods are becoming more and more popular in the chemistry community, as it has been reported that the ab initio methods illustrate the chemical mechanism in its original view, that is, at the electron level.16 Normally, one needs to consider the computational costs of ab initio (i.e., first principles) methods when determining whether they are appropriate for the problem at hand or not. As shown in Figure 1, when compared to much less accurate approaches, such as molecular mechanics, ab initio methods often take larger amounts of computer time, memory, and disk space because of their scales. For example, the Hartree–Fock self-consistent field (HFSCF) and density functional theory (DFT) already show scales in the range from Nx to Nx with N being the system magnitude parameters, not specifically the number of basis functions.7,8 The quantitative solutions (i.e., electronic correlation approaches) like Møller–Plesset perturbation theory up to second order (MP2),9 coupled cluster approaches with single and double excitations (CCSD),10,11 and iterative perturbed treatment (CCSD(T)) increase the scales by two or more orders, respectively. In general, the dynamic correlations can be well accounted using the above-mentioned DFT, MP2, and CCSD/CCSD(T) approaches. However, the static correlations, that is, quasi-degeneracy, are normally accounted by the multiconfigurational (MC) and multireference (MR) approaches.12 The scales of MC/MR approaches can be in a large range because of the so-called active space, in which the superposition of all possible configurations can be took into account for describing the quasi-degeneracy.8,12

Figure 1.

Figure 1

System scales under different simulating approaches.

Normally, one needs to choose the simulating approaches carefully by considering the studied systems as well as the desired accuracy. Apparently, the choice of simulating approaches, or even the choice of parameters in a given approach is also dominated by the computational resources.8 However, the predictable pace of Moore’s law13 cannot easily compensate for the difficulties caused by the conflict between computational resources and computational scales. Thus, there are actual demands for predicting the possible computational costs (e.g., time, memory, or disk space) before performing the simulations. It can benefit both researchers and computer centers in terms of computational resource utilization and computational resource scheduling, respectively.14,15 Additionally, a potential and specific usage can be expected in the high-throughput calculations using fragmentation approaches,1620 in which the task assignment and load-balancing are highly demanded. Because the different fragments correspond to different computational costs that cannot be well estimated before the practical calculations.

Several related works have been carried out in the area of computational cost modeling and job scheduling in the past few decades. Most of the related works chose to make use of the historical data and adopt methods of statistical analysis and machine learning (ML), with which the relationship between job features and the computational cost can be established.2132 One of the research prospects was characterized by laying emphasis on the meta information of jobs, including user types, number of processors applied for, execution time limits, and so on. These works took the assumption that different jobs with similar meta information come with similar costs, which was validated by Downey21 and Smith’s22 works. Time series approaches have been frequently used in this kind of works. For instance, Gaussier et al.24 designed a feature set including varieties of job history data as the input of a linear regression model. Moreover, many works focused on the architectures of the targeted systems and use ML to make predictions. The training data usually come from hardware performance counters so that the states of computing components during executions can be considered. Li et al.26 used a support vector machine (SVM) model to predict the instruction number in each cycle and obtain the optimal thread mapping scheme. Helmy et al.28 employed SVM, artificial neural network, and decision tree models to predict CPU burst time, and it can be further extended to a heterogeneous computing system as shown by Shulga and co-workers.29 Apart from the above solutions targeting common programs, there are also some predictive methods specified for a certain type of program.30,31 It is worthy of note that Matsunaga and Fortes32 applied the PQR2 model to predict the execution time, memory, and disk occupations of two bioinformatics applications (BLAST33 and RAxML34). It has been reported that taking features that were highly correlated with application types, such as protein sequence length, yields much better results.

Regarding the field of computational chemistry, Papay et al. developed a least square fitting method for graph-based component-wise runtime estimates in parallel SCF atomic computations in 1996.35 Antony et al. used a linear model to simulate the runtime of SCF algorithms in Gaussian applications and to estimate the impacts of architectures in terms of the count of retired instructions and cache misses.36 Additionally, it is noteworthy that Mniszewski et al. designed a class of tools for predicting the runtime of a molecular dynamics code,37 allowing users to find the optimal combination of algorithmic methods and hardware options. However, as far as we know, there is nearly no related work concerning to the prediction of computation cost in the field of quantum chemistry, except in the area of quantum machine learning (QML) models that very recently introduced by Heinen and co-workers.14 They demonstrated that QML-based wall time predictions significantly improve job scheduling efficiency by reducing CPU time overhead ranging from 10 to 90%. Until now, there has been no universal solution for predicting the computational cost. The current QML solution is restricted to a specified computational approach and specified parameters, and training of a corresponding ML model is essential before practical predictions.14 However, it may not be convenient for training the specific model each time before the practical calculations. Thus, generalization ability should be one of the essential elements for an universal solution when predicting the computational cost. Additionally, the traditional ML models including the ensemble learning,38 recurrent,39 or graph-based neural network40 may also be the good alternatives to the reported QML solutions.14

Herein, a top-level design-based forecasting framework is developed, which aims to yield reliable prediction of computational cost (mainly the computational time) with a high degree of generalization ability. At this stage, we focus on its design and its confirmatory usage via the prediction of computational times of DFT/time-dependent DFT (TDDFT) single-point calculation. In our design, cheminformatics is used to recognize the topological structure of molecules, and the ML models are used to establish the relation between topology and computational time. Additionally, the idea from many-world interpretation (MWI) in multiverse ansatz is used to gain generalization ability when treating with various combinations of DFT functional and basis sets, which are critical for practical calculations.4143

For the DFT calculations, the computational scaling ranges from Nx to Nx, as shown in Figure 1 and explained in detail in appendix. It stems from the evaluation of four-center two-electron repulsion integrals, that is

1. 1

where μ, ν, λ, and σ denote indices of atomic orbitals. This scaling is the upper boundary for the HF or DFT calculations.44,45 However, many two-electron integrals are of negligible magnitude for large molecules, and some rigorous upper boundary conditions can be applied to the integrals. For instance, the Schwarz inequality46

1. 2

can reduce the mathematical upper bounds of all two-electron integrals to be computed in an N2 log N process by safely ignoring the predetermined negligible integrals. Additionally, larger molecular systems have a higher fraction of atomic orbitals sufficiently distant from each other to be considered noninteracting, thereby yielding negligible two-electron integrals and further declining the exponent for HF or DFT calculations.44

It was demonstrated by Strout and Scuseria that the number of basis functions n can connect with the integrals via the scaling exponent α for the same molecular series,44 for example

1. 3

where n denotes the number of basis functions and I denotes the number of integrals. Because the computational cost of integrals is the upper boundary for the HF or DFT calculations, such that a t = an2 + bn + c type formula (exponent 2 is roughly approximated from α) can be expected as the working equation for rough prediction of the time. However, the simple polynomial equation or the exponential expression (eq 3) is only suitable for the molecules in the same series and better without significant scale difference. When molecules have different spatial structures, the predicted results are normally too poor to be referred when using this type of regression equation.44 Additionally, it is not convenient for this regression analysis to consider multiple factors (e.g., basis number together with electrons, bond type, etc.), which should be considered when better predictions are needed. Thus, the advantages of the proposed top-level design-based forecasting framework can be demonstrated in practical predictions.

This article is organized as follows: in Section 2, we present the design of the proposed framework, implementation details, and the workflow of the forecasting system; computational details are presented in Section 3, while benchmark examples are presented in Section 4; and finally, we draw the conclusions of the work in Section 5.

2. Implementations

2.1. Chemical Spaces Containing the Computational Times

Chemical space is a concept in cheminformatics referring to the property space, which is spanned by all possible molecules adhering to a given set of construction principles and boundary conditions.4751 As shown in Figure 2, the chemical space considered in this work is spanned by possible molecules and their computational times, which are treated as the intrinsic properties of molecules. Different computational schemes (e.g., hardware, software, approaches, etc.) lead to different computational times for a molecule. Thus, there are various chemical spaces for a given set of molecular suits. Even when a specific approach is employed, there are still various chemical spaces that are generated by the parameters. For instance, different choices of DFT functionals, basis sets, quantum chemical packages, and so forth can generate different chemical spaces, even if the same DFT approach is employed.

Figure 2.

Figure 2

Illustration of chemical spaces that contain molecular suits and computational times. The chemical spaces can be split by different computational schemes (e.g., hardware, software, and approaches, etc.), and can be further split by approach’s parameters. The figure was captured using the CSVR tool developed by Probst and Reymond.52

2.2. Cheminformatics and the Employed ML Models

As mentioned previously that the proposed DFT chemical spaces can be further split by different computational parameters, hence, ML techniques can be employed in each split chemical space, to train the models that can be used to perform predictions. In each DFT chemical space, we employed several ML models, from simple to complex, to benchmark their capacities and correspondingly to pick up the reliable framework for predicting the computational times. There are four carefully selected ML models within this framework, they are as follows:

  • (1)

    Random forest (RF),53,54 based on the structural similarity. The composition proportion of a molecule with respect to the feature structures (e.g., linear, dendritic, ring, etc.,) can be evaluated by the decision trees constructed by the designed feature structures via the simplified molecular input line entry specification (SMILES) codes.5557 Then, the predicted computational time can be estimated by a linear combination of computational times of feature structures. The idea behind this process is quite similar to the linear combination of atomic orbital approximation in quantum chemistry.

  • (2)

    Long short-term memory (LSTM),58,59 based on the recognition of the chemical formula. The connection between molecular structures and computational times is recognized by training the similar molecular suits. The molecular structures are identified via the SMILES codes using the natural language processing (NLP),60 and the bidirectional variation of LSTM model6163 is practically implemented in this work.

  • (3)

    Message passing neural network (MPNN),64 based on the graph-based learning of spatial structures. The connection between molecular structures and computational times is recognized by training the similar molecular suits using their spatial information. The number of basis functions, number of electrons, bond types, molecular charges, and so forth can be considered as a whole in this model.

  • (4)

    Multilevel graph convolutional neural network (MGCN),65 similar to MPNN, has advantages in terms of generalizability and transferability.

A brief procedure is also shown in Figure 3. There are more details of these four ML models in the appendix section, which can be referred by the interested readers.

Figure 3.

Figure 3

Illustration of DFT chemical space and the chemical spaces split by the computational parameters (e.g., DFT functionals and basis sets). ML techniques are employed for the purpose of predicting computational times.

2.3. Chemical MWI Ansatz and the Generalization

The chemical MWI ansatz used in this article is inspired by Hugh Everett III’s MWI multiverse ansatz,6670 in which there is an interpretation of quantum mechanics that asserts that the universal wave function is objectively real, and that there is no wave function collapse. It implies that all possible outcomes of quantum measurements are physically realized in some “world” or “universe”, and they all share a unique start point. As shown in Figure 3, in chemical MWI ansatz, the unique start point can be the chemical space, in which the same molecular suits and Kohn–Sham working equation are deployed, and every result’s world is the split chemical space caused by the uneven DFT parameters, which in turn result in different wave functions and different computational times as the intrinsic properties for the same molecular suits of the start point. As illustrated in Figure 4.

Figure 4.

Figure 4

Comparative illustration of original MWI using “Schrödinger’s cat” paradox71,72 and chemical MWI ansatz used in predicting the computational times in this work. In the MWI ansatz, every quantum event is a branch point; the cat is both alive and dead, even before the box is opened, but the “alive” and “dead” cats are in different branches of the universe, both of which are equally real and depended on “Radiation? → Poison Gas → Cat status” process; there is entanglement or link between the states of cats in different spaces, that is, if an alive cat in one space, then always a dead cat in the other space, but do not interact with each other. In the chemical MWI ansatz, different branches are split by the computational parameters that affect the actual implementation/solution of Hamiltonian operators () following the “Parameters → KS equation & molecule → Computational times” process. Homologous, the entanglement or link among chemical spaces can be also be deduced, that is, if the computational time (tref) is known in one space, then ttar for other space can be estimated based on tref; but still no interactions between tref and ttar because of different spaces.

It is worthy of note that the molecular suits are included in the start point and shared by all split chemical spaces, thus, the relationships between the computational times among various split chemical spaces can be connected by the molecular suits. In the original MWI, the status of cat (i.e., observed object) in one space can always be determined/deduced by the status of cat in the other space via the so-called quantum entanglement.73,74 In the chemical MWI of this work, we assume the computational times (i.e., observed objects) in one space can also be deduced by the computational times in other spaces via fitting relationships. The fitting relationships can be deduced by the molecular suits or just some molecules in the suits. As shown in Figure 5, these molecules are defined as the joint molecules because they represent the joints for various split chemical spaces, and can connect these split chemical spaces through the molecular link and the link surface (surface implies that it contains the computational properties like computational times).

Figure 5.

Figure 5

Illustration of split chemical spaces, molecular link via joint molecules, and link surface used in the chemical MWI.

Herein, it is worth mentioning that the fitting relationships obtained via one or a few molecular links may be generalized to the split chemical spaces if the solving manner for the KS equation is fixed. As such, the computational times within split chemical spaces, which are caused by various combinations of basis set and DFT functional, can be well considered within this ansatz. For DFT functionals, the correction coefficients of computational times for split chemical space can be approximated by the molecular link, and the Jacob’s ladder42,75 may be used as the entry point for further classifying the functionals that did not occur in the link surface. In this case, the correction coefficient for the target DFT functional under the specific basis can be obtained from the expression

2.3. 4

in which fdft(tar) and fdft(ref) are the target and reference timing data, respectively, for the molecular link. If fdft(tar) is not available, the DFT functional within the same region of Jacob’s ladder can be used as the substitution. For basis sets, the correction coefficients can be approximated by the molecular link via a polynomial curve-fitting technique. In this case, the correction coefficient for the target basis set under the specific DFT functional can be obtained through the expression

2.3. 5

in which f(x) denotes the fitted polynomial equation using various basis sets under the specific DFT functional, and x denotes the number of basis functions. If the second-order polynomial is used, then

2.3. 6

can be used for calculating the timing data for the target and reference basis sets, respectively. Nevertheless, it is worthy of note that the reliability of the correction coefficient is affected by the deviation of target basis set and reference basis set. Here, a similarity coefficient is introduced as a measurement of the deviation between the target and reference basis sets. The formula of similarity coefficient (s) can be expressed as

2.3. 7

where Ntar/ref represents the number of atomic basis functions, ρ measures the contractions of basis sets

2.3. 8

where Npri represents the number of primitive basis functions, Ncon represents the total number of contracted basis functions, and J denotes the Jaccard index7678 reflecting the similarity of orbital composition of the two basis sets. Furthermore

2.3. 9

where r represents the number of identical atomic basis functions between the target and reference basis sets, p represents the number of atomic basis functions in either of them. The closer that s is to 1, the closer the two basis sets are.

It needs to be emphasized that the transfer learning,79 that is the reuse of pretrained ML model on new functional/basis set combinations with scaling factors can be recognized as the rationale behind the chemical MWI ansatz. To be specific, the domain adaptation,80 which is a subcategory of transfer learning and has the ability to apply an algorithm trained in one or more “source domains” to a different but related “target domain”, can be used to understand eqs 46 when assuming only the scaling factor is different between the source domain and the target domain. Beyond the domain adaptation, the model-based transfer learning,81 which is normally used for the transfer of neural networks on condition that the parameters can be shared between source and target domains, should also be feasible as the so-called “fine-tuning” operations.82 Thus, it can make the demand for data turn lower.

2.4. Forecasting System

Figure 6 shows the workflow of our proposed forecasting system, which is already uploaded to Github.83 For any input molecule, the forecasting system can have the capacity to give a predicted DFT computational time for specific hardware, software with any combination of DFT functional, and basis set. The simplest one is the trained case (denoted as CASE-0), in which both the target DFT functional and the target basis set can match those of the trained models. In this case, the computational time t can be predicted using the matched models without any correction. For the other cases (i.e., either-trained case and neither-trained case), the whole process can be a bit more complex, and is listed in the following:

Figure 6.

Figure 6

Workflow of the proposed forecasting system.

CASE-1: Either-trained (only basis set can match).

  • (1)

    Identify the type (LDA, GGA, etc.,) of DFT functional based on the Jacob’s ladder.

  • (2)

    Decide the specific reference DFT functional based on the functional type or other preference.

  • (3)

    Assign the values of fdft(tar) and fdft(ref) based on the type of DFT functional, for example, LDA, GGA, hybrid and range-separated, meta hybrid and range-separated Hybrid with values of 0.95, 1.0, 1.1, and 1.2 (see Figure 12 for how these values were obtained).

  • (4)

    If the timing database of DFTtar/basis and DFTref/basis calculations for link molecules was available, then the values of fdft(tar) and fdft(ref) can also obtained from the database.

  • (5)

    Obtain the correction coefficient cdft based on eq 4.

  • (6)

    Predict the computational time (t0) of the target molecule using the models of reference functional and reference the basis set.

  • (7)

    Correct the computational time t via t = t0 × cdft

Figure 12.

Figure 12

Top: Illustrations of the ratios (colored bar) between CPU times of various DFT functionals and those of the PBE functionals for listed basis sets as shown in the legend box. Bottom: The ratios can be reordered and can be roughly partitioned for different types of DFT functionals referred to as the Jacob ladder).

CASE-2: Either-trained (only DFT functional can match).

  • (1)

    Calculate the similarity coefficients between target the basis set and reference basis sets.

  • (2)

    Chose the specific reference basis set with the largest similarity coefficient.

  • (3)

    Obtain the total number of AO basis (x) for the target molecule using the reference basis set and target basis set, respectively.

  • (4)

    Get the two fbas(x) values using the x equal to xtar and xref using the fitted polynomial equation under the reference DFT functional.

  • (5)

    Obtain the correction coefficient cbas based on eq 7.

  • (6)

    Predict the computational time (t0) of target molecule using the reference basis set.

  • (7)

    Correct the computational time t via t = t0 = cbas

CASE-3: Neither-trained (neither DFT functional and basis set can match).

  • (1)

    Same as that in CASE-2.

  • (2)

    Same as that in CASE-2.

  • (3)

    Same as that in CASE-2.

  • (4)

    Get the two fbas(x) values using the x equal to xtar and xref using the fitted polynomial equation under the specific/preferred DFT functional.

  • (5)

    Same as that in CASE-2.

  • (6)

    Same as (1) to (5) steps in CASE-1, then obtain the cdft.

  • (7)

    Predict the computational time (t0) of the target molecule using the reference basis set.

  • (8)

    Correct the computational time t via t = t0 = cbas × cdft.

3. Computational Details

A self-written code, which has already been uploaded to Github,83 was used to do the training and testing calculations. The Basis Set Exchange,84 a community database for quantum chemistry electronic structure calculations, was used to obtain the information on basis sets and electrons. The LIBXC,85 a community database for DFT functionals, was used to obtain the information of functionals. The STK86 package together with the RDKIT87 package were used for generating the molecular suites. These two packages were also used for extracting and labeling properties of the molecular suites. The dataloader in Tencent Alchemy Tools88 was modified to porting the possible information when training or testing models. All the calculations were implemented by the Gaussian89 (version 09.D01), NWChem90 (version 7.0.0), GAMESS91 (version 2018.R3), or OpenMolcas92 (version 8.4) packages. For all the Gaussian calculations, the Sugon W760-G20 server was used with two-way Intel Xeon E5-2680V3 processors (24 cores in total) and 128GB memory. For the other calculations, the Sugon CB60-G16 server was used with two-way Intel Xeon E5-2680V2 processors (20 cores in total) and 64GB memory. The self-written scripts using PYTHON with NUMPY and PYTORCH93 were used for automatic execution of the calculations, assembling the data, and analyzing the results.

The molecular suit of the RF with feed-forward neural network (FNN) model was artificially designed, containing 108 molecules with typical structures. When training the models, four typical molecular suits (i.e., single/double-bond linear, branch, and ring) were used as the training suits for RF models. The molecular suits of Bi-LSTM, MPNN, and MGCN models were sampled from the DrugBank dataset.9496 The criteria applied to select the molecules for the training and test sets from the DrugBank dataset were as follows:

  • a.

    Concentrate the DrugBank suits: divide all molecules of DrugBank into groups based on the rows of periodic table of elements, manually select the desired groups (e.g., groups that contain the first two row elements) to form the molecular suits.

  • b.

    Generate the incremental molecular subgroups: further divide the selected molecular suits into incremental subgroups based on the number of atoms without the H atom.

  • c.

    Get the training and testing reservoirs: randomly select the training molecules and testing molecules in each subgroups with a fixed ratio (e.g., 4:1) to form the training reservoir and testing reservoir.

  • d.

    Obtain the training and testing suits: choose the appropriate number of molecules from the training/testing reservoir to form the training/testing suits that in practical usages.

When training the models, the width of the hidden layers was set to five meaning that every hidden layer has five neurons. The mean absolute error (MAE) loss

3. 10

was used as the target function, and the mean relative error (MRE)

3. 11

was used for evaluating the performance of the model. For an input sample i, (i) denotes the model output and y(i) denotes the real value of the prediction target. Adam optimizer97 was used in minimizing the loss.

4. Results and Discussion

4.1. Predicting with Pretrained Models

At the beginning, the ML models in the foresting system were evaluated, to assess their capacity of distinguishing of molecules. Several molecules, which have nearly the same total number of basis functions but with different geometrical configurations, were used for illustrations. As shown in Figure 7, there are 25 sample molecules that are grouped into five rows: A-row and B-row are the molecules from the RF training suits, and C-, D-, and E-row are the molecules randomly selected from the DrugBank database. The molecules in the same column own almost the same total number of basis functions but different geometrical configurations when using the 6-31G basis sets98 with the M06-2x DFT functional.99

Figure 7.

Figure 7

Top left: Illustration of sampled geometrical configurations of testing molecules. A-row and B-row are the molecules from the RF training suits, while C-, D-, and E-rows are the molecules randomly selected from the DrugBank database. The molecules in the same column possess almost the same total number of basis sets but different geometrical configurations when using the 6-31G basis sets. Top right: The distributions of the total number of basis functions and the CPU times for all the molecule suits, as well as the positions of the sampled molecules. Bottom: The relative errors of the predicted CPU time for all the sampled molecules.

Figure 7 presents the predicted total CPU time, CPU time averaged to each SCF iteration, and their relative errors. It can be seen that the predicted total CPU times using LSTM models show the best accuracy, and the calculated MRE of testing 25 molecules is about 0.13 (Table 1). The results of MPNN and MGCN models also show relatively good accuracy with MRE values of 0.17 and 0.21, respectively. The results of RF show the poorest results with a MRE value of 0.37. It can also be observed in Figure 7 that a similar tendency can be observed for the averaged CPU times. To evaluate the capacity of the proposed method in identifying molecules with different sizes, the MRE values of total CPU times for column-grouped molecules with different sizes are also listed in Table 1. A decreasing accuracy was observed in the order of LSTM → MPNN → MGCN → RF. Except the RF model, both LSTM and MPNN/MGCN models can give the reliable predictions for each column’s molecules. It implied that these three models had the capacities to consider the changes in computational times caused by the structural differences.

Table 1. MRE Values of Total Computational Times for Testing Molecular Suits and Grouped Molecules (Shown in) with Different Numbers of Basis Functions.

model testing suit (25 samples) Col-1 Col-2 Col-3 Col-4 Col-5
RF 0.37 0.89 0.29 0.06 0.23 0.36
LSTM 0.13 0.12 0.10 0.11 0.15 0.15
MPNN 0.17 0.17 0.06 0.16 0.16 0.27
MGCN 0.21 0.25 0.16 0.17 0.25 0.24

After evaluating the classification capacities of the models in the proposed foresting system, we further checked how the magnitudes of the training suits affect the accuracy. Because the training suits in the RF models were fixed and artificially designed, the other three models (i.e., LSTM, MPNN, and MGCN) were checked in this part by increasing or decreasing the molecules in the training suits. Herein, the starting point of the training suits only includes four typical molecular suits that were used in the RF models, then the molecules from the DrugBank database were gradually added. The testing suit contained 116 molecules extracted from the DrugBank database. The results are shown in Figure 8. It can be seen that the MRE values of testing suits can already attain a value lower than 0.2 for all these three ML models with less than 1000 molecules in the training suits. This magnitude of training suits was far less than that the one obtained in image recognition, because the SMILES or one-hot representations of chemical elements can already recognize the constituent of molecules. Thus, only the geometric constructions need to be learned when training the models.

Figure 8.

Figure 8

Illustrations of the MRE of testing molecules for total/average CPU times along with the total number of molecules in the training suits.

Assuming that one has the typical aiming molecules, for example, PE molecules or drug molecules, then there may be ways for optimizing the training of the ML models using a fewer number of training suits, or to achieve higher accuracies. Here, we propose “space-specific” (S.-S.) and “space-averaged” (S.-A.) ways for training the models. In the former case, only the molecules of the same type as the aiming molecules were selected to the training suits. However, in the latter case, several types of molecules were used to ensure better generalization ability. For demonstration purposes, molecules of the PE suits (as part of molecules shown in row-A of Figure 7), branch suits (as part of molecules shown in row-B of Figure 7), and the DrugBank suits (as part of molecules shown in C, D, and E rows of Figure 7) were used as the training and testing suits. The results for the LSTM, MPNN, and MGCN models are shown in Table 2. It can be seen that the MREs of LSTM models for both S.-S. and S.-A. cases were quite close to each other, the derivations were normally less than 0.04 for both total and averaged CPU times. For the graph-based MPNN and MGCN models, the MREs of S.-S. case were in general larger than those of the S.-A. case in the small case, for example, total CPU results of MPNN’s branch (MRE was 0.63), DrugBank/M100 (M100 denotes 100 training molecules, with MRE was 0.42), and MGCN’s DrugBank/M100 (MRE was 0.31). However, the MREs can be reduced by simply adding the number of training samples, for example, in the total CPU results for the MPNN/MGCN’s DrugBank/M300-700 cases, the MREs were reduced to around 0.15. This was line with the tendency shown in Figure 8 that a certain amount of samples was needed for graph-based models. Overall, there was no obvious difference between S.-S. and S.-A. strategies when enough training samples were used (e.g., M > 300), and a decreasing accuracy was observed in the order of LSTM → MGCN → MPNN → RF.

Table 2. MREs of Predicted Total CPU Times and Predicted Averaged CPU Times for Each Iteration (in Bracket) of the ML Models Using Space-Specific and Space-Averaged Training Approaches.

        drug
model training manner PE branch (M = 100) (M = 300) (M = 500) (M = 700)
LSTM S.-S. 0.13 (0.05) 0.11 (0.09) 0.12 (0.13) 0.09 (0.12) 0.10 (0.12) 0.10 (0.13)
  S.-A. 0.15 (0.07) 0.06 (0.05) 0.13 (0.13) 0.11 (0.11) 0.14 (0.19) 0.11 (0.10)
MPNN S.-S. 0.17 (0.11) 0.63 (0.12) 0.42 (0.19) 0.16 (0.15) 0.15 (0.15) 0.13 (0.14)
  S.-A. 0.21 (0.27) 0.20 (0.12) 0.21 (0.19) 0.16 (0.13) 0.16 (0.17) 0.20 (0.13)
MGCN S.-S. 0.14 (0.06) 0.11 (0.09) 0.31 (0.12) 0.20 (0.10) 0.12 (0.10) 0.15 (0.11)
  S.-A. 0.18 (0.09) 0.14 (0.16) 0.16 (0.10) 0.15 (0.11) 0.12 (0.15) 0.14 (0.15)

To check the predictions for various DFT/TDDFT calculations, different combinations of popular DFT functionals (“PBE”,100 “BLYP”,101,102 “bhandhlyp”,103 “B3LYP”,104 “LC-BLYP”,105 “CAM-B3LYP”,106 “M06”,99 “M062x”,99 and “ωB97XD”107) and Pople’s basis sets (“6-31G”, “6-31G*”, and “6-31+G*”)98 were used for this purpose. The results for the total/average CPU times of ground state DFT calculations and total CPU times of singlet excited state TDDFT calculations, respectively, are illustrated in Figure 9. The S.-S training approach was used in these calculations with 1000 training molecules and the selected 116 testing molecules. Overall, it can be concluded that MGCN ∼ LSTM > MPNN > RF in terms of overall performance. To be specific, in the case of predicting total CPU times of ground state DFT calculations (Figure 9), all the four models showed good predicting capacities when using 6-31G basis sets with MREs around 0.10, while the MREs increased when polarization and diffused functions were added. For instance, the average MREs of RF model even turned to about 0.51 for the 6-31+G* case, and 0.30 for the MPNN case. Meanwhile, the MGCN and LSTM can still guarantee reliable predictions with MREs around 0.17. The results of MGCN showed better stability than those of the LSTM, it may have benefited from the graph-based learning approach of MGCN, in which the nature of its convergence manner (e.g., total iterations) can also be implicitly integrated in learning of the total CPU times. A similar tendency (MGCN ∼ LSTM > MPNN > RF) was observed for the prediction of total CPU times of singlet excited state TDDFT calculations with lightly augmented MRE values for all these models. Apart from the MREs for the predictions, the scattered error distributions are also illustrated in Figure 10 for the M06-2x functional with these three basis sets. It can be seen that the MGCN and LSTM show good stability when most scatters were located in the ±25% regions, regardless of the predictions of ground or excited states.

Figure 9.

Figure 9

MREs of the predicted total times of DFT calculations (cyan bars), predicted average times for each DFT iterations (green bars), and predicted total times of TDDFT calculations (violet bars) for the ML models.

Figure 10.

Figure 10

MREs of the predicted total times of DFT calculations (cyan bars), predicted average times for each DFT iterations (green bars), and the predicted total times of TDDFT calculations (violet bars) for the ML models.

Before the end of this subsection, the NWChem, GAMESS, and MOLCAS packages were interfaced with the forecasting system using the same four models, to check the predicting capacity for the general quantum chemical package. The calculated MRE results are shown in Figure 11. It can be noticed that all the benchmarked packages basically resemble each other when using the same ML models. The LSTM and MGCN models still showed the best precision for predictions, and thus can be deployed in the forecasting system as the working models. Nevertheless, we should mention that the DFT calculation time is assumed as the intrinsic property for the molecule, thus the single strong learners like MGCN/MPNN performs better than RF for this specific feature. The RF model is more suitable to tackle the noisy data, where the unknown or unobtainable features play roles. The LSTM shows good predictive ability, because the recognized ordered words already connects the structure via the SMILES code thus it can be affiliated as single strong learner similar to MGCN/MPNN.

Figure 11.

Figure 11

Illustrations of the MREs of the predicted total times of DFT calculations using different QC packages.

4.2. Predicting without Pretrained Models

All the aforementioned ML models were trained in the given chemical space, now we continue to show how the forecasting system give the predictions without using pretrained models (i.e., either-trained or neither-trained models) of the given chemical space using the chemical WMI ansatz. As we mentioned in Section 2.3, a plethora of different combinations of DFT functional/basis set could be the main obstacle for general forecasting. Thus, it is necessary to benchmark how the DFT functionals and basis sets play their roles in the practical calculations, separately. Herein, the relation between CPU times and parameters of DFT functionals or basis sets are shown in Figures 12 and 13, respectively, via the sampled link molecule (i.e., simple CH3CONHOH as shown in Figure 5).

Figure 13.

Figure 13

Top: Illustrations of relationships of computational CPU times and different basis sets for listed DFT functionals and basis sets. Bottom: Illustrated poly-fitted curves between computational CPU times and number of basis sets for every listed DFT functionals.

Figure 12 shows the ratios between CPU times of various functionals and those of the PBE functional were evaluated for the DFT functionals. It can be seen that the variations of DFT functionals did not change the CPU times a lot, for example, the average magnifications of CPU time were in the range of 0.90–1.2. Furthermore, the increase of various DFT functionals can be roughly assigned to four regions referred to as the Jacob ladder. For instance, the LDA, GGA, hybrid and range-separated, and meta hybrid and range-separated hybrid regions with magnifications of about 0.95, 1.0, 1.1, and 1.2, respectively.

For the basis sets, it can be observed in Figure 13 (top) that the computational CPU times (colored bars) matched well with the dimension of basis sets (solid line). Thus, in accordance with the change of basis sets, general polynomial curve-fitting techniques (Figure 13, bottom) can be employed to roughly consider the change of computational times.

For the case of either-trained predictions with the trained (“trained” means the parameter was involved in pretrained models) basis sets and untrained (“untrained” implies that the parameter was not involved in pretrained models) DFT functionals, the trained models can be used to give the predictions with magnifications based on the types of DFT functionals. For instance, there were nine functionals together with three basis sets in Figure 9. Suppose only the models of DFT functionals of PBE and LC-BLYP were trained, then all the timing predictions for the remaining DFT functional/basis set combinations can be deduced using these existing models as the references. The obtained results are shown in Figure 14. It can be found that most derivations of predicted times were only slightly larger than the original ones except in the BLYP case, in which the convergence behavior was quite different from other models. The MREs of the predicted times with REF:PBE (PBE functional as the reference) and with REF:LC-BLYP (LC-BLYP functional as the reference) can still be less than 0.2 for LSTM and MPNN/MGCN models. The scatter diagrams of relative errors are also illustrated in Figure 15 for the M06-2x functional case as an instance. It can be seen that the error distributions of REF:PBE and REF:LC-BLYP cases were quite similar to the original ones for all the three basis sets. The MGCN showed the best accuracy among these three ML models with relative errors mostly distributed within +25 to −25% regions for the original case, and there were only 5 to 20% displacements for the REF:PBE and REF:LC-BLYP cases.

Figure 14.

Figure 14

Illustrations of the MRE of predicted total CPU times with trained models (original) and either-trained models (REF:PBE and REF:LC-BLYP), respectively.

Figure 15.

Figure 15

Illustrations of the error scatters of predicted total CPU times using LSTM, MPNN, and MGCN models for the M06-2x cases in Figure 14. Both the stereoscopic (top) the radial (bottom) distributions are shown.

For the case of either-trained predictions with the trained DFT functional and untrained basis set, the pretrained models can be used to produce predictions with magnifications based on the fitted curves from various basis sets. This procedure was similar to that of the untrained DFT functional case, while it was a bit complex as the number of basis functions has a tight relationship with the actual computational times. To obtain the magnification between reference basis sets (REF-basis) and target basis sets (TAR-basis), a three-step procedure was used: (1) choose the fitted curve of eq 6 under the given DFT functional; (2) obtain the fbas(xREF) and fbas(xTAR) based on the fitted curve of eq 6; and (3) calculate the magnification coefficient cbas based on eq 5. Once cbas is obtained, the corrected timing predictions can be evaluated based on the predictions of reference basis sets. Herein, the predicted total computational CPU times for several DFT functionals together with basis sets of 6-31G, 6-31G*, 6-31+G*, 6-31++G**, SV,108 SVP,108 cc-pVDZ,109 and cc-pVTZ109 obtained using reference basis sets of 6-31G, 6-31G*, 6-31+G* are shown in Figure 16. It can be seen that there were a lot of volatility of MREs of the original predictions that were based on the three reference basis sets (shown in the center of Figure 16). The volatility in different reference models was caused by the enlarging magnification errors when the target point and the reference point in fitted curve were too far from each other. These magnification errors can be largely reduced by introducing the similarity coefficients as we explained in the previous section. The similarity-corrected results are illustrated in the outside region of Figure 16 while the error scatters are shown in Figure 17 for the sampled M06-2x cases. It can be seen that MREs can be largely reduced for all the four ML models. The graph-based MPNN and MGCN models behaved best, worse for the LSTM models, and worst for the RF models. One may notice that the relatively larger derivations for the cc-pVTZ case can be observed, it is because the 6-31G* basis set was used as the reference when applying the current similarity-corrected algorithm. The predicted timing results of the cc-pVTZ basis set can be improved greatly when using 6-31+G* as the reference basis set. Additionally, one can also expect better predicted results if more basis sets can be used as the reference ones.

Figure 16.

Figure 16

Illustrations of the MREs of predicted total CPU times using RF, LSTM, MPNN, and MGCN models with 6-31G, 6-31G*, and 6-31+G* reference basis sets (center), and the similarity-corrected ones for practical predictions (outside).

Figure 17.

Figure 17

Illustrations of the error scatters of predicted total CPU times using LSTM, MPNN, and MGCN models for the M06-2x cases in Figure 16. Both the stereoscopic (left) the radial (right) distributions are shown.

For the case of neither-trained predictions with the untrained DFT functional and untrained basis set, the trained models can be used to produce predictions with magnifications both from the reference DFT functional and from the reference basis set under the chemical WMI ansatz. Herein, the trained models of PBE/6-31G, PBE/6-31G*, and 6-31+G* combinations were used as the references, and total CPU times were predicted using the neither-trained models for several different combinations of DFT functionals and basis sets. The calculated MREs of the predicted total CPU times are illustrated in Figure 18 and the error scatters are shown in Figure 19. It can be noticed that the illustrated results were quite similar to those in Figure 16. The graph-based MPNN and MGCN models still exhibited best, worse for the LSTM models, and worst behaviors for the RF models. Additionally, the reference basis set again played an important role in the predicted results.

Figure 18.

Figure 18

Illustrations of the MREs of the predicted total CPU times with neither-trained models.

Figure 19.

Figure 19

Illustrations of the error scatters of the predicted total CPU times using LSTM, MPNN, and MGCN models (neither-trained) for the M06-2x cases in Figure 18.

5. Conclusions

A forecasting system for computational time of DFT/TDDFT calculations is presented in this work. Four popular ML methods, including the RF, LSTM, MPNN, and MGCN models, were used to produce reliable predicted timing results for DFT/TDDFT calculations. The structural similarity (in RF), recognition of chemical formula (in LSTM), and spatial structures (in MPNN and MGCN) were the ideas behind the choice of working ML models. The cheminformatics with SMILES codes and graph-based spatial structures were employed for extracting structural information when training and testing the ML models. The use of cheminformatics also reduced the number of training suits significantly compared to that of the case when performing image recognition tasks. Moreover, various combinations of the DFT functional and basis set can be demonstrated by employing the proposed chemical MWI ansatz using either-trained or neither-trained models.

The four ML models can be used as the kernels for running the forecasting system. The overall performance followed the “LSTM → MGCN → MPNN → RF” order in the chemical space with the pretrained ML models, and the typical MREs were 0.1 to 0.2 for the first two (LSTM/MGCN) models with +25 to −25% relative errors for most molecules. The tendency order in performance turned to “MGCN ∼ MPNN → LSTM → RF” in the chemical space without the pretrained ML models and the typical MREs were still within the scope of 0.1 to 0.2 for the first two (MGCN/MPNN) models irrespective of whether either-trained or neither-trained cases was used. The distribution of relative errors was only slighted enlarged. The relatively small MREs and concentrated relative errors showed that the forecasting system can be used in the HPC’s task assignment and load-balancing applications. We are currently working in this direction, particularly in coordination with fragmentation approaches in quantum chemistry, such as molecular fractionation with conjugate caps110112 and the renormalized exciton method.113115

In this stage, mainly the CPU time predictions on solving the Kohn–Sham equations as well as their TD expansions in single-point calculations are presented. This type of calculation is representative for the computational cost of a molecule,8 thus it is used as the first step in the design of a forecasting system. The routine calculations such as geometry-optimization and frequencies calculations are not yet included in these predictions, and are on the list of our future works.

Acknowledgments

This work was supported by the National Key Research and Development Program of China (grant no. 2018YFB0203805), National Natural Science Foundation of China (grant no. 21703260), the Informationization Program of the Chinese Academy of Science (grant no. XXH13506-403), and the Guangdong Provincial Key Laboratory of Biocomputing (grant no. 2016B030301007). We also thank PARATERA company for their cooperation.

Appendix

6. Kohn–Sham DFT and Its Scaling

In the Born–Oppenheimer approximation, a stationary electronic state can be described using a wave function Ψ(r⃗1, ..., r⃗N) satisfying the many-electron time-independent Schrödinger equation

6. A1

where Ĥ denotes the electronic Hamiltonian, E denotes the total energy, denotes the kinetic energy, denotes the potential energy from an external field due to positively charged nuclei, and Û denotes the electron–electron interaction energy.

In the KS-DFT hypothesis, particles can be treated as noninteracting fermions, so that there exists an orthogonal and normalized function set {ϕiKS|i = 1, 2, ..., N} satisfying the condition

6. A2

here, ρ(r⃗) denotes the probability density of ground state electrons in a factual system and ρs(r⃗) denotes a fictitious system. Thus, the KS wave function is a single Slater determinant constructed from a set of function sets (i.e., orbitals) that are the lowest energy solutions to

6. A3

where the VXC(1) is referred to as exchange–correlation potential, and “(1)” following each operator symbol simply indicates that the operator is 1-electron in nature. This equation is very similar to the Fock equation

6. A4

that is used in HF theory. Both eqs A3 and A4 can be solved iteratively using the so-called self-consistent field (SCF) methods. During the SCF iteration, orbital ϕi is updated iteratively, and is used to calculate electron density

6. A5

which in turn determines the one-electron matrix (e.g., Fock matrix in SCF iterations) to be diagonalized. After several iterations, both molecular orbital ϕi and its energy εi can be obtained, and the total electronic energy can then be calculated.

Comparing eq A3 with eq A4, one can clearly see that the major difference between them is in the two-electron integrals component. The origin of the N4 scaling behavior is the calculation of four-center two-electron integrals, that is

6. A6

where μ, ν, λ, and σ denote indices of atomic orbitals. This scaling is also the upper boundary for the HF or hybrid DFT calculations. However, many two-electron integrals are of negligible magnitude for large molecules, and some rigorous upper boundary conditions can be applied to the integrals. For instance, the Schwarz inequality46

6. A7

allows the calculation of strict, mathematical upper bounds of all two-electron integrals to be computed in an N2 log N process. Apart from the calculation of the two-electron integrals, the diagonalization of the Fock or Fock-like matrix is expected to contribute significantly. To the computational cost because the diagonalization step scales intrinsically as N3; or even lower if the matrix (e.g., in a large enough molecule) to be diagonalized is sufficiently sparse.

Nevertheless, it can be noticed that for the hybrid DFT functionals, a hybrid exchange–correlation functional (i.e., the VXC(1) term in eq A3) is usually constructed as a linear combination of the third terms (HF exact exchange functional) in eq A4. Hence, hybrid DFT methods scale in a similar manner to the HF but are normally more expensive because of a larger proportionality term, involved while the pure DFT methods scale better than HF because there is no HF exchange.

7. ML Models Used in the Forecasting System

7.1. RF Model Together with Simple Feed-Forward Neural Networks

We used the FNN as the skeletal frame to obtain the model between the basis number and the computational time. Figure B1 shows an illustration of the FNN model. Four layers are used in our model, which are the input layer, two hidden layers, and the output layer. The “input layer” is constructed using the system magnitude features (e.g., the number of basis sets). These vectors are normalized and then fed to the hidden layers. Each “hidden layer” contains several neurons, and the tanh function is used as the activation function. The data passed from the hidden layer will be directly used in the linear combination of the output results.

Figure B1.

Figure B1

Simple FNN model.

The predicted results for a molecule that is far from the training set and will still be very poor, if we only use the FNN model of Figure B1. To overcome the dependency of training sets, we introduced the idea of “feature training”. “Feature training” means that a few molecules suites with specific features [e.g., linear (L), dendritic (D), ring (R), etc.,] will be trained as the “cost functions”, upon which the computational cost (y) for any molecule can be calculated by a linear combination of these “cost basis”, for example

7.1. B1

where pL, pD, and pR denote the possibilities for each “cost basis” ffeature(x), ffeature denotes the “cost functions”, and ffeature(x) denotes the expected computational cost for the “feature” model with magnitude parameter x. Herein, one can notice that this “feature training” ansatz matches well with the RF model used in ML. Under this ansatz, a specific model (i.e., cost function) is trained and saved for each structure of the molecular suit. Afterward, an RF classifier is used to place the molecules into given categories, such as linear, dendritic or ring molecules, and so forth, with possibilities (p). The classifier accepts the SMILES codes of molecules as its input. The number of every molecule’s atoms (with hydrogen atoms excluded), branches, atoms on branches, and cycles is calculated and combined as an input vector according to its SMILES code. Figure B2 illustrates the entire process.

Figure B2.

Figure B2

Architecture of RF with FNNs.

In training:

The coefficients for each “cost basis” were obtained via the RF classifier with the Scikit-Learn(Sklearn)116 package, with all the parameters set to default values. The classifier provides a molecule’s probability of falling into each category as the output, which can be used in eq B1 for the predictions.

7.2. Bi-LSTM with Attention

The kinds of features extracted by the RF classifier are designed artificially with subjective preference, so that it may be not enough for aggregating molecular structural information. Considering that we used textual data (i.e., SMILES code) as the representation of molecular structure, methods for NLP are thus suitable for feature extraction as a result of this issue.

Here, we use the bidirectional LSTM (Bi-LSTM) with attention model that was proposed by Zhou and his co-workers.117 It is the state-of-the-art model for relation classification tasks; thus, we used this model to extract structural features from SMILES. Figure B3 illustrates the architecture of the model. The input features include the SMILES code (in the form of one-hot) and the number of basis functions. Suppose every character in a T-length SIMLES sequence is denoted with a one-hot vector xi, xi will be converted to a real number vector ei (ei = Wxi, where W denotes a parameter matrix automatically learned by the model during training). Then, E = {e1, e2, ..., et} is sent to the Bi-LSTM layer. The Bi-LSTM layer consists of a forward layer and a backward layer so that the model can learn from forward and backward sequences as the past and future semantic information in a sentence are equivalently significant. The LSTM layers at two directions generate two outputs, Hf and Hb

7.2. B2

where L denotes the operations performed by a LSTM layer. Then, an attention layer accepts the sum of the outputs from Bi-LSTM layers

7.2. B3
Figure B3.

Figure B3

Architecture of Bi-LSTM with the attention model.

The attention mechanism allows different context vectors to be generated from the Bi-LSTM layer’s output at every time step by assigning different “attention weight” to the outputs. Without the attention mechanism, the feature extraction operation on the output at every time step would have the limitation of depending on one same context vector with fixed length invariant to time steps. The attention layer outputs the final representation c of a SMILES as

7.2. B4
7.2. B5

where α denotes the attention weight vector, and w denotes a trained parameter vector. The high-level features of molecular structures are produced after the attention layer. We combine the structural features and the number of basis functions and feed them into fully connected layers to get the predicted result.

7.3. MPNN Model

As a representation of the molecular structure, SMILES is quite inexact because of the absence of spatial information. Figure B4 shows that for a more accurate representation, it is rational to model a molecule using an undirected graph G. We use the MPNN model, which is recognized as a kind of graph neural network (GNN) proposed by Gilmer as a solution for graph-based learning.64

Figure B4.

Figure B4

Architecture of the MPNN model.

The initial inputs of the model include a feature vector collection for nodes of the graph, denoted with xv, containing features of atom types, aromaticity and hybridization types, and a feature vector collection for edges, denoted with evw, containing features of bond types. Furthermore, the model has two phases: a message-passing phase and a readout phase. The message-passing phase runs T-step graph convolutions and at each step t, it is defined in terms of a message function Mt and a vertex update function Ut. Before the message passing, the node vectors are mapped to a n × d matrix called “node embedding” by a network (called “node net”), with n denoting the number of nodes; and d representing the dimension of the hidden state of each node. During the message-passing phase, hidden states hvt of each node are updated according to messages mv. The message-passing phase can be summarized as

7.3. B6
7.3. B7

where N(v) denotes the neighbors of v in G. Mt is defined as M(hv,hw,eew) = A(evw)hw specifically, where A(evw) denotes a network (edge net) mapping each edge vector evw to a d × d matrix (edge embedding). The vertex update function is GRU, short for gated recurrent unit.118 In the readout phase, a feature vector can be obtained as a summary of the whole graph with a readout function R

7.3. B8

where R denotes the set2set(119) model. The set2set model produces a graph-level embedding that is invariant to the order of nodes. Finally, we combine the graph-level embedding and the basis function number and feed them to fully connected networks to get the prediction results. Figure B4 illustrates the architecture of the MPNN model.

7.4. MGCN Model

Apart from MPNN, we also introduced another GNN model, MGCN,65 which is reported to have the advantages of generalizability and transferability. As shown in Figure B5, the architecture of MGCN includes five phases.

Figure B5.

Figure B5

Architecture of the MGCN model.

First, the initial inputs of the model include a feature vector collection for nodes of the graph, denoted by Inline graphic, containing features of atom types, aromaticity, and hybridization types, a feature vector collection for edges, denoted by Inline graphic, containing features of bond types and bond lengths, and the number of the basis functions. In the preprocessing phase, the embedding layer generates the node atom embeddings Inline graphic and edge embeddings Inline graphic. The radial basis function (RBF)120,121 layer converts the bond lengths to a distance tensor Inline graphic, with dij representing the distance between atom i and atom j. The RBF layer’s function form can be expressed as

7.4. B9

where ∩ denotes concatenation and μi is from a set of K central points {μ1, ..., μK}. In the message-passing phase, the interaction layers are constructed in the form of a hierarchical architecture to simulate the many-body interactions, which are transformed at different levels (atom-wise, atom-pair, atom-triple, etc.,). The l-th layer generates an edge presentation eijl+1 and an atom representation ai

7.4. B10
7.4. B11

where he denotes the edge update function and hv denotes the message-passing function, respectively. The form of he is as follows

7.4. B12

here, η denotes a constant set to 0.8, Wue denotes a weight matrix, ⊕ denotes elementwise addition, and ⊙ is the elementwise dot product. The form of hv is

7.4. B13

where M(x) denotes a linear layer, which is in the form M(x) = Wx + b. The outputs of T interaction layers along with ai0 are concatenated together as

7.4. B14

Afterward, the Readout layer generates a graph-level embedding G as

7.4. B15

here, σ denotes the softplus function. Finally, G and the basis function number are concatenated and sent to a fully connected layer to get the prediction time.

8. Parameters When Training the Models

The number of samples in training and testing sets, as well as some of important parameters in the training process are listed in Table C1.

Table C1. Number of Samples in Training (NSamplesTrain) and Testing (NSamples) Sets and Some of the Important Parameters (BatchSize and StepSize) in Training Models.

data model NSamplesTrain NSamplesTest BatchSize StepSize
Table 1 and Figure 7 RF 108 25 40 0.005
  LSTM 500 25 100 0.005
  MPNN 1500 25 50 0.005
  MGCN 500 25 25 0.001
Table 2 LSTM 100–1500 25 100 0.001–0.005
  MPNN 100–1500 25 100 0.001–0.005
  MGCN 100–1500 25 100 0.001–0.005
  RF 1478 238 40 0.005
Figures 911 LSTM 1478 238 100 0.005
(Figures 1419)a MPNN 1478 238 100 0.001
  MGCN 1478 238 100 0.005
a

The trained models were reused for predicting the results with NSamplesTest is 49 in Figures 1419.

The authors declare no competing financial interest.

References

  1. Friesner R. A. Ab initio quantum chemistry: Methodology and applications. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 6648–6653. 10.1073/pnas.0408036102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Hergert H.; Bogner S. K.; Morris T. D.; Schwenk A.; Tsukiyama K. The In-Medium Similarity Renormalization Group: A novel ab initio method for nuclei. Phys. Rep. 2016, 621, 165–222. 10.1016/j.physrep.2015.12.007. [DOI] [Google Scholar]; , Memorial Volume in Honor of Gerald E. Brown
  3. Al-Douri Y.; Hashim U.; Khenata R.; Reshak A. H.; Ameri M.; Bouhemadou A.; Rahim Ruslinda A.; Md Arshad M. K. Ab initio method of optical investigations of CdS1-xTex alloys under quantum dots diameter effect. Sol. Energy 2015, 115, 33–39. 10.1016/j.solener.2015.02.024. [DOI] [Google Scholar]
  4. Scerri E. R.Has Chemistry Been at Least Approximately Reduced to Quantum Mechanics? PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association, 1994; pp 160–170.
  5. Bash P. A.; Ho L. L.; MacKerell A. D.; Levine D.; Hallstrom P. Progress toward chemical accuracy in the computer simulation of condensed phase reactions. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 3698–3703. 10.1073/pnas.93.8.3698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Zhao L.; Pan S.; Holzmann N.; Schwerdtfeger P.; Frenking G. Chemical bonding and bonding models of main-group compounds. Chem. Rev. 2019, 119, 8781–8845. 10.1021/acs.chemrev.8b00722. [DOI] [PubMed] [Google Scholar]
  7. Jensen F.Introduction to Computational Chemistry; John Wiley & Sons, 2017; pp 80–81. [Google Scholar]
  8. Helgaker T.; Jorgensen P.; Olsen J.. Molecular Electronic-Structure Theory; John Wiley & Sons, 2014; pp 142–200. [Google Scholar]
  9. Møller C.; Plesset M. S. Note on an Approximation Treatment for Many-Electron Systems. Phys. Rev. 1934, 46, 618–622. 10.1103/PhysRev.46.618. [DOI] [Google Scholar]
  10. Purvis G. D.; Bartlett R. J. A full coupled-cluster singles and doubles model: The inclusion of disconnected triples. J. Chem. Phys. 1982, 76, 1910–1918. 10.1063/1.443164. [DOI] [Google Scholar]
  11. Cullen J. M.; Zerner M. C. The linked singles and doubles model: An approximate theory of electron correlation based on the coupled-cluster ansatz. J. Chem. Phys. 1982, 77, 4088–4109. 10.1063/1.444319. [DOI] [Google Scholar]
  12. Szalay P. G.; Müller T.; Gidofalvi G.; Lischka H.; Shepard R. Multiconfiguration Self-Consistent Field and Multireference Configuration Interaction Methods and Applications. Chem. Rev. 2012, 112, 108–181. 10.1021/cr200137a. [DOI] [PubMed] [Google Scholar]
  13. Brock D. C.; Moore G. E.. Understanding Moore’s Law: Four Decades of Innovation; Chemical Heritage Foundation, 2006. [Google Scholar]
  14. Heinen S.; Schwilk M.; von Rudorff G. F.; von Lilienfeld O. A. Machine learning the computational cost of quantum chemistry. Mach. Learn.: Sci. Technol. 2020, 1, 025002. 10.1088/2632-2153/ab6ac4. [DOI] [Google Scholar]
  15. Qin X.; Wang J.; Hu M.; Su Y.; Wan W.; Li B.; Dai R.; Wang J. Q. Practices on Monitoring, Scheduling, and Interconnection optimization of Super-Large Computing System. Front. Data Comput. 2020, 2, 55–69. [Google Scholar]
  16. Gordon M. S.; Fedorov D. G.; Pruitt S. R.; Slipchenko L. V. Fragmentation methods: A route to accurate calculations on large systems. Chem. Rev. 2012, 112, 632–672. 10.1021/cr200093j. [DOI] [PubMed] [Google Scholar]
  17. Li W.; Duan M.; Liao K.; Hong B.; Ni Z.; Ma J.; Li S. Improved generalized energy-based fragmentation approach and its applications to the binding energies of supramolecular complexes. Electron. Struct. 2019, 1, 044003. 10.1088/2516-1075/ab5049. [DOI] [Google Scholar]
  18. Wang Z.; Han Y.; Li J.; He X. Combining the Fragmentation Approach and Neural Network Potential Energy Surfaces of Fragments for Accurate Calculation of Protein Energy. J. Phys. Chem. B 2020, 124, 3027–3035. 10.1021/acs.jpcb.0c01370. [DOI] [PubMed] [Google Scholar]
  19. Fedorov D.; Kitaura K.. The Fragment Molecular Orbital Method: Practical Applications to Large Molecular Systems; CRC Press, 2009. [Google Scholar]
  20. Babu K.; Gadre S. R. Ab initio quality one-electron properties of large molecules: Development and testing of molecular tailoring approach. J. Comput. Chem. 2003, 24, 484–495. 10.1002/jcc.10206. [DOI] [PubMed] [Google Scholar]
  21. Downey A.Predicting Queue Times on Space-Sharing Parallel Computers. Proceedings 11th International Parallel Processing Symposium, Geneva, Switzerland, 1997; pp 209–218.
  22. Smith W.; Taylor V.; Foster I. In Job Scheduling Strategies for Parallel Processing; Goos G., Hartmanis J., van Leeuwen J., Feitelson D. G., Rudolph L., Eds.; Lecture Notes in Computer Science; Springer: Berlin, Heidelberg, 1999; Vol. 1659; pp 202–219. [Google Scholar]
  23. Tsafrir D.; Etsion Y.; Feitelson D. G. Backfilling Using System-Generated Predictions Rather than User Runtime Estimates. IEEE Trans. Parallel Distr. Syst. 2007, 18, 789–803. 10.1109/tpds.2007.70606. [DOI] [Google Scholar]
  24. Gaussier E.; Glesser D.; Reis V.; Trystram D.. Improving Backfilling by Using Machine Learning to Predict Running Times. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on—SC’15, Austin, Texas, 2015; pp 1–10.
  25. Sonmez O.; Yigitbasi N.; Iosup A.; Epema D.. Trace-Based Evaluation of Job Runtime and Queue Wait Time Predictions in Grids. Proceedings of the 18th ACM international symposium on High performance distributed computing—HPDC’09, Garching, Germany, 2009; p 111.
  26. Li C. V.; Petrucci V.; Mosse D.. Predicting Thread Profiles across Core Types via Machine Learning on Heterogeneous Multiprocessors. 2016 VI Brazilian Symposium on Computing Systems Engineering (SBESC), João Pessoa, Paraíba, Brazil, 2016; pp 56–62.
  27. Negi A.; Kumar P.. Applying Machine Learning Techniques to Improve Linux Process Scheduling. TENCON 2005—2005 IEEE Region 10 Conference, Melbourne, Australia, 2005; pp 1–6.
  28. Helmy T.; Al-Azani S.; Bin-Obaidellah O.. A Machine Learning-Based Approach to Estimate the CPU-Burst Time for Processes in the Computational Grids. 2015 3rd International Conference on Artificial Intelligence, Modelling and Simulation (AIMS), Kota Kinabalu, Malaysia, 2015; pp 3–8.
  29. Shulga D. A.; Kapustin A. A.; Kozlov A. A.; Kozyrev A. A.; Rovnyagin M. M.. The scheduling based on machine learning for heterogeneous CPU/GPU systems. Young Researchers in Electrical & Electronic Engineering Conference, 2016; p 4.
  30. Nadeem F.; Fahringer T. Optimizing execution time predictions of scientific workflow applications in the Grid through evolutionary programming. Future Generat. Comput. Syst. 2013, 29, 926–935. 10.1016/j.future.2012.10.005. [DOI] [Google Scholar]
  31. Singh K.; İpek E.; McKee S. A.; de Supinski B. R.; Schulz M.; Caruana R. Predicting parallel application performance via machine learning approaches. Concurr. Comput. 2007, 19, 2219–2235. 10.1002/cpe.1171. [DOI] [Google Scholar]
  32. Matsunaga A.; Fortes J. A.. On the Use of Machine Learning to Predict the Time and Resources Consumed by Applications. 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, Melbourne, Australia, 2010; pp 495–504.
  33. Altschul S. F.; Gish W.; Miller W.; Myers E. W.; Lipman D. J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. 10.1016/s0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  34. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014, 30, 1312–1313. 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Papay J.; Atherton T. J.; Zemerly M. J.; Nudd G. R. Performance prediction of parallel self consistant field computation. Parallel Algorithm Appl. 1996, 10, 127–143. 10.1080/10637199608915612. [DOI] [Google Scholar]
  36. Antony J.; Rendell A. P.; Yang R.; Trucks G.; Frisch M. J. Modelling the Runtime of the Gaussian Computational Chemistry Application and Assessing the Impacts of Microarchitectural Variations. Procedia Comput. Sci. 2011, 4, 281–291. 10.1016/j.procs.2011.04.030. [DOI] [Google Scholar]
  37. Mniszewski S. M.; Junghans C.; Voter A. F.; Perez D.; Eidenbenz S. J. TADSim: Discrete Event-Based Performance Prediction for Temperature-Accelerated Dynamics. ACM Trans. Model Comput. Simulat. 2015, 25, 1–26. 10.1145/2699715. [DOI] [Google Scholar]
  38. Schuld M.; Sinayskiy I.; Petruccione F. An introduction to quantum machine learning. Contemp. Phys. 2015, 56, 172–185. 10.1080/00107514.2014.964942. [DOI] [Google Scholar]
  39. Biamonte J.; Wittek P.; Pancotti N.; Rebentrost P.; Wiebe N.; Lloyd S. Quantum machine learning. Nature 2017, 549, 195–202. 10.1038/nature23474. [DOI] [PubMed] [Google Scholar]
  40. Ciliberto C.; Herbster M.; Ialongo A. D.; Pontil M.; Rocchetto A.; Severini S.; Wossnig L. Quantum machine learning: a classical perspective. Proc. Math. Phys. Eng. Sci. 2018, 474, 20170551. 10.1098/rspa.2017.0551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Baerends E. J.; Gritsenko O. V. A quantum chemical view of density functional theory. J. Phys. Chem. A 1997, 101, 5383–5403. 10.1021/jp9703768. [DOI] [Google Scholar]
  42. Perdew J. P.; Ruzsinszky A.; Tao J.; Staroverov V. N.; Scuseria G. E.; Csonka G. I. Prescription for the design and selection of density functional approximations: More constraint satisfaction with fewer fits. J. Chem. Phys. 2005, 123, 062201. 10.1063/1.1904565. [DOI] [PubMed] [Google Scholar]
  43. Cohen A. J.; Mori-Sánchez P.; Yang W. Challenges for density functional theory. Chem. Rev. 2012, 112, 289–320. 10.1021/cr200107z. [DOI] [PubMed] [Google Scholar]
  44. Strout D. L.; Scuseria G. E. A quantitative study of the scaling properties of the Hartree–Fock method. J. Chem. Phys. 1995, 102, 8448–8452. 10.1063/1.468836. [DOI] [Google Scholar]
  45. Pérez-Jordá J. M.; Yang W. Fast evaluation of the Coulomb energy for electron densities. J. Chem. Phys. 1997, 107, 1218–1226. 10.1063/1.474466. [DOI] [Google Scholar]
  46. Steele J. M.The Cauchy–Schwarz Master Class: An Introduction to the Art of Mathematical Inequalities; Cambridge University Press, 2004. [Google Scholar]
  47. Kirkpatrick P.; Ellis C. Chemical space. Nature 2004, 432, 823. 10.1038/432823a. [DOI] [Google Scholar]
  48. Reymond J.-L.; van Deursen R.; Blum L. C.; Ruddigkeit L. Chemical space as a source for new drugs. MedChemComm 2010, 1, 30. 10.1039/c0md00020e. [DOI] [Google Scholar]
  49. Oprea T. I. Chemical space navigation in lead discovery. Curr. Opin. Chem. Biol. 2002, 6, 384–389. 10.1016/s1367-5931(02)00329-0. [DOI] [PubMed] [Google Scholar]
  50. Oprea T. I.; Gottfries J. Chemography: The Art of Navigating in Chemical Space. J. Comb. Chem. 2001, 3, 157–166. 10.1021/cc0000388. [DOI] [PubMed] [Google Scholar]
  51. Reymond J.-L. The Chemical Space Project. Acc. Chem. Res. 2015, 48, 722–730. 10.1021/ar500432k. [DOI] [PubMed] [Google Scholar]
  52. Probst D.; Reymond J. L. Exploring Drugbank in Virtual Reality Chemical Space. J. Chem. Inf. Model. 2018, 58, 1731–1735. 10.1021/acs.jcim.8b00402. [DOI] [PubMed] [Google Scholar]
  53. Breiman L. Random Forests. Mach. Learn. 2001, 45, 5–32. 10.1023/a:1010933404324. [DOI] [Google Scholar]
  54. Ho T. K.Random Decision Forests. Proceedings of 3rd International Conference on Document Analysis and Recognition, 1995; Vol. 1, pp 278–282.
  55. Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31–36. 10.1021/ci00057a005. [DOI] [Google Scholar]
  56. Weininger D.; Weininger A.; Weininger J. L. SMILES. 2. Algorithm for generation of unique SMILES notation. J. Chem. Inf. Comput. Sci. 1989, 29, 97–101. 10.1021/ci00062a008. [DOI] [Google Scholar]
  57. Weininger D. SMILES. 3. DEPICT. Graphical depiction of chemical structures. J. Chem. Inf. Comput. Sci. 1990, 30, 237–243. 10.1021/ci00067a005. [DOI] [Google Scholar]
  58. Hochreiter S.; Schmidhuber J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]
  59. Gers F. A.; Schmidhuber J.; Cummins F. Learning to Forget: Continual Prediction with LSTM. Neural Comput. 2000, 12, 2451–2471. 10.1162/089976600300015015. [DOI] [PubMed] [Google Scholar]
  60. Morgan D. P.; Scofield C. L.. Neural Networks and Speech Processing; Springer US: Boston, MA, 1991; pp 245–288. [Google Scholar]
  61. Graves A.; Fernández S.; Schmidhuber J.. Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition. International Conference on Artificial Neural Networks, 2005; pp 799–804.
  62. Sundermeyer M.; Alkhouli T.; Wuebker J.; Ney H.. Translation Modeling with Bidirectional Recurrent Neural Networks. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014; pp 14–25.
  63. Kiperwasser E.; Goldberg Y.. Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations. Transactions of the Association for Computational Linguistics, 2016; Vol. 4, pp 313–327.
  64. Gilmer J.; Schoenholz S. S.; Riley P. F.; Vinyals O.; Dahl G. E.. Neural Message Passing for Quantum Chemistry. 2017, arXiv:1704.01212. [Google Scholar]
  65. Lu C.; Liu Q.; Wang C.; Huang Z.; Lin P.; He L.. Molecular Property Prediction: A Multilevel Quantum Interactions Modeling Perspective. Proceedings of the AAAI Conference on Artificial Intelligence, 2019; Vol. 33, pp 1052–1060.
  66. Everett H. “Relative State” Formulation of Quantum Mechanics. Rev. Mod. Phys. 1957, 29, 454–462. 10.1103/revmodphys.29.454. [DOI] [Google Scholar]
  67. Everett H.The Everett Interpretation of Quantum Mechanics: Collected Works 1955–1980 with Commentary; Princeton University Press, 2012. [Google Scholar]
  68. Wallace D.The Emergent Multiverse: Quantum Theory According to the Everett Interpretation; Oxford University Press, 2012. [Google Scholar]
  69. Tappenden P. Identity and probability in Everett’s multiverse. Br. J. Philos. Sci. 2000, 51, 99–114. 10.1093/bjps/51.1.99. [DOI] [Google Scholar]
  70. Bousso R.; Susskind L. Multiverse interpretation of quantum mechanics. Phys. Rev. D: Part., Fields, Gravitation, Cosmol. 2012, 85, 045007. 10.1103/physrevd.85.045007. [DOI] [Google Scholar]
  71. Schrödinger E. Die gegenwärtige Situation in der Quantenmechanik. Naturwissenschaften 1935, 23, 807–812. 10.1007/BF01491891. [DOI] [Google Scholar]
  72. Schrödinger E.Discussion of Probability Relations between Separated Systems. Mathematical Proceedings of the Cambridge Philosophical Society, 1935; pp 555–563.
  73. Einstein A.; Podolsky B.; Rosen N. Can quantum-mechanical description of physical reality be considered complete?. Phys. Rev. 1935, 47, 777. 10.1103/physrev.47.777. [DOI] [Google Scholar]
  74. Yin J.; Cao Y.; Yong H.-L.; Ren J.-G.; Liang H.; Liao S.-K.; Zhou F.; Liu C.; Wu Y.-P.; Pan G.-S.; et al. Lower bound on the speed of nonlocal correlations without locality and measurement choice loopholes. Phys. Rev. Lett. 2013, 110, 260407. 10.1103/physrevlett.110.260407. [DOI] [PubMed] [Google Scholar]
  75. Perdew J. P.; Schmidt K.. Jacob’s Ladder of Density Functional Approximations for the Exchange-Correlation Energy. AIP Conference Proceedings, 2001; pp 1–20.
  76. Moulton R.; Jiang Y.. Maximally Consistent Sampling and the Jaccard Index of Probability Distributions. IEEE International Conference on Data Mining, ICDM 2018, Singapore, November 17–20, 2018; pp 347–356.
  77. Levandowsky M.; Winter D. Distance between sets. Nature 1971, 234, 34–35. 10.1038/234034a0. [DOI] [Google Scholar]
  78. Jaccard P. Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull. Soc. Vaudoise Sci. Nat. 1901, 37, 547–579. [Google Scholar]
  79. Bozinovski S. Reminder of the First Paper on Transfer Learning in Neural Networks, 1976. Informatica 2020, 44, 291. 10.31449/inf.v44i3.2828. [DOI] [Google Scholar]
  80. Ben-David S.; Blitzer J.; Crammer K.; Kulesza A.; Pereira F.; Vaughan J. W. A theory of learning from different domains. Mach. Learn. 2010, 79, 151–175. 10.1007/s10994-009-5152-4. [DOI] [Google Scholar]
  81. Pan S. J.; Yang Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. 10.1109/TKDE.2009.191. [DOI] [Google Scholar]
  82. Yosinski J.; Clune J.; Bengio Y.; Lipson H. How transferable are features in deep neural networks?. Adv. Neural Inf. Process. Syst. 2014, 27, 3320–3328. [Google Scholar]
  83. Ma Y.A Forecasting System under MWI, ML, and Cheminformatics. https://github.com/yingjin-ma/Fcst_sys_public (accessed date Nov 19, 2020).
  84. Pritchard B. P.; Altarawy D.; Didier B.; Gibson T. D.; Windus T. L. A New Basis Set Exchange: An Open, Up-to-date Resource for the Molecular Sciences Community. J. Chem. Inf. Model. 2019, 59, 4814–4820. 10.1021/acs.jcim.9b00725. [DOI] [PubMed] [Google Scholar]
  85. Lehtola S.; Steigemann C.; Oliveira M. J. T.; Marques M. A. L. Recent developments in libxc—A comprehensive library of functionals for density functional theory. SoftwareX 2018, 7, 1–5. 10.1016/j.softx.2017.11.002. [DOI] [Google Scholar]
  86. Turcani L.; Berardo E.; Jelfs K. E. STK: A Python Toolkit for Supramolecular Assembly. J. Comput. Chem. 2018, 39, 1931–1942. 10.1002/jcc.25377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. The RDKit Documentation—The RDKit 2019.03.1 documentation. http://www.rdkit.org/docs/index.html (accessed Jan. 14, 2021).
  88. Tencent Quantum laboratory. https://alchemy.tencent.com/ (accessed Jan. 14, 2021).
  89. Frisch M. J.; et al. Gaussian09, Revision D.01; Gaussian Inc.: Wallingford CT, 2009.
  90. Aprà E.; Bylaska E. J.; De Jong W. A.; Govind N.; Kowalski K.; Straatsma T. P.; Valiev M.; van Dam H. J.; Alexeev Y.; Anchell J.; et al. NWChem: Past, present, and future. J. Chem. Phys. 2020, 152, 184102. 10.1063/5.0004997. [DOI] [PubMed] [Google Scholar]
  91. Barca G. M. J.; et al. Recent developments in the general atomic and molecular electronic structure system. J. Chem. Phys. 2020, 152, 154102. 10.1063/5.0005188. [DOI] [PubMed] [Google Scholar]
  92. Fdez Galván I.; Vacher M.; Alavi A.; Angeli C.; Aquilante F.; Autschbach J.; Bao J. J.; Bokarev S. I.; Bogdanov N. A.; Carlson R. K.; et al. OpenMolcas: From source code to insight. J. Chem. Theory Comput. 2019, 15, 5925–5964. 10.1021/acs.jctc.9b00532. [DOI] [PubMed] [Google Scholar]
  93. Paszke A.; Gross S.; Chintala S.; Chanan G.; Yang E.; DeVito Z.; Lin Z.; Desmaison A.; Antiga L.; Lerer A.. Automatic Differentiation in PyTorch. 31st Conference on Neural Information Processing Systems (NIPS 2017), 2017.
  94. Wishart D. S.; Knox C.; Guo A. C.; Shrivastava S.; Hassanali M.; Stothard P.; Chang Z.; Woolsey J. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006, 34, D668–D672. 10.1093/nar/gkj067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Wishart D. S.; Knox C.; Guo A. C.; Cheng D.; Shrivastava S.; Tzur D.; Gautam B.; Hassanali M. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008, 36, D901–D906. 10.1093/nar/gkm958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Law V.; Knox C.; Djoumbou Y.; Jewison T.; Guo A. C.; Liu Y.; Maciejewski A.; Arndt D.; Wilson M.; Neveu V.; et al. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 2014, 42, D1091–D1097. 10.1093/nar/gkt1068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Kingma D. P.; Ba J.. Adam: A Method for Stochastic Optimization. 2017, arXiv:1412.6980. [Google Scholar]
  98. Ditchfield R.; Hehre W. J.; Pople J. A. Self-consistent molecular-orbital methods. IX. An extended Gaussian-type basis for molecular-orbital studies of organic molecules. J. Chem. Phys. 1971, 54, 724–728. 10.1063/1.1674902. [DOI] [Google Scholar]
  99. Zhao Y.; Truhlar D. G. The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals. Theor. Chem. Acc. 2008, 120, 215–241. 10.1007/s00214-007-0310-x. [DOI] [Google Scholar]
  100. Perdew J. P.; Burke K.; Ernzerhof M. Generalized gradient approximation made simple. Phys. Rev. Lett. 1996, 77, 3865. 10.1103/physrevlett.77.3865. [DOI] [PubMed] [Google Scholar]
  101. Becke A. D. Density-functional exchange-energy approximation with correct asymptotic behavior. Phys. Rev. A: At., Mol., Opt. Phys. 1988, 38, 3098. 10.1103/physreva.38.3098. [DOI] [PubMed] [Google Scholar]
  102. Lee C.; Yang W.; Parr R. G. Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. Phys. Rev. B: Condens. Matter Mater. Phys. 1988, 37, 785. 10.1103/physrevb.37.785. [DOI] [PubMed] [Google Scholar]
  103. Becke A. D. A new mixing of Hartree–Fock and local density-functional theories. J. Chem. Phys. 1993, 98, 1372–1377. 10.1063/1.464304. [DOI] [Google Scholar]
  104. Raghavachari K. Perspective on “Density functional thermochemistry. III. The role of exact exchange”. Theor. Chem. Acc. 2000, 103, 361–363. 10.1007/978-3-662-10421-7_60. [DOI] [Google Scholar]
  105. Iikura H.; Tsuneda T.; Yanai T.; Hirao K. A long-range correction scheme for generalized-gradient-approximation exchange functionals. J. Chem. Phys. 2001, 115, 3540–3544. 10.1063/1.1383587. [DOI] [Google Scholar]
  106. Yanai T.; Tew D. P.; Handy N. C. A new hybrid exchange–correlation functional using the Coulomb-attenuating method (CAM-B3LYP). Chem. Phys. Lett. 2004, 393, 51–57. 10.1016/j.cplett.2004.06.011. [DOI] [Google Scholar]
  107. Chai J.-D.; Head-Gordon M. Long-range corrected hybrid density functionals with damped atom–atom dispersion corrections. Phys. Chem. Chem. Phys. 2008, 10, 6615–6620. 10.1039/b810189b. [DOI] [PubMed] [Google Scholar]
  108. Schäfer A.; Horn H.; Ahlrichs R. Fully optimized contracted Gaussian basis sets for atoms Li to Kr. J. Chem. Phys. 1992, 97, 2571–2577. 10.1063/1.463096. [DOI] [Google Scholar]
  109. Dunning T. H. Jr. Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen. J. Chem. Phys. 1989, 90, 1007–1023. 10.1063/1.456153. [DOI] [Google Scholar]
  110. Zhang D. W.; Zhang J. Z. H. Molecular fractionation with conjugate caps for full quantum mechanical calculation of protein–molecule interaction energy. J. Chem. Phys. 2003, 119, 3599–3605. 10.1063/1.1591727. [DOI] [Google Scholar]
  111. He X.; Zhang J. Z. H. The generalized molecular fractionation with conjugate caps/molecular mechanics method for direct calculation of protein energy. J. Chem. Phys. 2006, 124, 184703. 10.1063/1.2194535. [DOI] [PubMed] [Google Scholar]
  112. Zhang B.; Ma Y.; Jin X.; Wang Y.; Suo B.; He X.; Jin Z. GridMol2. 0: Implementation and application of linear-scale quantum mechanics methods and molecular visualization. Int. J. Quantum Chem. 2020, 120, e26402 10.1002/qua.26402. [DOI] [Google Scholar]
  113. Al Hajj M.; Malrieu J.-P.; Guihéry N. Renormalized excitonic method in terms of block excitations: Application to spin lattices. Phys. Rev. B: Condens. Matter Mater. Phys. 2005, 72, 224412. 10.1103/physrevb.72.224412. [DOI] [Google Scholar]
  114. Ma Y.; Liu Y.; Ma H. A new fragment-based approach for calculating electronic excitation energies of large systems. J. Chem. Phys. 2012, 136, 024113. 10.1063/1.3675915. [DOI] [PubMed] [Google Scholar]
  115. Ma Y.; Ma H. Calculating excited states of molecular aggregates by the renormalized excitonic method. J. Phys. Chem. A 2013, 117, 3655–3665. 10.1021/jp401168s. [DOI] [PubMed] [Google Scholar]
  116. Pedregosa F.; et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  117. Zhou P.; Shi W.; Tian J.; Qi Z.; Li B.; Hao H.; Xu B.. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin, Germany, 2016; pp 207–212.
  118. Cho K.; van Merrienboer B.; Bahdanau D.; Bengio Y.. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. 2014, arXiv:1409.1259. [Google Scholar]
  119. Vinyals O.; Bengio S.; Kudlur M.. Order Matters: Sequence to sequence for sets. 2015, arXiv:1511.06391. [Google Scholar]
  120. Broomhead D. S.; Lowe D.. Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks, 1988.
  121. Schwenker F.; Kestler H. A.; Palm G. Three learning phases for radial-basis-function networks. Neural Network. 2001, 14, 439–458. 10.1016/s0893-6080(01)00027-2. [DOI] [PubMed] [Google Scholar]

Articles from ACS Omega are provided here courtesy of American Chemical Society

RESOURCES