Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2008 Nov 13.
Published in final edited form as: J Chem Theory Comput. 2005;1(5):817–823. doi: 10.1021/ct0500287

Extension of the PDDG/PM3 Semiempirical Molecular Orbital Method to Sulfur, Silicon, and Phosphorus

Ivan Tubert-Brohman 1, Cristiano Ruch Werneck Guimarães 1, William L Jorgensen 1,*
PMCID: PMC2582878  NIHMSID: NIHMS60531  PMID: 19011692

Abstract

The PDDG/PM3 semiempirical molecular orbital method has been parameterized for molecules, ions, and complexes containing sulfur; the mean absolute error (MAE) for heats of formation, ΔHf, of 6.4 kcal/mol is 35 − 40 % smaller than for PM3, AM1, and MNDO/d. For completeness, parameterization was also carried out for silicon and phosphorous. For 144 silicon-containing molecules, the ΔHf MAE for PDDG/PM3, PM3, and AM1 is 11 − 12 kcal/mol, while MNDO/d yields 9.4 kcal/mol. For the limited set of 43 phosphorus-containing molecules, MNDO/d also yields the best results followed by PDDG/PM3, AM1, and PM3. The benefits of the d-orbitals in MNDO/d for hypervalent compounds are apparent for silicon and phosphorous, while they are masked in the larger dataset for sulfur by large errors for branched compounds. Overall, for 1480 molecules, ions, and complexes containing the elements H, C, N, O, F, Si, P, S, Cl, Br, and I, the MAEs in kcal/mol for ΔHf are 6.5 (PDDG/PM3), 8.7 (PM3), 10.3 (MNDO/d), 10.8 (AM1), and 19.8 (MNDO).

Introduction

Semiempirical methods based on the Neglect of Diatomic Differential Overlap (NDDO)1 approximation, such as MNDO,2 AM1,3 PM3,4 and MNDO/d,5 occupy an important place in computational chemistry due to their speed and excellent scaling with increasing system size. Even as constant advances in computer resources permit application of ab initio and DFT methods to ever larger systems of chemical interest, semiempirical methods allow the exploration of new frontiers such as full quantum mechanical calculations for proteins6 or long Monte Carlo and molecular dynamics simulations of reactions in solution and in enzymes by means of coupled quantum and molecular mechanics (QM/MM).7-16

In recent articles, a new NDDO-based method, PDDG/PM3, was introduced.17,18 It is derived from the PM3 method by addition of small pairwise distance directed Gaussians to the core repulsion function. The method was initially parameterized for the basic organic elements C, H, N, and O,17 and later extended to the halogens F, Cl, Br, and I.18 The use of the PDDG function in conjunction with extensive parameterization using large datasets resulted in a reduction of about 30% in the mean absolute error (MAE) for heats of formation in comparison to PM3. Several systematic errors such as for homologation and branching were overcome, and improvements for activation barriers for SN2 reactions involving halogens were also obtained. The PDDG/PM3 method yields heats of formation and isomerization energies that are more accurate than those obtained from B3LYP/6−31G* calculations, and for some classes of compounds chemical accuracy is approached, e.g., the MAE is 1.17 kcal/mol for alkanes. It has also been used successfully in QM/MM studies of nucleophilic aromatic substitution (SNAr) and SN2 reactions in solution.7,8

In this article, the PDDG/PM3 method is extended to sulfur, silicon, and phosphorus. With these additions, the complete set of “organic elements” is available for calculations. This will allow the application of the PDDG/PM3 method to a wide variety of molecules of chemical and biological interest. While it is expected that methods that do not include d-orbitals will have difficulties with hypervalent compounds, the question is how far can one go with a sp basis set and the PDDG approach.

PDDG Formalism

The rationale behind the PDDG formalism has been discussed.17 The key difference between the PDDG/PM3 method and its predecessor is the addition of pairwise distance directed Gaussian terms (PDDG) to the core repulsion function, as given in eq 1.

PDDG(A,B)=1nA+nB[i=12j=12(nAPAi+nBPBj)exp(102(RABDAiDBj)2)] (1)

This equation is an empirical correction to the core repulsion between atoms A and B, which are separated by a distance RAB; each element requires four parameters PA1, PA2, DA1, and DA2. The function is weighted using nA and nB, which are the number of valence electrons for atoms A and B, respectively. For each pair of atoms there are four Gaussians, which depend only on atomic parameters. The Gaussians are small compared to those used in the original AM1 and PM3 core repulsion functions, and work by addressing systematic errors associated with bonds and functional groups.17,18

Another difference between the traditional NDDO methods and the PDDG method is the way in which the molecular energy obtained from the SCF calculation is converted to a heat of formation at 298 K. In both cases, the difference between the heat of formation of each atom and its electronic energy (eisol) is added to the molecular energy. In the traditional method, eisol is obtained as a derived parameter by calculating the energy of an isolated atom with a restricted single-determinant wave function using the semiempirical formalism and parameter set;19 for the PDDG methods, eisol is treated as an optimizable parameter, obtained from a through-origin linear regression so that the conversion from molecular energy to heat of formation gives as small an error as possible. A similar approach has been applied in recent work using density functional theory (DFT) by Winget and Clark.20

Parameter Optimization

To extend the PDDG/PM3 method to sulfur, silicon, and phosphorus, a similar procedure to the one used for the halogens was followed: the Uss, Upp, βs, βp, ζs, ζp, and α MNDO parameters, as well as the PM3 and PDDG/PM3 Gaussian pre-exponential and distance terms were optimized by a combination of gradient-based methods (Fletcher–Powell) and simulated annealing.18 The available reference data had very few molecules involving more than one of the elements discussed in this paper, so little coupling between different elements’ parameters was expected, which allowed each element to be optimized separately. A total of 527 reference values were used in the error function during the optimization of the three elements, as detailed in Table 1. The error function that was minimized was the weighted sum of the square deviation between the calculated and reference values, as discussed previously.18 Most of the optimizations were done with fixed PM3 geometries, adding gradients to the error function to ensure that the geometric minimum for each molecule does not stray far from the PM3 minimum. Only the final optimization stage involved fully flexible geometries; in this stage the gradients were not included in the error function, since they are almost zero.

Table 1.

Composition of the Training Setsa

Data type S Si P
Heat of Formation 81 47 23
Ionization potential 28 15 13
Dipole moment 16 12 7
Bond length 78 96 16
Bond angle 54 33 8
a

For detailed data and references, see the Supporting Information.

The prior PDDG/PM3 parameterizations were performed as local optimizations that tried to stay close to the original PM3 parameters.17,18 However, preliminary results for sulfur showed that the local approach could provide little improvement over PM3, so more global optimization was undertaken. As a first step, to perturb the PM3 parameters reasonably far from their original values, 384 (3×27) initial parameter sets were generated by crossover between the Uss, Upp, βs, βp, ζs, ζp, and α parameters of PM3, MNDO, and AM1. These parameter sets were optimized in the usual way while using the PM3 Hamiltonian and local optimization for the PM3 Gaussians; the PDDG Gaussians were not added yet. The six best results were chosen and an exhaustive crossover was performed between them, resulting in 1920 (15×27) parameter sets. After optimizing the latter, the best set was chosen and 256 (44) different combinations of PM3 Gaussian parameters were tried in a grid-like fashion. After optimization of these, 256 combinations of PDDG Gaussians were added to the best result. The best parameter set from the last step was then subjected to 3000 steps of stochastic search; at each step, a random “kick” was applied to the parameters, which were then optimized using the same gradient-based algorithm. The best parameter set obtained so far was subjected to a flexible geometry optimization using simulated annealing, resulting in the final parameter set for sulfur. Suffice it to say that this protocol was not planned in advance in its entirety, but was the result of much trial and error. However, it resulted in remarkable improvement over the simple local optimization, as discussed below.

Similar procedures were tried for silicon and phosphorus, with a few variations. The MNDO method was no longer included in the crossover part, as it was found to not be helpful; also, the stochastic search was not performed for these elements. For silicon, it was found that the results for the global optimization were not better than those from the simple local optimization. while, for phosphorous, gains were made over PM3. The build-up procedure of parameterization for C, H, N and O first, then the halogens, then S, Si, and P clearly leads to the largest errors for the last elements. Simultaneous optimization for all elements is desirable, but logistically taxing.

Results and Discussion

The optimized and dependent parameters for the PDDG/PM3 method are shown in Table 2 along with the original PM3 parameters for comparison. Some PDDG/PM3 parameters for sulfur, such as Uss, ζs, and βs, are quite different from the PM3 parameters. This is a result of the global optimization method along with the differences in the training set. However, the PDDG/PM3 parameters are still more similar to PM3 rather than to AM1 values, while the magnitude of the differences between PDDG/PM3 and PM3 parameters is similar to the differences between PM3 and AM1. The differences in the parameters for silicon and phosphorus are generally smaller than those for sulfur.

Table 2.

Optimized PDDG/PM3 Parameters for S, Si, and P, along with the Standard PM3 Parametersa

PDDG/PM3
PM3
S Si P S Si P
Uss −43.906366 −26.332522 −37.882113 −49.895371 −26.763483 −40.413096
Upp −43.461348 −22.602540 −30.312979 −44.392583 −22.813635 −29.593052
βs −2.953912 −3.376445 −12.676297 −8.827465 −2.862145 −12.615879
βp −8.507779 −3.620969 −7.093318 −8.091415 −3.933148 −4.160040
ζs 1.012002 1.586389 2.395882 1.891185 1.635075 2.017563
ζp 1.876999 1.485958 1.742213 1.658972 1.313088 1.504732
α 2.539751 2.215157 2.005294 2.269706 2.135809 1.940534
eisol −166.336554 −66.839000 −117.212854 −183.453740 −67.788214 −117.959174
DD 1.006989 1.310515 0.893978 1.121431 1.314455 1.064495
QQ 0.891487 1.126089 0.960457 1.008649 1.274340 1.112039
ρ0b 1.517625 2.695556 1.743870 1.517625 2.695556 1.743870
ρ1b 0.711672 1.630757 1.050851 0.748602 1.633605 1.160242
ρ2b 0.754336 0.949200 1.208907 0.814668 1.025130 1.339579
a1 −0.330692 −0.071314 −0.398055 −0.399191 −0.390600 −0.611421
b1 6.000000 6.000000 1.997272 6.000669 6.000054 1.997272
c1 0.823837 0.237995 0.950073 0.962123 0.632262 0.794624
a2 0.024171 0.089451 −0.079653 −0.054899 0.057259 −0.093935
b2 6.000000 6.000000 1.998360 6.001845 6.007183 1.998360
c2 2.017756 1.897728 2.336959 1.579944 2.019987 1.910677
PA1 0.120434 −0.091928 0.462741
PA2 −0.002663 −0.040753 −0.020444
DA1 0.672870 1.163190 0.714296
DA2 2.032340 2.190526 2.041209
a

Units are: (eV) Uss, Upp, βs, βp, eisol, a1, a2, PA1, PA2; (au) ζs, ζp; (Bohr) DD, QQ, ρ0, ρ1, ρ2; (Å) c1, c2, DA1, DA2; (Å−1) α, b1, b2.

b

For use in MOPAC 6, ρ0 = 0.5/AM, ρ1 = 0.5/AD, ρ2 = 0.5/AQ.

Sulfur Thermochemistry

The performance of PDDG/PM3 and the other common semiempirical methods for the calculation of standard heats of formation is summarized in Table 3. Full details are provided for all individual compounds in the Supplementary Material. The mean absolute error for the entire set of 249 molecules is 6.4 kcal/mol for PDDG/PM3, which is 35−40% smaller than the MAE for the older methods. This large improvement was observed for most classes of compounds, but it could only be achieved after the extensive global optimization discussed above. Initial tests with local optimizations resulted in only ca. 15% improvements with respect to PM3. It is remarkable that the errors for hypervalent molecules with PDDG/PM3 and its sp-basis set are not large, and that notably improved performance for classes such as sulfones is achieved, while simultaneously improving the performance for normal valence functional groups. The MAE for thiols, sulfides, and disulfides are decreased by 63%, 13%, and 76%, respectively, when compared to PM3. Compounds with both sulfur and halogens were the only notable class where PDDG/PM3 did not yield improvement over PM3 or MNDO/d. The problem cases here are predominantly small fluorine containing species including SF2, FSSF, the transition state for the SH + CH3F SN2 reaction, and the HFHS+ complex. If more data were available for larger molecules, the benefits of PDDG for homologation and branching could be expected to dominate for this class too.

Table 3.

Mean Absolute Errors for Heats of Formation for Sulfur Compounds (kcal/mol)

N PDDG MNDO AM1 PM3 MNDO/d
All 249 6.4 41.2 10.6 10.5 10.0
Training 81 7.1 34.2 9.9 10.6 10.0
Test 168 6.0 44.6 10.9 10.4 10.1
Halides 15 13.7 56.5 14.5 6.6 5.1
Sulfoxides 7 5.4 43.2 3.8 6.1 5.6
Sulfones 36 5.9 143.8 18.9 18.1 10.4
Sulfates 5 1.7 162.9 10.5 5.1 5.9
Sulfites 4 8.9 54.1 22.0 11.2 7.6
Thiols 29 1.4 5.5 4.4 3.8 7.3
Sulfides 38 3.9 9.5 5.9 4.5 6.4
Disulfides 14 1.8 9.7 5.1 7.6 9.5
Aromatics 8 1.3 4.7 2.6 4.5 1.8
Thioamides 9 7.4 14.6 11.8 22.2 27.5
Dithiocarbamates 5 4.6 19.2 9.1 17.8 34.6
Thioesters 10 5.4 11.6 6.5 7.0 7.8
Thiocarbonates 7 4.6 11.8 5.6 15.6 8.2
Anions 7 5.3 5.7 5.3 7.2 3.5
Cations 7 11.3 19.2 18.0 23.0 11.0
Transition structs. 4 10.6 26.7 26.2 14.9 29.9
Complexes 18 13.0 41.6 15.0 17.5 19.6
Others 16 10.2 10.2 9.1 8.4 6.6

The latter point becomes apparent in Table 4, which highlights some of the systematic errors found with the different semiempirical methods. All methods except PDDG/PM3 have notable problems with homologation or branching or both. In particular, AM1 yields errors proportional to the number of methylene groups that results, for example, in a deviation of 14.6 kcal/mol for decanethiol. All the methods except PDDG/PM3 tend to overestimate the heats of formation of branched compounds. The problem is particularly large for both MNDO and MNDO/d; the latter method yields deviations of 31.6 kcal/mol for di-tert-butyl sulfide and 49.3 kcal/mol for di-tert-butyl sulfone. While a major part of the branching errors may be attributed to the hydrocarbon parameters, sulfur parameters must also play a role, because MNDO/d, while having the same parameters as MNDO for the first-row elements, tends to have larger branching errors. All of the methods, including PDDG/PM3, overestimate the heat of formation of sulfones to varying degrees. MNDO/d might be expected to be the best method for sulfones due to its inclusion of d functions; however, its advantages for small sulfones are minor and the branching problems for larger sulfones lead to large errors. PDDG/PM3 still overestimates the heat of formation of sulfones, but when all the sulfones in the dataset are considered, the mean signed error is +2.7 kcal/mol versus about +18 kcal/mol for AM1 and PM3.

Table 4.

ΔHf Results (kcal/mol) for Selected Molecules that Highlight Systematic Problems




Deviation (calc—exp)
Formula Name Exp PDDG PM3 AM1 MNDO MNDO/d
CH4S Methanethiol −5.4 0.2 −0.1 1.1 −1.9 1.1
C4H10S 1-Butanethiol −21.1 2.2 1.6 −3.1 −1.8 1.8
C10H22S 1-Decanethiol −50.7 −1.3 −1.3 −14.6 −0.5 3.1
C4H10S Isobutyl thiol −23.1 0.3 4.3 1.4 4.2 8.5
C5H12S Isopentyl thiol −27.4 −0.5 2.4 −1.1 4.4 8.0
C5H12S Neopentyl thiol −30.8 −0.7 7.7 7.0 15.3 20.4
C4H10S sec-Butyl thiol −23.0 1.3 4.4 1.0 2.9 7.3
C4H10S
tert-Butyl thiol
−26.0
1.0
8.1
7.3
10.9
14.9
C2H6S Dimethyl sulfide −8.9 −5.3 −2.1 −0.4 −8.2 0.4
C4H10S Diethyl sulfide −20.0 −0.7 2.8 −1.8 −9.0 0.5
C8H18S Diisobutyl sulfide −42.9 −2.4 5.7 −0.9 3.1 14.7
C6H14S Diisopropyl sulfide −33.9 −0.8 7.8 3.2 0.6 11.0
C8H18S
Di-tert-butyl sulfide
−45.1
−6.1
12.0
11.3
21.5
31.6
C2H6S2 Dimethyl disulfide −5.8 −3.5 1.0 1.6 −9.0 −0.7
C2H6S3 Dimethyl trisulfide −3.0 −2.0 −3.9 −1.7 −10.2 −2.4
C8H18S2 Dibutyl disulfide −37.9 −1.2 3.5 −6.8 −7.9 3.0
C8H18S2
Di-tert-butyl disulfide
−47.8
0.8
20.1
16.0
21.0
34.5
C2H6O2S Dimethyl sulfone −89.2 1.7 12.9 18.9 142.9 3.6
C4H10O2S Diethyl sulfone −102.6 9.6 21.7 21.4 143.2 5.6
C8H18O2S Dibutyl sulfone −121.8 7.7 19.5 13.3 143.6 7.1
C8H18O2S Diisobutyl sulfone −128.0 7.9 27.6 24.4 160.2 25.0
C8H18O2S Di-tert-butyl sulfone −130.6 8.0 38.2 41.8 178.9 49.3

The 18 complexes in the dataset are considered further in Table 5, which focuses on the computed enthalpies of complexation, i.e., the difference between the heats of formation of the complex and the separated molecules. PDDG/PM3 gives the lowest MAEs for neutral as well as positively and negatively charged complexes. The overall MAE of 3.8 kcal/mol from PDDG/PM3 shows large improvement over the alternative semiempirical methods. The improvements here are clearly not from better treatment of branching since the molecules are all small. Perhaps surprisingly, the original MNDO method gives the second best results for all classes; it does not show the serious overestimate of binding for positively-charged complexes like AM1 or PM3. Thus, its problems with the heats of formation of hypervalent molecules such as SO2 and dimethyl sulfoxide mostly cancel out when calculating interaction enthalpies.

Table 5.

Interaction Enthalpies of Sulfur-Containing Complexes.

Complex Ref. PDDG MNDO AM1 PM3 MNDO/d
NH3···SO2 −4.49a −4.26 −2.93 −13.37 −16.55 −2.74
MeNH2···SO2 −6.70a −3.41 −2.48 −10.63 −10.69 −1.94
Me2NH···SO2 −7.93a −3.42 −2.18 −10.00 −1.05 −1.53
NMe3···SO2 −10.65a −2.98 −1.46 −8.12 −0.89 −0.92
NH3···SO2···NH3 −3.00a −7.60 −5.43 −23.34 −2.31 −4.68
MeNH2···SO2···MeNH2 −4.28a −6.06 −4.60 −18.25 −1.87 −3.83
CH3SH···F −38.04b −34.73 −60.55 −103.16 −65.37 −61.76
CH3F···HS −12.54b −9.78 −5.35 −7.77 −5.37 −4.30
CH3SH···H2Of −1.16c −3.56 −0.52 −1.90 −1.02 −0.57
CH3SH···H2Og −2.21c −3.42 −0.62 −2.37 −2.43 −0.51
CH3SCH3···H2O −2.88c −4.32 −0.69 −2.44 −2.45 −0.25
DMSO···H2O −6.56c −8.15 −2.20 −5.61 −6.17 −1.90
EtSH···CH3S −5.40d −12.39 −11.45 −9.90 −14.37 −3.32
NH3—HS+ −111.45e −104.00 −101.22 −138.30 −141.19 −70.1
H2O—HS+ −72.51e −64.71 −69.00 −92.11 −89.45 −42.4
H2S—HS+ −93.36e −93.88 −84.32 −124.81 −122.12 −86.61
HF—HS+ −33.17e −24.90 −48.45 −102.27 −64.40 −39.39
HCl—HS+
−60.87e
−58.60
−49.50
−84.69
−90.34
−36.89
MAE (negative or neutral) 3.21 5.23 9.88 6.19 5.26
MAE (positive) 5.26 9.89 34.16 27.23 21.68
MAE (all, kcal/mol) 3.78 6.52 16.62 12.03 9.82
a

MP3/6−31+G(2d,p) values at 298 K.21

b

CCSD(T)/aug-cc-pVTZ ZPE corrected with MP2 frequencies.22

c

MP2(fc)/6−31++G(2d(X+),p) ZPE corrected.23

d

MP2/6−31+G** ZPE corrected.24

e

G2 values at 0 K.25

f

Water as hydrogen bond acceptor.

g

Water as hydrogen bond donor.

A new method based on AM1, called AM1*, has been reported recently.26 This method includes d-orbitals for phosphorus, sulfur, and chlorine as well as a modified core-repulsion function with two-element parameters. The standard AM1 parameters and formalism are used for hydrogen and the first-row elements. The intersection of the dataset used in the AM1* study with the dataset used here has 82 sulfur-containing molecules and results in the following MAEs in kcal/mol: MNDO/d (6.0), PM3 (7.2), PDDG/PM3 (7.3), AM1* (8.0), AM1 (10.1), and B3LYP/6−311+G(2d,p)//B3LYP/6−31+G(d) (11.8). The dataset contains few large or branched molecules, so MNDO/d and PM3 perform relatively well. There is no clear benefit for the AM1* methodology, while the B3LYP results are the poorest. As discussed previously,17,18 PDDG/PM3 consistently outperforms the much slower DFT methods for calculating heats of formation owing, presumably, to some systematic errors with the DFT methods.

The lower errors for heats of formation with PDDG/PM3 carry over to lower errors for isomerization enthalpies, as illustrated for several series in Table 6. PDDG/PM3 somewhat overestimates the stability of sulfides relative to thiols, but does particularly well for dithiols relative to disulfides and for branching isomers.

Table 6.

ΔHf Results (kcal/mol) for Some Isomeric Series

Expt. PDDG MNDO AM1 PM3 MNDO/d
CH3CH2SCH3 −14.2 −17.4 −23.0 −15.6 −14.1 −14.0
CH3CH2CH2SH −16.2 −15.3 −18.2 −17.4 −14.1 −14.7
(CH3)2CHSH −18.2 −17.1 −16.2 −16.0 −14.4 −12.5
HSCH2CH2CH2SH −7.1 −6.2 −11.0 −9.8 −3.8 −4.0
HSCH2CHSHCH3 −7.1 −7.1 −8.3 −9.5 −5.0 −0.4
CH3CH2SSCH3 −11.8 −13.7 −20.6 −10.7 −8.8 −11.5
CH3CH2SCH2CH3 −20.0 −20.7 −29.0 −21.8 −17.2 −19.5
CH3CH2CH2CH2SH −21.1 −18.9 −22.9 −24.2 −19.5 −19.4
(CH3)2CHSCH3 −21.4 −24.8 −25.3 −20.1 −18.4 −15.9
CH3CH2CHSHCH3 −23.0 −21.7 −20.0 −21.0 −18.6 −15.7
(CH3)2CHCH2SH −23.1 −22.8 −18.9 −21.6 −18.8 −14.6
(CH3)3CSH −26.0 −25.0 −15.1 −18.7 −17.9 −11.1
HSCH2CH2CH2CH2SH −12.0 −11.5 −16.0 −17.1 −9.7 −8.9
CH3CH2SSCH2CH3 −17.9 −18.0 −26.5 −17.3 −12.7 −16.4
(CH3)2CHSSCH3 −18.9 −19.5 −23.4 −15.2 −11.9 −13.6

Silicon Thermochemistry

Table 7 summarizes the results for heats of formation for various classes of silicon-containing compounds. Considering all 144 molecules in the dataset, MNDO/d provides the smallest MAE, and the differences between PM3, AM1, MNDO/d, and PDDG/PM3 are small. There are some classes of compounds for which MNDO/d is particularly advantageous, specifically, divalent silicon molecules, ions, and complexes. PDDG/PM3 gives results that are generally similar to PM3 or AM1 for most classes, though it does particularly well for silyl ethers. All of the methods have difficulties with the compounds containing halogens and silicon; however, most of these are very small polyhalosilanes.

Table 7.

Mean Absolute Errors for Heats of Formation for Silicon Compounds (kcal/mol).

N PDDG MNDO AM1 PM3 MNDO/d
All 144 11.9 16.2 11.7 11.1 9.4
Training 56 10.3 13.0 9.1 9.8 7.4
Test 88 12.9 18.2 13.3 11.9 10.7
CHSi 40 7.6 8.2 7.5 6.4 8.3
CHONSi 5 4.4 7.2 4.5 13.5 2.7
Halides 31 15.2 26.5 13.5 9.8 15.0
Ethers 9 6.9 11.1 7.8 8.6 9.6
Anions 4 17.0 21.7 14.4 9.2 6.7
Cations 10 13.8 21.0 8.5 24.6 7.3
Radicals 14 11.9 14.1 16.8 9.5 7.8
Complexes 9 21.7 14.0 20.6 28.5 11.9
Divalent 11 14.6 19.0 12.1 12.2 5.2
Others 9 13.0 14.3 11.8 6.3 6.8

Phosphorus Thermochemistry

Reliable thermochemical data for organophosphorus compounds has historically been scarce,27 which makes the development of semiempirical methods difficult. The present dataset consists of only 43 molecules, taken mostly from the paper reporting the parameterization of MNDO/d.5 The dataset is both too small and poorly representative of key functional groups containing phosphorus. Thus, the parameterizations should be considered tentative and used with caution. The summary of results in Table 8 should be viewed in this light. PDDG/PM3 gives a similar MAE for heats of formation as AM1 and shows a 17% smaller error than PM3. The best method for this dataset is MNDO/d, with a MAE of 12.7 kcal/mol. Since the dataset contains mostly small molecules, the advantage of d-orbitals apparently outweighs the homologation and branching problems of MNDO. The MAE for phosphorus halides increases for PDDG/PM3 when compared to PM3, as for sulfur and silicon halides. This trend suggests that the problem may be that the halogen parameters. Though they give significantly improved results for CHNOX molecules, where X is halogen, the halogen parameters are not optimal for molecules with second-row atoms.

Table 8.

Mean Absolute Errors for Heats of Formation for Phosphorus Compounds (kcal/mol)

N PDDG MNDO AM1 PM3 MNDO/d
All 43 17.9 44.8 18.2 21.5 12.7
Training 21 11.3 47.7 14.5 18.8 9.0
Test 22 24.1 42.1 21.7 24.0 16.3
Low valence 1 9.0 24.8 36.7 27.6 7.9
Normal valence 25 11.0 20.9 11.6 16.3 9.4
Hypervalent 14 31.0 94.6 28.3 30.9 18.5
Halides 8 20.0 56.5 20.8 14.7 14.4
Radicals 3 16.8 18.4 19.3 19.1 15.8
Phosphines 4 10.9 14.9 8.8 7.0 4.8
Phosphates 3 34.6 80.3 6.3 34.4 15.3
Phosphites 4 9.4 54.7 15.6 42.5 9.2

For comparison with the new AM1* method, the results for the intersection of the AM1* dataset for phosphorus and the dataset used in this work were analyzed. For the 33 common molecules, the MAEs (kcal/mol) are: MNDO/d (8.6), AM1 (13.1), PDDG/PM3 (13.8); PM3 (17.4); AM1* (17.6), and B3LYP/6−311+G(2d,p)//B3LYP/6−31+G(d) (12.6). MNDO/d again gives the best result, PDDG/PM3, AM1, and the DFT method give similar MAEs near 13 kcal/mol, and AM1* and PM3 yield MAEs of 17−18 kcal/mol.

Another recent development is the optimization by Lopez and York of the AM1/d method to treat nucleophilic attack on biological phosphates.28 The method was optimized using a set of B3LYP reference values for phosphates, phosphoranes, and transition structures. Because of the different scope of the AM1/d and the PDDG/PM3 parameterizations and the lack of overlap between the data sets used, making a fair comparison is difficult; it is to be expected that the AM1/d method will perform significantly better for the types of systems for which it was parameterized, whereas PDDG/PM3 has a wider scope in terms of functional groups included in the parameterization.

Other Observables

The parameterization of the PDDG/PM3 method has focused on improving heats of formation and, therefore, enthalpies of reaction in general, while retaining accuracy comparable to prior NDDO-based methods for other observables such as geometries, ionization potentials, and dipole moments. Table 9 shows the MAEs for these properties, while the complete results are again reported in the Supplementary Material. The performance of the various methods is similar overall with no striking problems.

Table 9.

Mean Absolute Errors for Ionization Potentials, Dipole Moments, and Geometries.

Element N PDDG MNDO AM1 PM3
IP (eV) S 32 0.66 0.71 0.48 0.35
Si 15 1.08 0.86 0.63 1.21

P
13
0.42
1.27
0.66
0.61
Dipole (D) S 28 0.54 0.44 0.43 0.46
Si 26 0.92 0.84 0.45 0.72

P
12
0.77
0.83
0.79
0.58
Bond length (Å) S 78 0.06 0.13 0.07 0.08
Si 96 0.09 0.13 0.06 0.11

P
16
0.12
0.08
0.08
0.06
Angle (deg) S 54 5.50 9.46 6.74 7.42
Si 33 3.25 2.47 2.39 2.75
P 8 3.36 3.39 3.76 1.91

Summary

The parameterization of PDDG/PM3 for sulfur showed the benefits of a global optimization over simpler local optimizations and the value of the availability of a large amount of experimental data for training and testing. Through global optimization, a method was obtained that yields MAEs for heats of formation that are 40% smaller than from the alternatives including semiempirical methods that employ d-orbitals and DFT at the B3LYP/6−311+G(2d,p)//B3LYP/6−31+G(d) level. Extension to silicon and phosphorus was also carried out, though it was hampered by limited experimental data.

PDDG/PM3 can now be used to treat molecules with any combination of the elements C, H, N, O, F, Cl, Br, I, S, Si, and P. Table 10 summarizes the results for all molecules, ions and complexes that have been treated so far. The PDDG/PM3 method is the most accurate semiempirical method available for calculating heats of formation with a MAE of 6.5 kcal/mol for the 1480 molecules in our full dataset; PM3 is the next best with an MAE of 8.7 kcal/mol. Particularly striking improvements have been obtained for molecules containing C, H, N, O, S and the halogens. Moreover, the PDDG/PM3 method can be easily implemented into existing software such as MOPAC 6.29

Table 10.

Mean Absolute Errors for Heats of Formation (kcal/mol) for All Molecules Studied.

Standard NDDO Methods
PDDG
N MNDO AM1 PM3 MNDO/d MNDO PM3
CHNOa 622 8.4 6.7 4.4 8.4 5.2 3.2
Halogensb 422 14.0 11.1 8.1 13.4 6.6 5.6
S 249 41.2 10.6 10.5 10.0 6.4
Si 144 16.2 11.7 11.1 9.4 11.9
P 43 44.8 18.2 21.5 12.7 17.9
All 1480 19.8 10.8 8.7 10.3 6.5
a

Reference 17.

b

Reference 18.

Supplementary Material

si20050211_120

Acknowledgment

Gratitude is expressed to the National Science Foundation (CHE-013996) and to the National Institutes of Health (GM032136) for support of this work.

Footnotes

Supporting Information Available:

An Excel table with the experimental and calculated heats of formation, ionization potentials, and dipole moments for the molecules in this study. This material is available free of charge via the Internet at http://pubs.acs.org.

References

  • 1.Pople JA, Beveridge DL. Approximate Molecular Orbital Theory. McGraw-Hill; New York: 1970. [Google Scholar]
  • 2.Dewar MJS, Thiel W. J. Am. Chem. Soc. 1977;99:4899–4917. [Google Scholar]
  • 3.Dewar MJS, Zoebisch EG, Healy EF, Stewart JJP. J. Am. Chem. Soc. 1985;107:3902–3909. [Google Scholar]
  • 4.Stewart JJP. J. Comput. Chem. 1989;10:221–264. [Google Scholar]
  • 5.Thiel W, Voityuk AA. J. Phys. Chem. 1996;100:616–626. [Google Scholar]
  • 6.Gogonea V, Suarez D, Van der Vaart A, Merz KM., Jr. Curr. Opin. Struct. Biol. 2001;11:217–223. doi: 10.1016/s0959-440x(00)00193-7. [DOI] [PubMed] [Google Scholar]
  • 7.Acevedo O, Jorgensen WL. Org. Lett. 2004;6:2881–2884. doi: 10.1021/ol049121k. [DOI] [PubMed] [Google Scholar]
  • 8.Vayner G, Houk KN, Jorgensen WL, Brauman JI. J. Am. Chem. Soc. 2004;126:9054–9058. doi: 10.1021/ja049070m. [DOI] [PubMed] [Google Scholar]
  • 9.Guimarães CRW, Repasky M,P, Chandrasekhar J, Tirado-Rives J, Jorgensen WL. J. Am. Chem. Soc. 2003;125:6892–6899. doi: 10.1021/ja021424r. [DOI] [PubMed] [Google Scholar]
  • 10.Guimarães CRW, Udier-Blagović M, Jorgensen WL. J. Am. Chem. Soc. 2005;127:3577–3588. doi: 10.1021/ja043905b. [DOI] [PubMed] [Google Scholar]
  • 11.Chandrasekhar J, Shariffskul S, Jorgensen WL. J. Phys. Chem. B. 2002;106:8078–8085. [Google Scholar]
  • 12.Ruiz-Pernia JJ, Silla E, Tunon I, Marti S, Moliner V. J. Phys. Chem. B. 2004;108:8427–8433. [Google Scholar]
  • 13.Ranaghan K,E, Mulholland A,J. Chem. Comm. 2004:1238–1239. doi: 10.1039/b402388a. [DOI] [PubMed] [Google Scholar]
  • 14.Devi-Kesavan LS, Garcia-Viloca M, Gao J. Theor. Chem. Acc. 2003;109:133–139. [Google Scholar]
  • 15.Devi-Kesavan LS, Gao J. J. Am. Chem. Soc. 2003;125:1532–1540. doi: 10.1021/ja026955u. [DOI] [PubMed] [Google Scholar]
  • 16.Kaminski GA, Jorgensen WL. J. Phys. Chem. B. 1998;102:1787–1796. [Google Scholar]
  • 17.Repasky MP, Chandrasekhar J, Jorgensen WL. J. Comput. Chem. 2002;23:1601–1622. doi: 10.1002/jcc.10162. [DOI] [PubMed] [Google Scholar]
  • 18.Tubert-Brohman I, Guimarães CRW, Repasky MP, Jorgensen WL. J. Comput. Chem. 2004;25:138–150. doi: 10.1002/jcc.10356. [DOI] [PubMed] [Google Scholar]
  • 19.Stewart JJP. J. Comput. Aid. Mol. Des. 1990;4:1–105. doi: 10.1007/BF00128336. [DOI] [PubMed] [Google Scholar]
  • 20.Winget P, Clark T. J. Comput. Chem. 2004;25:725–733. doi: 10.1002/jcc.10398. [DOI] [PubMed] [Google Scholar]
  • 21.Wong MW, Wiberg KB. J. Am. Chem. Soc. 1992;114:7527–7535. [Google Scholar]
  • 22.Gonzales JM, Cox RS, III, Brown ST, Allen WD, Schaefer HF., III J. Phys. Chem. 2001;105:11327. [Google Scholar]
  • 23.Rablen PR, Lockman JW, Jorgensen WL. J. Phys. Chem. A. 1998;102:3782–3797. [Google Scholar]
  • 24.Gronert S, Lee JM. J. Org. Chem. 1995;60:6731–6736. [Google Scholar]
  • 25.Solling TI, Radom L. Chem. Eur. J. 2001;7:1516–1524. doi: 10.1002/1521-3765(20010401)7:7<1516::aid-chem1516>3.0.co;2-q. [DOI] [PubMed] [Google Scholar]
  • 26.Winget P, Horn AHC, Selcuki C, Martin B, Clark T. J. Mol. Model. 2003;9:408–414. doi: 10.1007/s00894-003-0156-7. [DOI] [PubMed] [Google Scholar]
  • 27.Pilcher G. In: The Chemistry of Organophosphorus Compounds. Hartley FR, editor. Vol. 1. John Wiley & Sons; New York: 1990. pp. 127–136. [Google Scholar]
  • 28.Lopez X, York DM. Theor. Chem. Acc. 2003;109:149–159. [Google Scholar]
  • 29.Directions, code, and parameter files for implementation of the PDDG methods in MOPAC are freely available for download at http://www.jorgensenresearch.com. The PDDG methods are also available in the BOSS program; see http://www.cemcomco.com.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

si20050211_120

RESOURCES