Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2022 Mar 18;23(8):e202200061. doi: 10.1002/cphc.202200061

Uncertainty Quantification of Reactivity Scales**

Jonny Proppe 1,2,, Johannes Kircher 1
PMCID: PMC9314972  PMID: 35189024

Abstract

According to Mayr, polar organic synthesis can be rationalized by a simple empirical relationship linking bimolecular rate constants to as few as three reactivity parameters. Here, we propose an extension to Mayr's reactivity method that is rooted in uncertainty quantification and transforms the reactivity parameters into probability distributions. Through uncertainty propagation, these distributions can be transformed into uncertainty estimates for bimolecular rate constants. Chemists can exploit these virtual error bars to enhance synthesis planning and to decrease the ambiguity of conclusions drawn from experimental data. We demonstrate the above at the example of the reference data set released by Mayr and co‐workers [J. Am. Chem. Soc. 2001, 123, 9500; J. Am. Chem. Soc. 2012, 134, 13902]. As by‐product of the new approach, we obtain revised reactivity parameters for 36 π‐nucleophiles and 32 benzhydrylium ions.

Keywords: carbocations, kinetics, nucleophilic addition, reactivity scales, uncertainty quantification


Reliable uncertainty! We propose an extension to Mayr's reactivity scale method that is rooted in uncertainty quantification. It yields reliable uncertainty estimates for bimolecular rate constants. Chemists can exploit these virtual error bars to enhance synthesis planning and to decrease the ambiguity of conclusions drawn from experimental data (image by Prof. Ricardo A. Mata).

graphic file with name CPHC-23-0-g007.jpg

1. Introduction

Polar organic reactions are ubiquitous in Nature and the chemical industry. Synthesis planning involving reactions of this kind relies on two fundamental questions (among others): whether nucleophilic attack takes place on a relevant time scale, and whether this time scale interferes with that of another reaction in which either the same nucleophile or the same electrophile participates. The answers to both questions revolve around the quantification of reaction rates – absolute ones in the former case, relative ones in the latter case. For instance, in iminium‐activated reactions, [1] it is important that the nucleophile is strong enough (in absolute terms) to attack the intermediate iminium ion but also weak enough (in relative terms) not to react with the precursor carbonyl compound.

Herbert Mayr and co‐workers provided unambiguous evidence that a simple empirical relationship, known as the Mayr–Patz equation (MPE), [2] addresses scenarios of this kind reliably, [3]

logkexplogkMPE=sN(N+E) (1)

We define logklog10k2(20C) for the sake of brevity. Here, the decadic logarithm of the bimolecular rate constant measured at 20 °C (logkexp ) is approximated as the sum of two reactivity parameters (nucleophilicity N and electrophilicity E), multiplied by a nucleophile‐specific sensitivity factor (s N). The MPE allows for semi‐quantitative predictions of bimolecular rate constants in a remarkable range of about -5<logk<8 . The philicities (N and E) of the species involved in reactions verifying this relationship cover a range of 30–40 orders of magnitude, which can be considered a unique achievement given that the accuracy of k MPE is within a factor of 10 to 100. On the basis of these results, Mayr formulated an uncertainty principle of organic reactivity: the accuracy of k MPE and chemical diversity cannot be maximized at the same time. Even though higher accuracy can be reached if one considers a narrower range of chemical species, the small errors in k MPE appear impressive given the diversity of Mayr's reactivity database,[ 4 , 5 ] which currently comprises reactivity parameters for 1251 nucleophiles and 345 electrophiles.

In this work, we introduce uncertainty quantification (UQ) into Mayr's reactivity approach. This combined approach, which we made openly available, [6] enables users to perform virtual measurements of logk , which are reported as expectation±deviation – just like physical measurements. Usually, virtual measurement uncertainty (or prediction uncertainty) is significantly larger than physical measurement uncertainty, which can be attributed to a more comprehensive list of uncertainty components including parameter uncertainty, model discrepancy/inadequacy, and numerical noise.[ 7 , 8 , 9 ] A key feature of our UQ approach is the transformation of reactivity parameters into probability distributions, which can be transformed – via uncertainty propagation – into probability distributions of logkMPE . We argue that quantitative knowledge of uncertainty in k MPE enhances the already powerful reactivity approach by Mayr, for three reasons.

First, virtual measurements of logk (expectation ± deviation) represent testable statistical hypotheses. That is, one can quantify an x% confidence interval of logkMPE and count how often logkexp is located within that interval (ideally x%). According to the Guide to the Expression of Uncertainty in Measurement, [10] it is recommended to express uncertainty as a 95 % confidence interval. This recommendation is supported by the community.[ 11 , 12 , 13 ] Such reporting standards help to identify shortcomings, thereby increasing the overall reliability of Mayr's reactivity approach and guiding the search for new research directions (e. g., proposing measurements of yet unobserved reactions).

Second, in synthesis planning, where subtle reactivity differences may matter, our UQ‐based approach can support the decision‐making process. The larger the overlap of two logkMPE distributions corresponding to competing reactions, the less certain one can discriminate between the two, which makes it more difficult to predict selectivity. The more the overlap tends toward zero (one), the more (less) certain it is that one can predict the relative species flux through competing reaction channels.

Third, since rate constant uncertainty can also be quantified for reactions that have yet to be observed, new opportunities arise for benchmarking computational chemistry methods.[ 14 , 15 ] Even if the experimental benchmark for a reaction of interest is not yet available – which, so far, severely constrained the domain of application for theoreticians – our UQ approach still enables benchmarking, but under uncertainty. This way, the diversity of benchmark sets can be increased remarkably, which we anticipate to accelerate method developments in theoretical and computational chemistry.

To explore the potential of UQ for chemical research, we build upon previous work by Proppe and colleagues,[ 15 , 16 ] addressing Mössbauer spectroscopy,[ 9 , 17 ] dispersion corrections to density functional theory,[ 18 , 19 ] reaction kinetics,[ 20 , 21 ] acid‐base equilibria, [22] and exchange spin coupling. [23] This foundation will support our endeavor to pave the way for a novel approach to determining reactivity parameters with steadily increasing accuracy. For demonstration purposes, we selected more than 200 reactions of the two reference data sets published by Mayr and co‐workers,[ 24 , 25 ] which cover a wide range of logk values (−3.6 to +8.0 ).

1.1. Optimization of Reactivity Parameters

We employed the following objective function for optimizing reactivity parameters,

Δ2=r=1Rwr·δrlogk2 (2)
δrlogk=logkexp,r-logkMPE,r (3)

Here, δrlogk and wr are the residual and the weight of the rth reaction (R reactions in total), respectively. We employed the basin‐hopping algorithm by Wales and Doye [26] as implemented in SciPy 1.5.0 [27] for minimizing the objective function. It is a global optimization algorithm suited for multivariate non‐convex problems. We used the default settings of the basin‐hopping algorithm except for the argument, which we set from 100 to 1 as preliminary tests suggested that a single iteration is sufficient to find optimal reactivity parameters (see Section S1 of the Supporting Information for more details). In the original optimization studies,[ 24 , 25 ] a special case of this objective function was employed, where all weights are uniformly distributed, i. e., wr =w r for all possible values of r and r‘≠r. This special case may lead to less‐than‐optimal results in view of the practitioner's main interest in the reactivity scale approach – the quantitative prediction of (absolute or relative) reaction rates. In 2001, Mayr and co‐workers wrote: [24] “Imagine the case that a reaction series, investigated for the elucidation of the reactivity parameters of a structurally unique reagent, matches Eq. 1 only moderately. One would then have to decide whether the benefit of obtaining the new reactivity parameter compensates for the deterioration of the quality of the overall correlation, which is associated with the incorporation of a poorly matching reaction series. An unambiguous decision would often be impossible!” To avoid ambiguity, we argue in favor of a procedure that allows us to include all reaction data but weights them depending on their individual quality. This procedure, which we coin discrepancy weighting, assigns an importance (a value between 0 and 1) to each species depending on how well its associated reaction series matches with experimental data (There exist approaches similar to discrepancy weighting, such as the iteratively reweighted least squares method[ 9 , 13 ] or the worst offender algorithm [28] ). These species‐specific weights are then combined to yield the reaction‐specific weights wr of Eq. 2 (see Section 2.4). We point the reader to Appendix A for a full derivation of those weights. Eventually, they can be utilized to determine the uncertainty of reactivity parameters and, as a consequence, of logkMPE on the basis of Bayesian bootstrapping [29] (see Appendix B). The full optimization workflow is summarized in Figure 1.

Figure 1.

Figure 1

Flowchart illustrating our approach to optimizing reactivity parameters. Version labels (in blue rhomboid boxes) represent a hierarchy of distinct parametrizations.

1.2. Quantification of Uncertainty in log k MPE

We define the model error as the root‐mean‐square error of the residuals,

RMSEϵ=R-1r=1Rδrlogk2=μ2+σ2 (4)

It is equivalent to the definition of logσ by Mayr and co‐workers (see Footnote 58 in Ref. [24]). In the case of uniform weights, wr=R-1 for all r=1,...,R , the squared model error, ϵ 2, equals Δ2. The model error combines information on both the model bias (μ) and model dispersion (σ). The model bias or mean error (ME) represents the centroid of the residuals and is an estimate of the overall systematic error in logkMPE ,

MEμ=R-1r=1Rδrlogk (5)

The model dispersion represents the scatter of the residuals and is reflected by the root‐mean‐square deviation,

RMSDσ=R-1r=1Rδrlogk-μ2 (6)

Under the assumption of normally distributed residuals (see Section S3 and Figure S3 of the Supporting Information for validation results), model dispersion represents the model's contribution to prediction uncertainty, i. e., the uncertainty in logkMPE . The second contribution to prediction uncertainty is parameter uncertainty, which can be estimated from the ensemble of bootstrap samples generated in the course of our optimization workflow. Since each bootstrap sample (B in total) yields slightly different reactivity parameters, we obtain an empirical distribution for each parameter. Uncertainty propagation is straightforward. For a given reaction, each bootstrap sample yields a slightly different logkMPE value, leading again to an empirical distribution. We define the parameter‐related uncertainty in logkMPE of the rth reaction as

βr=B-1b=1BlogkMPE,rb-B-1b'=1BlogkMPE,rb'2 (7)

Assuming normally distributed variables and independence of the two uncertainty contributions, [30] the prediction uncertainty (95 % confidence) corresponding to the rth reaction can be estimated as

U.95,r=1.96·Ur=1.96·σ2+βr2 (8)

1.3. Data Selection

All 304 reactions (in dichloromethane) of the 2001 and 2012 studies were considered.[ 24 , 25 ] This pool (Figures 2 and 3, Table 1) encompasses 33 benzhydrylium ions (electrophiles) and 45 π‐nucleophiles, and covers a wide range of logkexp values (−3.6 to +9.2 ). The two anchor species are E15 (E=0.00 ) and N7 (sN=1.00 ); [25] their parameters E and s N, respectively, were kept fixed throughout. We excluded those reactions from the optimization procedure (30 in total) for which logkexp>8 as the MPE (Eq. 1) loses its validity in that regime (diffusion limit). We also excluded those reactions from the optimization procedure (47 in total) that were not measured according to the standard protocol: measurement at 20 °C plus least‐squares fit of absorbance data to a single exponential. While we do not doubt the quality of these 47 data points, we still neglect them in this study as we attempt to remove potential sources of bias to draw conclusions from our UQ analysis that are as unambiguous as possible.

Figure 2.

Figure 2

The 2001/12 reference set of electrophiles (benzhydrylium ions). Substituents Y and Z for all electrophiles collectively addressed as E x (x= 511, 1533) are specified in Table 1. The 2022 reference set comprises the same systems except for species E33.

Figure 3.

Figure 3

The 2001/12 reference set of nucleophiles (π ‐systems). The 2022 reference set comprises the same systems except for species N6, N19, N33, N36, N37, N39, N41, N44, and N45.

Table 1.

Specification of substituents for benzhydrylium ions of the reference set (Figure 2).

Y

Z

Y

Z

E5

4‐(N‐pyrrolidino)

Y

E21

4‐Me

H

E6

4‐N(Me)2

Y

E22

4‐F

Y

E7

4‐N(Me)(Ph)

Y

E23

4‐F

H

E8

4‐(N‐morpholino)

Y

E24

3‐F, 4‐Me

Y

E9

4‐N(Ph)2

Y

E25

H

Y

E10

4‐N(Me)(CH2CF3)

Y

E26

4‐Cl

Y

E11

4‐N(Ph)(CH2CF3)

Y

E27

3‐F

H

E15

4‐MeO

Y

E28

4‐(CF3)

H

E16

4‐MeO

4‐PhO

E29

3,5‐F2

H

E17

4‐MeO

4‐Me

E30

3‐F

Y

E18

4‐MeO

H

E31

3,5‐F2

3‐F

E19

4‐PhO

H

E32

4‐(CF3)

Y

E20

4‐Me

Y

E33

3,5‐F2

Y

Since we do not know the true values of the experimental rate constants, we rely on an overdetermined system (more equations than unknowns). Therefore, we introduced and applied the 2E3N rule: First, every non‐anchor electrophile (single free parameter, E) needs to participate in at least two observed reactions. Second, every non‐anchor nucleophile (two free parameters, s N and N) needs to participate in at least three observed reactions. Third, the two anchor species (E15: no free parameters; N7: single free parameter, N) need to participate in at least one (E15) or two (N7) observed reactions. In addition to the 2E3N rule, we required a fully connected network of reactions such that one can traverse from any node (species) to every other node through the edges that represent experimental reaction data (cf. Figure S1). In the 2012 study, [25] the reactivity parameters of several known [24] non‐anchor species (N1N3, E1E13, E16E20) were kept fixed. As the 2E3N rule does not apply to non‐anchor species with fixed parameters, all systematic errors they embrace will propagate through the reaction network. Reliable UQ, however, requires the elimination of all recognizable sources of systematic error. [10] Therefore, we relaxed all fixed reactivity parameters of non‐anchor species, which increased the number of species violating the 2E3N rule.

Applying parameter relaxation and the exclusion criteria mentioned above (logkexp>8 , non‐standard protocol, violation of 2E3N rule, isolated subnetworks) left us with 212 valid reactions shared among 32 electrophiles and 36 nucleophiles (Figure S1). This set of reactions represents 102 free reactivity parameters, which were optimized as per Figure 1 (Section 2.2). For 30 of the 212 valid reactions, we extracted detailed experimental data from the supplementary material of the 2001/12 studies to quantify measurement uncertainty (Section 2.3). For each reaction, there exists a series of observed rate constants, k obs (ordinate), measured with respect to different excess nucleophile concentrations, N (abscissa). The slope of a linear regression model, fkN , represents the bimolecular rate constant k 2,

kobsfkN=k2N+constant (9)

Here, we applied Bayesian linear regression [31] as implemented in Scikit‐learn 0.23.1 [32] to obtain uncertainty estimates of the regression coefficients (Uncertainty estimates of this kind can also be obtained through ordinary least‐squares regression [33] ). The uncertainty associated with the slope (k 2) represents the experimental standard deviation of the mean, [10] which is the accepted definition of measurement uncertainty. See Table S2 for experimentally derived values of k 2 and associated uncertainty estimates.

2. Results and Discussion

The structure of this section is reflected by the following roadmap:

Reproduction of the 2012 results. [25]

Application of the data selection criteria introduced in Section 2.3 and re‐optimization (uniform weighting), yielding a new set of reactivity parameters referred to as version 1.1.

Quantification and assessment of measurement uncertainty.

Re‐optimization of reactivity parameters (version 1.2) based on non‐uniform weights determined through discrepancy weighting.

Estimation of empirical parameter distributions via discrepancy‐weighted bootstrapping, building the newest set of reactivity parameters (version 2.0).

Quantification and assessment of prediction uncertainty in logkMPE.

2.1. Reproduction of the 2012 Parametrization

To validate our optimization procedure, we attempted to reproduce the results of the 2012 parametrization study. [25] Both the model error ϵ (Eq. 4) and model dispersion σ (Eq. 6) equal 0.13 and are 0.8 % smaller than the model error determined in 2012 (see also Table S1 for summary statistics). We find that the absolute difference of 0.17 in the nucleophilicity parameter (N) for N5 constitutes, by far, the largest deviation. When excluding this nucleophile from the optimization procedure, we still obtain a model error/dispersion of 0.13, but a decreased model error difference of 0.3 % (with the 2022 error being smaller). The largest absolute difference in reactivity parameters that remains equals 0.02. This difference cannot be explained by the truncation of reactivity parameter values (after the second decimal) reported in the original article, [25] which we used for this reproduction test. It is possible that the removal of N5 causes this remaining difference since all reactivity parameters are coupled to each other through the objective function (Eq. 2).

The remaining deviation or a fraction thereof could possibly also be traced back to differences in the optimization algorithms. Mayr and co‐workers used proprietary software and, hence, no detailed algorithmic information on the nonlinear optimizer is available. We can, however, estimate the magnitude of numerical noise that emerges from the customized settings of the basin‐hopping optimizer. In Section S1 of the Supporting Information, we show results that support the hypothesis that numerical errors are not the origin of the remaining difference.

We conclude that we can approximately, but not exactly, reproduce the 2012 results, which we cannot fully resolve. In particular, the disagreement caused by N5 requires further investigation. Currently, we have no other explanation than a technical problem related to the optimizer employed in the 2012 study, or a typo that was either reported in the 2012 paper or applied in the 2012 optimization procedure.

2.2. Revised Reactivity Parameters, Version 1.1: The Effect of Data Cleaning

We defined several data selection criteria (cf. Section 1.3), which led to a decrease of the number of reference electrophiles and reference nucleophiles for the sake of consistency. Furthermore, the previously fixed parameters of some non‐anchor species [25] were relaxed. These changes affect the reactivity parameters of the reference species, which play a crucial role as they constitute the basis of determining reactivity parameters for any non‐reference (i. e., new) species. Given that many publications refer to the original reference parameters of the combined 2001/12 study, changing them can be considered a critical issue. See Section S2 of the Supporting Information for a detailed analysis on how each criterion affects the optimization outcome.

In Table 2, we report the reactivity parameters of the 2022 reference set, where version 1.0 refers to the original parameters by Mayr and co‐workers, and version 1.1 refers to the parameters of case 4. The sensitivity parameter s N generally decreases, but increases especially for nucleophiles that already exhibited above‐average sensitivity values. This behavior is observed, e. g., for N14N16, N20, and N21, the five least reactive nucleophiles of the reference set when sorting by sNN . The sign of all nucleophilicity parameters N is preserved but their magnitudes significantly increase in almost all cases. This increase is compensated by the increase (decrease) in s N for nucleophiles with positive (negative) values of N. The large change in the nucleophilicity parameter N of N5 (causing a change of more than one order of magnitude in k 2) appears coherent with the findings of the previous subsection. On average, the nucleophilicity parameter N changes by as much as 0.72 units and mostly toward larger values, which is compensated by changes in the electrophilicity parameter E toward consistently smaller values, with an average change of 0.51 units. The model error with respect to the new 2022 reference set decreases by 19 % (from 0.11 to 0.09) when employing reactivity parameters of version 1.1 compared to version 1.0.

Table 2.

Updated reactivity parameters (2.0) for reference nucleophiles and reference electrophiles. Each value represents the first moment (mean) of the associated empirical parameter distribution obtained through discrepancy‐weighted bootstrapping. We also report the original values (1.0),[ 24 , 25 ] those obtained by relaxing all fixed parameters corresponding to non‐anchor species (1.1), and those obtained by relaxation plus discrepancy weighting (1.2). The sN value (all versions) of the anchor nucleophile N7 is printed in italics as it was kept fixed during optimization. The anchor electrophile E15 is not shown as its electrophilicity parameter E=0.00 was kept fixed during optimization. Nucleophiles and electrophiles that have been sorted out according to the criteria outlined in Section 1.3 (i.e., N6, N19, N33, N36, N37, N39, N41, N44, N45, E33) are also not shown. RMSE(1.0) and RMSE(2.0) refer to the root‐mean‐square error with respect to versions 1.0 and 2.0, respectively. See Table S1 for the corresponding model errors and related statistics.

sN(1.0)
sN(1.1)
sN(1.2)
sN(2.0)
N(1.0)
N(1.1)
N(1.2)
N(2.0)

E(1.0)
E(1.1)
E(1.2)
E(2.0)

N1

0.98

0.84

0.87

0.87

9.00

10.13

9.84

9.84

E1

−10.04

−11.23

−10.87

−10.87

N2

0.93

0.84

0.86

0.86

6.57

7.33

7.13

7.12

E2

−9.45

−10.59

−10.26

−10.25

N3

0.96

0.86

0.89

0.89

4.41

4.92

4.78

4.77

E3

−8.76

−9.78

−9.51

−9.50

N4

0.91

0.86

0.87

0.87

3.76

4.16

4.05

4.05

E4

−8.22

−9.18

−8.92

−8.92

N5

1.17

0.98

0.95

0.95

1.18

2.50

2.63

2.64

E5

−7.69

−8.60

−8.39

−8.38

N7

1.00

1.00

1.00

1.00

1.68

1.78

1.70

1.70

E6

−7.02

−7.82

−7.60

−7.60

N8

1.06

1.07

1.05

1.05

0.84

0.87

0.92

0.91

E7

−5.89

−6.58

−6.38

−6.38

N9

1.04

1.02

1.01

1.01

1.16

1.40

1.39

1.38

E8

−5.53

−6.17

−6.00

−5.99

N10

1.07

1.05

1.04

1.04

0.79

1.02

1.01

1.01

E9

−4.72

−5.26

−5.14

−5.13

N11

1.00

0.91

0.98

0.98

0.65

1.47

0.90

0.93

E10

−3.85

−4.30

−4.19

−4.18

N12

1.07

1.07

1.09

1.08

0.06

0.20

0.07

0.15

E11

−3.14

−3.49

−3.42

−3.41

N13

1.09

1.09

1.10

1.10

−0.25

−0.10

−0.21

−0.21

E12

−2.64

−2.97

−2.91

−2.91

N14

1.06

1.25

1.24

1.24

−0.57

−1.31

−1.16

−1.14

E13

−1.36

−1.50

−1.37

−1.37

N15

1.97

2.13

2.10

2.08

−3.65

−3.72

−3.72

−3.73

E14

−0.81

−0.87

−0.87

−0.87

N16

1.41

1.54

1.51

1.52

−2.77

−2.87

−2.76

−2.77

E16

0.61

0.55

0.67

0.68

N17

1.29

1.15

1.18

1.18

1.33

1.48

1.43

1.43

E17

1.48

1.41

1.45

1.45

N18

0.99

0.88

0.90

0.92

1.35

1.50

1.47

1.43

E18

2.11

1.93

1.98

1.98

N20

2.08

2.28

2.04

2.04

−3.57

−3.66

−3.54

−3.54

E19

2.90

2.81

2.81

2.80

N21

1.77

1.88

1.52

1.57

−4.36

−4.36

−4.24

−4.24

E20

3.63

3.73

3.59

3.59

N22

0.81

0.72

0.75

0.75

8.23

9.21

8.93

8.92

E21

4.43

4.42

4.49

4.50

N23

0.84

0.75

0.78

0.78

8.57

9.58

9.30

9.29

E22

5.01

4.96

4.95

4.95

N24

0.86

0.77

0.80

0.80

10.61

11.87

11.47

11.47

E23

5.20

5.17

5.28

5.29

N25

0.83

0.74

0.77

0.77

11.40

12.76

12.37

12.35

E24

5.24

5.18

5.26

5.25

N26

0.70

0.63

0.65

0.65

12.56

14.07

13.59

13.62

E25

5.47

5.40

5.52

5.52

N27

0.81

0.72

0.76

0.76

13.36

14.98

14.42

14.42

E26

5.48

5.41

5.47

5.47

N28

0.89

0.79

0.82

0.82

7.48

8.36

8.14

8.14

E27

6.23

6.13

6.19

6.19

N29

1.00

0.89

0.91

0.91

5.21

5.82

5.66

5.65

E28

6.70

6.61

6.65

6.64

N30

0.90

0.82

0.85

0.85

3.09

3.47

3.40

3.39

E29

6.74

6.64

6.69

6.68

N31

0.91

0.82

0.86

0.86

5.41

6.04

5.88

5.87

E30

6.87

6.75

6.79

6.78

N32

0.89

0.82

0.85

0.84

5.46

6.07

5.91

5.94

E31

7.52

7.31

7.23

7.23

N34

0.96

0.85

0.89

0.89

6.22

6.93

6.73

6.73

E32

7.96

7.60

7.52

7.52

N35

1.11

0.99

0.99

1.00

3.61

4.05

3.95

3.92

N38

1.17

1.11

1.18

1.18

0.65

0.81

0.69

0.69

N40

1.17

1.38

1.45

1.46

0.90

0.66

0.49

0.49

N42

0.98

1.09

1.06

1.07

1.11

0.98

1.00

0.98

N43

1.06

1.11

1.09

1.08

1.70

1.63

1.60

1.63

RMSE(1.0)

0.11

0.10

0.10

0.72

0.54

0.54

RMSE(1.0)

0.51

0.38

0.38

RMSE(2.0)

0.10

0.07

0.01

0.54

0.22

0.02

RMSE(2.0)

0.38

0.15

0.01

2.3. Quantification and Assessment of Measurement Uncertainty

Explicit consideration of (physical) measurement uncertainty in optimization procedures is often neglected in scientific studies. However, if its values are widely distributed and its magnitude becomes a dominant contribution to the model dispersion σ (Eq. 6), it can significantly alter the optimal values of the parameters under consideration. (Model dispersion and model error are interchangeable terms in this case as the model bias equals zero.) To estimate the importance of explicitly considering measurement uncertainty, we selected 30 of the 212 valid reactions (cf. Figure S1 and Table S2) that represent a diverse set of species and cover a wide range of logkexp values (−2.5 to +7.8 ). We find a positive dependence of the measurement uncertainty, u, on the value of logkexp (Figure 4). Laser flash photolysis experiments, [25] which were carried out to determine k 2 of faster reactions (logkexp>ca. 6), appear to introduce larger measurement uncertainty than conventional and stopped‐flow UV/Vis spectrophotometry.[ 24 , 25 ] For the residuals, however, we find no such trend, indicating homogeneous quality of logkexp over the full relevance domain (see also Figure S2).

Figure 4.

Figure 4

Absolute values of residuals (black dots), δlogk , versus logkexp are shown for 30 selected reactions of the 2022 reference set. Version 1.1 parameters were used to calculate logkMPE . Red error bars represent measurement uncertainty (95 % confidence), which show a positive trend with respect to logkexp (see also Figure S2 for a scatter plot of the measurement uncertainty versus logkexp ). The residuals are consistently larger than their associated 95 % confidence intervals (no error bar intercepts with the abscissa), which indicates that measurement uncertainty contributes negligibly to the model error.

We define the average measurement uncertainty (95 % confidence) as

u.95=1.96·u=1.96u2 (10)
u2=130r=130ur2 (11)

We obtain u.95=2.78×10-3 , which explains only 1.6 % of the model dispersion, σ.95=1.96·σ=1.70×10-1 . A direct comparison of the model residuals (Eq. 3) with individual measurement uncertainties (95 % confidence) shows that the former are constantly larger than the latter, from a factor of 2.78 up to several thousands (Figure 4). Assuming a factor of 2.78 for all residuals relative to the associated measurement uncertainties, one would obtain u.95/σ.95=(1.96·2.78)​-1=18% . This hypothetically high percentage would correspond to 1.96·2.78=5.45 standard deviations of the measurement uncertainty. Given that 1.96 standard deviations already correspond to 95 % of the area under a normal distribution, a factor of 5.45 effectively corresponds to 100 % of that area. We conclude that we can safely neglect measurement uncertainty in the context of reactivity scales.

2.4. Revised Reactivity Parameters 1.2: The Effect of Discrepancy Weighting

An insignificant contribution of measurement uncertainty to the model error is not a sufficient condition to neglect non‐uniform weights (cf. Eq. 2) in the optimization procedure. Consider the case where the species‐specific model dispersion (Eq. 19, see Appendix A) of species S is significantly larger than that of the other species. In such a scenario, species S may deteriorate the quality of the overall optimization outcome. One can resolve this situation and process data of potentially heterogeneous quality by applying discrepancy weighting. The discrepancy of a model is a measure of its inability to reproduce the reference data within their uncertainty range (here, originating from physical measurements and data post‐processing).[ 9 , 13 ] The quantification of model discrepancy is an iterative procedure because the weights of the objective function and the species‐specific model dispersions are functions of each other. Consequently, the former need to be refined until self‐consistency is reached, i. e., until weights and dispersions no longer change. Note that the weights in Eq. 2 refer to reactions and not to species. Hence, for a given reaction, the species‐specific model dispersions of the participating nucleophile and electrophile need to be combined to yield a reaction‐specific weight. The full procedure is outlined in Appendix A. See Table S3 for species‐specific weights. Reaction‐specific weights can be accessed through the project‐related GitLab repository. [6]

For the weighting procedure to be sound from a statistical perspective, it is important that, after reaching self‐consistency, the residuals of species S, δrlogk)}​S , are zero‐centered (μS0 , cf. Eq. 19) and randomly distributed, i. e., they show no trend with respect to the absolute value of logk . We find that the latter condition is well met as evidenced by the close‐to‐one correlation coefficient of logkexp versus logkMPE for each species of the reference set. The former condition is also fulfilled as confirmed by σS2/ϵS21 (cf. Eq. 19) in most cases, although for a small number of cases, the contribution of σS2 to the overall species error ϵS2 can be as small as 75 %. See Table S3 for details. We conclude that discrepancy weighting can be reliably applied in the optimization of reactivity parameters.

Figure 5 shows the weights of all 212 valid reactions as a function of logkexp . They are homogeneously distributed around the red baseline, which represents uniform weights, and show no trend with respect to logkexp . In the previous subsection, we found that measurement uncertainty is overall negligible but increases with logkexp . If measurement uncertainty would, however, contribute significantly to the model error, we would expect a negative trend of the weights with respect to the value of logk . The actual missing trend further supports our conclusion that measurement uncertainty contributes negligibly to the model error.

Figure 5.

Figure 5

Reaction‐specific and non‐uniform weights (black dots), {wr}​r=1R , obtained from discrepancy weighting versus logkexp are shown for all 212 valid reactions of the 2022 reference set. The red baseline represents the case of uniform weighting, i. e., wr=R-1 for all r=1,...,R . The non‐uniform weights show no trend with respect to logkexp .

The revised version 1.2 of reactivity parameters (Table 2) mitigates the upward and downward shifts of N and E to some degree, respectively, but clearly has higher resemblance to version 1.1 than to version 1.0. Consequently, the data selection criteria applied in this study affect the reactivity parameters of the reference set significantly more than discrepancy weighting does.

2.5. Revised Reactivity Parameters 2.0: The Effect of Bootstrapping

Due to the finite size of the reference set, which additionally covers only a fraction of the reaction matrix it spans (cf. Figure S1), the optimal values of the reactivity parameters can be expected to carry uncertainty. In order to estimate parameter uncertainty, we applied Bayesian bootstrapping (cf. Appendix B). With this technique, we generated 10,000 synthetic reference sets referred to as bootstrap samples. For each sample, we carried out an individual optimization (using the self‐consistent weights of parametrization 1.2), leading to a unique set of optimal reactivity parameters. The set of 10,000 values per reactivity parameter is referred to as empirical distribution.

For the first time, we can report reactivity parameters that are equipped with quantitative uncertainty measures. We define the first moment (mean) of the empirical parameter distributions as version 2.0 reactivity parameters (Table 2). This most recent parametrization is almost identical to version 1.2, which is indicative of a well‐balanced, representative set of reaction data. Uncertainty in s N, N, and E (95 % confidence) is located in ranges of 0.02–0.55 (root‐mean‐square value, RMSV=0.15), 0.04–1.10 (RMSV=0.44), and 0.04–0.55 (RMSV=0.26), respectively. The large uncertainty of 1.10 in the nucleophilicity parameter of N5 is another indication of bias that is coherent with the above‐mentioned findings. Most of the empirical parameter distributions are symmetric and can be well approximated by a normal distribution, with a tendency of the empirical distribution to be slightly leptokurtic, i. e., it is narrower than the corresponding normal distribution. Parameter uncertainty estimates and histograms of empirical distributions can be accessed through the project‐related GitLab repository. [6] A representative example is provided in Figure 6.

Figure 6.

Figure 6

Empirical distributions for nucleophile N28 (s N and N), electrophile E11 (E), and the reaction N28+E11 (logkMPE ) obtained from Bayesian bootstrapping. Mean values (green solid line) and symmetric 95 % confidence intervals (green dashed line) of the distributions are reported. Corresponding normal distributions are shown as black dashed curves and serve as reference frames. The mean values of the s N, N, and E distributions (μsN , μN , and μE ) correspond to the version 2.0 reactivity parameters reported in Table 2. The blue dashed line in the bottom‐right plot represents the value of logkMPE obtained via μsN(μN+μE) . It is identical, within two decimals, to the mean value of the asymmetric logkMPE distribution.

Empirical parameter distributions can be exploited in several ways to underpin, improve, and find limitations to the reactivity approach by Mayr. First, the uncertainty in reactivity parameters of non‐reference species can be estimated in analogy to Mayr's approach. For a non‐reference nucleophile/electrophile, measurements are performed on a series of reactions including reference electrophiles/nucleophiles. Least‐squares optimization in accord with Eq. 2 yields the reactivity parameter(s) for the non‐reference species. Since 10,000 values are available for s N, N, and E of the reference species, we can repeat the optimization procedure 10,000 times (which is computationally efficient), resulting in empirical distributions of reactivity parameters also for non‐reference species. Second, combining empirical distributions of s N, N, and E yields an empirical distribution of logkMPE from which its uncertainty can be derived (see Section 2.6 and Figure 6).

We propose a third way to exploiting empirical parameter distributions. A series of theoretical models predicting Mayr‐type reactivity parameters were proposed in the past.[ 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 ] The predictive power of these models was assessed with respect to some summary statistic (e. g., mean absolute error or root‐mean‐square error). However, to put the resulting statistics into context, it is necessary to know the uncertainty in the underlying reference values. Ours is the first study providing such uncertainty estimates on a rigorous basis, which allows for assessing previous theoretical work. For instance, regression models were previously employed to predict nucleophilicity N (Orlandi et al., [42] Table 3 of this work) and electrophilicity E (Hoffmann et al., [40] Table 4 of this work) on the basis of quantum‐mechanical and empirical descriptors. Version 1.0 reactivity parameters (reference species) and those derived therefrom (non‐reference species) served as reference values in both studies. Regarding the reference species, we find that only 21–45 % of the predicted reactivity parameters (both N and E) are located inside their 95 % confidence intervals, indicating that the theoretical models cannot reproduce the reference values within their uncertainty ranges.

Table 3.

Predictions of nucleophilicity, N (O21), by Orlandi et al. [42] for nucleophiles of the reference set. Differences with respect to parametrizations 1.0 (Mayr and co‐workers[ 24 , 25 ]) and 2.0 (this work) are provided, ΔN(1.0/2.0)=N(1.0/2.0)- N (O21). Parameter uncertainty (95 % confidence, assuming normally distributed variables) estimated by us, α.95N2.0 , is reported. The root‐mean‐square value (RMSV) as well as the percentage of differences, ΔN(1.0/2.0) , located inside the 95 % confidence interval (PDCI95) are provided as summary statistics.

N(O21)
ΔN(1.0)
ΔN(2.0)
α.95(N(2.0))

N(O21)
ΔN(1.0)
ΔN(2.0)
α.95(N(2.0))

N1

10.62

−1.62

−0.78

0.51

N22

8.91

−0.68

0.01

0.45

N2

7.93

−1.36

−0.81

0.37

N25

11.02

0.38

1.33

0.61

N5

−0.21

1.39

2.85

1.08

N27

12.56

0.80

1.86

0.70

N10

0.77

0.02

0.24

0.09

N34

5.29

0.93

1.44

0.35

N12

0.02

0.04

0.13

0.52

N35

4.37

−0.76

−0.45

0.48

N13

2.58

−2.83

−2.79

0.07

N38

2.76

−2.11

−2.07

0.08

N15

−2.09

−1.56

−1.64

0.37

N42

1.38

−0.27

−0.40

0.12

N17

2.18

−0.85

−0.75

0.13

N43

1.60

0.10

0.03

0.52

N18

1.66

−0.31

−0.23

0.32

RMSV PDCI95(1.0)

1.15

1.31

0.46 32%

N20

−3.34

−0.23

−0.20

0.20

N21

−4.07

−0.29

−0.16

0.34

PDCI95(2.0)

32%

Table 4.

Predictions of electrophilicity, E (H20), by Hoffmann et al. [40] for electrophiles of the reference set. Differences with respect to parametrizations 1.0 (Mayr and co‐workers[ 24 , 25 ]) and 2.0 (this work) are provided, ΔE(1.0/2.0)=E(1.0/2.0)- E (H20). Parameter uncertainty (95 % confidence, assuming normally distributed variables) estimated by Hoffmann et al., α .95(E (H20)), and by us, α .95(E (2.0)), are reported. The root‐mean‐square value (RMSV) as well as the percentage of differences, ΔE (1.0/2.0), located inside the 95 % confidence interval (PDCI95) are provided as summary statistics.

E(H20)
ΔE(1.0)
ΔE(2.0)
α.95(E(H20))
α.95(E(2.0))

E(H20)
ΔE(1.0)
ΔE(2.0)
α.95(E(H20))
α.95(E(2.0))

E1

−9.71

−0.33

−1.16

0.24

0.53

E19

3.00

−0.10

−0.20

0.33

0.09

E2

−9.78

0.33

−0.47

0.25

0.51

E20

3.22

0.41

0.37

0.24

0.14

E3

−8.57

−0.19

−0.93

0.35

0.47

E21

4.34

0.09

0.16

0.27

0.14

E4

−8.44

0.22

−0.48

0.24

0.44

E23

6.07

−0.87

−0.78

0.20

0.11

E5

−8.04

0.35

−0.34

0.31

0.42

E24

5.44

−0.20

−0.19

0.33

0.07

E6

−7.02

0.00

−0.58

0.39

0.39

E25

5.15

0.32

0.37

0.20

0.09

E7

−6.16

0.27

−0.22

0.33

0.34

E26

5.15

0.33

0.32

0.41

0.08

E8

−6.55

1.02

0.56

0.43

0.32

E27

6.10

0.13

0.09

0.24

0.11

E9

−3.80

−0.92

−1.33

0.55

0.29

E28

6.70

0.00

−0.06

0.49

0.12

E10

−3.62

−0.23

−0.56

0.43

0.25

E29

6.82

−0.08

−0.14

0.35

0.12

E11

−3.06

−0.08

−0.35

0.33

0.23

E30

6.69

0.18

0.09

0.25

0.15

E13

−1.17

−0.19

−0.20

0.47

0.12

E31

6.73

0.79

0.50

0.29

0.15

E14

−0.70

−0.11

−0.17

0.31

0.16

E32

6.53

1.43

0.99

0.35

0.22

E16

0.16

0.45

0.52

0.33

0.07

RMSV

0.48

0.54

0.35

0.26

E17

1.25

0.23

0.20

0.35

0.04

PDCI95(1.0)

62 %

45 %

E18

2.30

−0.19

−0.32

0.41

0.05

PDCI95(2.0)

45 %

21 %

It should be noted that Tables 3 and 4 draw an overly pessimistic picture. On the one hand, both studies included a much larger pool of species than those reported here, comprising several non‐reference species. It is well known that the accuracy of reactivity parameters corresponding to non‐reference species is significantly smaller than that observed for reference species. [3] This heterogeneity in accuracy obviously has an effect on theoretical predictions, which we did not take into account in our analysis due to the lack of empirical parameter distributions for non‐reference species. On the other hand, our comparison is based on uncertainties corresponding to version 2.0 reactivity parameters, even though the regression models by Orlandi et al. and Hoffmann et al. were trained with respect to the currently accepted set of reactivity parameters (version 1.0).[ 4 , 5 ] We would like to raise one issue, though. Hoffmann et al. [40] provided uncertainty estimates for reactivity parameters that are a by‐product of their regression framework (Gaussian processes [44] ). Only 45–62 % of their predictions (with respect to reference species only) fall within their 95 % confidence intervals, indicating that their model underestimates parameter uncertainty. We observed this behavior of Gaussian processes in another context [21] and concluded to select kernel functions not only with respect to predictive power; they should also yield statistically reliable results, i. e., about 95 % of the predictions should be located inside their 95 % confidence intervals.

2.6. Quantification and Assessment of Uncertainty in log k MPE

Due to the empirical nature of the reactivity parameter distributions, we can propagate uncertainty without assuming some parametrized distribution (e. g., a normal distribution parametrized by mean and variance). That is, for each set of reactivity parameters we obtain one set of logkMPE values. Histograms and statistics of the corresponding empirical distributions of logkMPE can be accessed through the project‐related GitLab repository. All distributions are unimodal, and many of them are clearly asymmetric, see Figure 6 for a representative example. A consequence of this skewness is that logkMPE calculated from version 2.0 reactivity parameters may not well represent the mean of its empirical distribution, even though we observe such a behavior only for a handful of cases.

From the ensemble of logkMPE values for a given reaction we can estimate the contribution of parameter uncertainty to the overall prediction uncertainty (Eq. 8). A heat map comprising uncertainty estimates for logkMPE (95 % confidence) of the full reaction matrix is shown in Figure 7. For many of the observed reactions (represented by crosses), the contribution of parameter uncertainty to the overall prediction uncertainty is effectively zero, and model dispersion remains the sole contribution, i. e., U.95σ.95=0.21 . For the set of observed reactions, we find prediction uncertainties of 0.21–0.92 (RMSV=0.25). Taking all combinations of reference nucleophiles and reference electrophiles into account that lie within a range of -5<logkMPE<8 , we find a maximum prediction uncertainty of 2.14 (RMSV=0.50). Consequently, the average accuracy of k MPE that we can expect for any valid combination of reference nucleophile and reference electrophile is within a factor of 10. In most cases, a simple uncertainty pattern can be observed: the larger the distance to an observed reaction in terms of electrophilicity E, the larger the prediction uncertainty (no such trend can be observed with respect to nucleophilicity N or sensitivity‐weighted nucleophilicity sNN ). This gradual change in uncertainty indicates that information is propagated from observed reactions to similar yet unobserved reactions. We can derive a simple rule for experimental design from this finding:

Figure 7.

Figure 7

Uncertainty (95 % confidence) in logkMPE , sNN , N, and E. Crosses represent observed reactions. Colored fields represent reactions within a range of -5<logkMPE<8 . White fields in the main matrix indicate reactions outside that range. White fields outside the main matrix represent anchor species whose reactivity parameters (either s N or E) are fixed.

For a given nucleophile, measure log k exp for a series of electrophiles that are as equidistant as possible with respect to electrophilicity E.

To assess the quality of our uncertainty estimates, we counted how often the residual of a reaction is located within its 95 % confidence interval (hypothesis testing). The result is visualized in Figure 8A. Only a single residual (or less than 1 % of all 212 residuals) is located outside its 95 % confidence interval (ideal value: 5 %). Hence, our UQ model is rather conservative as it tends to overestimate prediction uncertainty. Overestimation is particularly strong when the contribution of parameter uncertainty to the overall prediction uncertainty tends toward zero. It appears that the model dispersion – a global/constant contribution to the overall prediction uncertainty – is too rough an approximation of the local/reaction‐specific model dispersion. Noteworthy, we found a trend between the squared residual, δlogk)]​2 , and the squared parameter‐related uncertainty in logkMPE , β 2 (cf. Eq. 7). This trend is not linear but describes a monotonically increasing function. We found that a quadratic ordinary least‐squares regression model, gβ2 , appropriately quantifies this trend,

δlogk)]​2g(β2)=a+b1β2+b2β4 (12)

Figure 8.

Figure 8

Assessment of uncertainty estimates (95 % confidence) for logkMPE . (A) Prediction uncertainty is estimated according to Eq. 8. A single residual (or <1 % of all residuals) is located outside its 95 % confidence interval. (B) Prediction uncertainty is estimated according to Eq. 13. Again, a single residual is located outside its 95 % confidence interval. (C) Equivalent to (A), but 46 of the 212 reference reactions were excluded from the optimization workflow and subsequent uncertainty quantification. Four (9 %) of the 46 validation residuals are located outside their 95 % confidence intervals. (D) Equivalent to (B), but the same procedure as outlined in (C) was applied. Again, four of the 46 validation residuals are located outside their 95 % confidence intervals.

Here, a, b 1, and b 2 are the coefficients of the model. We can generalize Eq. 8 to resolve local model dispersion (LMD),

U.95,rLMD=1.96·(crσ)​2+βr2 (13)

Eqs. 8 and 13 are identical if cr=1 . The weight cr is not to be confused with the weight wr of the objective function defined in Eq. 2. The quadratic regression model offers a way to re‐define the weights of Eq. 13 such that r=1Rcr2σ2=σ2·r=1Rcr2 =σ2·R is a conservation law,

cr=R·grβr2s=1Rgsβs2 (14)

Replacing the uniform weights with those obtained according to Eq. 14, we obtain an expression of the prediction uncertainty with an effectively local model dispersion. The resulting hypothesis test is visualized in Figure 8B. The shape of the updated prediction uncertainty band better reflects the increasing scatter of the residuals (from left to right). Again, a single residual is located outside its 95 % confidence interval after the update. The UQ model remains conservative, but overall is a much better fit to the actual distribution of residuals.

It should be noted that the hypothesis test is biased somehow and possibly presents an overly optimistic picture as the reactions included in this test were also used to optimize reactivity parameters and quantify prediction uncertainty. As a preliminary test, we split the 212 reference reactions into a training set (Rtrain= 166 reactions) and a validation set (Rval= 46 reactions). We selected the validation set reactions (cf. Figure S1) in such a way that the 2E3N rule was not violated for any species of the 2022 reference set. The training set was subjected to the optimization workflow and subsequent UQ. A hypothesis test (Figures 8C and 8D) reveals that 9 % of the validation set residuals are located outside their 95 % confidence intervals. This finding may suggest that our UQ model is too optimistic. Confidence intervals, however, are sample statistics and as such functions of the underlying data set. To estimate the uncertainty of the 95 % confidence interval corresponding to the validation set, we calculated the standard error of a binomial distribution, [31] p(1-p)/Rval=4% with p=9% . Hence, we cannot reject the compatibility with a 95 % coverage, although the validation sample size appears to be too small to draw a robust conclusion. The decreased training sample size is also problematic: as the number of reactivity parameters remains unchanged, uncertainty estimates are expected be of lower quality than in the previous scenario (Figures 8A and 8B). Eventually, we would like to point out that the model dispersion can also be understood as a tunable parameter through which a correct calibration of the prediction uncertainty can be ensured. This approach is known as parameter uncertainty inflation. [45]

3. Conclusions

We showed that the incorporation of uncertainty quantification (UQ) into the reactivity scale method by Mayr [3] sheds new light on the topic. As a by‐product of the UQ‐extended reactivity approach, we obtained revised reactivity parameters for 68 reference species. Compared to the original parametrization by Mayr and co‐workers,[ 24 , 25 ] the revised parameters differ by as much as one unit. It remains to be discussed how these changes could be integrated into Mayr's reactivity database.[ 4 , 5 ] Since the reactivity parameters of all non‐reference species (about 1200 nucleophiles and 300 electrophiles) are derived from the ones of the reference species, our revised set of parameters would affect the entire database.

Our results suggest that the prediction uncertainty associated with logkMPE (95 % confidence) amounts to 0.21–0.92 units for the set of 212 observed reference reactions. For combinations of reference nucleophiles and reference electrophiles that have not yet been observed and lie within the relevant range of -5<logkMPE<8 , we found a maximum prediction uncertainty of 2.14 units. These numbers reflect the accuracy in k MPE estimated previously. [3] To take into account potential non‐normality of the empirical logkMPE distributions computed by us, we define the following “best practice”. For a rough estimation of logk , which is still expected to be highly accurate in most cases, we recommend to use the revised reactivity parameters (version 2.0) reported in Table 2. For a critical analysis of logk , we recommend to explicitly calculate the empirical distribution of logkMPE by an interactive tool that can be accessed through the project‐related GitLab repository. [6] We further encourage the community to assess future theoretical predictions of reactivity parameters in the context of parameter uncertainty (as discussed in this study, cf. Tables 3 and 4). Such benchmarks ensure that theoreticians interpret their predictions as critically as possible, but also enable experimentalists to unambiguously evaluate theoretical work.

Uncertainty estimates for logkMPE also allowed us to formulate testable statistical hypotheses, on the basis of which we could assess their quality. The estimates appear to be reliable, but the results are not yet conclusive due to the small sample size. In future UQ‐related work on reactivity scales, the pool of both species and reactions needs to be increased to draw more robust conclusions. We would also appreciate support by the community in this context. For instance, there are many unobserved combinations (-5<logkMPE<8 ) present in the reaction matrix of the reference set (Figures 7 and S1). Measurements of these combinations will further increase the accuracy of reactivity parameters and uncertainty estimates corresponding to reference electrophiles and reference nucleophiles, which are at the heart of Mayr's reactivity scale approach.

In the long run, we aim at deriving reactivity parameters from first‐principles calculations, especially for species not yet listed in Mayr's reactivity database. An achievement of this kind would facilitate reactivity predictions to an unprecedented extent due to the resource efficiency and the high automation capacity of computations, thereby reducing experimental expense and accelerating research on polar organic reactivity. Despite their first‐principles character, thermochemical calculations are based on approximations that require benchmarking, i. e., an assessment with respect to reference values with well‐defined accuracy (here, experimental rate constants, logkexp ). We anticipate that the incorporation of UQ supports our ambition as the reaction matrix spanned by the electrophiles and the nucleophiles of Mayr's reactivity database is rather sparse (cf. Figures S1 and 7). The vacancies of the reaction matrix (representing unobserved reactions) can be filled by means of our UQ‐based approach, allowing for benchmarking under uncertainty. The diversity of benchmarkable reactions can be increased remarkably in this way, which increases the significance of conclusions drawn from theoretical studies and is particularly important in the context of data‐driven chemical design.[ 46 , 47 , 48 ] Currently, we are benchmarking first‐principles models of reactivity against results of this study.

Appendix A: Discrepancy Weighting

We define the global discrepancy d as the square root of the difference between the squared model error (Eq. 4) and the average squared measurement uncertainty (Eq. 11,

d=ϵ2-u2 (15)

Note that ϵ 2 is generally significantly larger than u2 and, hence, the square root of a positive number is taken. It is required that the model has been corrected for bias (Eq. 5), such that

ϵ2=μ2+σ2σ2 (16)

In uniformly weighted least‐squares optimization (cf. Eq. 2), the model bias is zero by definition (provided it contains a constant term). Assuming that measurement uncertainty is also negligible (σ2u2 ), as is shown in Section 2.3, we can write

d2=ϵ2-u2σ2 (17)

By design, this approximation also holds true when determining discrepancies for individual species, S{N1N45,E1E33} , i. e.,

dS2σS2 (18)

Since electrophiles and nucleophiles can participate in as few as two or three reactions, we take the statistical degrees of freedom of each species explicitly into account,

σS=νS-1rSδrlogk-μS2νS-1rSδrlogk2=ϵS (19)
νS=RS-γS (20)

Here, νS constitutes the degrees of freedom of species S, RS is its number of occurrences, γS represents its number of free reactivity parameters, and S is the index set of reactions in which it participates. Hence, the smaller RS , the larger the effect of γS on σS will become. The discrepancy d .95 of the rth reaction, in which species SN,r and SE,r participate, can then be calculated under the assumption of t‐distributed, independent errors,

d.95,r=t.95,r2σSN,r2+σSE,r2 (21)

The reaction‐specific t‐factor t .95,r corresponds to the folded t‐distribution for degrees of freedom νr that defines an interval encompassing 95 % of the distribution. The degrees of freedom for the rth reaction, νr , can be estimated on the basis of the Welch–Satterthwaite equation[ 49 , 50 ] (in particular, we refer the reader to Eq. 17 of the latter reference),

νr=cSN,rσSN,r2+cSE,rσSE,r22cSN,r2σSN,r4νSN,r+cSE,r2σSE,r4νSE,r (22)
cS=(νS+1)​-1 (23)

The reaction‐specific degree of freedom, νr , is at most as large as the sum of the species‐specific degrees of freedom, νSN,r and νSE,r ,

νrνSN,r+νSE,r (24)

The inverse of the squared discrepancy, d.95,r-2 , constitutes the weight of the rth reaction. We additionally normalize the weights such that they sum up to one,

wr=d.95,r-2s=1Rd.95,s-2 (25)

In the unweighted case, normalization leads to wr=R-1 for all possible values of r. Normalization does not affect the position of the global minimum of the objective function, but allows for comparability between different sets of weights. It should be noted that discrepancy weighting is an iterative procedure as reaction‐specific weights and errors are functions of each other. Hence, we need to update the weights until self‐consistency is reached.

Appendix B: Bayesian Bootstrapping

This technique [29] simulates drawing new samples from an underlying but unknown population by assuming that the data set at hand itself is the population. Consequently, only available data is used to draw samples, each of which yields slightly different parameters.

The following procedure describes sampling from a uniform Dirichlet distribution. [31] Given R data points, R-1 real numbers between zero and one are sampled from a uniform distribution. The numbers 0.0 and 1.0 are added to the tuple of R-1 sampled numbers. The tuple is then sorted in ascending order, yielding q0=0.0<q1<...<qR-1<qR=1.0 . We define pr=qr-qr-1 as weight of the rth data point (i. e., pr=wr ), which is a number between zero and one. Summing over all weights yields r=1Rpr=1 and, therefore, each weight can be considered the probability of drawing the corresponding data point from the underlying population. Note that if both discrepancy weighting and bootstrapping are applied, the weight of the rth reaction reads

wr=pr·d.95,r-2s=1Rps·d.95,s-2 (26)

We repeat this random procedure B times, representing B bootstrap samples, each characterized by an individual set 𝒫b:={prb}​r=1R . The original sample (the data set at hand) can be characterized by the set 𝒫0 with a uniform distribution of weights, i. e., pr0=R-1 for all possible values of r.

Conflict of interest

The authors declare no conflict of interest.

4.

Supporting information

As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re‐organized for online delivery, but are not copy‐edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

Supporting Information

Acknowledgements

J.P. acknowledges funding of this research by the German Research Foundation (DFG) via project 389479699/GRK2455. The authors appreciate advice and artistic input (table‐of‐contents graphic) by Prof. Ricardo A. Mata and thank him, Prof. Herbert Mayr, Dr. Verena Kraehmer, and Dr. Christopher Stein for insightful discussions and proof‐reading of this manuscript. Open Access funding enabled and organized by Projekt DEAL.

J. Proppe, J. Kircher, ChemPhysChem 2022, 23, e202200061.

**

A previous version of this manuscript has been deposited on a preprint server (DOI: 10.26434/chemrxiv‐2021‐hwh2d‐v2)

Data Availability Statement

The data that support the findings of this study are openly available in GitLab at https://gitlab.com/jproppe/mayruq, reference number 27888788.

References

  • 1. Mayr H., Lakhdar S., Maji B., Ofial A. R., Beilstein J. Org. Chem. 2012, 8, 1458–1478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Mayr H., Patz M., Angew. Chem. Int. Ed. 1994, 33, 938–957; [Google Scholar]; Angew. Chem. 1994, 106, 990–1010. [Google Scholar]
  • 3. Mayr H., Tetrahedron 2015, 71, 5095–5111. [Google Scholar]
  • 4. Mayr H., Ofial A. R., SAR QSAR Environ. Res. 2015, 26, 619–646. [DOI] [PubMed] [Google Scholar]
  • 5.H. Mayr, A. R. Ofial, Mayr's Database of Reactivity Parameters, https://www.cup.lmu.de/oc/mayr/reaktionsdatenbank2/, last accessed on 25 January 2022.
  • 6.J. Proppe, Uncertainty Quantification of Reactivity Scales, https://gitlab.com/jproppe/mayruq, last accessed on 25 November 2021. [DOI] [PMC free article] [PubMed]
  • 7. Kennedy M. C., O'Hagan A., J. R. Stat. Soc. Series B 2001, 63, 425–464. [Google Scholar]
  • 8. Pernot P., Cailliez F., AIChE J. 2017, 63, 4642–4665. [Google Scholar]
  • 9. Proppe J., Reiher M., J. Chem. Theory Comput. 2017, 13, 3297–3317. [DOI] [PubMed] [Google Scholar]
  • 10.JCGM, Evaluation of Measurement Data – Guide to the Expression of Uncertainty in Measurement, 2008.
  • 11.B. N. Taylor, C. E. Kuyatt, NIST Technical Note 1297, 1994.
  • 12. Ruscic B., Int. J. Quantum Chem. 2014, 114, 1097–1101. [Google Scholar]
  • 13. Pernot P., Civalleri B., Presti D., Savin A., J. Phys. Chem. A 2015, 119, 5288–5304. [DOI] [PubMed] [Google Scholar]
  • 14. Mata R. A., Suhm M. A., Angew. Chem. Int. Ed. 2017, 56, 11011–11018; [DOI] [PMC free article] [PubMed] [Google Scholar]; Angew. Chem. 2017, 129, 11155–11163. [Google Scholar]
  • 15. Simm G. N., Proppe J., Reiher M., Chimia 2017, 71, 202–208. [DOI] [PubMed] [Google Scholar]
  • 16. Friederich P., Häse F., Proppe J., Aspuru-Guzik A., Nat. Mater. 2021, 20, 750–761. [DOI] [PubMed] [Google Scholar]
  • 17. Gallenkamp C., Kramm U. I., Proppe J., Krewald V., Int. J. Quantum Chem. 2021, 121, e26394. [Google Scholar]
  • 18. Weymuth T., Proppe J., Reiher M., J. Chem. Theory Comput. 2018, 14, 2480–2494. [DOI] [PubMed] [Google Scholar]
  • 19. Proppe J., Gugler S., Reiher M., J. Chem. Theory Comput. 2019, 15, 6046–6060. [DOI] [PubMed] [Google Scholar]
  • 20. Proppe J., Husch T., Simm G. N., Reiher M., Faraday Discuss. 2016, 195, 497–520. [DOI] [PubMed] [Google Scholar]
  • 21. Proppe J., Reiher M., J. Chem. Theory Comput. 2019, 15, 357–370. [DOI] [PubMed] [Google Scholar]
  • 22. Uranga J., Hasecke L., Proppe J., Fingerhut J., Mata R. A., J. Chem. Inf. Model. 2021, 61, 1942–1953. [DOI] [PubMed] [Google Scholar]
  • 23. Bahlke M. P., Mogos N., Proppe J., Herrmann C., J. Phys. Chem. A 2020, 124, 8708–8723. [DOI] [PubMed] [Google Scholar]
  • 24. Mayr H., Bug T., Gotta M. F., Hering N., Irrgang B., Janker B., Kempf B., Loos R., Ofial A. R., Remennikov G., Schimmel H., J. Am. Chem. Soc. 2001, 123, 9500–9512. [DOI] [PubMed] [Google Scholar]
  • 25. Ammer J., Nolte C., Mayr H., J. Am. Chem. Soc. 2012, 134, 13902–13911. [DOI] [PubMed] [Google Scholar]
  • 26. Wales D. J., Doye J. P. K., J. Phys. Chem. A 1997, 101, 5111–5116. [Google Scholar]
  • 27. Virtanen P. et al., Nat. Methods 2020, 17, 261–272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Ruscic B., Pinzon R. E., Morton M. L., von Laszevski G., Bittner S. J., Nijsure S. G., Amin K. A., Minkoff M., Wagner A. F., J. Phys. Chem. A 2004, 108, 9979–9997. [Google Scholar]
  • 29. Rubin D. B., Ann. Statist. 1981, 9, 130–134. [Google Scholar]
  • 30. Hastie T., Tibshirani R., Friedman J. H., The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer Series in Statistics), Springer: New York (NY), United States, 2nd ed., 2009. [Google Scholar]
  • 31.C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer: New York (NY), United States, 2006.
  • 32. Pedregosa F. et al., J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  • 33. Altman N., Krzywinski M., Nat. Methods 2015, 12, 999–1000. [DOI] [PubMed] [Google Scholar]
  • 34. Pérez P., Toro-Labbé A., Aizman A., Contreras R., J. Org. Chem. 2002, 67, 4747–4752. [DOI] [PubMed] [Google Scholar]
  • 35. Schindele C., Houk K. N., Mayr H., J. Am. Chem. Soc. 2002, 124, 11208–11214. [DOI] [PubMed] [Google Scholar]
  • 36. Wang C., Fu Y., Guo Q. X., Liu L., Chem. Eur. J. 2010, 16, 2586–2598. [DOI] [PubMed] [Google Scholar]
  • 37. Pereira F., Latino D. A. R. S., Aires-de-Sousa J., J. Org. Chem. 2011, 76, 9312–9319. [DOI] [PubMed] [Google Scholar]
  • 38. Zhuo L. G., Liao W., Yu Z. X., Asian J. Org. Chem. 2012, 1, 336–345. [Google Scholar]
  • 39. Hoffmann G., Tognetti V., Joubert L., Chem. Phys. Lett. 2019, 724, 24–28. [Google Scholar]
  • 40. Hoffmann G., Balcilar M., Tognetti V., Héroux P., Gaüzére B., Adam S., Joubert L., J. Comput. Chem. 2020, 41, 2124–2136. [Google Scholar]
  • 41. Mood A., Tavakoli M., Gutman E., Kadish D., Baldi P., Van Vranken D. L., J. Org. Chem. 2020, 85, 4096–4102. [DOI] [PubMed] [Google Scholar]
  • 42. Orlandi M., Escudero-Casao M., Licini G., J. Org. Chem. 2021, 86, 3555–3564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Kadish D., Mood A. D., Tavakoli M., Gutman E. S., Baldi P., Van Vranken D. L., J. Org. Chem. 2021, 86, 3721–3729. [DOI] [PubMed] [Google Scholar]
  • 44.C. E. Rasmussen, C. K. I. Williams, Gaussian Processes for Machine Learning, The MIT Press: Cambridge (MA), United States, 2006.
  • 45. Pernot P., J. Chem. Phys. 2017, 147, 104102. [DOI] [PubMed] [Google Scholar]
  • 46. Aspuru-Guzik A., Lindh R., Reiher M., ACS Cent. Sci. 2018, 4, 144–152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. dos Passos Gomes G., Pollice R., Aspuru-Guzik A., Trend Chem. 2021, 3, 96–110. [Google Scholar]
  • 48. Pollice R., dos Passos Gomes G., Aldeghi M., Hickman R. J., Krenn M., Lavigne C., Lindner-D'Addario M., Nigam A., Ser C. T., Yao Z., Aspuru-Guzik A., Acc. Chem. Res. 2021, 54, 849–860. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Welch B. L., Biometrika 1938, 29, 350–362. [Google Scholar]
  • 50. Satterthwaite F. E., Biometrics Bull. 1946, 2, 110–114. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re‐organized for online delivery, but are not copy‐edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

Supporting Information

Data Availability Statement

The data that support the findings of this study are openly available in GitLab at https://gitlab.com/jproppe/mayruq, reference number 27888788.


Articles from Chemphyschem are provided here courtesy of Wiley

RESOURCES