Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2025 May 12;15(11):8885–8893. doi: 10.1021/acscatal.5c01626

A Practical Start-Up Guide for Synthetic Chemists to Implement Design of Experiments (DoE)

Brendan J Wall 1, Mason T Koeritz 1, Levi M Stanley 1,*, Brett VanVeller 1,*
PMCID: PMC12150262  PMID: 40502975

Abstract

Experimental design in the synthetic chemistry community primarily focuses on optimizing reaction conditions by modifying one variable at a time (OVAT). Design of Experiments (DoE) methodology, already commonplace in the chemical industry, captures optimal reaction conditions while shrinking the number of experiments required. Additionally, DoE can provide interaction effects between variables that are not captured in OVAT optimizations. Despite the cost- and time-saving benefits that DoE can provide, it has not been widely adopted by the academic community. This start-up guide is the first entry-level manual, designed for synthetic chemists, with worked examples for the practical implementation of DoE methodology in the context of synthetic method development.

Keywords: design of experiments, one variable at a time, start-up guide, synthetic chemistry, optimization


graphic file with name cs5c01626_0010.jpg


graphic file with name cs5c01626_0009.jpg

Introduction

After the discovery of any chemical transformation, a synthetic chemist is faced with the task of optimizing the conditions (or independent variables) of a reaction. Frequently, the chemical yield serves as the response (or dependent variable) of the chemical reaction. Asymmetric chemical transformations introduce added complexity to the optimization process as the chemist must now optimize reaction conditions for multiple responses (e.g., yield and stereoselectivity). It is no secret to the synthetic community that the optimization of organic reactions can be one of the most time-consuming, expensive, and challenging aspects of reaction development.

Chemists approach this optimization problem using a “controlled experiment” methodology known as One Variable At a Time (OVAT) optimization to ensure that an experimental outcome is due to one identifiable factor. In an OVAT optimization, a chemist interested in how temperature might affect the yield of their reaction could perform their reactions at 0 °C, 25 °C, 50 °C, and 75 °C. After locating the optimal temperature for the reaction, the chemist might consider another variable such as the catalyst loading, testing the effect of 1 mol %, 5 mol %, and 10 mol % catalyst loading. At the end of the OVAT optimization the chemist might believe that their conditions reflect the optimal conditions for the transformation; however, the fraction of chemical space probed by their optimization is minimal (Figure , left).

1.

1

One Variable At a Time (OVAT) versus Design of Experiments (DoE) methodology.

The OVAT methodology treats variables independently of one another, meaning interaction effects between variables are not captured (Figure ). Often this approach leads to erroneous conclusions about the true optimal reaction conditions. ,, Additionally, since each variable is treated independently, a minimum of 3 reactions (high, middle, and low) is required to understand the effect of each variable. Finally and perhaps most troubling is that systematic optimization of multiple responses, such as in the case of asymmetric reactions where yield and enantiomeric ratio are optimized together, is not possible with this method. Instead, chemists must separately perform an additional optimization for other responses (such as selectivity).

Design of Experiments (DoE) methodology has emerged as a workhorse method in chemical industry, assisting chemists in the optimization of chemical reactions. ,− DoE uses statistical methods to shrink the number of reactions necessary to capture the effects and dependency of variables on the response(s) of interest. DoE simply requires chemists to define feasible upper and lower limits for each independent variable they are interested in exploring. For instance, a chemist could explore temperatures between 0 and 75 °C, catalyst loading from 1 mol % to 10 mol %, ligand stoichiometry from 1 eq. to 3 eq., and concentration from 0.1 M to 0.3 M. In a traditional OVAT optimization, this requires an undefined number of experiments to gather the effects of each variable and how the variables interact with one another. Additionally, despite these optimization efforts, the OVAT optimum may not be the true optimal conditions because the portion of chemical space sampled with OVAT is restricted (Figure , left). Conversely, DoE simultaneously tests multiple variables in each experiment, enabling the design to account for effects between variables, and completely models the chemical space ,, (Figure , right). Typically the number of experiments (dependent on the DoE design) scales with 2 n or 3 n number of experiments (where n is the number of variables tested). DoE also utilizes statistical tests to determine a variable’s significance early in the optimization process, enabling chemists to shrink the overall number of variables tested in an optimization. By using DoE the chemist gains (1) material cost-savings, (2) time-savings in experimental setup and analysis, (3) complete understanding of variable effects, and (4) a systematic approach to optimizing multiple responses.

When our laboratories learned of the principles of DoE and its ability to reduce the number of experiments necessary to optimize reaction outcomes, we were excited to implement DoE methodology in our own research. Unfortunately, after reading through existing articles and comprehensive resources, we were left ill-equipped on where to start. After much trial and error, we learned valuable lessons that inspired us to create a tutorial start-up guide that we wish we had had at the outset.

Academic researchers who have limited time and resources should theoretically be most served by implementing DoE. However, OVAT optimization persists in both academic publishing and methodology training despite the aforementioned advantages. Why has academia been slow to adopt DoE into the development of chemical reactions? We believe that the slow adoption arises from long-standing beliefs that DoE is complex and requires significant expertise in statistics to implement. For this reason, the “activation barrier” to utilizing DoE is thought to be more time-intensive than simply using existing intuitive OVAT methods. The purpose of this work is to disrupt these norms by providing the first entry-level guide on how to practically implement the DoE methodology as a synthetic chemist. We take special care to provide a jargon-free working knowledge of the statistics behind DoE. In-depth discussions of DoE methodology and the statistical principles behind the approach are available in other resources in greater detail and are outside the scope of this discussion. ,,−

A flowchart describing the suggested DoE workflow for reaction optimization is provided to guide the reader through each step (Figure ). The section headings of this article match the numbered boxes in the flowchart, with key takeaways at the end of each section. This work emphasizes considerations and best practices when developing a DoE optimization, highlights commonly controlled variables in synthesis, and simplifies the logical workflow for a DoE optimization. Additionally, explanations on when to use certain DoE design types, troubleshooting tips, and options for data visualization are provided. Finally, we provide two sample case studies that illustrate the strategy in practice. As the field of chemoinformatics continues to develop, it is imperative that synthetic chemists begin to interface with and utilize predictive models and computer-assisted experimental design.

2.

2

Flowchart describing suggested DoE workflow for reaction optimization. Numbers correspond to the section numbers in the text describing special considerations and best-practices for each step.

Statistical Principles of DoE Methodology

The first step in optimizing any reaction is deciding what outcome should be optimized. Commonly, the yield of the reaction is the desired outcome, but selectivity factors (such as stereoselectivity) are often considered. In DoE, these responses can be modeled by the equation in Figure A. While initially this equation looks complicated, each component of the equation (highlighted with different colors, Figure A) can be deciphered without any prior statistics knowledge. The response of the system (for example, chemical yield) is equal to a base constant (β0) plus a modifier (β1) for each variable (x 1) in the reaction which can positively or negatively affect the yield. These variables (β1 x 1, β2 x 2, β3 x 3, etc.) are referred to as the main effects, and each variable is evaluated for its impact on the response without interaction from any other variables. The data for variable main effects will feel intuitive to most synthetic chemists, as this is similar to the information captured from OVAT optimization methods. Often in OVAT it is assumed that variable main effects comprise the most significant contribution to the overall response.

3.

3

(A) Design of Experiments equation which is the statistical model for the response (often chemical yield). (B) Basic introduction to common DoE design types which increase in complexity as new variable effects are added to the modeling equation.

Different types of designs will include different combinations of these terms. A fraction factorial design will typically only describe variable main effects (β1 x 1, β2 x 2, β3 x 3, etc.) which explain how a higher or lower value will affect the response (Figure B). The next level, a two-level full-factorial design, adds interaction terms (β1,2 x 1 x 2, β1,3 x 1 x 3, etc.) that capture effects between variables (Figure B). For example, the reaction being investigated could have an interaction effect where temperature and reagent stoichiometry affect one another. In practice, main effects and interaction effects are often captured simultaneously to determine which variables are the most significant and to eliminate insignificant variables. Once the most significant effects are determined, the synthetic chemist may choose to use a response surface design to determine the precise value of each variable that produces the optimal response. This design type includes squared (quadratic, x 2) variable (β1,1 x 1 x 1, β2,2 x 2 x 2, etc.) terms to evaluate any nonlinear effects and give curvature to the overall response surface (Figure B). For example, the reaction being investigated could perform optimally at a temperature of 60 °C, and the yield could be diminished at both higher and lower temperatures than 60 °C. Details for when to choose each design type are discussed in later sections below.

1. Response Considerations

The next step of any DoE is determining the response(s) or variable outcome(s) from the experiment that should be optimized. In synthetic method development, these outcomes are often percent yield and/or selectivity factors such as diastereoselectivity, enantioselectivity, or regioselectivity. A major benefit of DoE is that multiple responses can be systematically optimized at one time, compared to OVAT optimization where the treatment of only one response at a time is possible. Often OVAT optimizations of more than one response result in conditions that are a compromise between yield and selectivity factors rather than a true optimization for the responses.

DoE utilizes a statistical framework that determines the relationships between variables and their effects on the responses being monitored, enabling the location of the true optimum for multiple responses at a time. Synthetic chemists might consider using multiple responses to minimize certain variables, such as the stoichiometry of an expensive reagent or catalyst. In industrial process chemistry, DoE has been utilized to reduce the formation of unfavorable side-products, minimize cost, and reduce chemical waste. ,, Because the desired response is not always maximization, a desirability factor is included in the analysis to guide the user toward conditions that produce the desired outcome, whether it is maximization, minimization, or a balance of multiple responses (more on this in Section 4).

A common problem encountered during the implementation of the DoE is that responses should provide a quantifiable amount of your desired product for all required analyses. Responses that yield 0% of the desired product or a nonselective 50:50 mixture are considered empty data points which do not contribute to the optimization. In comparison to traditional OVAT optimization where 0% yield indicates that the combination of variables is simply not productive, in DoE too many null results can create severe outliers, which skews the overall optimization. For this reason, DoE is often challenging to implement as a tool for pure reaction discovery and performs best when optimizing reactions after initial exploration of preliminary reaction conditions.

Key Takeaways

  • Unlike OVAT, DoE provides a systematic framework for the optimization of multiple reaction responses, using fewer experiments, which can be either maximized or minimized at the same time (e.g., yield and stereoselectivity, yield of product and side-products).

  • In practice, DoE works best when optimizing reaction conditions after initial conditions are identified and is not an effective tool for pure reaction discovery.

  • Experimental results for responses that result in no yield or nonselective reactivity skew the optimization in DoE. (See Section 5 Troubleshooting DoE for more information.)

2. Variable Considerations

When reaction variables are considered, there are two types of variables: continuous and categorical. Continuous variables are those that have an infinite number of values between two set points. These include factors such as temperature or concentration where the synthetic chemist may set these parameters to a high or low value and any value in between. Categorical variables are those that do not have a spectrum of values, such as the choice of solvent, metal catalyst, type of ligand, or presence of additional reagents. The primary strength of DoE is the evaluation of continuous variables, although a limited set of categorical variables may also be included. Categorical variables may require evaluation in combination with OVAT as part of the initial reaction discovery prior to DoE.

How to Add Categorical Variables without Getting Overwhelmed

To simplify the interpretation of the DoE results, we suggest:

  • (1)

    Identify ”best” initial conditions that produce appreciable or even measurable yield of product.

  • (2)

    Take the categorical variables from (1) and perform an evaluation of the continuous variables using DoE.

  • (3)

    Optimized conditions from (2) can be used to evaluate new categorical variables (i.e., start back at (1)).

The practical reality is that categorical variables described by many parameters (such as solvent, ligand, or catalyst) are often extremely significant. It is necessary to determine which set of categorical variables performs best and remove any insignificant variables before proceeding to subsequent higher-order DoE designs (more on this in Section ).

A Note on Describing Categorical Variables as Continuous Variables

One option to evaluate categorical variables in the context of DoE is to utilize a continuous parameter to describe the categorical variable. For example, monodentate phosphine ligands can be described by their Tolman cone angle or percent buried volume (%V bur ). One can imagine describing these ligands using one or more of these parameters in the DoE. However, a potential drawback from the inclusion of ligands as continuous variables is that the optimal cone angle, %V bur , or ligand electronics that results from the DoE optimization may not exist, and effects from ligand denticity may not be accurately described by DoE. For those interested in describing ligands as continuous variables, advanced resources such as the KRAKEN phosphine ligand database are available. For beginners, we do not suggest implementing categorical variables in this way, as conclusions from the data can be challenging, especially when comparing ligands from different classes.

Solvent is another potentially parametrized categorical variable as it can be described by its dielectric constant (ϵ) and utilized directly in the DoE. Caution is still warranted, however, because dielectric constant does not always capture the full effect of a solvent on the chemical reaction and may not consider crucial factors such as solubility of each reagent or hydrogen-bond donating and accepting ability. Further, conclusions from a DoE that suggest a specific solvent dielectric may be meaningless if a solvent with that dielectric constant is not available or not compatible with the reaction conditions. It is our opinion that the inclusion of solvent in DoE is often cumbersome and may overcomplicate the optimization process, especially for beginners.

How to Define the Upper and Lower Limits for Each Variable

When determining the range (or chemical space) for each continuous variable being evaluated in the DoE, synthetic chemists must use their chemical intuition to identify upper and lower limits that ensure that each reaction results in a quantifiable response. For example, if it is known from the initial reaction discovery that the transformation requires elevated temperatures, performing the reaction at 0 °C will likely result in no product yield and lead to an empty data point for that experiment. Instead, a range of 50 to 100 °C would provide better insight into how temperature impacts the reaction, while still providing a quantifiable response. Variable boundaries should be set as wide as possible to ”stress test” the reaction but not insofar as to lead to null results. For beginners, this can pose a challenge as the chemist may not know just how sensitive the reaction conditions are to certain variable adjustments. We advise using a combination of chemical intuition and preliminary data from reaction discovery when setting the variable boundaries, keeping in mind that adjustments to the limits for each variable can be performed in subsequent DoE designs.

Key Takeaways

  • Commonly controlled continuous synthetic variables (easily modeled with DoE): Temperature, concentration, reagent stoichiometry, catalyst loading, reaction time.

  • Commonly controlled categorical synthetic variables (not easily modeled with DoE): Solvent, ligand, catalyst type, inclusion of reaction additives (e.g., sacrificial reagents for catalytic turnover).

  • Ensure that both the upper and lower limits of each variable result in a quantifiable response (0% yield is not useful).

3. Choosing a Design Type

DoE has many different types of structured approaches (designs), each with their own level of complexity. With all of the available options, it can be overwhelming to choose which design type will be the best fit for a reaction optimization. Below, we disclose a simple two-step option as a starting point that will work for most reaction optimizations. If this process does not fit your needs, there are more complex design options available.

Start Here! 2-Level Full Factorial Design

In Figure , we describe three design types, each with increasing complexity. While we showed the fractional factorial and full-factorial designs separately, these exist on a continuum where additional terms can be added to evaluate the response and obtain higher resolution of the chemical space. Often in OVAT optimization, we assume that the variable main effects have the most significant effect on the response, and we ignore any interactions between variables. However, as we’ve stated previously, the benefits of DoE are its ability to optimize multiple effects at once and provide information about the interactions between variables.

Typically, interactions between just 2 variables (second-level interactions) are the most significant interaction effects in a chemical optimization. While it is possible for 3 or more variables to have interactions which depend on one another, these interactions often comprise insignificant effects on the overall response. The additional reactions required to evaluate these interactions of 3 or more variables may be eliminated from the design. Therefore, we suggest a 2-Level Full Factorial Design as the ideal starting point to evaluate the variables and cover a significant amount of chemical space without needing to conduct an exorbitant number of reactions. In a 2-level full factorial design each variable is evaluated at a high and low value which allows for both main effect and variable effects to be evaluated on the response of interest. This design type scales with 2 n number of reactions; however, if this is too many reactions for the number of variables being evaluated, you may consider a fractional factorial design to first evaluate if any variables are insignificant (more on this in Section 4). We believe in practice the 2-Level Full Factorial Design will be the most effective starting point for the majority of synthetic optimizations. Examples of how to set up this design in JMP are provided in the Supporting Information.

The Next Step: Response Surface Design

If the 2-Level Full Factorial Design has not met the optimization goal, the next method we suggest is a Response Surface Design. Typically a second “higher level” optimization is needed because the 2-level linear model cannot fully describe the curvature of the chemical space, thereby indicating whether an intermediate value between your high and low points would yield the best results. If improved reaction conditions were all that was needed and a complete optimization with curvature is not required (e.g., when needing to generate an important starting material), validating the “best” conditions from the linear model may be sufficient.

Response Surface Designs include a middle value in the variable evaluation, which allows for quadratic effects from the DoE equation (Figure A) to be included. For example, evaluating temperature between 25 and 75 °C in a linear model will describe whether 25 or 75 °C results in the better response, but a quadratic model is able to predict that 55 °C is the optimal temperature. As a best practice, the most significant variables should be evaluated in a Response Surface Design, as the number of required reactions in this design type is much greater (3 n ) than in linear designs. If many variables (>3) are significant, we suggest choosing the three most significant variables and setting the remaining variables to the best conditions, as predicted by the full factorial design. However, additional variables may be added if a greater number of reactions is feasible or if analysis is simple.

A Note on the Importance of Design Resolution

When deviating from the prescribed DoE design types above, special care must be taken to ensure that the resolution of the design enables the type of analysis desired. Consider a design with 4 variables: A, B, C, and D. In the 2-Level Full Factorial Design, the variable main effects (A, B, C, and D) as well as binary effects (AB, BC, AC, BD, CD, and AD), and higher-order interactions (third- and fourth-level interaction effects such as ABC, ACD, ABD, BCD, and ABCD) will be included.

In a Fractional Factorial Design, typically implemented when there are many variables to be evaluated, interaction effects are confounded (sometimes called aliased) with one another in order to shrink the number of reactions required. This means that more insignificant interactions, such as the third- and fourth-level interactions, may be “grouped” within other variables. In simplified terms, Fractional Factorial Design operate under the assumption that these higher-order interactions will have an insignificant effect on the overall response. Therefore, fractional factorial designs are particularly useful for determining variable significance, but special care should be taken not to implement them for analysis of complex interaction effects without understanding which variable interactions are confounded. Further discussion on when to utilize Full Factorial versus fractional factorial designs, as well as design resolution, are available in greater detail in these references. ,

This section discusses design types only in general terms. A step-by-step explanation for practically setting this up in JMP has been provided for Case Study II (discussed below) in the Supporting Information.

Key Takeaways

  • For most optimizations, a 2-Level Full Factorial Design will be a sufficient starting design.

  • The Response Surface Design can be used to further refine the reaction and obtain the exact middle values of each variable that result in the optimal response.

  • In most cases, these two design types will result in optimal conditions, though more complex design types are available.

  • Special care should be taken when selecting design types with lower resolution to ensure the correct conclusions are made from the DoE data.

4. Performing Experiments, Data Analysis Considerations, and Model Validation

For the synthetic chemist, understanding and interpreting the statistical results from the DoE can be another barrier to entry into utilizing DoE for reaction optimization. However, there are key pieces of information that provide insight into which variables are significant in the reaction and which set of variables gives the optimal response. There are more in-depth statistics tutorials available, , but the average synthetic chemist can interpret the DoE results with a rudimentary understanding of statistics.

DoE is available in software toolkits for statistical analysis (e.g., JMP, Design-Expert, and MODDE) as well as packages within a coding suite/langauge (pyDOE or R). JMP has introduced academic licensing plans that are available for free for researchers and students. We have included a step-by-step guide to setting up and analyzing DoE results in the Supporting Information using JMP. We believe that JMP is a great software package for DoE beginners as they have many support articles available on their Web site and visualization tools built into the software.

How to Determine Which Variables Are Statistically Significant

Variable significance refers to the degree of impact a variable has on the response. The level of significance for each variable (or interaction between variables) is depicted by a p-value, where a lower value indicates a greater confidence that the variable is significant. In statistics, a typical significance threshold is a p-value below 0.05 (meaning 95% confidence that the variable has a significant effect on the response). In practice, it is important to note that this p-value of 0.05 is not a hard requirement for significance, as error in the reaction analysis must also be considered. For example, a p-value of 0.06 does not necessarily mean that the variable has no impact on the response. It is our suggestion to use both p-values and your own chemical intuition to determine the most significant variables to be re-evaluated and further optimized. In the event that many (or all) variables or interactions are determined to be significant, consider shrinking the variable boundaries, introducing curvature into the design, or increasing the number of replicates in order to limit error.

A Note on Desirability Functions and DoE Optimizations with Multiple Responses

When optimizing multiple responses in DoE often the ideal variables for maximizing yield may not be the ideal set of variables to maximize another response (for example, % e.e.). In these cases, there is a degree of compromise which occurs to locate ideal conditions that satisfy both of the responses. In JMP, the software will request the user to specify whether they would like to maximize or minimize a response. After the data from the experiments is entered, a desirability function is generated for each response individually. In its simplest form, these desirability functions translate the response data into a scale between 0 and 1 where 1 is the most desirable outcome (i.e., 100% yield, or 99% e.e.). , Then each individual desirability function is averaged to create an overall desirability function capable of weighing multiple responses at the same time. In practice, the desirability function is less helpful for one response, and it is most straightforward to use the response value directly (see Prediction Profiler in Figures S7 and S8 in Supporting Information).

Validating the DoE Model

After determining the significant variables and performing subsequent optimizations, the optimal conditions should always be re-evaluated to ensure that the DoE optimized conditions are valid and that the predicted optimal conditions lead to similar and reproducible results (see Figure ). This means actually running the “optimized” conditions and validating whether they produce results similar to those of the conditions from the model. DoE should not be considered a metaphorical “black box” capable of always providing a true optimization answer. If the predicted conditions do not lead to optimal or consistent results, refer to the Troubleshooting section of this guide.

5. Troubleshooting DoE

Once the results of a DoE are obtained, the synthetic chemist may need to troubleshoot certain situations to achieve the desired results. If several reactions resulted in no measurable response, then the boundaries of the chemical space may need to be adjusted. For example, if each reaction trial run at higher temperatures results in no yield, then the temperature boundaries should be lowered to explore a range of lower temperatures that will lead to measurable yield.

If the statistical analysis states that none of the variables or interactions are significant, then there may be another variable that has not been included in the design that should be considered. There may also be some source of systematic error leading to insignificant results. In the DoE design, we recommend including multiple center points to check for systematic error, as they should lead to similar results in each trial. Additionally, the DoE experiments may be separated into blocks that are run in separate sets to identify systematic error between experiments performed on different days. Finally, categorical variables such as the choice of solvent, ligands, or additives may also be reevaluated.

After the results of a DoE are obtained, if the reaction conditions do not meet the desired level of optimization or yield consistent results, the most significant variables may be reevaluated in a higher-resolution design, such as a Response Surface Designs. Often times linear models cannot accurately describe the chemical space, and quadratic models (though more expensive) can lead to better optimization. If neither design yields a reproducible model, consider constructing a smaller design space around the ideal conditions and increasing the number of replicates in the design.

Case Study I: Thiooxime Ester Condensation

The reported conditions for oxime condensation between S-benzylthiohydroxylamine (SBTHA, 1) and acetophenone (2) did not furnish sufficient yield of condensation product, thiooxime ester (3), typically <40%. Unfortunately, a large excess (>10 equiv) of 2 was required for the reaction, and this excess proved difficult to separate from the desired product 3. We therefore sought to utilize DoE to understand the variable main effects in the synthesis of 3. Since 3 was an intermediate in our synthetic route and derived from readily accessible starting materials, we utilized a Fractional Factorial Design. In this case the interaction effects between variables are confounded and therefore sacrificed for the sake of fewer overall reactions. Instead of describing the solvent or acid catalyst as continuous variables, we utilized the previously reported solvent and acid catalysts from literature and optimized all other continuous variables (e.g., concentration, temperature, stoichiometry). For the variable boundaries we explored the effects of temperature (23–61 °C), equivalents of 2 (1–5 equiv), acid catalyst loading (5–25 mol %), and reaction concentration (0.5–2 M).

From the DoE design we generated the following reaction space shown in Table , and after 9 reactions the model showed that higher catalyst loading of 25 mol % was generally preferred, and serviceable yields could be maintained with only 5 equiv of the ketone partner 2. Analysis of this data using the surface profiler tool in JMP (Figure ) shows that increasing temperature negatively affected the yield of the reaction and that increasing the TFA catalyst loading and higher reaction concentration could enable consistently good yields (>95%) without using an excess of 10 equiv of 2. It is important to note that optimization could have been performed on this system to understand variable interaction effects, curvature, and a complete optimization model of the chemical space. However, for our use, a reaction which provided >80% yield with a reduction in reagent stoichiometry was enough utility that no higher-order DoE design was needed.

1. DoE Optimization for the Synthesis of Thiooxime Esters.

graphic file with name cs5c01626_0007.jpg

entry equiv.2 catalyst (mol %) [1] (M) temp. (°C) yield 3 (%)
1 5 25 0.5 23 82
2 5 5 2.0 23 81
3 5 25 2.0 61 76
4 3 15 1.3 43 75
5 1 25 2.0 23 70
6 5 5 0.5 61 63
7 1 25 0.5 61 46
8 1 5 0.5 23 37
9 1 5 2.0 61 17
a

Yield determined by NMR integration against anisole internal standard.

4.

4

Surface Profiler from JMP showing each variable effect on the response (percent yield of 3) of the reaction.

Case Study II: Oxaborane Catalyst Optimization

Oxaboranes, a class of boron–oxygen heterocycles, are produced via a nickel-catalyzed dearylative cyclocondensation of aldehydes and alkynes with triphenylborane. Initial studies of this reaction produced the oxaborane product in 70% yield in the presence of 10 mol % of Ni­(cod)2 as the precatalyst. DoE was leveraged to increase the yield of the reaction and lower the loadings of catalyst and triphenylborane. First, the categorical variables were evaluated, including ligand and solvent. Tributylphosphine was the ideal ligand, and changing the solvent from toluene to THF produced the oxaborane product in quantitative yield.

The initial DoE was a linear model that evaluated the significance of four continuous factors: catalyst loading, reaction concentration, temperature, and loading of triphenylborane. After 14 reactions, the model showed that temperature and concentration did not have a significant effect on the yield of the reaction in the ranges tested, but increased amounts of triphenylborane produced a higher yield of the oxaborane product.

A second DoE was run by using a Response Surface design to reevaluate the catalyst loading, temperature, and loading of triphenylborane (Table ). Although temperature showed a low level of significance in the linear model, we theorized that this was due to the high range of temperatures evaluated in the DoE. Lower temperatures were predicted to be slightly more optimal, so we decided to reevaluate temperature at a lower range and included this in the quadratic model. After 15 reactions, the JMP prediction profiler showed that the optimal temperature was in the middle of the range tested (Figure ).

2. DoE Optimization for the Synthesis of Oxaboranes.

graphic file with name cs5c01626_0008.jpg

entry catalyst (mol %) BPh3 (equiv) temp. (°C) yield 6 (%)
1 1 2 50 99
2 1 1.1 50 58
3 5 2 25 99
4 1 1.5 25 33
5 1 1.5 75 99
6 10 1.5 75 31
7 5 1.1 25 82
8 10 2 50 99
9 5 1.5 50 98
10 10 1.1 50 76
11 10 1.5 25 80
12 5 1.5 50 93
13 5 1.1 75 55
14 5 2 75 77
15 5 1.5 50 89
a

Yield determined by NMR integration against dibromomethane internal standard.

5.

5

Surface Profiler from JMP showing each variable effect on the response (percent yield of 6) for the reaction.

In an effort to minimize the amount of catalyst used, the equivalents of triphenylborane was increased to maintain a >90% yield. It is worth noting that the prediction profiler may show a percent yield response greater than 100% (which is, of course, not possible). Instead of only one specific set of reaction conditions that leads to quantitative yield, there may be a range of reaction conditions that all lead to quantitative yield, so any predicted yields greater than 100% refer to reaction conditions that result in quantitative yield. While 1 mol % catalyst (with 1.5 equiv of triphenylborane at 50 °C) furnished our desired product in high yield (Figure ), 5 mol % was utilized in the optimized conditions as this catalyst loading consistently led to near quantitative yield of the oxaborane product. The response surface plots (Figure A,B) show the curvature of the reaction surface achieved through the quadratic model, allowing for a visual interpretation of the effect of each variable on the percent yield of the reaction.

6.

6

(A) 3D response surface analysis of yield of oxaborane 6 versus catalyst loading and equivalents of triphenylborane at 50 °C. (B) 3D response surface analysis of yield of oxaborane 6 versus catalyst loading and temperature with 1.5 equiv of triphenylborane.

Other Helpful Reviews & Resources

Design of Experiments for Nanocrystal Syntheses: A How-To Guide for Proper Implementation

Chemistry of Materials 2022, 34, 9823–9835

While this article utilizes DoE for optimizing the properties of nanocrystal materials, synthetic organic chemists will find utility in the article’s more advanced explanation of core DoE concepts. We recommend this article as a logical next step for beginners looking for an intermediate guide to DoE.

The Application of Design of Experiments (DoE) Reaction Optimization and Solvent Selection in the Development of New Synthetic Chemistry

Organic & Biomolecular Chemistry 2016, 14, 2373

This article provides an open-source solvent map generated with principle component analysis (PCA) and discusses parametrizing solvents for DoE.

Design of Experiments (DoE) and Process Optimization. A Review of Recent Publications

Organic Process Research & Development 2015, 19, 1605–1633

This article provides a review of examples where DoE was utilized in industrial process chemistry to assist in impurity rejection and scale-up.

Efficiency by Design: Optimization in Process Research

Organic Process Research & Development 2001, 5, 308–323

This article provides a framework for DoE implementation in industrial process chemistry and guides the reader through a case study.

A Brief Introduction to Chemical Reaction Optimization

Chemical Reviews 2023, 123, 3089–3126

For those interested in other ways to optimize chemical reactions, this review provides an entry point for understanding other approaches to reaction optimization such as kinetic modeling, optimization algorithms, high-throughput experimentation (HTE), data mining, and machine learning.

Conclusions

Design of Experiment methodology represents a powerful tool for the optimization of chemical reactions. The avoidance of its use in academic settings has long stemmed from the lack of an entry-level guide to provide a basic starting point for implementation. The framework provided here enables new DoE users to rapidly understand the best practices which underlie the selection of DoE responses, variables, design types, and data analysis. Case studies I and II provide examples of this logical flow in practice with careful explanation of data interpretation provided in the main text and Supporting Information guide. It is our hope that as the academic community continues to embrace the field of chemoinformatics, statistically guided data analysis like DoE will be commonplace. We believe that this guide will enable future scientists to quickly understand and apply DoE concepts in their research.

Supplementary Material

cs5c01626_si_001.pdf (1.1MB, pdf)

Acknowledgments

The authors acknowledge the National Science Foundation under award number 2404390.

The data underlying this study are available in the published article and its Supporting Information.

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acscatal.5c01626.

  • General methods, step-by-step guides to 2-Level Full Factorial Design setup in JMP and to Response Surface Designs setup in JMP, references (PDF)

‡.

B.J.W. and M.T.K. conceptualized and developed the original draft of the manuscript. All authors contributed in the review and editing of the final manuscript.

The authors declare no competing financial interest.

References

  1. Carlson, R. ; Carlson, J. . Design and Optimization in Organic Synthesis: Second Revised and Enlarged ed.; Data Handling in Science and Technology; Elsevier Science, 2005. [Google Scholar]
  2. Owen M. R., Luscombe C., Lai Godbert S., Crookes D. L., Emiabata-Smith D.. Efficiency by Design: Optimisation in Process Research. Org. Process Res. Dev. 2001;5:308–323. doi: 10.1021/op000024q. [DOI] [Google Scholar]
  3. Wahid Z., Nadir N.. World Applied Sciences Journal. 2012;21:56–61. [Google Scholar]
  4. Lendrem D. W., Lendrem B. C., Woods D., Rowland-Jones R., Burke M., Chatfield M., Isaacs J. D., Owen M. R.. Drug Discovery Today. 2015;20:1365–1371. doi: 10.1016/j.drudis.2015.09.015. [DOI] [PubMed] [Google Scholar]
  5. Denmark S. E., Butler C. R.. J. Am. Chem. Soc. 2008;130:3690–3704. doi: 10.1021/ja7100888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Stone S., Wang T., Liang J., Cochran J., Green J., Gu W.. Organic. Biomol. Chem. 2015;13:10471–10476. doi: 10.1039/C5OB01154J. [DOI] [PubMed] [Google Scholar]
  7. Murray P. M., Bellany F., Benhamou L., Bučar D.-K., Tabor A. B., Sheppard T. D.. Org. Biomol. Chem. 2016;14:2373–2384. doi: 10.1039/C5OB01892G. [DOI] [PubMed] [Google Scholar]
  8. Bowden G. D., Pichler B. J., Maurer A.. Design of Experiments (DoE) Approach Accelerates the Optimization of Copper-Mediated 18F-Fluorination Reactions of Arylstannanes. Sci. Rep. 2019;9:11370. doi: 10.1038/s41598-019-47846-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Koeritz M. T., Burgett R. W., Kadam A. A., Stanley L. M.. Org. Lett. 2020;22:5731–5736. doi: 10.1021/acs.orglett.0c01607. [DOI] [PubMed] [Google Scholar]
  10. Bouveyron T., Bratenberg P., Bell P., Eisenacher M.. Catalysts. 2024;14(6):360. doi: 10.3390/catal14060360. [DOI] [Google Scholar]
  11. Kucmierczyk P., Duehren R., Sang R., Jackstell R., Beller M., Franke R.. ACS Sustainable Chem. Eng. 2022;10:4822–4830. doi: 10.1021/acssuschemeng.1c05871. [DOI] [Google Scholar]
  12. Weissman S. A., Anderson N. G.. Org. Process Res. Dev. 2015;19:1605–1633. doi: 10.1021/op500169m. [DOI] [Google Scholar]
  13. Aggarwal V. K., Staubitz A. C., Owen M.. Org. Process Res. Dev. 2006;10:64–69. doi: 10.1021/op058013q. [DOI] [Google Scholar]
  14. Lendrem D., Owen M., Godbert S.. Org. Process Res. Dev. 2001;5:324–327. doi: 10.1021/op000025i. [DOI] [Google Scholar]
  15. Linsley, M. ; McGeeney, D. . Design of Experiments for Chemists: Introductory Statistical Methods; Royal Society of Chemistry, 2023. [Google Scholar]
  16. Leardi R.. Experimental design in chemistry: A tutorial. Analytica Chimica Acta. 2009;652(1):161–172. doi: 10.1016/j.aca.2009.06.015. [DOI] [PubMed] [Google Scholar]
  17. Taylor C. J., Pomberger A., Felton K. C., Grainger R., Barecka M., Chamberlain T. W., Bourne R. A., Johnson C. N., Lapkin A. A.. Chem. Rev. 2023;123:3089–3126. doi: 10.1021/acs.chemrev.2c00798. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Goos, P. ; Jones, B. . Optimal Design of Experiments; John Wiley Sons, Ltd, 2011; pp 9–45. 10.1002/9781119974017 [DOI] [Google Scholar]
  19. Gensch T., dos Passos Gomes G., Friederich P., Peters E., Gaudin T., Pollice R., Jorner K., Nigam A., Lindner-D’Addario M., Sigman M. S., Aspuru-Guzik A.. J. Am. Chem. Soc. 2022;144:1205–1217. doi: 10.1021/jacs.1c09718. [DOI] [PubMed] [Google Scholar]
  20. Tolman C. A.. Chem. Rev. 1977;77:313–348. doi: 10.1021/cr60307a002. [DOI] [Google Scholar]
  21. Dorta R., Stevens E. D., Scott N. M., Costabile C., Cavallo L., Hoff C. D., Nolan S. P.. J. Am. Chem. Soc. 2005;127:2485–2495. doi: 10.1021/ja0438821. [DOI] [PubMed] [Google Scholar]
  22. Escayola S., Bahri-Laleh N., Poater A.. Chem. Soc. Rev. 2024;53:853–882. doi: 10.1039/D3CS00725A. [DOI] [PubMed] [Google Scholar]
  23. Newman-Stonebraker S. H., Smith S. R., Borowski J. E., Peters E., Gensch T., Johnson H. C., Sigman M. S., Doyle A. G.. Science. 2021;374:301–308. doi: 10.1126/science.abj4213. [DOI] [PubMed] [Google Scholar]
  24. Diorazio L. J., Hose D. R. J., Adlington N. K.. Org. Process Res. Dev. 2016;20:760–773. doi: 10.1021/acs.oprd.6b00015. [DOI] [Google Scholar]
  25. Tye H.. Drug Discovery Today. 2004;9:485–491. doi: 10.1016/S1359-6446(04)03086-7. [DOI] [PubMed] [Google Scholar]
  26. SAS Institute Inc ., JMP Pro 16.0. https://www.jmp.com/, 1989-2025; Accessed: 2025-04-20.
  27. Sartorius AG, MODDE. https://www.sartorius.com/en/products/process-analytical-technology/data-analytics-software/doe-software/modde/, 2025; Accessed: 2025-04-20.
  28. Stat-Ease Inc., MODDE. https://www.statease.com/software/design-expert/, 1988-2025; Accessed: 2025-04-20.
  29. Williamson E. M., Sun Z., Mora-Tamez L., Brutchey R. L.. Chem. Mater. 2022;34:9823–9835. doi: 10.1021/acs.chemmater.2c02924. [DOI] [Google Scholar]
  30. NIST/SEMATECH e-Handbook of Statistical Methods , 5.3.3.4.4. Fractional factorial design specifications and design resolution. https://www.itl.nist.gov/div898/handbook/pri/section3/pri3344.htm/, Accessed: 2025-04-20.
  31. Martinez, J.-M. ; Collette, Y. ; Christopoulou, M. ; Baudin, M. . pyDOE: The experimental design package for python. Scilab, 2009.
  32. R Core Team . R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
  33. Islam M. A., Sakkas V., Albanis T. A.. Journal of Hazardous Materials. 2009;170:230–238. doi: 10.1016/j.jhazmat.2009.04.106. [DOI] [PubMed] [Google Scholar]
  34. Harrington, E. C. The Desirability Function. Industrial quality control 1965, 21, 494–498. [Google Scholar]
  35. Derringer G., Suich R.. Simultaneous Optimization of Several Response Variables. Journal of Quality Technology. 1980;12:214–219. [Google Scholar]
  36. Foster J. C., Powell C. R., Radzinski S. C., Matson J. B.. Org. Lett. 2014;16:1558–1561. doi: 10.1021/ol500385a. [DOI] [PubMed] [Google Scholar]
  37. Wall B. J., Byerly-Duke J., VanVeller B.. Journal of Organic Chemistry. 2024;89:15312–15316. doi: 10.1021/acs.joc.4c01571. [DOI] [PubMed] [Google Scholar]
  38. Koeritz M. T., Banovetz H. K., Prell S. A., Stanley L. M.. Chem. Sci. 2022;13:7790–7795. doi: 10.1039/D2SC01840C. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

cs5c01626_si_001.pdf (1.1MB, pdf)

Data Availability Statement

The data underlying this study are available in the published article and its Supporting Information.


Articles from ACS Catalysis are provided here courtesy of American Chemical Society

RESOURCES