Abstract
Differential Evolution (DE) has become one of the leading metaheuristics in the class of Evolutionary Algorithms, which consists of methods that operate off of survival-of-the-fittest principles. This general purpose optimization algorithm is viewed as an improvement over Genetic Algorithms, which are widely used to find solutions to chemometric problems. Using straightforward vector operations and random draws, DE can provide fast, efficient optimization of any real, vector-valued function. This article reviews the basic algorithm and a few of its modifications with various enhancements. We provide guidance for practitioners, discuss implementation issues and give illustrative applications of DE with the corresponding R codes to find different types of optimal designs for various statistical models in chemometrics that involve the Arrhenius equation, reaction rates, concentration measures and chemical mixtures.
Key words and phrases: D-optimality, Evolutionary Algorithms, Experimental Design, Mixture Experiments, Reaction Rates
1. Introduction
Nature-inspired, metaheuristic approaches to solve all kinds of optimization problems have skyrocketed in the last few decades, especially in engineering and computer science [1–2]. Some key reasons for their popularity are their speed, simplicity, flexibility, availability of computer codes and ease of implementation. Additional compelling reasons for their widespread use are: (a) they do not require any assumptions, so they can be applied to solve very different types of optimization problems, including those in which the objective function is non-differentiable, multi-modal, multi-objective or high dimensional, and (b) they tend to provide exact or approximate solutions of high quality to complicated optimization problems even though there is often no rigorous proof of convergence.
As the name suggests, nature-inspired metaheuristic algorithms are motivated by the behavior of animals or other natural processes. Some examples of popular nature-inspired metaheuristic algorithms are Genetic Algorithms [3], Cuckoo Search [4], Ant Colony Optimization [5], Grey Wolf Optimization [6], Jumping Frogs Optimization [7], Bat Algorithm [8], and Particle Swarm Optimization (PSO) [9], among many others. While all being in the same general family of evolutionary algorithms, each of these methods operates with a different number of tuning parameters and behavioral characteristics. Naturally, some tend to perform better than others in selected situations. [1–2] provide an overview of the development of nature-inspired metaheuristic algorithms and show systematically how they now dominate the optimization literature for solving real world problems.
In the field of metaheuristic algorithms, there are two primary classes of algorithms: swarm intelligence and genetic representations. The former includes methods in which a group of agents work together to explore the search space and share information as necessary. While many of the algorithms listed above fall into this category, we will instead focus on the latter in this work. Methods in the sub-class of genetic representations involve a set of agents, each made up of chromosomes and genes, in an evolutionary, survival-of-the-fittest battle to be part of the best generation. Fraser appeared to be the first to study candidate solutions in the framework of genetic representations [10]. From this framework, two main algorithms have been developed. Holland established the original Genetic Algorithm (GA) in 1975, and more recently as computational power has grown, Price and Storn created Differential Evolution (DE) in 1997 [11 – 12]. Since then, many extensions have been proposed to improve their performance in different situations [13 – 15].
The primary difference between these two algorithms is that GA is used to search exclusively over discrete spaces while DE can also be used over any continuous space. Current optimization literature suggests that in many situations, DE produces better, more stable solutions than GA [16 – 17]. DE employs only vector operations and random number generators, allowing it to compete more effectively with the speed of the simpler, yet limited, GA. Additionally, DE has been shown to work well for complex problems, such as solving multiple-objective problems, finding clustering weights, and training neural networks [15]. However, despite these advantages, GA is still more well known and popular than DE in many disciplines. For example, in chemometrics, there are well over 100 recently published articles that use GA as of June 2019, but only a handful that apply DE, see [18 – 20] and [21 – 23]. These few articles do not provide details and implementation issues for DE or its many variants. Interestingly, DE is also rarely used for optimization in the statistical literature. To date, there are only a handful of papers in statistical journals, that use DE as a general tool for solving various types of optimization problems [24 – 26].
The aim of this paper is to describe DE in more detail and show the usefulness and versatility of DE as yet another powerful tool that requires few, if any, assumptions to solve complex optimization problems. In what follows, we review the fundamental concepts of DE and demonstrate how it solves optimization problems in chemometrics through the design of efficient chemical experiments. We do not compare the performance of DE with its competitors, such as PSO or Cuckoo search, since several papers, including [27], have reported that DE outperforms several other nature-inspired metaheuristics.
In the next section, we describe the key components of DE, the various steps of the algorithm and both the empirical and theoretically-motivated rules for tuning its parameters. Section 3 briefly reviews optimal design methodology and Section 4 demonstrates how DE can be applied to find optimal designs for various linear and non-linear models. Section 5 concludes with a discussion on recent enhancements of the basic DE. Some of these improvements are targeted for solving specific types of optimization problems and others are more on the algorithm itself, such as more effective ways for finding suitable tuning parameters. Codes for generating the optimal designs discussed in this paper can be found in the supplementary materials.
2. DE Algorithm and Developments
Differential Evolution is a general purpose evolutionary algorithm and was proposed by Rainer Storn and Kenneth Price in 1997 as a means of quickly optimizing functions that do not necessarily have nice properties, like differentiability or continuity, which are required for many standard optimization procedures [11, 28]. The goal of DE is to find the optimal solution efficiently and to do so in an easy-to-implement way. Below we present the standard formulation of DE and then discuss additional considerations for choosing the key features that determine its computational cost and convergence.
2.1. Algorithm Overview
Without loss of generality, suppose we want to minimize a real-valued function with V variables by finding such that h(x*) ≤ h(x) for all . The search space of candidate solutions is defined by the limits of each of the V variables and constitute the landscape of the fitness function. These limits may be specified naturally by the application or selected by the experimenter.
Having defined the problem, the next phase is to initialize the DE algorithm. The first step is to choose the number of candidate solutions per generation, P, also known as the population size. Each solution is represented by a vector of length V, so each generation of candidate solutions has dimension V × P. In addition to the population size, there are two parameters F and CR to be selected. We defer discussing the roles and bounds of these parameters but note that there are several approaches for choosing their values. The final step in the initialization process is to specify a stopping rule or condition. In our case we consider a stopping condition defined by a maximum number of generations Gmax. Once these preliminary steps are complete, the five basic steps for the DE algorithm are as follows:
Genetic Representation: The initial population must be of size P > 4 to ensure that there is enough genetic diversity. Members in the population are candidate solutions of the optimization problem and are called agents. Each is represented by a vector of length V and labeled as . These comprise the first generation of solutions. The initial value for each entry in each agent is randomly chosen over the interval specified for the particular variable.
Mutation: Much like GA, a mutation process helps to expand the search space of DE. For each target vector , this mutation produces a “donor” vector by adding the weighted difference of two agents to a third, all randomly chosen and distinct from the target. For this process the weighting factor F is chosen on the interval [0, 2]. Thus, with this constant, P vectors are created according to , where r0 ≠ r1 ≠ r2 ≠ i. is referred to as the base vector used for generating donor vector . Depending on the bounds and choice of F this process may in some cases lead to values that fall outside of the acceptable region for each dimension. There are many strategies in the general optimization literature for dealing with this problem, but we do not explicitly discuss them here.
- Crossover: Crossover blends the current generation of agents with the population of donor vectors in order to form candidates for the next generation known as “trial” vectors. This process differs from the crossover mechanisms typically used in GA in that a decision is made for each element of the vector and not at a few defined points. In DE this technique requires the crossover constant CR, chosen from [0, 1]. For each i from 1 to P, one of the V elements of is randomly selected to directly enter the trial vector . In this way one variable is forced to change so that each will certainly differ from its original target vector. Next, with probability CR more elements are taken from and placed in the trial vector. Whichever variables do not take their value from the donor vector inherit their original value from . Assuming variable j is randomly chosen to take its value from the trial this process is driven by the following equation:
(1) - Selection: Selection creates the next generation of agents by comparing each target vector to its respective trial vector. Whichever is measured to be the most fit using h becomes . In the case of minimization this process is given by
(2) Repeat: Repeat steps 2 through 4 over many generations until Gmax is reached or another specified stopping condition is satisfied.
These 5 steps are summarized in Algorithm 1.

2.2. Parameter Tuning
The DE algorithm has only 3 tuning parameters, P, F, and CR, which is fewer than many other metaheuristic algorithms, but their values play a large role in the convergence of the algorithm. Improper selection of their values could cause the algorithm to become stuck in a local optimum if the agents do not efficiently explore the space. The process for tuning each parameter is in part driven by the application of interest and prior knowledge of the function landscape. However, it is also important to consider whether any modifications should be made to the standard algorithm and how these may affect the selection of parameter values. In this section we describe some of the adaptations at each step in basic DE and discuss general guidelines for tuning each parameter. For a full study of the individual parameters and choices for other adjustable features of DE, see [12]. There are also several variants of DE that change the underlying process and in some cases add additional parameters or change the usage of the existing ones. We review some of the most substantial of these variants and hybrid algorithms in Section 5.
In the initialization phase, an important choice is the number of agents P to be used per generation. The search space itself is usually determined by the application of interest, but the value of P must be carefully considered to enhance performance. In setting this parameter it is important to strike a balance between having enough points to explore the space and the computational time of the program. For example, in a space where P = 6 gives sufficient new information in each generation for the algorithm to converge in 100 generations, choosing P = 12 may not provide any more information per generation but will double the time complexity of each one. If the fitness function is computationally difficult this could result in a drastic slow-down of the method. In fact, in many instances in which the stopping criteria is a fixed number of generations Gmax it is better to fix an appropriate number of allowable function evaluations and maintain that P × Gmax does not exceed this number for all choices of P. This approach necessitates balancing between P and Gmax to achieve the best result. Beyond this, a general rule of thumb is that P should be at least 10 times the number of function inputs V to ensure sufficient diversity [12].
The mutation factor F is arguably the most sensitive DE parameter. Its value will determine the trade-off between exploration and exploitation of the search space, meaning that a large mutation factor will move points quickly across the space towards the general direction of the global optimum, but it will be difficult to make the smaller, exact steps necessary to reach a precise value. There are many solutions available to this problem including altering the underlying mutation procedure to one that is better suited to the search space of interest, dithering (using a new F for each agent), jittering (using a new F for each variable), and implementing a self-adaptive DE variant that chooses a new mutation strategy and parameter value for each agent. Such methods could be adjusted to give an annealing effect to F that shrinks it as the agents get closer to the optimum so that they can make more precise steps. As far as general rules for selecting F, [29] found a lower bound based on the values of P and CR such that choosing a value lower than that will cause the diversity of each generation to decrease as the total number of generations increases. On the other hand empirical evidence has indicated that values chosen between 0.6 and 0.9 will perform well for general optimization problems [12].
The crossover constant CR is a measure of the rate of mutation of the population. Even in cases where an appropriate F has been chosen, an inappropriate CR will prevent the agents from evolving at the right pace for escaping local optima. There are many procedures for altering the crossover strategy to better accommodate the search space and also self-adaptive procedures similar to those for the mutation factor such as the one presented in [30]. If the standard selection procedure described in Section 2.1 is used, then the value of CR is naturally bounded between 0 and 1. However, the original authors of DE discovered through empirical study that the appropriate range of values is actually limited to CR ∈ {[0, 0.3] ∪ [0.8, 1]}. It was found that small values of CR lead to convergence in cases where the fitness function can be rewritten as the sum of single variable optimization problems (i.e. separable) and larger values are used if this is not possible [12].
In addition to the tuning parameters, the fitness function and the stopping condition determine the total amount of time the algorithm will run. The more complex the function to be optimized, in terms of magnitude, order, non-linearity, discontinuity, etc., the longer it will take to reach the optimum. Restricting the search region as much as possible can alleviate some of the costs of evaluating a complex fitness function many times. The stopping condition is user-specified and often based around a maximal number of iterations or function evaluations. Other popular options are computational time limits, convergence measures to test for small changes between generations, and stopping values if the algorithm is being used for benchmarking. The selection of a stopping condition is driven mainly by the application and the computational resources available.
3. An Overview of Optimal Design Methodology
Before showing how DE can be usefully applied to solve optimal design problems, we first briefly review background for constructing a model-based optimal design given a statistical model and an objective. Throughout we assume that there are resources to collect a set of N pre-determined observations for the study and that the assumed statistical model takes the form
| (3) |
where yi is the response at the vector of explanatory variables xi. The errors ϵi are identically and independently distributed with zero mean and some constant variance. f(xi, β) is a known continuous function assumed to capture the relationship between the input x and output y through the vector of unknown model parameters β. The region available for experimentation, known as the design space, is and it contains all possible values of the vectors xi, i = 1, … , N. We require the design space to be a compact set so that the optimum exists. If the model f(x, β) can be written as f(x)Tβ, we refer to it as linear. Otherwise we refer to it as non-linear.
Given the study objective and the fixed total number of observations N available for the study, our goal is to optimally select a set of so-called “support points” xi’s, i = 1, 2, … , k, from to observe responses. If the model of interests has v experimental factors then each xi can be represented as a vector with components xi1, … , xiv. If each xi is replicated ni times, we have an exact design, subject to . It follows that the total number of variables to optimize over can increase rapidly as the number of support points k increases. For example, if we wish to estimate all parameters in a linear model with v = 5 and all main effects and two-factor interactions, then there are 16 parameters to estimate and k must be at least 16. This means that our optimization problem consists of a total of 95 variables, 16 × 5 = 80 factor settings plus 16 − 1 = 15 replicate counts. In practice, optimal designs for a high-dimensional model with several factors are rarely minimally supported (i.e. contain one support point per parameter), so the number of variables to optimize can be even larger.
The optimality criteria for measuring design fitness are generally very complicated mathematical formulations and very few model-criterion pairings result in simple, closed-form solutions to this challenging optimization problem. The analytic descriptions that have been found are primarily for simple models, such as low-order, linear polynomials or nonlinear models with one or two predictors. Consequently, determining optimal exact designs is difficult as there is no general theory for finding them or confirming the optimality of candidate designs. For this reason we refer to the exact designs found by DE as optimal exact designs, but note that their optimality cannot be confirmed and instead rely on comparisons to designs implemented in practice.
An alternative approach is to instead work with approximate designs. These designs are defined by the proportion of runs pi = ni/N at each support point, with . When the criterion is a convex function of the design, there are numerous important advantages of working with approximate designs versus exact designs. They include (i) a unified framework for studying and finding optimal approximate designs, (ii) theory to confirm the optimality of a design, and (iii) an assessment of proximity of a design to the optimum without knowing the latter. For background in approximate designs, see design monographs, such as [31 – 32], among several others.
In practice, an optimal design is implemented by rounding each Npi to the nearest integer and making sure the resulting replicate counts sum to N. This approximate design formulation was first presented by Kiefer and is now the standard approach to designing a study for nonlinear models and special cases of linear models [31]. For this reason our primary goal in each example is to find optimal approximate designs, but in many cases we also present the results of searching for optimal exact designs. We denote a k-point approximate design with weight pi at xi, i = 1, … , k by ψ. In what follows, we drop the “approximate” qualifier and assume that all designs are approximate unless otherwise specified.
Let ∇f(x, β) be the partial derivative of the mean function with respect to the model parameters β and define the normalized information matrix for the design ψ by
| (4) |
The covariance matrix of the maximum likelihood estimates for the parameters β is inversely proportional to M(ψ, β), so making the information matrix large in some sense is desirable. One popular choice of criterion to meet this goal is to maximize the determinant of M. When errors are independent and normally distributed, it can be shown that such a design minimizes the volume of the confidence ellipsoid for β and thus provides the most precise estimates. This is commonly called D-optimality (with D standing for determinant) and the fitness function is defined as
| (5) |
The logarithmic function in this criterion serves a critical purpose; for fixed β, it can be shown that Equation (5) is a convex function of ψ in the space of all designs defined over . Accordingly, a design that minimizes it over all designs on is labeled D-optimal. For linear models, the information matrix does not depend on the parameters β, so any convex function of it can be minimized directly by choices of k, x1, … , xk and p1, … , pk. In the case of a non-linear model, the matrix M(ψ, β) depends directly on β, which it is our goal to estimate. The simplest way to overcome this circular issue is to assume a set of initial estimates or nominal values β0 for the parameters. These can come from expert knowledge, related experiments or a pilot study. With this approximation, the criterion only depends on ψ and can be optimized directly in the same manner as the linear case. We refer to such designs for non-linear models as locally D-optimal since they depend on the choice of nominal values [33].
While there are many designs for other functions of the information matrix, D-optimal designs are by far the most widely used in practice. Another design criterion for estimating parameters is to minimize the sum of the variances of the estimated parameters. This is tantamount to finding a design that minimizes the trace of M−1(ψ, β). Such a criterion can also shown to be convex and the design that minimizes it over all designs on is known as the A-optimal design.
Sometimes a researcher may be interested in estimating a function of the model parameters. In this case, a different type of criterion is required. If c(β) is the user-selected function of interest, β0 is the nominal value for β and ∇c(β) is the derivative of c(β) with respect to β, we minimize over all designs on . This is minimizing the asymptotic variance of the estimated function of interest. We call such designs c-optimal. Applications of c-optimal designs are abundant. For example, suppose we want to find the dose level of a drug that produces an 80% response rate, or we want to estimate the lethal dose that results in a 5% death rate. In both of these cases, the quantities of interest can be expressed as a nonlinear function of the model parameters β and c-optimal designs can be used to properly estimate them.
For each of these criteria, we compare designs by measuring the relative magnitude of their fitness function values. For instance, if we want to compare two designs, ψ1 and ψ2 and the criterion is A-optimality, we use the ratio of their criterion values. Specifically, the A-efficiency of ψ1 relative to ψ2 is
| (6) |
If the ratio is less than 1, ψ1 is less A-efficient than ψ2, and vice versa. It can be shown that if the above ratio is 0.5, then the design ψ1 requires twice as many observations to perform as well as ψ2. If ψ2 is known to be A-optimal, then we refer to this ratio as the A-efficiency of ψ1. Similar formulae hold for other design criteria, except for D-optimality, in which the ratio must be raised to the power 1/q, where q is the length of the parameter vector β. This transformation allows us to maintain the interpretation that design ψ1 with efficiency x requires 1/x times as many runs to achieve the same performance as ψ2.
Finding optimal designs, regardless of the criterion, is always a challenging problem, both mathematically and computationally. In the case of approximate designs with a convex optimality criterion, there is a powerful and simple tool known as the general equivalence theorem for checking the optimality of a design over all designs defined on . This result is based on directional derivative considerations of the convex design criterion and is widely discussed in the primary optimal design literature: see [34] and design monographs [32, 35].
Each convex optimality criterion has a unique equivalence theorem. For example, when we have a nonlinear model and the nominal value for the vector of model parameters is β0, then for the three criteria discussed above, a design ψ* is locally
- D-optimal if and only if it satisfies
(7) - A-optimal if and only if it satisfies
(8) - c-optimal if and only if it satisfies
(9)
Moreover, each of the above inequalities becomes an equality at the support points of the optimal design ψ*. The functions on the left hand side of the inequalities are sometimes called sensitivity functions. In practice, achieving exact equality at every support point is unlikely, but up to some small threshold (e.g. 10−6 for our purposes) optimality is still guaranteed.
The equivalence theorems have several important consequences. If is one or two-dimensional, the optimality of a design can easily be confirmed by plotting the sensitivity function across the design space and ascertain if the conditions of the equivalence theorem are met. If is 3-dimensional or higher, the features of the multivariate sensitivity plot are harder to appreciate visually. Equivalence theorems can also be directly used in construction algorithms for generating optimal designs and checking whether a design is close to the optimum. If the design is not optimal, its proximity to the optimum can be measured by an efficiency lower bound. In practice a design with a high efficiency lower bound may be adequate. In the following section we implement DE to quickly locate various types of optimal designs for chemical experiments.
4. DE Applications to Optimal Design
This section provides illustrative applications of DE to find optimal designs for a variety of chemical experiments. We start by finding designs for simple models and incrementally augment the problem difficulty. For each application, we describe the problem setup and how the features of DE can be adapted to fit the situation. We then show that DE can effectively find optimal designs that coincide with published results or outperform them. The utility of DE becomes clearer when it is laborious to calculate the optimal design from theory and DE finds it almost instantly with the proper setup.
DE algorithms, like other notable nature-inspired metaheuristic algorithms, are widely and freely available in different formats. We choose to use the R codes available from the package DEoptim in R Version 3.4.3 [36]. All computations were carried out using a 2016 Lenovo P50 Thinkpad 2.9 GHz Intel Core i7 with 16GB RAM on 64bit Windows 10. In what is to follow, we first use DE to search for a minimally supported optimal design. This is a common starting strategy because the optimization problem is simpler with fewer variables to optimize and the search is confined to all designs on with q points. However, the resulting minimally supported optimal design may not be optimal among the class of all designs on . An equivalence theorem is needed to confirm its optimality among all designs on . Later, we discuss what happens if DE is initialized with more points than the optimal design requires and how DE may be altered to solve more difficult design problems.
4.1. Estimating Parameters in the Arrhenius Equation
We begin by finding locally D-optimal designs for estimating the two parameters in the Arrhenius equation commonly used in chemical experiments [37]. This model describes the relationship between the mean temperature and reaction rate of a process. Its basic form is given by
| (10) |
Here Er denotes the expectation of the random variable r, denoting the reaction time. It is an exponential function of the parameter B, the activation temperature, and the parameter A is a multiplicative constant called the Arrhenius frequency parameter. The design variable is the temperature T and good choices for its settings produce efficient estimates for A and B. Many experiments have been performed to determine the parameter values for different chemical reactions to better understand their dependence on temperature [38 – 40].
Our specific application concerns the reaction given by NO + O3 → NO2 + O2, which captures the loss process for ozone in the troposphere and stratosphere [41]. In order to study the temperature dependence of this reaction we seek an experimental design that is supported at two temperature points. Since the equation is non-linear with respect to B any optimal design will depend on a set of initial parameter values β0. For this example we choose β0 = (3.0 × 10−12, 1500)T according to preliminary results from NASA’s Jet Propulsion Lab [42]. The acceptable range of temperatures, the design space , is fixed at [212, 422], so that we may compare the results of DE with the designs found in [43]. From this we calculate the vector ∇Er as the first-order partial derivatives of the model with respect to A and B, which is given by
| (11) |
Evaluating at the nominal values for the parameters β0, we calculate the information matrix in Equation (4) and plug it into our fitness function. Assuming the design of interest ψ has two points T1 and T2 with weights p1 and p2, then the D-optimality criterion is
| (12) |
We implement DE to find the locally D-optimal design. In addition to the two temperatures we also need to search for the proportion of observations to take at one of the temperature levels (since the other is then determined by the unity constraint). The number of variables to optimize is therefore V = 3, so this is a relatively small optimization problem. Accordingly, we set a population size of P = 10 agents and limit our DE search for the optimum to Gmax = 50 generations, resulting in a total of 500 evaluations of the fitness function. Each candidate design is represented by a vector with the two design points first followed by the weight of the first point. We choose the standard values F = 0.8 and CR = 0.9 from Section 2 for the parameters since the optimization of h is a non-separable problem.
The algorithm took only 0.1 seconds to produce a design equally supported (0.5 weight at each point) at the temperature values 329.3 and 422.0. Figure 3 shows the evolution of the ten candidate solutions through the fifty generations across the two temperature dimensions. Since the algorithm is able to find the correct weight in the first few generations this variable is omitted from the plot. Candidates from earlier generations are given a lighter color to represent that they are far from optimal while those from generation 50 are dark, indicating their optimality. We observe that the algorithm locates an optimal support point at the boundary of the design space after just twenty generations and the remaining generations are used to finely tune the second point.
Figure 3:

DE convergence to the locally D-optimal design for the Arrhenius Equation.
To confirm that the design found by DE is locally D-optimal, Figure 4 shows the sensitivity function from (7) for the generated design. We observe that over the design space, the function is bounded above by 0, with equality at the two temperature settings. By the equivalence theorem, this confirms the local D-optimality of the DE-generated design when the vector of nominal values is β0 = (3.0 × 10−12, 1500)T.
Figure 4:

Sensitivity function for local D-optimality of the DE-generated design for the Arrhenius Equation.
While the approximate design we found is locally D-optimal, conducting an experiment in the laboratory would require an exact design. One such experiment that studied the reaction we are considering was conducted in [44]. Following their setup, we use DE to search for a 75-run D-optimal exact design. Increasing the population size to P = 50 and the number of iterations to Gmax = 200 while leaving the other parameters and nominal values the same, we were able to find a 75-run exact design with a lower criterion value (85.77 versus 86.50) than the implemented design. This result further demonstrates DE’s ability to locate practical designs for non-linear models with ease. A visual representation of the design found by DE is given in Figure 5.
Figure 5:

Design points for 75-point exact designs for the Arrhenius Equation.
Having shown that DE can find exact and locally optimal designs for this simple model, we now apply it to an extended version known as the Modified Arrhenius equation [45]. This modification is used when temperature settings are allowed to vary over a wide range. The model is given by
| (13) |
where A′ is now a positive parameter independent of the temperature setting, m is a new parameter that describes how the exponential scales with temperature and Ermod is the mean reaction time under this modified model. Following [43] we fix m = 5 and set the nominal parameters to β0 = (A′ = 1, B = 1500)T. This setting for B was retained from the standard model and we set A′ = 1 for simplicity because the determinant of the information matrix does not depend on its value.
To find a locally D-optimal design for this 2-parameter model (since m is fixed) using DE, we first compute the gradient of the mean function
| (14) |
We then repeat as before and use a 2-point design to initiate the DE algorithm to search for a locally D-optimal design for estimating A′ and B using the criterion in Equation (5) with the above gradient. For this more challenging problem, we increase Gmax to 60 and leave the remaining DE parameters as they were. We keep the design space as [212, 422]. After reaching the maximum number of generations in 0.1 second, DE found a design equally supported at 392.72 and 212.60. This design has 99.9% D-efficiency relative to the D-optimal design supported equally at 390.5 and 209.5 on an unbounded design space. See [42] for details.
The above examples show that DE can solve simple optimization problems very fast with little or no tuning of the parameters required and minimal computational expense. The next few applications are more complicated to optimize; they either have additional constraints on the optimization problems or more variables to optimize. We show that DE can find the optimum with similar ease even if we initialize the algorithm with candidate designs that have more than the required number of support points.
4.2. Estimating the Effect of Reaction Order and Decay Rate on Concentration
For a more complicated example, consider a modification of the model in [46] for studying the influence of reaction order and decay rate on the concentration of a chemical at a given time. The original model is given by
| (15) |
where c is the concentration at time T, λ is the reaction order and θ is the decay rate. In some circumstances, it is appropriate to include a third parameter ν to make the model more flexible. The modified version of Equation (15) now takes the form
| (16) |
For this model we seek a locally A-optimal design with 3 support points. Taking inspiration from [46] we select the nominal parameters β0 = (0.5, 0.5, 0.1)T. Following a similar procedure as in the previous example, we begin by calculating the first order partial derivatives of the mean function to obtain the information matrix M(ψ, β0), where ψ is the design and β0 is the nominal value for the vector of model parameters. We recall that the A-optimality criterion is given by
| (17) |
We seek a minimally supported design with 3 points over the design space . Since there is only a single explanatory variable, the number of dimensions to search over is 3 × 2 − 1 = 5. We set P = 50, F = .8, CR = .9 and Gmax = 100. Running the algorithm with these settings did not allow for enough exploration of the space, so Gmax was increased from 100 in increments of 50 until 300. This led to a design equally supported at 0.000, 1.151, and 3.343.
Generating this design took only 1.5 seconds of CPU time and required minimal tuning of the parameters. Figure 6 shows that the sensistivity function of the DE-generated design satisfies the general equivalence theorem (8), confirming the local A-optimality of the design on with the nominal values β0. For this example we also used DE to search for A-optimal exact designs for this model, but in all cases we considered (N = 3, 10, 40, 100) the design found by DE is actually the rounded approximate design, so the details have been omitted.
Figure 6:

Sensitivity function for local A-optimality of the DE-generated design for the modified Atkinson model.
Can DE find an optimal design if we instead start the search using candidate designs with more points than the (unknown) number of support points of the optimal design? To investigate this issue, we repeat the above analysis, but instead of optimizing over 5 variables with 3 support points and 2 weights, we supply designs with 6 support points and 5 weights, making for a total of 11 variables. DE was again able to converge to the equally-supported A-optimal design by clustering the excessive points. Due to the increased dimension of the search space, achieving this result required 300 generations with all other parameters maintaining their values from the 5-variable situation.
Figure 7 shows the best design from every 100 generations. The x-axis gives the values of the 6 support points while the y-axis shows the corresponding weights. As before, support points from later generations are given a darker color to indicate their convergence to the optimal design. When the best design from a given generation has several points that are indistinguishable their weights are summed and a single point is shown. This plot demonstrates the rapid pace at which DE begins to cluster the additional points and move towards the optimal weights. Table 1 gives the raw values from each design shown. We observe that earlier generations have six unique support points; however, by generation 200 the points form three clusters and the focus shifts to finding the appropriate weights. By generation 300 the collective weight at the three points yields the equally-supported design we report. From this example, we observe that even when the user starts with candidate designs with more points than needed, DE is still able to find the optimal design.
Figure 7:

Convergence of DE for the Atkinson example initialized with 6 support points. Each panel (a)-(d) displays the best design found after every 100 generations.
Table 1:
The best design found by DE for Atkinson’s model after every 100 generations.
| Gen | T1 (p1) | T2 (p2) | T3 (p3) | T4 (p4) | T5 (p5) | T6 (p6) |
|---|---|---|---|---|---|---|
| 1 | 0.038(0.318) | 1.004(0.201) | 1.143(0.094) | 2.910(0.091) | 3.329(0.174) | 3.832(0.123) |
| 100 | 0.000(0.357) | 1.036(0.055) | 1.187(0.246) | 3.267(0.066) | 3.435(0.099) | 3.339(0.177) |
| 200 | 0.000(0.347) | 1.628(0.153) | 1.632(0.281) | 3.338(0.048) | 3.339(0.004) | 3.345(0.167) |
| 300 | 0.000(0.333) | 1.151(0.268) | 1.151(0.062) | 3.344(0.046) | 3.344(0.219) | 3.344(0.068) |
| Final | 0.000(0.333) | 1.151(0.333) | 3.344(0.334) | |||
4.3. Estimating the Effect of Microemulsion Mixtures on Drug Solubility
In our third application, we apply DE to find an optimal design for a mixture experiment with several factors. In mixture experiments, each run is a mixture of the same ingredients, but the relative proportion of each ingredient in each run may be different, and thus affect the mean measured response. These problems arise frequently in life and physical sciences. For example, [47] studied the effect of microemulsion formulation composition on the solubility and dissolution of drugs.
There are several possible models for a mixture experiment with a fixed number of ingredients. If there are d ingredients, the most frequently used linear model is the Scheffé polynomial [48] given by
| (18) |
In the above equation, there are parameters β1, … , βd, β12, … , β(d−1)d and d inputs x1, … , xd. This model is similar to a standard linear model with d factors and all two-factor interactions, except that the design space is a d − 1 simplex, i.e. each xi ≥ 0 subject to x1 + x2 + … + xd = 1.
To find D-optimal designs for this problem over the (d − 1)-dimensional simplex using DE, we proceed as before by first computing the information matrix. In this experiment, there are d = 3 ingredients: oil, water, and a surfactant to study the drug dissolution properties y of microemulsion formulations. The model used by [47] to study this relationship is given by
| (19) |
which is equivalent to Equation (18) with the addition of a three-way interaction of all ingredients. In [47] the authors fit this model with the 13-points exact design in Table 3(b). We apply DE to find an approximate design of comparable size with a better ability to estimate the model parameters. Since the three ingredient proportions must sum to 1 in each run, we only need to optimize over two experimental factors. A search among all 13-point designs shows the total number of variables that DE has to optimize is 38. We initially set P = 175 and ran the algorithm for Gmax = 2000 generations. Clearly this setting does not follow the rule of thumb P ≥ 10V in Section 2. However, we find that the marginal improvement to the final design does not warrant using additional resources to produce 380 agents. Instead it is more efficient to use only half as many agents and run the algorithm for twice as many generations. We set F = 0.8, CR = 0.7 and slowly increases the value of CR to 0.9 through multiple trials as it improves performance. We observe that the basic form of DE is not able to quickly locate an optimal design for this problem due to the large amount of time required per generation and this limits the number of generations possible. To overcome this issue we implemented a parallel version of DE [49] that spreads the calculations required in each generation over many nodes. DE naturally lends itself to parallel computing since many of its operations can be performed in isolation and then reassembled. Further details of this approach and other DE variants are discussed in Section 5.
Table 3:
13-point exact designs for the emulsion mixture experiment.(a) is the 13-point design found by DE and (b) is the design implemented in [47].
| x1 | x2 | x3 | x1 | x2 | x3 |
|---|---|---|---|---|---|
| 0 | 0 | 1 | 0.65 | 0.18 | 0.17 |
| 0.01 | 0.01 | 0.98 | 0.65 | 0.22 | 0.13 |
| 0 | 0.50 | 0.50 | 0.65 | 0.25 | 0.10 |
| 0.01 | 0.49 | 0.50 | 0.69 | 0.16 | 0.15 |
| 0 | 1 | 0 | 0.69 | 0.19 | 0.12 |
| 0.01 | 0.97 | 0.02 | 0.72 | 0.12 | 0.17 |
| 0.32 | 0.33 | 0.35 | 0.73 | 0.13 | 0.14 |
| 0.34 | 0.34 | 0.32 | 0.75 | 0.15 | 0.10 |
| 0.49 | 0 | 0.51 | 0.76 | 0.09 | 0.15 |
| 0.51 | 0 | 0.49 | 0.78 | 0.05 | 0.17 |
| 0.49 | 0.51 | 0 | 0.79 | 0.09 | 0.12 |
| 0.53 | 0.47 | 0 | 0.82 | 0.05 | 0.13 |
| 1 | 0 | 0 | 0.85 | >.05 | 0.10 |
Using this variant of DE we were able to locate the D-optimal design given in Table 2(a). It took 24 seconds of CPU time across 8 nodes to converge to a 13-point design with many duplicated points. Further investigation reveals that the design is equally weighted at 7 points and its D-optimality can be confirmed visually using the equivalence contour plot shown in Figure 8, which shows that the contour values at each of the 7 points from our design is approximately 0. In this plot x1 and x2 are given on the x and y axes respectively. With these two values determined, x3 can be inferred. The contour value is derived from the sensitivity function from Equation (7). The key implication is that the 13-point design used by the authors is not optimal for the proposed model and a direct calculation shows that the relative D-efficiency of their design to the one found with DE is only 79%. Through this comparison we observe the benefits of DE: DE is fast and finds the optimal design effortlessly and accurately.
Table 2:
13-point approximate design for the emulsion mixture experiment. (a) is the 13-point design found by DE and (b) is a 7-point D-optimal design that results from merging the rows of (a).
| Run | x1 | x2 | x3 | pi | x1 | x2 | x3 | Σ pi |
|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 0 | 0.143 | 1 | 0 | 0 | |
| 2 | 0 | 1 | 0 | 0.142 | 0 | 1 | 0 | |
| 3 | 0 | 0 | 1 | 0.143 | 0 | 0 | 1 | |
| 0 | ||||||||
| 5 | 0 | 0.5 | 0.5 | 0.059 | ||||
| 0 | ||||||||
| 7 | 0.5 | 0 | 0.5 | 0.065 | ||||
| 8 | 0.5 | 0.5 | 0 | 0.143 | 0 | |||
| 13 | 0.33 | 0.34 | 0.33 | 0.002 |
Figure 8:

Sensitivity contours of the DE-generated design for the emulsion mixture experiment. The function is maximized at the 7 points included in the design, confirming its D-optimality.
We note that this comparison between the design in [47] and the DE generated design is not exactly fair. While both designs have 13 points, our design has only 7 unique supports and is an approximate design, as defined in Section 3. To implement this design in practice we need to multiply each weight by the desired run size (in this case 13). Since the weights are not regular fractions of 13 we will need to do some rounding in order to achieve a usable design. Through this process we lose some efficiency. In order to mitigate this loss we could instead find an 13-point D-optimal exact design directly, but this is known to be a very difficult problem to solve analytically for general run size. However, DE is able to overcome this challenge. To do this we search over the 26-dimensional space for a good exact design. Setting P = 150, F = 0.8 and CR = 0.9 DE required 15 seconds across 8 nodes and Gmax = 2000 generations to find the design presented in Table 3(a). We observe that the design points all come from the optimal approximate design, with some points replicated once. This design has 99% efficiency relative to the optimal one found earlier, again showing a clear advantage over the design chosen by the researchers. Thus, DE is capable of producing both exact and approximate designs for problems with a large number of optimization parameters.
4.4. Estimating the Effect of Substrate Concentration and Inhibition Amount on Reaction Velocity
Until now we have used DE to find optimal approximate and exact designs for linear and non-linear models of various complexities. However, we have so far only considered non-linear models with a single input variable and with the prior knowledge of a nominal set of values. It is useful to consider a final example in which neither of these situations apply. For this example we will search for Bayesian D-optimal designs for the 4-parameter mixed inhibition model described in [50–51]. This model considers the relationship between substrate concentration s, inhibition amount i, and reaction velocity v and is given by
| (20) |
In this model the vector of parameters is β = (Vmax, km, kic, kiu)T, representing the velocity at the maximum concentration, the Michaelis-Menten constant, and two dissociation constants respectively. For this model we still assume that we have some prior knowledge of the true value of these parameters, but unlike the previous examples it takes the form of a multivariate distribution. The goal is to use DE to find a Bayesian D-optimal design that minimizes the D-optimality criterion in Equation (5) averaged over the prior distribution. Formally, we seek a design that minimizes
| (21) |
where π(β) is the multivariate prior we assume for β.
We consider several choices for π(β) including both discrete and continuous distributions with and without correlation. Following [50 – 51] we assume that the mean of each continuous prior is given by βμ = (7.298, 4.386, 2.582, 5.0)T. Furthermore, we consider two design spaces: 9 ≤ s ≤ 30, 0 ≤ i ≤ 60 and 0 ≤ s ≤ 30, 18 ≤ i ≤ 60. The set of plausible parameter values is given by km ∈ [4, 5], kic ∈ [2, 3], and kiu ∈ [4, 5]. To approximate the integral in Equation (21) we average over a systematic sample from the prior distribution. Taking inspiration from [52], the sample is derived from a series of Halton draws. This reduces computational time and provides a reasonable estimate of the integral. To further speed up the algorithm we use the same parallel version of DE that was implemented in the previous example.
We consider uniform priors, both continuous and discrete, and two multivariate normal distributions, one with independence and the other with weak to moderate correlation. For each prior we search for a minimally supported design by initializing DE with a population size of P = 120 for Gmax = 250 generations. We begin with CR = 0.1 and F = 0.8 and increase CR in steps of size 0.1 between iterations if the algorithm fails to converge. For each continuous prior we use 125 Halton draws to evaluate the integral [52].
Table 4 shows the DE designs under each combination of prior and design space. In this table Σ1 and Σ2 are variance-covariance matrices with diagonal equal to (0.50, 0.11, 0.11, 0.20), and an average correlation of 0 and 0.46 respectively. Specifically, Σ1 and Σ2 are given by
Table 4:
Bayesian D-optimal designs for the enzyme kinetic model under four prior distributions for β and two design spaces.
| s ∈ [9, 30], i ∈ [0,60] | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MVNorm(βμ, Σ1) | MVNorm(βμ, Σ2) | U(βμ − 1, βμ + 1) | Ud | |||||||||
| 1 | 30 | 4.07 | 0.25 | 30 | 4.07 | 0.25 | 30 | 4.32 | 0.25 | 30 | 4.35 | 0.25 |
| 2 | 9 | 3.57 | 0.25 | 9 | 3.61 | 0.25 | 9 | 3.82 | 0.25 | 9 | 3.81 | 0.25 |
| 3 | 30 | 0 | 0.25 | 30 | 0 | 0.25 | 30 | 0 | 0.25 | 30 | 0 | 0.25 |
| 4 | 9 | 0 | 0.25 | 9 | 0 | 0.25 | 9 | 0 | 0.25 | 9 | 0 | 0.25 |
| s ∈ [0, 30], i ∈ [18, 60] | ||||||||||||
| MVNorm(βμ, Σ1) | MVNorm(βμ, Σ2) | U(βμ − 1, βμ + 1) | Ud | |||||||||
| 1 | 29.81 | 18.03 | 0.25 | 29.95 | 18.11 | 0.25 | 29.92 | 18 | 0.25 | 29.87 | 18.03 | 0.25 |
| 2 | 4.28 | 18.06 | 0.25 | 4.64 | 18.03 | 0.25 | 4.75 | 18.14 | 0.25 | 4.59 | 18.05 | 0.25 |
| 3 | 29.96 | 41.44 | 0.25 | 29.71 | 39.11 | 0.25 | 29.59 | 40.16 | 0.25 | 29.77 | 42.58 | 0.25 |
| 4 | 4.77 | 39.56 | 0.25 | 4.78 | 39.17 | 0.25, | 5.25 | 41.05 | 0.25 | 5.06 | 40.47 | 0.25 |
Ud is a discrete uniform distribution of 44 = 256 grid points over the parameter space.
For the design space that requires a larger concentration s ≥ 9 we see that each design places points at extreme values of s, two where there is no inhibition and two where there is slight inhibition. Under the second design space that requires more inhibition i ≥ 18, we observe that the optimal design points lie away from the boundaries of the space, but are still at the extreme ends of the concentration range but more towards the lower bound of the inhibition range. Using the Bayesian extension of the general equivalence theorem we can visually verify that the designs found by DE are indeed optimal under each particular prior and design space. Figure 9 displays one such plot of the sensitivity function of the DE-generated design.
Figure 9:

Sensitivity contour of the design found by DE under the independent multivariate normal prior and design space s ∈ [9, 30], i ∈ [0, 60].
The above approximate designs are implementable if the sample size is a multiple of 4. Researchers may be interested in a design of a specific size that does not meet this requirement. For this reason it is also useful to showcase DE’s ability to find optimal exact designs in a situation where prior information about the parameters is limited. To do this we consider the 21-point design implemented in [51]. We repeat the same search as before, but this time over a 42-dimensional space. Due to the increased size of the problem we initialize DE with a population size of p = 210 for Gmax = 1250. In the same manner as the approximate case we begin with CR = 0.1 and F=0.8 and increase CR in steps of size 0.1 as necessary.
Table 5 reports that the exact designs found by DE have much higher D-optimality than the design used in practice. This indicates that even when a large number of design points is desired DE is still capable of finding an efficient design. This observation holds regardless of the prior distribution we assume for the parameter values.
Table 5:
Relative D-efficiency of design from [51] to DE 21-point exact designs for the enzyme kinetic model under four prior distributions for β and two design spaces.
| s ∈ [9, 30], i ∈ [0, 60] | s ∈ [0, 30], i ∈ [18, 60] | |
|---|---|---|
| MVNorm(βμ, Σ1) | 79.0% | 78.9% |
| MVNorm(βμ, Σ2) | 79.0% | 79.0% |
| U(βμ − 1, βμ + 1) | 79.0% | 78.9% |
| Ud | 78.9% | 79.1% |
5. Popular DE Variants and Hybrid Algorithms
In the two previous example we employed a parallel version of DE to arrive at designs that outperforms the ones available in the published literature. There are many such advancements to the basic DE algorithm. The most popular variations on the basic DE algorithm fall into two classes: adaptive-tuning and multiple-objective. For many optimization problems solved with metaheuristic algorithms the selection of tuning parameters is a bottleneck. DE attempts to solve this issue by having only a few such parameters, yet sometimes many trials are required to choose their correct values. Many of the enhancements for DE use a self-adaptive parameter search that studies the history of past agents to appropriately tune parameters for future generations. Some popular members of this class include JADE, SADE, and SHADE [53 – 55], which have collectively been cited hundreds of times. The other major class of DE variants concerns optimization problems with multiple objectives. There are many novel algorithms based on such multi-objective approaches to DE (for example, see [56–58]) and the leading methods involve minimizing the Pareto frontier or utilizing Lagrange multipliers.
There are also many new algorithms developed by hybridizing DE with other metaheuristic algorithms. Akin to crafting ensemble statistical models, these hybrid algorithms allow for a more thorough search of the space and the relative weight given to the approaches of each method when crafting the algorithm determines its efficacy. Some examples of hybrid DE algorithms involve blending it with Simulated Annealing, PSO, and k-means clustering [59–61]. These methods have shown great success at tackling optimization problems that the individual algorithms struggled to solve. This overview covers some DE variants but there are many other modifications of the basic DE algorithm that have been proposed since its initial conception. Some of these methods are motivated by the application at hand while others are motivated by further heuristic considerations aimed at improving the performance of DE in various ways. For a most recent review of the latest innovations of DE see [62].
6. Conclusion
In this article we reviewed the basic formulation of Differential Evolution and apply it to find various types of optimal designs in the chemical sciences. The algorithm is metaheuristic and is based upon the ideas of evolutionary biology. Through the genetic representation of candidate solutions to a real-valued fitness function, DE uses mutation and crossover of the genes of each agent to produce sequential improvements to the fit of subsequent generations. We discussed the foundational ideas of the algorithm and the specific methods and ideas behind tuning each parameter.
We implemented DE codes and demonstrates DE’s ability to solve design problems for statistical models with different number of parameters. Both approximate and exact optimal designs can be found by DE in a few seconds to estimate some or all parameters in linear and non-linear models. DE or its variants can also be effective in solving high dimensional and other complex design problems. For high dimensional optimization problems it would be appropriate to apply different nature-inspired metaheuristic algorithms and observe whether they produce the same or approximately the same solution. If they do, we are much more assured that the solution is likely to be correct. If needed, readers can hybridize two or three such algorithms to take advantage of the positive qualities of each to enhance the hybridized algorithm performance. We close by reminding the reader that DE is a general purpose optimization tool and can solve both design and non-design optimization problems. We hope that this introduction to DE stimulates further chemometric research in this direction.
Supplementary Material
Figure 1:

An illustration of mutation for a single agent in the Differential Evolution algorithm where a donor vector is created by blending 3 randomly drawn agents. In this case , and were chosen.
Figure 2:

An illustration of the crossover procedure for a single agent in the Differential Evolution algorithm. Here we create trial vector by combining and . The light green elements of come from the donor vector and the light red elements come from the target vector . The jth element of is marked in dark green as it comes from with probability one.
Highlights.
Differential Evolution is a useful metaheuristic algorithm for solving general optimization problems.
Optimal experimental designs can be located quickly using standard Differential Evolution or a slight modification.
By employing Differential Evolution, researchers in chemometrics may find more robust solutions to challenging problems.
7. Acknowledgement
Wong was partially supported by a grant from the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R01GM107639. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Supplementary Materials
The R codes for generating the designs presented in this paper have been provided.
Declaration of interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Contributor Information
Zack Stokes, Department of Statistics, University of California, Los Angeles, Los Angeles, CA 90095.
Abhyuday Mandal, Department of Statistics, University of Georgia, Athens, GA 30602.
Weng Kee Wong, Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095.
9 References
- [1].Whitacre JM (2011a). Recent trends indicate rapid growth of nature-inspired optimization in academia and industry. Computing 93, pp 121–133. [Google Scholar]
- [2].Whitacre JM (2011b). Survival of the flexible: explaining the recent dominance of nature-inspired optimization within a rapidly evolving world. Computing 93, pp 135–146. [Google Scholar]
- [3].Holland JH (1975). Adaptation in Natural and Artificial Systems, 2nd edition MIT Press. [Google Scholar]
- [4].Yang XS and Deb S (2009). Cuckoo Search via Levy Flights. World Congress on Nature & Biologically Inspired Computing (NaBIC 2009), IEEE Publications, pp 210–214. [Google Scholar]
- [5].Dorigo M, Maniezzo V and Colorni A (1996). Ant System: Optimization by a Colony of Cooperating Agents. IEEE Transactions on Systems, Man, and Cybernetics Part B 26, pp 29–41. [DOI] [PubMed] [Google Scholar]
- [6].Mirjalili S, Mirjalili SM and Lewis A (2014).Grey Wolf Optimizer. Advances in Engineering Software 69, pp 46–61. [Google Scholar]
- [7].García FJM and Moreno-Pérez JA (2008). Jumping frogs optimization:a new swarm method for discrete optimization. Technical Report DEIOC 3/2008, Dept. of Statistics, O.R. and Computing, Univ. of La Laguna, Tenerife, Spain.
- [8].Yang XS (2010). A new metaheuristic bat-inspired algorithm In: Nature inspired cooperative strategies for optimization (NICSO 2010). Springer, pp 65–74. [Google Scholar]
- [9].Engelbrecht AP (2005). Fundamentals of Computational Swarm Intelligence. Wiley. [Google Scholar]
- [10].Fraser AS (1957). Simulation of genetic systems by automatic digital computers. I. Introduction, Aust. J. Biol. Sci 10, pp 484–491. [Google Scholar]
- [11].Storn R and Price K (1997). Differential evolution-A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11, pp 341–359. [Google Scholar]
- [12].Price K, Storn RM, and Lampinen JA (2005). Differential Evolution: A Practical Approach to Global Optimization, Springer. [Google Scholar]
- [13].Mitchell M (1998). An Introduction to Genetic Algorithms. MIT Press. [Google Scholar]
- [14].Feoktistov V (2006). Differential Evolution: In Search of Solutions. Springer. [Google Scholar]
- [15].Plagianakos VP, Tasoulis DK, and Vrahatis MN (2008). A Review of Major Application Areas of Differential Evolution In Advances in Differential Evolution. Springer, pp 197–238. [Google Scholar]
- [16].Tusar T and Filipic B (2007). Differential Evolution Versus Genetic Algorithms in Multi-objective Optimization. Evolutionary Multi-Criterion Optimization, 4th International Conference, Matshushima, Japan, March 2007, pp 257–271. [Google Scholar]
- [17].Lilla AD, Khan MA, and Barendse P (2013). Comparison of differential evolution and genetic algorithm in the design of permanent magnet generators, Proc. IEEE International Conf. Industr. Tech., Feb. 2013, pp 266–271. [Google Scholar]
- [18].Tayyebi S and Soltanali S (2017). A new approach of GA-based type reduction of interval type-2 fuzzy model for nonlinear MIMO system: Application in methane oxidation process. Chemometrics and Intelligent Laboratory Systems 167, pp 152–160. [Google Scholar]
- [19].Yang Q, Wang M, Xiao H, Yang L, Zhu B, Zhang T, and Zeng X (2015). Feature selection using a combination of genetic algorithm and selection frequency curve analysis. Chemometrics and Intelligent Laboratory Systems 148, pp 106–114. [Google Scholar]
- [20].Mercader AG and Duchowicz PR (2015). Enhanced replacement method integration with genetic algorithms populations in QSAR and QSPR theories. Chemometrics and Intelligent Laboratory Systems 149, pp 117–122. [Google Scholar]
- [21].Yu K, Wang X, and Wang Z (2015). Self-adaptive multi-objective teaching-learning-based optimization and its application in ethylene cracking furnace operation optimization. Chemometrics and Intelligent Laboratory Systems 146, pp 198–210. [Google Scholar]
- [22].Salcedo-Sanz S Portilla-Figueras JA, Ortiz-Garcia EG, Perez-Bellido AM, Garcia-Herrera R, and Elorrieta JI (2009). Spatial regression analysis of NOx and O3 concentrations in Madrid urban area using Radial Basis Function networks. Chemometrics and Intelligent Laboratory Systems 99, pp 79–90. [Google Scholar]
- [23].Deeb O, da Cunha EFF, Cormanich RA, Ramalho TC, and Freitas MP (2012). Computer-assisted assessment of potentially useful non-peptide HIV-1 protease inhibitors. Chemometrics and Intelligent Laboratory Systems 116, pp 123–127. [Google Scholar]
- [24].Cizek P (2008) Robust and Efficient Adaptive Estimation of Binary-Choice Regression Models, Journal of the American Statistical Association, 103, pp 687–696. [Google Scholar]
- [25].Miao H, Wu H and Xue H (2014). Generalized Ordinary Differential Equation Models. Journal of the American Statistical Association, 109, pp 1672–1682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Favaro S and James LF (2015). A note on nonparametric inference for species variety within Gibbs-type priors. Electronic Jounal of Statistics, 9, pp 2884–2902. [Google Scholar]
- [27].Wahab MN, Nefti-Meziani S and Atyabi A (2015). A Comprehensive Review of Swarm Optimization Algorithms. PLoS ONE 10(5). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Storn R (1996). On the usage of differential evolution for function optimization. NAFIPS 1996 Biennial Conference of the North American Fuzzy Information Processing Society 5, pp 519–523. [Google Scholar]
- [29].Zaharie D (2002). Parameter adaptation in differential evolution by controlling the population diversity. Proceedings of the international workshop on symbolic and numeric algorithms for scientific computing, pp 385–397. [Google Scholar]
- [30].Fan Q and Zhang Y (2016). Self-adaptive differential evolution algorithm with crossover strategies adaptation and its application in parameter estimation. Chemometrics and Intelligent Laboratory Systems 151, pp 164–171. [Google Scholar]
- [31].Kiefer JC (1959). Optimum experimental designs. J. Roy. Stat. Soc. B 21, pp 272–319. [Google Scholar]
- [32].Silvey SD (1980). Optimal Design, Chapman and Hall. [Google Scholar]
- [33].Chernoff H (1953). Locally Optimal Designs for Estimating Parameters. Annals of Mathematical Statistics 24, pp 586–602. [Google Scholar]
- [34].Kiefer JC (1961). Optimum designs in regression problems, II. Annals of Mathematical Statistics 32, pp 298–325. [Google Scholar]
- [35].Fedorov VV (1972). Theory of Optimal Designs, Academic Press. [Google Scholar]
- [36].Mullen KM, Ardia D, Gil D, Windover D, and Cline J (2011). DEoptim: An R Package for Global Optimization by Differential Evolution. Journal of Statistical Software 40, pp 1–26. [Google Scholar]
- [37].Arrhenius SA (1889). Über die Dissociationswärme und den Einflu der Temperatur auf den Dissociationsgrad der Elektrolyte. Z. Phys. Chem 4, pp 96–116. [Google Scholar]
- [38].Stráská J, Stráský J, and Jancek M (2015). Activation Energy for Grain Growth of the Isochronally Annealed Ultrafine Grained Magnesium Alloy after Hot Extrusion and Equal-Channel Angular Pressing (EX-ECAP). Proceedings of the International Symposium on Physics of Materials 128, pp 578–581. [Google Scholar]
- [39].Qin Z, Balasubramanian SK, Wolkers WF, Pearce JA, and Bischof JC (2014). Correlated Parameter Fit of Arrhenius Model for Thermal Denaturation of Proteins and Cells. Ann. Biomed Eng. 42, pp 2392–2404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Marsi I and Seres L (2000). Determination of the Arrhenius parameters of the decomposition of azoisopropane: investigation of possible systematic errors via computer simulation. Chemometrics and Intelligent Laboratory Systems 50, pp 53–61. [Google Scholar]
- [41].Lippmann HH, Jesser B, and Schurath U (1980). The Rate Constant of NO + O3 → NO2 + O2 in the Temperature Range of 283–443K. International Journal of Chemical Kinetics 7, pp 547–554. [Google Scholar]
- [42].Jet Propulsion Laboratory, Chemical Kinetics and Photochemical Data for Use in Atmospheric Studies. California Institute of Technology, NASA, February 2003, Evaluation Number 14, Pasadena, California. [Google Scholar]
- [43].Rodríguez-Aragóna LJ and López-Fidalgo J (2005). Optimal designs for the Arrhenius equation. Chemometrics and Intelligent Laboratory Systems 77, pp 131–138. [Google Scholar]
- [44].Ray GW and Watson RT (1981). Kinetics of the Reaction NO + O3 → NO2 +O2 from 212 to 422 K. Journal of Physical Chemistry 85, pp 1673–1676. [Google Scholar]
- [45].Rodríguez-Díaz JM and Santos-Martín MT (2008). Study of the best designs for modifications of the Arrhenius equation. Chemometrics and Intelligent Laboratory Systems 95, pp 199–208. [Google Scholar]
- [46].Atkinson AC and Bogacka B (1997). Compound D- and Ds-Optimum Designs for Determining the Order of a Chemical Reaction. Technometrics 39, pp 347–356. [Google Scholar]
- [47].Furlanetto S, Cirri M, Piepel G, Mennini N, and Mura P (2011). Mixture experiment methods in the development and optimization of microemulsion formulations. Journal of Pharmaceutical and Biomedical Analysis 55, pp 610–617. [DOI] [PubMed] [Google Scholar]
- [48].Scheffé H (1958). Experiments With Mixtures. Journal of the Royal Statistical Society, Series B. 20, pp 344–360. [Google Scholar]
- [49].Tasoulis DK, Pavlidis NG, Plagianakos VP and Vrahatis MN (2004). Parallel Differential Evolution. Proceedings of the 2004 Congress on Evolutionary Computation, pp 2023–2029. [Google Scholar]
- [50].Chen P-Y, Chen R-B, Tung H-C and Wong WK (2017). Standardized maximim D-optimal designs for enzyme kinetic inhibition models. Chemometrics and Intelligent Laboratory Systems 169, pp 79–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Bogacka B, Patan M, Johnson PJ, Youdim K, and Atkinson AC (2011). Optimum Design of Experiments for Enzyme Inhibition Kinetic Models. Journal of Biopharmaceutical Statistics. 21(3), pp 555–572. [DOI] [PubMed] [Google Scholar]
- [52].Ruseckaite A, Goos P and Fok D (2017). Bayesian D-optimal choice designs for mixtures. Applied Statistics 66, pp 363–386. [Google Scholar]
- [53].Zhang J and Sanderson AC (2007). JADE: Self-adaptive differential evolution with fast and reliable convergence performance. 2007 IEEE Congress on Evolutionary Computation, Singapore, 2007, pp 2251–2258. [Google Scholar]
- [54].Qin AK and Suganthan PN (2005). Self-adaptive differential evolution algorithm for numerical optimization. 2005 IEEE Congress on Evolutionary Computation 2, Edinburgh, Scotland, pp 1785–1791. [Google Scholar]
- [55].Tanabe R, Fukunaga A (2013). Success-history based parameter adaptation for differential evolution. 2013 IEEE Congress on Evolutionary Computation (CEC), pp 71–78. [Google Scholar]
- [56].Robic T and Filipic B (2005). DEMO: Differential Evolution for Multiobjective Optimization. International Conference on Evolutionary Multi-Criterion Optimization EMO 2005: Evolutionary Multi-Criterion Optimization, pp 520–533. [Google Scholar]
- [57].Long W, Liang X, Huang Y and Chen Y (2013). A hybrid differential evolution augmented Lagrangian method for constrained numerical and engineering optimization. Computer-Aided Design 45, pp 1562–1574. [Google Scholar]
- [58].Saha I, Maullik U, Lukasik M and Plewczynski D (2014). Multiobjective Differential Evolution: A Comparative Study on Benchmark Problems. Man-Machine Interactions 3, pp 529–536. [Google Scholar]
- [59].Chen B, Zeng W, Lin Y and Zhong W (2014). An Enhanced Differential Evolution Based Algorithm with Simulated Annealing for Solving Multiobjective Optimization Problems. Journal of Applied Mathematics 2014. [Google Scholar]
- [60].Yu X, Cao J, Shan H, Zhu L and Guo J (2014). An Adaptive Hybrid Algorithm Based on Particle Swarm Optimization and Differential Evolution for Global Optimization. The Scientific World Journal 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [61].Sahin O and Akay B (2016). Comparisons of metaheuristic algorithms and fitness functions on software test data generation. Applied Soft Computing, 49, pp 1202–1214. [Google Scholar]
- [62].Xu Weinan, Wong WK, Tan KC and Xu JX (2019). Finding High-Dimensional D-Optimal Designs for Logistic Models via Differential Evolution. IEEE Access 7, pp 7133–7146. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
