Progress analysis of a multi-recombinative evolution strategy on the highly multimodal Rastrigin function

Amir Omeradzic; Hans-Georg Beyer

doi:10.1016/j.tcs.2023.114179

. Author manuscript; available in PMC: 2024 Feb 16.

Published in final edited form as: Theor Comput Sci. 2023 Sep 22;978:114179. doi: 10.1016/j.tcs.2023.114179

Progress analysis of a multi-recombinative evolution strategy on the highly multimodal Rastrigin function^✩

Amir Omeradzic ^1,^*, Hans-Georg Beyer ¹

PMCID: PMC7615653 EMSID: EMS189094 PMID: 38371370

Abstract

A first and second order progress rate analysis was conducted for the intermediate multi-recombinative Evolution Strategy (μ/μ_I, λ)-ES with isotropic scale-invariant mutations on the highly multimodal Rastrigin test function. Closed-form analytic solutions for the progress rates are obtained in the limit of large dimensionality and large populations. The first order results are able to model the one-generation progress including local attraction phenomena. Furthermore, a second order progress rate is derived yielding additional correction terms and further improving the progress model. The obtained results are compared to simulations and show good agreement, even for moderately large populations and dimensionality. The progress rates are applied within a dynamical systems approach, which models the evolution using difference equations. The obtained dynamics are compared to real averaged optimization runs and yield good agreement. The results improve further when dimensionality and population size are increased. Local and global convergence is investigated within given model showing that large mutations are needed to maximize the probability of global convergence, which comes at the expense of efficiency. An outlook regarding future research goals is provided.

Keywords: Evolution strategy, Progress rate, Global optimization, Rastrigin function

1. Introduction

The theoretical analysis of the performance of Evolution Strategies (ES) [8] optimizing functions f (y) in real-valued N-dimensional search spaces y ∈ ℝ^N is a challenge. This is due to the probabilistic nature of these algorithms allowing up to now the dynamic progress analysis only on simple test functions such as the sphere model [2,5], the ridge function class [3,14], and the ellipsoid model [7]. These test functions are simple w.r.t. their optimization landscape (also referred to as fitness landscape) in that they have at most one optimizer (i.e., the location y of the optimum). Analyzing the dynamical behavior of ES on more complex and multimodal test functions appears to be even more demanding. However, ES and other evolutionary algorithms are especially designated to optimize such problems. There is empirical evidence that ES are able to globally optimize highly multimodal optimization problems [11] with in N exponential number of local optima. The question arises how and when these ES are able to locate the global optimizer. It is the long term goal to find conditions the ES must fulfill to not get trapped in the vast amount of local optimizers. Ideally, a theoretical analysis should provide the answers regarding the success probability P_S (of locating the global optimum) depending on the ES parameters such as the population size λ and the test function to be optimized. Furthermore, one is interested in the computational complexity of the optimization process.

One approach successfully applied to the analysis of the ES-performance on simple unimodal test functions mentioned above is the dynamical systems approach [5] which is based on progress rate analysis. The progress rate is a measure of expected positional change in search space between two generations depending on location, strategy and test function parameters. The idea of investigating global search behavior from expected local progress was successfully applied, among others, in [3,7]. It will be shown in this paper that this approach can be extended to the highly multimodal Rastrigin test function

f (y) = \sum_{i = 1}^{N} f_{i} (y_{i}) = \sum_{i = 1}^{N} [y_{i}^{2} + A (1 - \cos (α y_{i}))],

(1)

where y ∈ ℝ^N, with oscillation amplitude A and frequency parameter α. The i-th fitness component in Eq. (1) is defined as

f_{i} (y_{i}) : = y_{i}^{2} + A (1 - \cos (α y_{i})) .

(2)

Depending on A and α a finite number of local minima M can be observed for each component i. Therefore, the overall number of local minima is scaling as M^N posing a highly multimodal minimization problem with the global optimizer located at ŷ = 0. An exemplary optimization landscape of the Rastrigin function is shown in Fig. 1.

Fig. 1 — The heat map shows the optimization landscape for A = 1, α = 2π, and N = 2. The global minimizer located at the origin (dark blue) is surrounded by multiple local minima. On the right side the same parameter set is shown for N = 1. For increasing y the oscillation contribution is decreasing. (For interpretation of the colors in the figure(s), the reader is referred to the web version of this article.)

The remarkable observation is that ES – unlike classical nonlinear optimization algorithms (e.g. BFGS) – do not follow the local gradient or Hessian ending in one of the M^N − 1 local optimizers. That is, ES perform a rather global search. A deeper understanding of this behavior is still missing. Recently, attempts have been made to analyze the problem from the viewpoint of relaxation using kernel smoothing [15]. However, the sampling process needed to transform the original problem into a convex optimization problem is still lacking a link to the ES.

In this paper a simplified and scale-invariant (μ/μ_I, λ)-ES, see Algorithm 1, is analyzed with step-size control defined in Eq. (4). Starting from the so-called parental centroid vector y^(g) a population of λ offspring are generated by adding isotropic Gaussian mutations x ~ σ𝒩(0, 1) with mutation strength σ in Lines 6 and 7. Thereafter, the fitness is evaluated in Line 8. Selection of the μ best individuals is done in Line 10. It is performed for a given selection (truncation) ratio defined as

ϑ : = \frac{μ}{λ},

(3)

with ϑ ∈ (0, 1). It will be an essential quantity for the progress rate results in the limit of large population sizes. Using intermediate recombination with equal weights the best m = 1, …, μ individuals are recombined in Line 11 and the new parental centroid y^(g+1) is obtained. In the following, the subscript “m; λ” can be read as the m-th best solution out of λ candidate solutions. In Line 12 the simplified step-size adaptation is performed. To this end, a constant normalized mutation σ* using the spherical normalization with ‖y^(g)‖ = R^(g) is defined as

σ^{*} : = \frac{σ^{(g)} N}{‖ y^{(g)} ‖} = \frac{σ^{(g)} N}{R^{(g)}} .

(4)

This property ensures scale invariance and therefore global convergence of the algorithm, as the mutation strength σ^(g) decreases if and only if the residual distance R^(g) decreases. The quantity σ* is unknown during black-box optimizations, but it is very useful for theoretical investigations to obtain scale-invariant mutations strengths.

Algorithm 1. (μ/μ_I, λ)-ES with constant σ*.

1: g ← 0

2: y⁽⁰⁾ ← y^(init)

3: σ⁽⁰⁾ ← σ*‖y⁽⁰⁾‖/N

4: repeat

5: for l ← 1, …, λ do

6: ${\tilde{x}}_{l} \leftarrow σ^{(g)} 𝒩_{l} (0, 1)$

7: ${\tilde{y}}_{l} \leftarrow y^{(g)} + {\tilde{x}}_{l}$

8: ${\tilde{f}}_{l} \leftarrow f ({\tilde{y}}_{l})$

9: end for

10: (ỹ_1;λ,…, ỹ_μ;λ) ← sort (ỹ w.r.t. ascending $\tilde{f}$ )

11: $y^{(g + 1)} \leftarrow \frac{1}{μ} \sum_{m = 1}^{μ} {\tilde{y}}_{m; λ}$

12: σ^(g+1) ← σ*‖y^(g+1)‖/N

13: g ← g + 1

14: until termination criterion

The remainder of this paper is organized as follows. In the next section the local performance measures will be introduced being the basis for both the progress rate analysis and the dynamical systems approach. Section 3 is devoted to the determination and evaluation of the first order progress rate. Section 4 describes the derivation of the second order progress rate, which will rely on first order progress rate results. Section 5 uses the local performance measures to establish the evolution equations that govern the dynamical behavior of the ES. Experiments will be presented to show the usefulness of the approach. In the final Section 6 conclusions will be drawn and being based on open problems the further research direction will be outlined.

2. Local performance measures and quality gain distribution

The performance of an ES between two generations can be evaluated in both fitness and search space. The quality gain Q_y(x) of fitness f at a position y^(g) due to an isotropic mutation x ~ σ𝒩(0, 1) is defined as

Q_{y} (x) : = f (y^{(g)} + x) - f (y^{(g)}),

(5)

and yields in the case of fitness improvement (minimization considered) a negative value Q_y < 0. The definition (5) measures the fitness change before selection and will be needed for the evaluation of the two progress rates (7) and (8). The quality gain components are decomposed using f_i from Eq. (2) as Q_i := f_i(y_i + x_i) − f_i(y_i), such that

Q_{y} (x) = \sum_{i = 1}^{N} Q_{i} (x_{i}) = \sum_{i = 1}^{N} [f_{i} (y_{i}^{(g)} + x_{i}) - f_{i} (y_{i}^{(g)})] .

(6)

That is, the quality gain corresponds to the difference between fitness values before and after the mutation application. A probabilistic model for the distribution of quality values will be presented below. It will be important for the subsequent progress rate derivations, as selection is based on fitness values.

Analyzing the progress towards the optimizer in search space, the first order progress rate on the Rastrigin function has already been investigated in [17] as a first approach. In this paper, a new approach is presented which significantly improves the prediction quality.

The first order progress rate between two generations for the parental component y_i is defined as

φ_{i} : = E [y_{i}^{(g)} - y_{i}^{(g + 1)} ∣ y^{(g)}, σ^{(g)}],

(7)

given parental position y^(g) and mutation strength σ^(g) at generation g. It is a measure of expected positional difference in search space. Positive expected progress φ_i > 0 is defined in the case $y_{i}^{(g)} > E [y_{i}^{(g + 1)}]$ for $y_{i}^{(g)} > 0$ and $E [y_{i}^{(g + 1)}] > 0$ . In this case the distance to the optimizer ŷ_i = 0 is reduced in expectation. This assumption is only valid as long as the sign of $E [y_{i}^{(g + 1)}]$ does not change, i.e., for small mutations compared to the residual distance. Therefore φ_i has limited applicability when studying the convergence behavior in the vicinity of the optimizer. As has been shown in [7] regarding the performance analysis on the ellipsoid model, a second order progress rate is needed. It is defined as

φ_{i}^{II} : = E [{(y_{i}^{(g)})}^{2} - {(y_{i}^{(g + 1)})}^{2} ∣ y^{(g)}, σ^{(g)}] .

(8)

Squaring the positions yields $φ_{i}^{II} > 0$ independent of the sign, if the distance to ŷ_i = 0 decreases in expectation. Additionally, the derivation will yield expressions containing a progress gain and loss part, which is necessary for a more accurate model of convergence. Both progress rates will be expressed using integral equations for the expected values and approximations will be necessary to find closed-form solutions. In a second step the progress rates can be applied within difference equations to model the expected dynamics over many generations in order to investigate the global convergence behavior.

The selection of individuals is based on the attained fitness values. The quality gain measures the fitness change before selection according to (5). When the progress rate of an ES is modeled, the cumulative distribution function (CDF) P_Q (q) of the quality gain and its probability density function (PDF) p_Q (q) are needed as a function of y and σ. Obtaining an exact CDF for Q_y(x) is not feasible at this point. Since $Q_{y} (x) = \sum_{i = 1}^{N} Q_{i} (x_{i})$ with independent random variables Q_i, the application of the Central Limit Theorem seems appropriate to show that the distribution is asymptotically normal.¹ However, proving its validity rigorously seems hard or even impossible for arbitrary y. Therefore, we resort to normality as an approximation for the quality gain distribution. This is backed up by experimental results in Fig. 2, where sampled Q_y(x)-values are compared to the normal approximation. A standard Anderson-Darling test was performed to check whether the sampled data was drawn from a normal distribution with known mean and variance according to (9). The hypothesis test fails to reject the normality assumption at p-values p = 0.48 (left) and p = 0.53 (right), where rejection is usually defined for p < 0.05. Even at relatively small N = 10 the results agree well. Good experimental agreement is also observed for the variation of the location y and mutation strength σ (not shown). Therefore, the normality assumption does not pose a strong restriction on the overall prediction quality of the progress rates in the subsequent sections, such that we approximate

Q_{y} (x) = \sum_{i = 1}^{N} Q_{i} (x_{i}) ~ 𝒩 (E [Q_{y} (x)], Var [Q_{y} (x)]) .

(9)

Fig. 2 — The histograms show sampled values of Q_y(x) from (5) with fixed y by applying random mutations x_k ~ σ𝒩(0, 1) (σ = 1 with k = 1, …, 10⁴ samples) at N = 10 (left) and N = 100 (right) with A = 10. The y-values were initialized randomly at ‖y‖ = 10 where local attraction is significant. The red envelope curves show the respective normal approximation (9) using mean value (30) and variance (31). The p-values of the Anderson-Darling-test for normality are p = 0.48 (left) and p = 0.53 (right).

Furthermore, the following abbreviations are introduced

E_{Q} := E [Q_{y} (x)] = \sum_{i = 1}^{N} E [Q_{i}]

(10)

D_{Q}^{2} : = Var [Q_{y} (x)] = \sum_{i = 1}^{N} Var [Q_{i}] .

(11)

At this point an additional assumption for the coordinates y = (y₁, …, y_N) has to be made to justify subsequent variance approximations (13) and (14). Given the search vector y = (y₁, …, y_N) and residual distance R² = ‖y‖² it is assumed that the components contribute approximately equally (in expectation) to the residual distance, i.e., there is no dominating component, such that

\frac{y_{i}^{2}}{R^{2}} \approx \frac{1}{N}, for all i = 1, \dots, N .

(12)

Property (12) will also be referred to as component equipartition. The concept was introduced in [6] and proven for the noisy ellipsoid in [12]. Its applicability to the Rastrigin function was shown in [19]. The equipartition assumption is necessary in order to justify certain approximation steps and to provide a closed-form solution for the progress rate. Furthermore, it will be a reasonable assumption to obtain a model of the algorithm’s progress and dynamics in expectation. This assumption also justifies a linear scaling of the variance with dimensionality N provided that the components are contributing equally to the overall variance, such that

D_{Q}^{2} = \sum_{i = 1}^{N} Var [Q_{i}] = Θ (N) .

(13)

Additionally, for large N an important approximation will be used for the variance to significantly simplify the obtained lengthy results. If no single i-th component is dominating the sum, i.e., Var [Q_i] / Σ_j≠i Var [Q_j] → 0 (for any i in the limit N → ∞), the contribution of a single term is negligible for N → ∞. Therefore, the two sums over N and N − 1 terms, respectively, are asymptotically equal with

D_{Q}^{2} = \sum_{i = 1}^{N} Var [Q_{i}] ≃ \sum_{j \neq i} Var [Q_{j}] = D_{i}^{2} .

(14)

Note that quantity $D_{i}^{2}$ is formally introduced in (20). Returning to Eq. (9), the expression is rewritten using a standardized random variate Z as

Z = \frac{Q_{y} (x) - E_{Q}}{D_{Q}} \overset{N \to \infty}{~} 𝒩 (0, 1) .

(15)

Approximation 1 (Quality gain distribution)

The local quality gain at position y due to random mutation vector x ~ 𝒩(0, σ²1) is approximately normally distributed. Therefore, P_Q (q) and p_Q (q) can be approximated as

{\tilde{P}}_{Q} (q) = Φ (\frac{q - E_{Q}}{D_{Q}})

(16)

{\tilde{p}}_{Q} (q) = \frac{1}{\sqrt{2 π} D_{Q}} \exp [- \frac{1}{2} {(\frac{q - E_{Q}}{D_{Q}})}^{2}] .

(17)

Within the normal approximation (16) the inverse ${\tilde{P}}_{Q}^{- 1} (p)$ given some probability p can be easily obtained by using the quantile function Φ⁻¹(p) of the normal distribution. This relation will be used later to obtain a quality gain for some given probability p using

q = E_{Q} + D_{Q} Φ^{- 1} (p) .

(18)

For the derivation of the i-th component progress rate the conditional distribution function P_Q (q|x_i) of the quality gain is needed for a given component x_i. In this case expected value and variance are given by

E_{Q ∣ x_{i}} := E [Q_{y} (x) ∣ x_{i}] = Q_{i} (x_{i}) + \sum_{j \neq i} E [Q_{j}]

(19)

D_{i}^{2} : = Var [Q_{y} (x) ∣ x_{i}] = \sum_{j \neq i} Var [Q_{j}],

(20)

where the sum j ≠ i is taken for fixed i over the remaining N − 1 components. Therefore, a normal approximation for the conditional CDF is introduced using (19) and (20).

Approximation 2 (Quality gain distribution given x_i)

The quality gain distribution at position y given fixed mutation component x_i and random mutation vector (x)_j≠i ~ (𝒩(0, σ²1))_j≠i is approximately normally distributed. Therefore, P_Q (q|x_i) and p_Q (q|x_i) can be approximated as

{\tilde{P}}_{Q} (q ∣ x_{i}) = Φ (\frac{q - E_{Q ∣ x_{i}}}{D_{i}})

(21)

{\tilde{P}}_{Q} (q ∣ x_{i}) = \frac{1}{\sqrt{2 π} D_{i}} \exp [- \frac{1}{2} {(\frac{q - E_{Q ∣ x_{i}}}{D_{i}})}^{2}] .

(22)

Having derived approximations of the quality gain distribution functions, the quantities E [Q_i] and Var [Q_i] remain to be determined. As the components are independent, it is sufficient to consider a single component and then perform the summation. Starting from definition (6), one can evaluate the quality gain of a single component Q_i(x_i). After applying trigonometric identity cos (α(y_i + x_i)) = cos (αy_i) cos (αx_i) − sin (αy_i) sin (αx_i), one gets

Q_{i} (x_{i}) = f_{i} (y_{i} + x_{i}) - f_{i} (y_{i})

(23)

= x_{i}^{2} + 2 y_{i} x_{i} + A \cos (α y_{i}) - A \cos (α y_{i}) \cos (α x_{i}) + A \sin (α y_{i}) \sin (α x_{i}),

(24)

of which E [Q_i] and $Var [Q_{i}] = E [Q_{i}^{2}] - E {[Q_{i}]}^{2}$ need to be evaluated. The results will be expressed as expected values containing trigonometric functions. As a remark, terms containing moments of x_i ~ 𝒩(0, σ²), i.e., $E [x_{i}^{k}]$ with k ≥ 1, are silently evaluated as they are assumed to be widely known. Starting with E[Q_i] one has

E[Q_{i}] = σ^{2} + A \cos (α y_{i}) (1 - E[\cos (α x_{i})]),

(25)

where odd powers of $E [x_{i}^{k}] = 0$ , which also yields E[sin (αx_i)] = 0. Evaluating Var [Q_i] yields

\begin{array}{l} Var [Q_{i}] = E [Q_{i}^{2}] - E[Q_{i}]^{2} \\ = 2 σ^{4} + 4 y_{i}^{2} σ^{2} + A^{2} \sin^{2} (α y_{i}) Var [\sin (α x_{i})] \\ + A^{2} \cos^{2} (α y_{i}) Var [\cos (α x_{i})] - 2 A \cos (α y_{i}) E [x_{i}^{2} \cos (α x_{i})] \\ + 2 A σ^{2} \cos (α y_{i}) E[\cos (α x_{i})] + 4 A y_{i} \sin (α y_{i}) E[x_{i} \sin (α x_{i})] . \end{array}

(26)

Expectations of the form $E [x_{i}^{k} \cos α x_{i}]$ and $E [x_{i}^{k} \sin α x_{i}]$ for k ≥ 0 can be obtained by using the definition of the characteristic function χ of a random variate x ~ 𝒩(μ, σ²) and its known result [1]

χ_{x} (α) = E [e^{ı α x}] = e^{ı α μ - \frac{1}{2} α^{2} σ^{2}} = e^{- \frac{1}{2} α^{2} σ^{2}} [\cos (α μ) + ı \sin (α μ)],

(27)

with the imaginary unit denoted by $ı = \sqrt{- 1}$ in (27) and (28). Now the k-th derivatives with respect to α can be applied to both sides

\begin{array}{l} \frac{d^{k}}{d α^{k}} E [e^{ı α x}] = E [\frac{d^{k}}{d α^{k}} e^{ı α x}] = E [\frac{d^{k}}{d α^{k}} \cos (α x)] + ı E [\frac{d^{k}}{d α^{k}} \sin (α x)] \\ \overset{!}{=} \frac{d^{k}}{d α^{k}} [e^{- \frac{{(α σ)}^{2}}{2}} [\cos (α μ) + ı \sin (α μ)]], \end{array}

(28)

such that corresponding real and imaginary parts can be identified by comparing both sides (denoted by $\underset{=}{!}$ ) of Eq. (28). Given μ = 0 for k = {0, 1, 2} the required expectations of trigonometric terms can be derived. Additionally, trigonometric identities cos²(x) = 1/2 + cos(2x)/2 and sin²(x) = 1/2 − cos(2x)/2 are used. The results are

\begin{array}{c} E [\cos (α x)] = e^{- \frac{{(α σ)}^{2}}{2}}, E [\cos^{2} (α x)] = \frac{1}{2} + \frac{1}{2} e^{- \frac{{(2 α σ)}^{2}}{2}} \\ E [\sin^{2} (α x)] = \frac{1}{2} - \frac{1}{2} e^{- \frac{{(2 α σ)}^{2}}{2}}, E [x \sin (α x)] = α σ^{2} e^{- \frac{{(α σ)}^{2}}{2}} \\ E [x^{2} \cos (α x)] = (σ^{2} - α^{2} σ^{4}) e^{- \frac{{(α σ)}^{2}}{2}}, Var [(\cdot)] = E [{(\cdot)}^{2}] - E {[(\cdot)]}^{2} . \end{array}

(29)

Inserting relations (29) into (25) and (26), summing over all N components and collecting the resulting terms one obtains the expected value

E_{Q} = \sum_{i = 1}^{N} [σ^{2} + A \cos (α y_{i}) (1 - e^{- \frac{{(α σ)}^{2}}{2}})] .

(30)

Analogously, the variance of the Rastrigin quality gain yields

\begin{array}{l} D_{Q}^{2} & = & \sum_{i = 1}^{N} [4 σ^{2} y_{i}^{2} + 2 σ^{4} + \frac{A^{2}}{2} (1 - e^{- {(α σ)}^{2}}) (1 - \cos (2 α y_{i}) e^{- {(α σ)}^{2}}) \\ + 2 A α σ^{2} e^{- \frac{1}{2} {(α σ)}^{2}} (α σ^{2} \cos (α y_{i}) + 2 y_{i} \sin (α y_{i}))] . \end{array}

(31)

The quantities E_{Q|x_i} from (19) and $D_{i}^{2}$ from (20) are given analogously by summing over N − 1 components. Expressions E_Q and D_Q could be inserted into (16), and E_{Q|x_i} with Q_i(x_i) and D_i into (21). However, it is omitted at this point for better readability.

As an important remark, expression (23) can be linearized w.r.t. mutation x_i to obtain analytically solvable progress rate integrals, see also discussion after Eq. (51). Taylor-expanding f_i around y_i for small x_i gives $f_{i} (y_{i} + x_{i}) = f_{i} (y_{i}) + \frac{\partial f_{i}}{\partial y_{i}} x_{i} + O (x_{i}^{2})$ , such that after setting $f_{i}^{'} : = \frac{\partial f_{i}}{\partial y_{i}}$ and evaluating the derivative one has

\begin{array}{l} Q_{i} (x_{i}) = f_{i} (y_{i} + x_{i}) - f_{i} (y_{i}) = f_{i}^{'} x_{i} + O (x_{i}^{2}) \\ = (2 y_{i} + α A \sin (α y_{i})) x_{i} + O (x_{i}^{2}) = (k_{i} + d_{i}) x_{i} + O (x_{i}^{2}), \end{array}

(32)

with following definitions applied to (32)

f_{i}^{'} : = k_{i} + d_{i}, with k_{i} : = 2 y_{i}, and d_{i} : = α A \sin (α y_{i}) .

(33)

Component k_i is the derivative of the quadratic term $y_{i}^{2}$ , cf. Eq. (2), which follows the global quadratic structure of the function. Conversely, derivative d_i follows the local oscillation, such that it will be very important for the model of local attraction during the progress rate derivations in Secs. 3 and 4.

3. First order progress rate

While the first order progress rate (7) does not suffice to completely describe the convergence behavior of the ES on Rastrigin, see Sec. 5, it is a necessary step in the calculation of the second order progress rate in Sec. 4. Given definition (7) and the parental location y^(g), one has to find the expected value over the i-component location $E [y_{i}^{(g + 1)}]$ . The positional update y^(g) → y^(g+1) performed by the ES is realized by consecutively applying mutation, selection, and recombination (see Algorithm 1), such that one can write

y^{(g + 1)} = \frac{1}{μ} \sum_{m = 1}^{μ} (y^{(g)} + x_{m; λ}) = y^{(g)} + \frac{1}{μ} \sum_{m = 1}^{μ} x_{m; λ},

(34)

where x_m;λ denotes the mutation vector of the m-th best offspring after selection. Considering the i-th component of Eq. (34), abbreviating the mutation component as x_m;λ := (x_m;λ)_i, and taking the expected value thereof yields

E [y_{i}^{(g + 1)} ∣ y^{(g)}, σ^{(g)}] = y_{i}^{(g)} + \frac{1}{μ} \sum_{m = 1}^{μ} E [x_{m; λ} ∣ y^{(g)}, σ^{(g)}] .

(35)

The progress rate can therefore be evaluated by inserting (35) into (7) giving

φ_{i} = - \frac{1}{μ} \sum_{m = 1}^{μ} E [x_{m; λ} ∣ y^{(g)}, σ^{(g)}] .

(36)

Before starting the derivation of (36), the important large population theorem is stated which will be used during the derivation of both first and second order progress rate. Its application also yields the so-called asymptotic generalized progress coefficients presented in Eq. (45).

Theorem 1

Let λ > μ + 1 and μ > a with a ≥ 1 and ϑ = μ/λ with 0 < ϑ < 1, such that t^λ−μ−1(1 − t)^μ−a exhibits its maximum on (0, 1) and vanishes at t ∈ {0, 1}. Let f_x(t) be a function defined for constant x ∈ ℝ, such that f_x : [0, 1] → [0, 1] with bounded derivatives on [0, 1] and let B denote the beta function. Furthermore, let p_x denote the PDF of a normally distributed variate and let p_n(x) denote a polynomial of degree n in x. For infinitely large μ, λ → ∞ and constant ϑ = μ/λ the following limit holds

\begin{array}{l} \lim_{\begin{matrix} μ, λ \to \infty \\ ϑ = const . \end{matrix}} & \int_{- \infty}^{\infty} p_{n} (x) p_{x} (x) \frac{1}{B (λ - μ, μ)} \int_{0}^{1} t^{λ - μ - 1} {(1 - t)}^{μ - a} f_{x} (t) d t d x \\ = \frac{1}{ϑ^{a - 1}} \int_{- \infty}^{\infty} p_{n} (x) p_{x} (x) f_{x} (1 - ϑ) d x . \end{array}

(37)

Proof

The dominated convergence theorem is applied. First, the following sequence is defined for μ = 1, 2, …, with λ(μ) = μ/ϑ and constant ϑ

g_{μ} (x) : = \frac{1}{B (λ - μ, μ)} \int_{0}^{1} t^{λ - μ - 1} (1 - t)^{μ - a} f_{x} (t) d t .

(38)

Note that g_μ is measured over the density of the normal distribution. In [18] it was shown that g_μ(x) converges for any x according to

\lim_{\begin{matrix} μ, λ \to \infty \\ ϑ = const . \end{matrix}} g_{μ} (x) = \frac{f_{x} (1 - ϑ)}{ϑ^{a - 1}} .

(39)

An upper bound of g_μ can be estimated using 0 ≤ f_x ≤ 1 and the definition of the beta function $B (z_{1}, z_{2}) = \int_{0}^{1} t^{z_{1} - 1} {(1 - t)}^{z_{2} - 1} d t$ as

\begin{array}{l} | g_{μ} (x) | \leq \frac{B (λ - μ, μ - a + 1)}{B (λ - μ, μ)} = \frac{(λ - μ - 1)! (μ - a)!}{(λ - a)!} \frac{(λ - 1)!}{(λ - μ - 1)! (μ - 1)!} \\ = \frac{(λ - 1) (λ - 2) \dots (λ - a + 1) (λ - a)!}{(μ - 1) (μ - 2) \dots (μ - a + 1) (μ - a)!} \frac{(μ - a)!}{(λ - a)!} \\ = {(\frac{λ}{μ})}^{a - 1} \frac{(1 - 1 / λ) \dots (1 - (a - 1) / λ)}{(1 - 1 / μ) \dots (1 - (a - 1) / μ)} \\ \leq \frac{1}{ϑ^{a - 1}} \frac{1}{{(1 - (a - 1) / μ)}^{a - 1}} . \end{array}

(40)

A lower bound for the denominator of (40) can be given as

{(1 - \frac{a - 1}{μ})}^{a - 1} \geq \frac{1}{a^{a - 1}} .

(41)

Inequality (41) can be shown easily by setting μ = a + k with integers a ≥ 1 and k ≥ 1 (ensuring μ > a). This yields

\begin{array}{r} 1 - \frac{a - 1}{μ} = \frac{a + k - a + 1}{a + k} \geq \frac{1}{a} \\ a k \geq k, \end{array}

(42)

which is fulfilled for any a ≥ 1 and k ≥ 1. Using (41) in (40) one gets

| g_{μ} (x) | \leq {(\frac{a}{ϑ})}^{a - 1} .

(43)

As there is a constant upper bound of |g_μ(x)|, it remains to show that

\begin{array}{l} \int_{- \infty}^{\infty} | p_{n} (x) | p_{x} (x) d x \leq \int_{- \infty}^{\infty} | \sum_{k = 0}^{n} a_{k} x^{k} | p_{x} (x) d x \leq \int_{- \infty}^{\infty} \sum_{k = 0}^{n} | a_{k} | | x^{k} | p_{x} (x) d x \\ \leq 2 \sum_{k = 0}^{n} | a_{k} | \int_{0}^{\infty} x^{k} p_{x} (x) d x < \infty, \end{array}

(44)

which is finite due to normal density p_x(x). Hence, the limit in Eq. (37) can be exchanged with the integral over x. Using the limit of (39) the desired result is obtained.

The limit (39) is readily used in [16] to define the so-called asymptotic generalized progress coefficients for integers a ≥ 1, b ≥ 0, and truncation ratio 0 < ϑ < 1 as

e_{ϑ}^{a, b} : = {[\frac{e^{- \frac{1}{2} {[Φ^{- 1} (ϑ)]}^{2}}}{\sqrt{2 π} ϑ}]}^{a} {[- Φ^{- 1} (ϑ)]}^{b} .

(45)

These are characteristic coefficients describing the progress in the limit μ, λ → ∞ with constant ϑ = μ/λ, and are related to the generalized progress coefficients [5, Eq. (5.112)]. They will reappear during the derivation of both φ_i and $φ_{i}^{II}$ . The derivation of φ_i is presented now.

Proposition 1

Let μ,λ ∈ ℕ with μ ≥ 1 and μ < λ and let p_x denote the PDF of the random mutation x ~ 𝒩 (0,σ²). Let x_m;λ denote the m-th best value (out of λ) of the i-th mutation component (x_m;λ)_i. Furthermore, let P_Q and $P_{Q}^{- 1}$ denote the quality gain CDF (and its inverse), respectively, with B denoting the beta function. Then, the first order component-wise progress rate is given by

\begin{array}{l} φ_{i} = - \frac{1}{μ} \sum_{m = 1}^{μ} E [x_{m; λ}] \\ = - \frac{λ}{μ} \int_{x_{i} = - \infty}^{x_{i} = \infty} x_{i} p_{x} (x_{i}) \frac{1}{B (λ - μ, μ)} \int_{t = 0}^{t = 1} t^{λ - μ - 1} {(1 - t)}^{μ - 1} P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{i}) d t d x_{i} . \end{array}

(46)

Proof

From now on the conditional dependency on y^(g) and σ^(g) will be implicitly assumed as given for better readability of the equations. The expected value of the i-th mutation component x_m;λ after selection can be expressed as an integral over the order statistic density p_m;λ(x_i) of the m-th best individual, such that (36) is rewritten as

φ_{i} = - \frac{1}{μ} \sum_{m = 1}^{μ} E [x_{m; λ}] = - \frac{1}{μ} \sum_{m = 1}^{μ} \int_{- \infty}^{\infty} x_{i} p_{m; λ} (x_{i}) d x_{i} .

(47)

The subsequent task will be to derive the density p_m;λ as a function of mutation and quality gain distributions. Mutations are distributed normally with zero mean and variance σ² according to the normal density

p_{x} (x_{i}) = \frac{1}{\sqrt{2 π} σ} \exp [- \frac{1}{2} {(\frac{x_{i}}{σ})}^{2}] .

(48)

Given mutation x_i (and implicitly position y), a random quality gain value Q is distributed according to a conditional probability density p_Q (q|x_i). Given that the m-th best individual attains a quality gain within [q, q + dq], there must be m − 1 better individuals having a smaller quality value with probability [Pr{Q ≤ q}]^m−1 = [P_Q (q)]^m−1, and λ − m individuals having a larger value with [Pr{Q > q}]^λ−m = [1 − P_Q (q)]^λ−m. To account for all relevant combinations one has $\frac{λ!}{(m - 1)! (λ - m)!}$ , where 1/(m − 1)! and 1/(λ − m)! exclude the irrelevant combinations among the two groups of better and worse individuals, respectively. The conditional density for the m-th individual as a function of the quality gain q yields

p_{Q; m; λ} (q ∣ x_{i}) = \frac{λ!}{(m - 1)! (λ - m)!} p_{Q} (q ∣ x_{i}) P_{Q} (q)^{m - 1} [1 - P_{Q} (q)]^{λ - m} .

(49)

By integrating (49) over all attainable quality gain values q ∈ [q_l, q_u], one arrives at the density

p_{m; λ} (x_{i}) = p_{x} (x_{i}) \frac{λ!}{(m - 1)! (λ - m)!} \int_{q_{l}}^{q_{u}} p_{Q} (q ∣ x_{i}) P_{Q} {(q)}^{m - 1} {[1 - P_{Q} (q)]}^{λ - m} d q .

(50)

Inserting the order statistic density from (50) into the progress rate (47), one obtains the intermediate result

φ_{i} = - \frac{1}{μ} \sum_{m = 1}^{μ} \frac{λ!}{(m - 1)! (λ - m)!} \int_{- \infty}^{\infty} x_{i} p_{x} (x_{i}) \int_{q_{l}}^{q_{u}} p_{Q} (q ∣ x_{i}) P_{Q} {(q)}^{m - 1} {[1 - P_{Q} (q)]}^{λ - m} d q d x_{i} .

(51)

A few important remarks can be made regarding Eq. (51). A closed-form analytic solution cannot be obtained without applying further approximations. It can be approached in an analogous way to the φ_i-derivation of the Ellipsoid in [13] to obtain a solution in terms of the well-known progress coefficient c_μ/μ,λ [5, p. 216]. However, a closed-form solution with this approach requires a linear relation of Q_i w.r.t. x_i, see relation (32). The effect of a linearized quality gain on the progress rate of the Rastrigin function was already studied in [17] and showed that the progress due to local attraction is not modeled correctly, as the oscillation terms have to be either dropped or linearized for small x_i.

Therefore a different approach is followed here assuming the infinite population limit, an approach which was applied within the analysis of functions with noise-induced multi-modality [9]. The approach will yield correction terms including the effects of the trigonometric terms from (24), in contrast to only taking linearized terms from (32). Starting from Eq. (51) and moving the sum including the m-dependent prefactors into the innermost integral yields

φ_{i} = - \frac{λ!}{μ} \int_{- \infty}^{\infty} x_{i} p_{x} (x_{i}) \int_{q_{l}}^{q_{u}} p_{Q} (q ∣ x_{i}) \sum_{m = 1}^{μ} \frac{P_{Q} {(q)}^{m - 1} {[1 - P_{Q} (q)]}^{λ - m}}{(m - 1)! (λ - m)!} d q d x_{i} .

(52)

Now a transformation can be applied for the sum Σ_m(·) yielding an expression as a function of the regularized incomplete beta function [5, p. 147]. One has

\sum_{m = 1}^{μ} \frac{P {(q)}^{m - 1} {[1 - P (q)]}^{λ - m}}{(m - 1)! (λ - m)!} = \frac{1}{(λ - μ - 1)! (μ - 1)!} \int_{0}^{1 - P (q)} t^{λ - μ - 1} {(1 - t)}^{μ - 1} d t .

(53)

Furthermore, one can rewrite the resulting population-dependent factor as follows

\frac{λ!}{μ} \frac{1}{(λ - μ - 1)! (μ - 1)!} = \frac{λ}{μ} \frac{(λ - 1)!}{(λ - μ - 1)! (μ - 1)!} = \frac{λ}{μ} \frac{Γ (λ)}{Γ (λ - μ) Γ (μ)} = \frac{λ}{μ} \frac{1}{B (λ - μ, μ)},

(54)

where we have used the property of the gamma function Γ(n) = (n − 1)! (for any integer n > 0) and the known relation between gamma and beta functions $\frac{Γ (x) Γ (y)}{Γ (x + y)} = B (x, y)$ . These replacements will be useful later. After replacing the sum and refactoring we arrive at the following progress rate integral

φ_{i} = - \frac{λ}{μ} \frac{1}{B (λ - μ, μ)} \int_{x_{i} = - \infty}^{x_{i} = \infty} x_{i} p_{x} (x_{i}) \int_{q = q_{l}}^{q = q_{u}} p_{Q} (q ∣ x_{i}) \int_{t = 0}^{t = 1 - P_{Q} (q)} t^{λ - μ - 1} {(1 - t)}^{μ - 1} d t d q d x_{i} .

(55)

Now the integration order of t and q is exchanged. In Eq. (55) one has the bounds

q_{l} \leq q \leq q_{u}, 0 \leq t \leq 1 - P_{Q} (q) .

(56)

Defining the inverse transformation $q = P_{Q}^{- 1} (1 - t)$ and integrating over t first, one obtains the new ranges

0 \leq t \leq 1, q_{l} \leq q \leq P_{Q}^{- 1} (1 - t) .

(57)

The progress rate yields

φ_{i} = - \frac{λ}{μ} \frac{1}{B (λ - μ, μ)} \int_{x_{i} = - \infty}^{x_{i} = \infty} x_{i} p_{x} (x_{i}) \int_{t = 0}^{t = 1} t^{λ - μ - 1} {(1 - t)}^{μ - 1} \int_{q = q_{l}}^{q = P_{Q}^{- 1} (1 - t)} p_{Q} (q | x_{i}) d q d t d x_{i} .

(58)

Now the innermost integral can be solved using p_Q (q|x_i) = dP_Q (q|x_i)/dq

\int_{q_{l}}^{P_{Q}^{- 1} (1 - t)} p_{Q} (q | x_{i}) d q = P_{Q} (P_{Q}^{- 1} (1 - t) | x_{i}) - P_{Q} (q_{l} | x_{i}) = P_{Q} (P_{Q}^{- 1} (1 - t) | x_{i}),

(59)

where the probability P_Q (q_l|x_i) = Pr(Q ≤ q_l|x_i) = 0 for any lower bound value q_l. Inserting (59) into (58), we arrive at the progress rate integral (46).

Unfortunately a closed-form solution of (46) after inserting Approximation 1 and Approximation 2 for the quality gain CDF is not possible due to the underlying structure of the integrand. Hence, asymptotic approximations will be introduced assuming large populations and large dimensionality to successively simplify the integral in a way that closed-form solutions can be provided. First, the large population theorem will be applied and then the quality gain CDF is inserted. Thereafter, the normal CDF is Taylor-expanded with the first two terms yielding analytically solvable results and higher order terms vanishing as O(1/N). The results are further simplified in the end assuming component equipartition (12), which finally gives the progress rate result in (96).

Theorem 2

Let p_x denote the PDF of the random mutation x ~ 𝒩 (0, σ²). Let P_Q denote the quality gain CDF with its quantile function given by $P_{Q}^{- 1}$ . For a truncation ratio ϑ = μ/λ with 0 < ϑ < 1 the component-wise progress rate for large populations yields

\lim_{\begin{matrix} μ, λ \to \infty \\ ϑ = const . \end{matrix}} φ_{i} = - \frac{1}{ϑ} \int_{- \infty}^{\infty} x_{i} p_{x} (x_{i}) P_{Q} (P_{Q}^{- 1} (ϑ) | x_{i}) d x_{i} .

(60)

Proof

Starting from Eq. (46) and applying the infinite population size limit, the result of Theorem 1 can be applied with a = 1, p_n(x_i) = x_i, and $f_{x} (t) = P_{Q} (P_{Q}^{- 1} (1 - t) | x_{i})$ . Evaluating f_x(t) at t = 1 − ϑ gives

{f_{x} (t) |}_{t = 1 - ϑ} = {P_{Q} (P_{Q}^{- 1} (1 - t) | x_{i}) |}_{t = 1 - ϑ} = P_{Q} (P_{Q}^{- 1} (ϑ) | x_{i}),

(61)

which yields the result (60).

The next step requires the use of Approximation 1 and Approximation 2 for the quality gain distributions in Eq. (60). To this end, one uses the conditional normal distribution function $Φ (\frac{q - E_{Q | x_{i}}}{D_{i}})$ , see (21), and the inverse transformation q = E_Q + D_Q Φ⁻¹(p) evaluated at p = ϑ, see (18). One obtains

{\tilde{P}}_{Q} ({\tilde{P}}_{Q}^{- 1} (ϑ) ∣ x_{i}) = Φ (\frac{E_{Q} + D_{Q} Φ^{- 1} (ϑ) - E_{Q ∣ x_{i}}}{D_{i}}) .

(62)

Given the normal approximation (62), an expression for E_{Q|x_i} is needed. Using definition (19) with Q_i-result (24) the (conditional) expected value is written as

E_{Q | x_{i}} = Q_{i} (x_{i}) + \sum_{j \neq i} E [Q_{j}] = k_{i} x_{i} + δ_{i} (x_{i}) + E_{i} .

(63)

In (63) the following definitions are introduced as abbreviations

\begin{array}{l} k_{i} := 2 y_{i} \\ δ_{i} (x_{i}) := x_{i}^{2} + A \cos (α y_{i}) (1 - \cos (α x_{i})) + A \sin (α y_{i}) \sin (α x_{i}) \\ E_{i} := \sum_{j \neq i} E [Q_{i}] . \end{array}

(64)

Given Eq. (63), quantity δ(x_i) includes all non-linear terms in x_i. This will be important when the normal CDF is expanded and analytically solved. Inserting relation (63) into (62) and the result into (60) yields

φ_{i} ≃ - \frac{1}{ϑ} \int_{- \infty}^{\infty} x_{i} p_{x} (x_{i}) Φ (\frac{E_{Q} + D_{Q} Φ^{- 1} (ϑ) - (k_{i} x_{i} + δ_{i} (x_{i}) + E_{i})}{D_{i}}) d x_{i} .

(65)

A closed-form solution of (65) cannot be obtained with Φ(δ_i(x_i)) containing non-linear terms in x_i. However, a solution in terms of a Taylor expansion can be provided by introducing the decomposition Φ(g(x_i) + h(x_i)) with g(x_i) being a linear function, and h(x_i) being a small non-linear perturbation according to

g (x_{i}) := - \frac{k_{i}}{D_{i}} x_{i} + \frac{E_{Q_{i}} + D_{Q} Φ^{- 1} (ϑ)}{D_{i}}

(66)

h (x_{i}) := - \frac{δ (x_{i})}{D_{i}} .

(67)

In (66), the abbreviation E_{Q_i} = E_Q − E_i = E[Q_i], cf. Eq. (10), is used to denote the expected value of the i-th summand of the quality gain (6). Using functions g(x_i) and h(x_i) Eq. (65) becomes

φ_{i} ≃ - \frac{1}{ϑ} \int_{- \infty}^{\infty} x_{i} p_{x} (x_{i}) Φ (g (x_{i}) + h (x_{i})) d x_{i} .

(68)

Approximation 3 (Truncated cumulative distribution function series)

Under the assumption of a normally distributed quality gain, see Approximation 1 and Approximation 2, and a quality gain variance scaling with N according to Eq. (13), the CDF of the normal distribution is expanded at g(x_i) in the limit of N → ∞ as

φ_{i} ≃ - \frac{1}{ϑ} \int_{- \infty}^{\infty} x_{i} p_{x} (x_{i}) (Φ (g (x_{i})) + ϕ (g (x_{i})) h (x_{i}) + O (\frac{1}{N})) d x_{i} .

(69)

Relation (69) is derived now. Starting from (68), the Taylor-expansion of Φ(·) up to first order with the remainder denoted by r yields

Φ (g + h) = \sum_{n = 0}^{\infty} \frac{1}{n!} \frac{d^{n} Φ}{d g^{n}} h^{n} = Φ (g) + ϕ (g) h + r (N) .

(70)

Note that all derivatives of the normal distribution exist as $\frac{d^{n} ϕ (x)}{d x^{n}} = (- 1)^{n} {He}_{n} (x) ϕ (x)$ with He_n (x) denoting the n-th order probabilist’s Hermite polynomials. In the following the scaling properties of the remainder as a function of N are investigated. It will be shown that r = O(1/N). To this end, (70) is rewritten as

r (N) = Φ (g + h) - Φ (g) - ϕ (g) h .

(71)

For the further analysis of r(N) the equipartition of components is assumed as introduced in Eqs. (12), (13), and (14). Hence, the variance D_i can be written as a function of N as

D_{i} = s \sqrt{N},

(72)

where the prefactor s ≠ s(N) depends on A, α, y, and σ. With these assumptions the functions g and h are written as (using E := E_{Q_i}, $Φ_{ϑ}^{- 1} : = Φ^{- 1} (ϑ)$ , dropping the subscript i for brevity and using D_i ≃ D_Q)

g = \frac{E - k x}{s \sqrt{N}} + Φ_{ϑ}^{- 1}, h = - \frac{δ}{s \sqrt{N}} .

(73)

As h → 0 for N → ∞, the remainder (71) vanishes accordingly. Therefore, in order to show r(N) = O(1/N), lim_N→∞ r(N)N is investigated applying l’Hôpital’s rule

\lim_{N \to \infty} r (N) N = \lim_{N \to \infty} \frac{r (N)}{1 / N} = \lim_{N \to \infty} \frac{\frac{\partial r (N)}{\partial N}}{\frac{\partial (1 / N)}{\partial N}} = - \lim_{N \to \infty} N^{2} \frac{\partial r (N)}{\partial N} .

(74)

To evaluate (74) the derivative of r from (71) w.r.t. N is evaluated as

\begin{array}{l} \frac{\partial r}{\partial N} & = & \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} {(g + h)}^{2}} (\frac{\partial g}{\partial N} + \frac{\partial h}{\partial N}) - \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} g^{2}} \frac{\partial g}{\partial N} + \frac{g h}{\sqrt{2 π}} e^{- \frac{1}{2} g^{2}} \frac{\partial g}{\partial N} - \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} g^{2}} \frac{\partial h}{\partial N} \\ = & \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} g^{2}} [(e^{- g h - \frac{1}{2} h^{2}} - 1) (\frac{\partial g}{\partial N} + \frac{\partial h}{\partial N}) + g h \frac{\partial g}{\partial N}] . \end{array}

(75)

The term $(e^{- g h - \frac{1}{2} h^{2}} - 1)$ of (75) is expanded up to first order discarding higher orders $O ({(g h + \frac{1}{2} h^{2})}^{2})$

\begin{array}{l} \frac{\partial r}{\partial N} & ≃ & \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} g^{2}} [(- g h - \frac{1}{2} h^{2}) (\frac{\partial g}{\partial N} + \frac{\partial h}{\partial N}) + g h \frac{\partial g}{\partial N}] \\ = & \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} g^{2}} [- \frac{1}{2} h^{2} (\frac{\partial g}{\partial N} + \frac{\partial h}{\partial N}) - g h \frac{\partial h}{\partial N}] . \end{array}

(76)

The derivatives of g and h from Eq. (73) are

\frac{\partial g}{\partial N} = - \frac{E - k x}{2 s N^{3 / 2}}, \frac{\partial h}{\partial N} = \frac{δ}{2 s N^{3 / 2}} .

(77)

Inserting (73) and (77) into (76) yields after refactoring

\begin{array}{l} \frac{\partial r}{\partial N} & ≃ & \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} {(\frac{E - k x}{s \sqrt{N}} + Φ_{ϑ}^{- 1})}^{2}} [- \frac{δ^{2}}{2 s^{2} N} (- \frac{E - k x}{2 s N^{3 / 2}} + \frac{δ}{2 s N^{3 / 2}}) + (\frac{E - k x}{s \sqrt{N}} + Φ_{ϑ}^{- 1}) \frac{δ^{2}}{2 s^{2} N^{2}}] \\ = & \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} {(\frac{E - k x}{s \sqrt{N}} + Φ_{ϑ}^{- 1})}^{2}} (- \frac{δ^{2}}{2 s^{2} N^{2}}) [\frac{δ}{2 s \sqrt{N}} - \frac{3}{2} \frac{E - k x}{s \sqrt{N}} - Φ_{ϑ}^{- 1}] . \end{array}

(78)

Taking the limit (74) of (78) therefore yields

\begin{array}{l} \lim_{N \to \infty} r (N) N = - \lim_{N \to \infty} N^{2} \frac{\partial r (N)}{\partial N} & = & \lim_{N \to \infty} {\frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} {(\frac{E - k x}{s \sqrt{N}} + Φ_{ϑ}^{- 1})}^{2}} \frac{δ^{2}}{2 s^{2}} [\frac{δ}{2 s \sqrt{N}} - \frac{3}{2} \frac{E - k x}{s \sqrt{N}} - Φ_{ϑ}^{- 1}]} \\ = & - \frac{δ^{2} Φ_{ϑ}^{- 1}}{2 \sqrt{2 π} s^{2}} e^{- \frac{1}{2} {(Φ_{ϑ}^{- 1})}^{2}}, \end{array}

(79)

such that the remainder r(N) can be given as

r (N) ≃ - \frac{δ^{2} Φ_{ϑ}^{- 1}}{2 \sqrt{2 π} s^{2}} e^{- \frac{1}{2} {(Φ_{ϑ}^{- 1})}^{2}} \frac{1}{N} = O (\frac{1}{N}),

(80)

which concludes the derivation of (69).

Both integrals of (69) are analytically solvable.² The zeroth order term yields a closed form solution due to g(x_i) being linear w.r.t. x_i and gives progress contributions due to the sphere function, i.e., the linear part of the quality gain (63). The first order term can be solved by applying quadratic completion to the Gaussian product p_x(x_i)ϕ(g(x_i)) yielding an expected value over a normal density. The expected value over h(x_i) can be regarded as a perturbation of the sphere containing A and α dependencies.

The determination of φ_i via (69) was done in [18] by evaluating both integrals. As the derivation and the final result for φ_i are very lengthy and therefore not practical for further analytic treatment, the obtained expression for φ_i was simplified as a last step assuming large dimensionality N. However, the same result as in [18] can be obtained in a quicker way by simplifying the integrands of (69) under the same assumptions before the integration, instead of simplifying the result afterwards. This will enable a more concise derivation of the final progress rate result.

First the functions g and h from (66) and (67), respectively, are simplified. For large N, the quality gain variance D_i ≃ D_Q using (14). As E_{Q_i} is just the quality gain expectation of a single component, it can be neglected compared to D_Q scaling as $\sqrt{N}$ using (13). Hence, one has

g (x_{i}) ≃ - \frac{k_{i} x_{i}}{D_{Q}} + Φ^{- 1} (ϑ)

(81)

h (x_{i}) ≃ - \frac{δ (x_{i})}{D_{Q}} .

(82)

Another approximation is introduced regarding the density p_x(x_i)ϕ(g(x_i)) for the second term of (69). By completing the square one can derive a resulting normal density with mean m and variance ς² by demanding

p_{x} (x_{i}) ϕ (g (x_{i})) = \frac{1}{\sqrt{2 π} σ} e^{- \frac{1}{2} \frac{x_{i}^{2}}{σ^{2}}} \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} g {(x_{i})}^{2}} \overset{!}{=} C e^{- \frac{1}{2} \frac{{(x_{i} - m)}^{2}}{ς^{2}}} .

(83)

Simple calculations yield

m = Φ^{- 1} (ϑ) \frac{D_{Q} k_{i} σ^{2}}{D_{Q}^{2} + k_{i}^{2} σ^{2}}, ς^{2} = \frac{1}{1 / σ^{2} + (k_{i}^{2} / D_{Q}^{2})}, C = \frac{e^{- \frac{1}{2} {[Φ^{- 1} (ϑ)]}^{2}}}{2 π σ} .

(84)

Noting that $D_{Q}^{2} = Θ (N)$ and neglecting contributions of single components for N → ∞, i.e., $k_{i}^{2} ≪ D_{Q}^{2}$ , ${(k_{i} σ)}^{2} ≪ D_{Q}^{2}$ , the quantities m and ς² from (84) yield the asymptotic results

m ≃ 0, ς^{2} ≃ σ^{2},

(85)

such that the density of the first order term yields

p_{x} (x_{i}) ϕ (g (x_{i})) ≃ \frac{e^{- \frac{1}{2} {[Φ^{- 1} (ϑ)]}^{2}}}{\sqrt{2 π}} p_{x} (x_{i}) .

(86)

Using the results from Eqs. (81), (82), and (86), the progress rate integral (69) is further simplified. The prefactors of the resulting integral yield the asymptotic progress coefficient (45)

c_{ϑ} := e_{ϑ}^{1, 0} = \frac{1}{\sqrt{2 π} ϑ} e^{- \frac{1}{2} {[Φ^{- 1} (ϑ)]}^{2}} .

(87)

Approximation 4 (Progress rate integral for large dimensionality)

Based on the result of Approximation 3 only the first two terms are considered. Furthermore, the integrands of (69) are approximated and simplified assuming large dimensionality using Eqs. (81), (82), (86), and (87). Hence, one obtains

φ_{i} ≃ I_{i}^{0} + I_{i}^{1}, with

(88)

I_{i}^{0} := - \frac{1}{ϑ} \int_{- \infty}^{\infty} x_{i} p_{x} (x_{i}) Φ (- \frac{k_{i} x_{i}}{D_{Q}} + Φ^{- 1} (ϑ)) d x_{i}, and

(89)

I_{i}^{1} : = \frac{c_{ϑ}}{D_{Q}} \frac{1}{\sqrt{2 π} σ} \int_{- \infty}^{\infty} x_{i} δ (x_{i}) p_{x} (x_{i}) d x_{i} .

(90)

Calculating $I_{i}^{0}$ from (89) by inserting mutation density p_x(x_i) from (48) and applying the substitution z = x_i/σ, one gets

I_{i}^{0} = - \frac{σ}{\sqrt{2 π} ϑ} \int_{- \infty}^{\infty} z e^{- \frac{1}{2} z^{2}} Φ (- \frac{k_{i} σ}{D_{Q}} z + Φ^{- 1} (ϑ)) d z .

(91)

The following integral identity [5, Eq. (A.12)] can be applied

\int_{- \infty}^{\infty} t e^{- \frac{1}{2} t^{2}} Φ (a t + b) d t = \frac{a}{\sqrt{1 + a^{2}}} \exp [- \frac{1}{2} \frac{b^{2}}{1 + a^{2}}] .

(92)

Evaluating (92) with a = −k_iσ/D_Q and b = Φ⁻¹(ϑ) yields for the right-hand side of (92)

\frac{a}{\sqrt{1 + a^{2}}} \exp [- \frac{1}{2} \frac{b^{2}}{1 + a^{2}}] = - \frac{k_{i} σ}{D_{Q}} \frac{1}{\sqrt{1 + {(k_{i} σ / D_{Q})}^{2}}} \exp [- \frac{1}{2} \frac{{[Φ^{- 1} (ϑ)]}^{2}}{1 + {(k_{i} σ / D_{Q})}^{2}}] .

(93)

Again assuming ${(k_{i} σ)}^{2} ≪ D_{Q}^{2}$ , expression (93) simplifies and the result for (89) is obtained with (87) as

I_{i}^{0} ≃ \frac{e^{- \frac{1}{2} {[Φ^{- 1} (ϑ)]}^{2}}}{\sqrt{2 π} ϑ} \frac{k_{i} σ^{2}}{D_{Q}} = c_{ϑ} \frac{k_{i} σ^{2}}{D_{Q}} .

(94)

Now $I_{i}^{1}$ is solved. One notices that $x_{i} δ (x_{i}) = x_{i} (x_{i}^{2} + A cos (α y_{i}) (1 - cos (α x_{i})) + A sin (α y_{i}) sin (α x_{i})),$ see (64), is integrated over density p_x with zero mean. Therefore, all odd functions of x_i yield no contribution and only the term x_i sin (αx_i) needs to be evaluated. One gets

\begin{array}{l} I_{i}^{1} & ≃ & c_{ϑ} \frac{A \sin (α y_{i})}{D_{Q}} \frac{1}{\sqrt{2 π} σ} \int_{- \infty}^{\infty} x_{i} \sin (α x_{i}) e^{- \frac{1}{2} {(\frac{x_{i}}{σ})}^{2}} d x_{i} \\ = & c_{ϑ} \frac{A \sin (α y_{i})}{D_{Q}} E [x_{i} \sin (α x_{i})] \\ = & c_{ϑ} \frac{A \sin (α y_{i})}{D_{Q}} α σ^{2} e^{- \frac{1}{2} {(α σ)}^{2}} \\ = & c_{ϑ} \frac{d_{i} σ^{2}}{D_{Q}} e^{- \frac{1}{2} {(α σ)}^{2}} . \end{array}

(95)

In the second line of (95) the expected value definition is used. From second to third line the expected value of x_i sin (αx_i) is evaluated using (29). In the last line the derivative d_i = αA sin (αy_i) from (33) is recovered. Using the results from (94) and (95) the first order progress rate approximation for large N and μ can finally be given.

First order progress rate

The first order component-wise progress rate on the Rastrigin function in the asymptotic limits of infinitely large population size μ (constant ϑ = μ/λ) and infinitely large dimensionality N yields

φ_{i} ≃ c_{ϑ} \frac{σ^{2}}{D_{Q}} (k_{i} + e^{- \frac{1}{2} {(α σ)}^{2}} d_{i}) = c_{ϑ} \frac{σ^{2}}{D_{Q}} (2 y_{i} + e^{- \frac{1}{2} {(α σ)}^{2}} α A \sin (α y_{i})) .

(96)

The expressions for $c_{ϑ} = e_{ϑ}^{1, 0}$ from (45) and D_Q from (31) were not inserted to improve readability. Result (96) shows very interesting properties compared to [17, Eq. (26)], where a linearized quality gain approximation resulted in

φ_{i, lin} ≃ c_{μ / μ, λ} \frac{σ^{2}}{\sqrt{{(f_{i}^{'} σ)}^{2} + D_{i}^{2}}} f_{i}^{'} .

(97)

First note that the progress coefficient was replaced by its asymptotic form c_μ/μ,λ ≃ c_ϑ. The difference for the variance terms in the denominators of (96) and (97) is negligible for large N with $D_{Q}^{2} \approx D_{i}^{2} + {(f_{i}^{'} σ)}^{2}$ , see also (14). However, the most notable difference lies between the derivative term $f_{i}^{'} = k_{i} + d_{i}$ , see definition (33), and the newly obtained term $k_{i} + e^{- \frac{1}{2} {(α σ)}^{2}} d_{i}$ . It contains an unchanged sphere-dependent term k_i and an exponentially decaying Rastrigin-specific term d_i. This characteristic form will be discussed in the subsequent part. The result (96) will be essential for the determination of the second order $φ_{i}^{II}$ .

At this point one-generation experiments can be performed and compared to the progress rate (96) to investigate its accuracy. To this end, a random position vector y is initialized isotropically with ‖y‖ = R given some residual distance R. Then, repeated simulations are performed and quantity (7) is averaged over 10⁶ trials. The issue with the choice of R is that the “interesting” region with high density of local minima scales with N, such that a relation R(N) is needed. The following argumentation can be given. Assuming w.l.o.g. y > 0 and that all components of the parental position are at some given local minimum denoted by ŷ^(j). Index j identifies the local attractor along the half-axis, e.g. j ∈ {1, 2, 3} in Fig. 1 on the right side. For N = 1 one has y = [ŷ^(j)] and therefore R² = (ŷ^(j))². Having N components at the same j-th local minimum yields y = [ŷ^(j), ŷ^(j), …, ŷ^(j)], such that R² = N(ŷ^(j))². A scaling $R = O (\sqrt{N})$ is therefore needed to stay within a certain region of local attractors when N is increased.

The progress rates of two exemplary components for a single experiment are shown in Fig. 3. For both plots σ ∈ [0, 1] was chosen in order to investigate the effects of the oscillation as α = 2π. On the left, one observes enhanced progress for moderate σ-values due to local attraction, as both local and global attractor are aligned along the same direction. On the right, there is negative progress for moderate σ, as the local attractor is driving the ES away from the global attractor. For larger σ, the overall spherical shape is dominating and both exhibit positive progress. A decomposition of the progress rate in terms of φ_i = φ_i(d_i, k_i)|_{k_i=0} + φ_i(d_i, k_i)|_{d_i=0} is displayed in Fig. 3. It shows the large-scale behavior of the k_i-term, dashed cyan, and limited range of the d_i-term, dotted green. As $k_{i} = \partial (y_{i}^{2}) / \partial y_{i}$ , its progress term models the global quadratic structure of Rastrigin, see derivative definitions (33). The second term $e^{- \frac{1}{2} {(α σ)}^{2}} d_{i}$ models the Rastrigin-specific local oscillation having limited range depending on the mutation strength σ (or α). By defining scale-invariant mutations using (4) with σ = σ*R/N, the oscillations vanish via $e^{- \frac{1}{2} {(α σ^{*} R / N)}^{2}}$ for large residual distance R, where the sphere function is recovered. This model significantly improves the progress rate formula (97) from [17].

As a note, changing one of the fitness parameters A or α directly affects Fig. 3. The change of amplitude A rescales both the (local) peak and dip heights accordingly, increasing the effects of local attraction for larger A. Increasing frequency α has mostly short-range effects as the overall range is reduced due to suppression via $e^{- \frac{1}{2} {(α σ)}^{2}}$ of (96). In the subsequent parts, the progress rate is investigated for A = 1 and α = 2π as an example.

In Figs. 4 and 5 the progress rate is evaluated over scale-invariant σ* for two different N-values and population sizes. One can see that the approximation quality improves for larger N and μ, as expected from the applied approximations. The overall agreement between simulation and approximation is good for larger and smaller residual distances R, see left and right plots, respectively. The σ*-range was chosen large enough, such that the progress rate of the corresponding sphere function [5, Eq. (6.54)] reaches negative values due to mutations being too large. This boundary directly translates to Rastrigin, as the global structure is the same. However, due to φ_i being first order, no negative progress occurs even for large σ*. Therefore the second order progress rate $φ_{i}^{II}$ needs to be derived in Sec. 4, where loss terms will provide additional correction terms.

Fig. 4 — Progress rate *φ_i* as a function of the normalized mutation σ* for (10/10, 40)-ES with N = 20, A = 1, α = 2π, at two residual distances $R = 10 \sqrt{N}$ with *y_i* = 11.6 (left) and $R = 0.1 \sqrt{N}$ with *y_i* = 0.116 (right). As in Fig. 3, black dots depict the simulation, while the red dash-dotted line shows result (96). The error bars are very small and therefore not visible.

Fig. 5 — Progress rate *φ_i* as a function of the normalized mutation σ* for (100/100, 200)-ES with N = 100, A = 1, α = 2π, at two residual distances $R = 10 \sqrt{N}$ with *y_i* = 11.9 (left) and $R = 0.1 \sqrt{N}$ with *y_i* = 0.119 (right). The approximation quality improves compared to Fig. 4 and shows very good agreement.

4. Second order progress rate

The second order progress rate (8) requires the evaluation of $E [{(y_{i}^{(g + 1)})}^{2}] .$ Starting with intermediate result (34) and referring to the i-th component, the expression yields after squaring

\begin{array}{l} {(y_{i}^{(g + 1)})}^{2} & = & {(y_{i}^{(g)} + \frac{1}{μ} \sum_{m = 1}^{μ} x_{m; λ})}^{2} \\ = & {(y_{i}^{(g)})}^{2} + 2 y_{i}^{(g)} \frac{1}{μ} \sum_{m = 1}^{μ} x_{m; λ} + \frac{1}{μ^{2}} {(\sum_{m = 1}^{μ} x_{m; λ})}^{2} . \end{array}

(98)

Squaring the last term can be evaluated by separating the sum into equal and unequal indices

\begin{array}{l} {(\sum_{m = 1}^{μ} x_{m; λ})}^{2} & = & (\sum_{k = 1}^{μ} x_{k; λ}) (\sum_{l = 1}^{μ} x_{l; λ}) = \sum_{m = 1}^{μ} {(x_{m; λ})}^{2} + \sum_{k \neq l} x_{k; λ} x_{l; λ} \\ = & \sum_{m = 1}^{μ} {(x_{m; λ})}^{2} + 2 \sum_{l = 2}^{μ} \sum_{k = 1}^{l - 1} x_{k; λ} x_{l; λ} . \end{array}

(99)

Inserting (99) into (98) and taking the expected value (conditional variables y^(g) and σ^(g) are implicitly assumed to be given) yields

E [{(y_{i}^{(g + 1)})}^{2}] = {(y_{i}^{(g)})}^{2} + 2 y_{i}^{(g)} \frac{1}{μ} \sum_{m = 1}^{μ} E [x_{m; λ}] + \frac{1}{μ^{2}} \sum_{m = 1}^{μ} E [{(x_{m; λ})}^{2}] + \frac{2}{μ^{2}} \sum_{l = 2}^{μ} \sum_{k = 1}^{l - 1} E [x_{k; λ} x_{l; λ}] .

(100)

Noting that $φ_{i} = - \frac{1}{μ} \sum_{m = 1}^{μ} E [x_{m; λ}],$ see Eq. (36), and using (100) in $φ_{i}^{II}$ -definition (8) yields the second order i-th component progress rate

φ_{i}^{II} = 2 y_{i}^{(g)} φ_{i} - \frac{1}{μ^{2}} E^{(2)} - \frac{2}{μ^{2}} E^{(1, 1)},

(101)

for which the two following expected values need to be determined

\frac{1}{μ^{2}} E^{(2)} : = \frac{1}{μ^{2}} \sum_{m = 1}^{μ} E [{(x_{m; λ})}^{2}]

(102)

\frac{1}{μ^{2}} E^{(1, 1)} := \frac{1}{μ^{2}} \sum_{l = 2}^{μ} \sum_{k = 1}^{l - 1} E [x_{k; λ} x_{l; λ}] .

(103)

In the subsequent parts the solutions to Eqs. (102) and (103) will be derived. Starting with (102), the solution requires order statistic density (50) for the m-th individual, large population identity (37), and the expansion of the normal CDF (69) up to first order. The resulting two integrals can then be solved analytically for large N and the results will simplify significantly.

Proposition 2

Let μ,λ ∈ ℕ with μ ≥ 1 and μ < λ and let p_x denote the PDF of the random mutation x ~ 𝒩 (0, σ²). Let x_m;λ denote the m-th best value (out of λ) of the i-th mutation component (x_m;λ)_i. Furthermore, let P_Q and $P_{Q}^{- 1}$ denote the quality gain CDF (and its inverse), respectively, with B denoting the beta function. Then, the second order expected value reads

\frac{1}{μ} \sum_{m = 1}^{μ} E [{(x_{m; λ})}^{2}] = \frac{λ}{μ} \int_{x_{i} = - \infty}^{x_{i} = \infty} x_{i}^{2} p_{x} (x_{i}) \frac{1}{B (λ - μ, μ)} \int_{t = 0}^{t = 1} t^{λ - μ - 1} (1 - t)^{μ - 1} P_{Q} (P_{Q}^{- 1} (1 - t) | x_{i}) d t d x_{i} .

(104)

Proof

Starting from (102) and rewriting the expected value as an integral over order statistic density p_m;λ(x_i) yields

\frac{1}{μ} \sum_{m = 1}^{μ} E [{(x_{m; λ})}^{2}] = \frac{1}{μ} \sum_{m = 1}^{μ} \int_{- \infty}^{\infty} x_{i}^{2} p_{m; λ} (x_{i}) d x_{i} .

(105)

Both (47) and (105) have the same structure after inserting p_m;λ(x_i) from (50) and the integration over the squared mutation component is performed as the last step. The same steps as presented in the proof of Proposition 1 can therefore be applied with squared quantity $x_{i}^{2}$ , which directly gives the result (104).

Analogously to the derivation of the first order progress rate in Sec. 3, a closed-form solution for (104) can only be provided by first applying the limit of large populations and then introducing approximations assuming large dimensionality N.

Theorem 3

Let p_x denote the PDF of the random mutation x ~ 𝒩 (0, σ²) and let x_m;λ denote the m-th best value (out of λ) of the i-th mutation component (x_m;λ)_i. Let P_Q denote the quality gain CDF with its quantile function given by $P_{Q}^{- 1}$ . For a truncation ratio ϑ the limit of the second order expected value reads

\lim_{\begin{matrix} μ, λ \to \infty \\ ϑ = const . \end{matrix}} \frac{1}{μ} \sum_{m = 1}^{μ} E [{(x_{m; λ})}^{2}] = \frac{1}{ϑ} \int_{- \infty}^{\infty} x_{i}^{2} p_{x} (x_{i}) P_{Q} (P_{Q}^{- 1} (ϑ) | x_{i}) d x_{i} .

(106)

Proof

Starting from Eq. (104) and applying the infinite population size limit, the result of Theorem 1 can be applied with a = 1, $p_{n} (x_{i}) = x_{i}^{2}, and f_{x} (t) |_{t = 1 - ϑ} = P_{Q} (P_{Q}^{- 1} (ϑ) | x_{i}),$ which yields the result (106).

Given result (106), approximations are again applied to provide closed-form solutions. Inserting quality gain Approximation 1 and Approximation 2 via Eq. (62) into (106) leads (again) to an analytically not solvable integral due to non-linear terms in x_i within Φ(·). Therefore, the CDF is expanded using Approximation 3 neglecting higher order terms O(1/N). Finally, the integrands are simplified assuming large dimensionality using Approximation 4. The result is therefore given after inserting g(x_i) and h(x_i) from (81) and (82) as

\frac{1}{μ^{2}} E^{(2)} ≃ I_{i}^{0} + I_{i}^{1}, with

(107)

I_{i}^{0} := \frac{1}{μ ϑ} \int_{- \infty}^{\infty} x_{i}^{2} p_{x} (x_{i}) Φ (- \frac{k_{i} x_{i}}{D_{Q}} + Φ^{- 1} (ϑ)) d x_{i}, and

(108)

I_{i}^{1} := - \frac{c_{ϑ}}{μ D_{Q}} \frac{1}{\sqrt{2 π} σ} \int_{- \infty}^{\infty} x_{i}^{2} δ (x_{i}) p_{x} (x_{i}) d x_{i} .

(109)

The two integrals abbreviated as $I_{i}^{0}$ and $I_{i}^{1}$ are evaluated now. For $I_{i}^{0}$ , the substitution z = x_i/σ is introduced

I_{i}^{0} = \frac{σ^{2}}{\sqrt{2 π} μ ϑ} \int_{- \infty}^{\infty} z^{2} e^{- \frac{1}{2} z^{2}} Φ (- \frac{k_{i} σ z}{D_{Q}} + Φ^{- 1} (ϑ)) d z .

(110)

The following integral identity [16] is applied for real parameters a and b

\frac{1}{\sqrt{2 π}} \int_{- \infty}^{\infty} t^{2} e^{- \frac{1}{2} t^{2}} Φ (a t + b) d t = Φ (\frac{b}{{(1 + a^{2})}^{1 / 2}}) - \frac{1}{\sqrt{2 π}} \frac{a^{2} b}{{(1 + a^{2})}^{3 / 2}} e^{- \frac{1}{2} \frac{b^{2}}{1 + a^{2}}} .

(111)

Evaluating (111) with a = −k_iσ/D_Q, b = Φ⁻¹(ϑ) from (108) yields for the right-hand side of (111)

\begin{array}{l} Φ & (\frac{b}{{(1 + a^{2})}^{1 / 2}}) - \frac{1}{\sqrt{2 π}} \frac{a^{2} b}{{(1 + a^{2})}^{3 / 2}} e^{- \frac{1}{2} \frac{b^{2}}{1 + a^{2}}} \\ = Φ (\frac{Φ^{- 1} (ϑ)}{{(1 + {(k_{i} σ)}^{2} / D_{Q}^{2})}^{1 / 2}}) - \frac{1}{\sqrt{2 π}} \frac{{(k_{i} σ)}^{2} Φ^{- 1} (ϑ)}{D_{Q}^{2} {(1 + {(k_{i} σ)}^{2} / D_{Q}^{2})}^{3 / 2}} e^{- \frac{1}{2} \frac{{[Φ^{- 1} (ϑ)]}^{2}}{1 + {(k_{i} σ)}^{2} / D_{Q}^{2}}} . \end{array}

(112)

Assuming ${(k_{i} σ)}^{2} ≪ D_{Q}^{2}$ for large N further simplifies (112) and one obtains the result

I_{i}^{0} ≃ \frac{σ^{2}}{μ} [1 - Φ^{- 1} (ϑ) [\frac{e^{- \frac{1}{2} {[Φ^{- 1} (ϑ)]}^{2}}}{\sqrt{2 π} ϑ}] \frac{{(k_{i} σ)}^{2}}{D_{Q}^{2}}] .

(113)

For (113) the asymptotic generalized progress coefficient definition $e_{ϑ}^{1, 1}$ from (45) can be applied with parameters a = 1 and b = 1

e_{ϑ}^{1, 1} = - Φ^{- 1} (ϑ) [\frac{e^{- \frac{1}{2} {[Φ^{- 1} (ϑ)]}^{2}}}{\sqrt{2 π} ϑ}] .

(114)

This leads to following result for the first integral $I_{i}^{0}$

I_{i}^{0} ≃ \frac{σ^{2}}{μ} [1 + e_{ϑ}^{1, 1} \frac{{(k_{i} σ)}^{2}}{D_{Q}^{2}}] .

(115)

Second integral $I_{i}^{1}$ from (109) is expressed using expected values over the normal density p_x of the terms given by $x_{i}^{2} δ (x_{i}) .$ With δ(x_i) given in Eq. (64) one gets

I_{i}^{1} ≃ - \frac{c_{ϑ}}{μ D_{Q}} (E [x_{i}^{4}] + A \sin (α y_{i}) E [x_{i}^{2} \sin (α x_{i})] + A \cos (α y_{i}) E [x_{i}^{2}] - A \cos (α y_{i}) E [x_{i}^{2} \cos (α x_{i})]) .

(116)

One has $E [x_{i}^{4}] = 3 σ^{4}$ and $E [x_{i}^{2}] = σ^{2}$ . Using results from (29) the remaining expected values read

E [x_{i}^{2} \sin (α x_{i})] = 0, E [x_{i}^{2} \cos (α x_{i})] = (σ^{2} - α^{2} σ^{4}) e^{- \frac{1}{2} {(α σ)}^{2}} .

(117)

Therefore, one gets

I_{i}^{1} ≃ - \frac{c_{ϑ} σ^{2}}{μ D_{Q}} [3 σ^{2} + A \cos (α y_{i}) (1 - e^{- \frac{1}{2} {(α σ)}^{2}} + α^{2} σ^{2} e^{- \frac{1}{2} {(α σ)}^{2}})] .

(118)

Collecting the results (115) and (118) with k_i = 2y_i and inserting them back into (107) the expected value finally reads

\frac{1}{μ^{2}} E^{(2)} ≃ \frac{σ^{2}}{μ} {1 + e_{ϑ}^{1, 1} \frac{{(2 y_{i})}^{2} σ^{2}}{D_{Q}^{2}} - \frac{c_{ϑ}}{D_{Q}} [3 σ^{2} + A \cos (α y_{i}) (1 - e^{- \frac{1}{2} {(α σ)}^{2}} + α^{2} σ^{2} e^{- \frac{1}{2} {(α σ)}^{2}})]} .

(119)

The solution of the second expected value $\frac{1}{μ^{2}} E^{(1, 1)}$ from (103) is presented now. First an exact integral is derived. Then, approximations are applied to give closed-form solutions.

Proposition 3

Let μ, λ ∈ ℕ with μ ≥ 1 and μ < λ and let p_x denote the PDF of the random mutation x ~ 𝒩 (0, σ²). Let x_k;λ denote the k-th best value (out of λ) of the i-th mutation component (x_k;λ)_i. Furthermore, let P_Q and $P_{Q}^{- 1}$ denote the quality gain CDF (and its inverse), respectively, with B denoting the beta function. Then, the second order expected value reads

\begin{array}{l} \frac{1}{μ^{2}} \sum_{l = 2}^{μ} \sum_{k = 1}^{l - 1} & E [x_{k; λ} x_{l; λ}] = \frac{1}{2} \frac{λ}{μ} \frac{μ - 1}{μ} \int_{- \infty}^{\infty} x_{1} p_{x} (x_{1}) \int_{- \infty}^{\infty} x_{2} p_{x} (x_{2}) \\ \times (\frac{1}{B (λ - μ, μ)} \int_{0}^{1} t^{λ - μ - 1} {(1 - t)}^{μ - 2} P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{1}) P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{2}) d t) d x_{2} d x_{1} \end{array}

(120)

Proof

First, a joint order statistic density has to be derived for the expected value. Then, the double sum is converted into a single integral using a known identity. The resulting five-fold integration is restructured by exchanging bounds and then successively solved.

Starting with (103), the double sum includes mixed contributions from the k-th and l-th best elements of the i-th mutation component. To avoid confusion with the summation indices k and l, the integration variables associated with k-th element will be denoted as x₁ (mutation) and q₁ (quality), while the l-th element is integrated over x₂ and q₂. The ordering 1 ≤ k < l ≤ λ is assumed with k yielding a smaller (better) quality value q₁ < q₂. Additionally, the joint probability density p_k,l;λ(x₁, x₂) is needed, such that the expected value can be formulated as

\frac{1}{μ^{2}} E^{(1, 1)} = \frac{1}{μ^{2}} \sum_{l = 2}^{μ} \sum_{k = 1}^{l - 1} \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} x_{1} x_{2} p_{k, l; λ} (x_{1}, x_{2}) d x_{2} d x_{1} .

(121)

The mutation densities are independent and denoted by p_x(x₁) and p_x(x₂), respectively. Given mutation components x₁ and x₂, the conditional density obtaining the quality values q₁ and q₂ is p_Q (q₁|x₁) and p_Q (q₂|x₂), respectively. Given q₁ and q₂, one has k − 1 values smaller than q₁, l − k − 1 values between q₁ and q₂ and λ − l values larger than q₂ with probabilities

\begin{array}{l} \Pr {Q \leq q_{1}}^{k - 1} & = & P_{Q} {(q_{1})}^{k - 1} \\ \Pr {q_{1} \leq Q \leq q_{2}}^{l - k - 1} & = & {[P_{Q} (q_{2}) - P_{Q} (q_{1})]}^{l - k - 1} \\ \Pr {Q > q_{2}}^{λ - 1} & = & {[1 - P_{Q} (q_{2})]}^{λ - l}, \end{array}

(122)

and P_Q (q) denoting the quality gain CDF. The joint probability density can therefore be written as

\begin{array}{l} p_{k, l; λ} (x_{1}, x_{2}) & = & p_{x} (x_{1}) p_{x} (x_{2}) \int_{q_{\min}}^{\infty} p_{Q} (q_{1} | x_{1}) \int_{q_{1}}^{\infty} p_{Q} (q_{2} | x_{2}) \\ \times λ! \frac{P_{Q} {(q_{1})}^{k - 1} {[P_{Q} (q_{2}) - P_{Q} (q_{1})]}^{l - k - 1} {[1 - P_{Q} (q_{2})]}^{λ - l}}{(k - 1)! (l - k - 1)! (λ - l)!} d q_{2} d q_{1}, \end{array}

(123)

with integration ranges q_min ≤ q₁ < ∞ and q₁ < q₂ < ∞ as k < l. Lower bound q_min denotes the smallest possible quality value, which is resolved later. The factorials exclude the irrelevant combinations among the three groups given in (122). Plugging (123) into (121) and moving the sum into the innermost integral gives

\begin{array}{l} \frac{1}{μ^{2}} E^{(1, 1)} = \frac{λ!}{μ^{2}} & \int_{- \infty}^{\infty} x_{1} p_{x} (x_{1}) \int_{- \infty}^{\infty} x_{2} p_{x} (x_{2}) \int_{q_{\min}}^{\infty} p_{Q} (q_{1} | x_{1}) \int_{q_{1}}^{\infty} p_{Q} (q_{2} | x_{2}) \\ \times \sum_{l = 2}^{μ} \sum_{k = 1}^{l - 1} \frac{P_{Q} {(q_{1})}^{k - 1} {[P_{Q} (q_{2}) - P_{Q} (q_{1})]}^{l - k - 1} {[1 - P_{Q} (q_{2})]}^{λ - l}}{(k - 1)! (l - k - 1)! (λ - l)!} d q_{2} d q_{1} d x_{2} d x_{1} . \end{array}

(124)

The double sum of (124) over the P_Q -values will be expressed by an integral. This can be done using an identity from [4, p. 113]. Setting ν = 2 and identifying the indices as i₁ = l and i₂ = k, the identity yields

\sum_{l = 2}^{μ} \sum_{k = 1}^{l - 1} \frac{Q_{1}^{λ - l} {[Q_{2} - Q_{1}]}^{l - k - 1} {[1 - Q_{2}]}^{k - 1}}{(λ - l)! (l - k - 1)! (k - 1)!} = \frac{1}{(λ - μ - 1)! (μ - 2)!} \int_{0}^{Q_{1}} t^{λ - μ - 1} {(1 - t)}^{μ - 2} d t,

(125)

for real values Q₁ and Q₂, with integers ν ≤ μ < λ. Now the substitution Q₁ = 1 − P_Q (q₂), Q₂ = 1 − P_Q (q₁) can be performed and the double sum of (124) can be recognized by comparing with (125). Applying the identity therefore yields

\sum_{l = 2}^{μ} \sum_{k = 1}^{l - 1} \frac{{[1 - P_{Q} (q_{2})]}^{λ - l} {[P_{Q} (q_{2}) - P_{Q} (q_{1})]}^{l - k - 1} {[P_{Q} (q_{1})]}^{k - 1}}{(λ - l)! (l - k - 1)! (k - 1)!} = \frac{1}{(λ - μ - 1)! (μ - 2)!} \int_{0}^{1 - P_{Q} (q_{2})} t^{λ - μ - 1} (1 - t)^{μ - 2} d t .

(126)

Hence, Eq. (124) is expressed as

\begin{array}{l} \frac{1}{μ^{2}} E^{(1, 1)} & = & \frac{λ!}{μ^{2}} \frac{1}{(λ - μ - 1)! (μ - 2)!} \int_{- \infty}^{\infty} x_{1} p_{x} (x_{1}) \int_{- \infty}^{\infty} x_{2} p_{x} (x_{2}) \\ \times \int_{q_{\min}}^{\infty} p_{Q} (q_{1} ∣ x_{1}) \int_{q_{1}}^{\infty} p_{Q} (q_{2} ∣ x_{2}) \int_{0}^{1 - P_{Q} (q_{2})} t^{λ - μ - 1} {(1 - t)}^{μ - 2} d t d q_{2} d q_{1} d x_{2} d x_{1} . \end{array}

(127)

The prefactor of Eq. (127) can be evaluated as

\frac{λ!}{μ^{2}} \frac{1}{(λ - μ - 1)! (μ - 2)!} = \frac{λ (λ - 1)! (μ - 1)}{μ^{2} (λ - μ - 1)! (μ - 1)!} = \frac{1}{ϑ} \frac{μ - 1}{μ} \frac{1}{B (λ - μ, μ)} .

(128)

Now the integration order will be exchanged twice in (127). First the order between t and q₂ is exchanged. Then the order between t and q₁ is exchanged, such that both q-integrations are performed before the t-integration enabling the application of the large population identity (37). Starting with integration bounds

q_{1} \leq q_{2} < \infty, 0 \leq t \leq 1 - P_{Q} (q_{2}),

(129)

and using the inverse function $P_{Q}^{- 1}$ with $q_{2} = P_{Q}^{- 1} (1 - t)$ the exchanged bounds between t and q₂ are

0 \leq t \leq 1 - P_{Q} (q_{1}), q_{1} \leq q_{2} \leq P_{Q}^{- 1} (1 - t) .

(130)

Using factor (128) and exchanged bounds (130), the expression (127) is reformulated as

\begin{array}{l} \frac{1}{μ^{2}} E^{(1, 1)} & = & \frac{1}{ϑ} \frac{μ - 1}{μ} \frac{1}{B (λ - μ, μ)} \int_{- \infty}^{\infty} x_{1} p_{x} (x_{1}) \int_{- \infty}^{\infty} x_{2} p_{x} (x_{2}) \\ \times \int_{q_{\min}}^{\infty} p_{Q} (q_{1} ∣ x_{1}) \int_{0}^{1 - P_{Q} (q_{1})} t^{λ - μ - 1} {(1 - t)}^{μ - 2} \int_{q_{1}}^{P_{Q}^{- 1} (1 - t)} p_{Q} (q_{2} ∣ x_{2}) d q_{2} d t d q_{1} d x_{2} d x_{1} . \end{array}

(131)

Now the integration order between t and q₁ is exchanged starting from

q_{\min} \leq q_{1} < \infty, 0 \leq t \leq 1 - P_{Q} (q_{1}),

(132)

yielding exchanged bounds

0 \leq t \leq 1, q_{\min} \leq q_{1} \leq P_{Q}^{- 1} (1 - t) .

(133)

Therefore, one arrives at the following integral to be solved (beta function has been moved inside as it will be evaluated during the t-integration)

\begin{array}{l} \frac{1}{μ^{2}} E^{(1, 1)} & = & \frac{1}{ϑ} \frac{μ - 1}{μ} \int_{- \infty}^{\infty} x_{1} p_{x} (x_{1}) \int_{- \infty}^{\infty} x_{2} p_{x} (x_{2}) \\ \times (\frac{1}{B (λ - μ, μ)} \int_{0}^{1} t^{λ - μ - 1} {(1 - t)}^{μ - 2} \\ \times [\int_{q_{\min}}^{P_{Q}^{- 1} (1 - t)} p_{Q} (q_{1} ∣ x_{1}) {\int_{q_{1}}^{P_{Q}^{- 1} (1 - t)} p_{Q} (q_{2} ∣ x_{2}) d q_{2}} d q_{1}] d t) d x_{2} d x_{1} . \end{array}

(134)

Now the integrals in (134) will be successively solved. Starting with integral {·} over q₂ one has

\int_{q_{1}}^{P_{Q}^{- 1} (1 - t)} p_{Q} (q_{2} ∣ x_{2}) d q_{2} = {[P_{Q} (q_{2} ∣ x_{2})]}_{q_{1}}^{P_{Q}^{- 1} (1 - t)} = P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{2}) - P_{Q} (q_{1} ∣ x_{2}) .

(135)

The q₁-integration within [·] using (135) yields

\int_{q_{\min}}^{P_{Q}^{- 1} (1 - t)} p_{Q} (q_{1} ∣ x_{1}) (P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{2}) - P_{Q} (q_{1} ∣ x_{2})) d q_{1}

(136)

= P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{2}) \int_{q_{\min}}^{P_{Q}^{- 1} (1 - t)} p_{Q} (q_{1} ∣ x_{1}) d q_{1}

(137)

- \int_{q_{\min}}^{P_{Q}^{- 1} (1 - t)} p_{Q} (q_{1} ∣ x_{1}) P_{Q} (q_{1} ∣ x_{2}) d q_{1} .

(138)

First integral (137) is easily evaluated, as the conditional density is integrated over its support giving

\begin{array}{l} P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{2}) \int_{q_{\min}}^{P_{Q}^{- 1} (1 - t)} p_{Q} (q_{1} ∣ x_{1}) d q_{1} & = & P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{2}) {[P_{Q} (q_{1} ∣ x_{1})]}_{q_{\min}}^{P_{Q}^{- 1} (1 - t)} \\ = & P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{2}) P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{1}), \end{array}

(139)

with P_Q (q_min|x₁) = Pr{Q ≤ q_min|x₁} = 0. Note that the resulting factors are equal up to the conditional variables x₁ and x₂.

The second integral (138) will be simplified using integration by parts. Thereafter, one can exchange the x₁ and x₂ variables to find a simpler expression for the original integral. Integration by parts yields

\int_{q_{\min}}^{P_{Q}^{- 1} (1 - t)} p_{Q} (q_{1} ∣ x_{1}) P_{Q} (q_{1} ∣ x_{2}) d q_{1} = P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{1}) P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{2}) - \int_{q_{\min}}^{P_{Q}^{- 1} (1 - t)} P_{Q} (q_{1} ∣ x_{1}) p_{Q} (q_{1} ∣ x_{2}) d q_{1} .

(140)

Equation (140) inserted into (134) has to be integrated over x₁ and x₂, of which the order can be exchanged. For the following step the t-integration and the prefactors of (134) have no influence, such that they are dropped for better readability. Integrating both sides of (140) yields

\begin{array}{l} \int_{- \infty}^{\infty} x_{1} p_{x} (x_{1}) \int_{- \infty}^{\infty} x_{2} p_{x} (x_{2}) \int_{q_{\min}}^{P_{Q}^{- 1} (1 - t)} p_{Q} (q_{1} ∣ x_{1}) P_{Q} (q_{1} ∣ x_{2}) d q_{1} d x_{2} d x_{1} \\ = \int_{- \infty}^{\infty} x_{1} p_{x} (x_{1}) \int_{- \infty}^{\infty} x_{2} p_{x} (x_{2}) P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{1}) P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{2}) d x_{2} d x_{1} \\ - \int_{- \infty}^{\infty} x_{2} p_{x} (x_{2}) \int_{- \infty}^{\infty} x_{1} p_{x} (x_{1}) \int_{q_{\min}}^{P_{Q}^{- 1} (1 - t)} P_{Q} (q_{1} ∣ x_{2}) p_{Q} (q_{1} ∣ x_{1}) d q_{1} d x_{1} d x_{2}, \end{array}

(141)

where in the last line the integration order of x₁ and x₂ was exchanged, such that an expression equivalent to the left-hand side of (141) is obtained with given arguments for p_Q and P_Q. Collecting the terms, Eq. (141) can be formulated as

\int_{- \infty}^{\infty} x_{1} p_{x} (x_{1}) \int_{- \infty}^{\infty} x_{2} p_{x} (x_{2}) \int_{q_{\min}}^{P_{Q}^{- 1} (1 - t)} p_{Q} (q_{1} ∣ x_{1}) P_{Q} (q_{1} ∣ x_{2}) d q_{1} d x_{2} d x_{1} = \frac{1}{2} \int_{- \infty}^{\infty} x_{1} p_{x} (x_{1}) \int_{- \infty}^{\infty} x_{2} p_{x} (x_{2}) P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{1}) P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{2}) d x_{2} d x_{1} .

(142)

Noting that the right-hand side of result (142) is one half of the first integration result (139) after x-integration and noting the minus sign in (138), one gets for (136) the expression

\begin{array}{l} \int_{- \infty}^{\infty} x_{1} p_{x} (x_{1}) \int_{- \infty}^{\infty} x_{2} p_{x} (x_{2}) \int_{q_{\min}}^{P_{Q}^{- 1} (1 - t)} p_{Q} (q_{1} ∣ x_{1}) (P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{2}) - P_{Q} (q_{1} ∣ x_{2})) d q_{1} d x_{2} d x_{1} \\ = \int_{- \infty}^{\infty} x_{1} p_{x} (x_{1}) \int_{- \infty}^{\infty} x_{2} p_{x} (x_{2}) (1 - \frac{1}{2}) P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{1}) P_{Q} (P_{Q}^{- 1} (1 - t) ∣ x_{2}) d x_{2} d x_{1} . \end{array}

(143)

Inserting the results of (143) back into [·] of (134) and including all prefactors, the five-fold integral simplifies providing the desired result of Eq. (120).

Theorem 4

Let p_x denote the density of the i-th component mutation x ~ 𝒩 (0, σ²) and let x_k;λ denote the k-th best value (out of λ) of the i-th mutation component (x_k;λ)_i. Let P_Q denote the quality gain CDF with its quantile function given by $P_{Q}^{- 1} .$ For a truncation ratio ϑ the limit of the second order expected value reads

\lim_{\begin{matrix} μ, λ \to \infty \\ ϑ = const . \end{matrix}} \frac{1}{μ (μ - 1)} \sum_{l = 2}^{μ} \sum_{k = 1}^{l - 1} E [x_{k; λ} x_{l; λ}] = \frac{1}{2} {[\frac{1}{ϑ} \int_{- \infty}^{\infty} x_{i} p_{x} (x_{i}) P_{Q} (P_{Q}^{- 1} (ϑ) ∣ x_{i}) d x_{i}]}^{2} .

(144)

Proof

Starting from Eq. (120) the μ-dependent prefactor was rearranged in a way that the factor (μ − 1)/μ in (120) is retained in the final result. Formally one could include (μ − 1)/μ in the sequence (38) and take the limit. However, it is desirable to keep the factor in the progress rate as a correction for finite μ-values. As a next step, one can define $f_{x} (t) = P_{Q} (P_{Q}^{- 1} (1 - t) | x_{1}) P_{Q} (P_{Q}^{- 1} (1 - t) | x_{2}) .$ As 0 ≤ f_x(t) ≤ 1 the same bound estimation as in (43) holds. Furthermore, both mutation integrals over density p_x are finite, see also (44). Therefore, the limit is evaluated with $f_{x} (t) |_{t = 1 - ϑ} = P_{Q} (P_{Q}^{- 1} (ϑ) | x_{1}) P_{Q} (P_{Q}^{- 1} (ϑ) | x_{2})$ and a = 2 as

\begin{array}{l} \lim_{\begin{matrix} μ, λ \to \infty \\ ϑ = const . \end{matrix}} \frac{1}{μ (μ - 1)} E^{(1, 1)} & = & \frac{1}{2} \frac{1}{ϑ^{2}} \int_{- \infty}^{\infty} x_{1} p_{x} (x_{1}) \int_{- \infty}^{\infty} x_{2} p_{x} (x_{2}) P_{Q} (P_{Q}^{- 1} (ϑ) ∣ x_{1}) P_{Q} (P_{Q}^{- 1} (ϑ) ∣ x_{2}) d x_{2} d x_{1} \\ = & \frac{1}{2} \frac{1}{ϑ^{2}} \int_{- \infty}^{\infty} x_{1} p_{x} (x_{1}) P_{Q} (P_{Q}^{- 1} (ϑ) ∣ x_{1}) d x_{1} \int_{- \infty}^{\infty} x_{2} p_{x} (x_{2}) P_{Q} (P_{Q}^{- 1} (ϑ) ∣ x_{2}) d x_{2} \\ = & \frac{1}{2} {[\frac{1}{ϑ} \int_{- \infty}^{\infty} x_{i} p_{x} (x_{i}) P_{Q} (P_{Q}^{- 1} (ϑ) ∣ x_{i}) d x_{i}]}^{2}, \end{array}

(145)

with x_i re-introduced in the last line to denote the i-th mutation component, which gives Eq. (144).

In [·] of result (144), one can identify the first order progress rate −φ_i within the large population limit derived in Eq. (60). Refactoring (144) to obtain $\frac{1}{μ^{2}} E^{(1, 1)},$ one can insert the φ_i-approximation from (96). Noting that $c_{ϑ}^{2} = e_{ϑ}^{2, 0}$ via (45), one gets

\begin{array}{l} \frac{1}{μ^{2}} E^{(1, 1)} & ≃ & \frac{1}{2} \frac{μ - 1}{μ} φ_{i}^{2} \\ ≃ & \frac{1}{2} \frac{μ - 1}{μ} e_{ϑ}^{2, 0} \frac{σ^{4}}{D_{Q}^{2}} {(2 y_{i} + e^{- \frac{1}{2} {(α σ)}^{2}} α A \sin (α y_{i}))}^{2} . \end{array}

(146)

Finally, inserting the results from (119) and (146) into (101), one obtains the second order progress rate

\begin{array}{l} φ_{i}^{II} ≃ c_{ϑ} \frac{σ^{2}}{D_{Q}} (4 y_{i}^{2} + e^{- \frac{1}{2} {(α σ)}^{2}} 2 α A y_{i} \sin (α y_{i})) \\ - \frac{σ^{2}}{μ} {1 + e_{ϑ}^{1, 1} \frac{{(2 y_{i})}^{2} σ^{2}}{D_{Q}^{2}} - \frac{c_{ϑ}}{D_{Q}} [3 σ^{2} + A \cos (α y_{i}) (1 - e^{- \frac{1}{2} {(α σ)}^{2}} + α^{2} σ^{2} e^{- \frac{1}{2} {(α σ)}^{2}})] \\ + (μ - 1) e_{ϑ}^{2, 0} \frac{σ^{2}}{D_{Q}^{2}} {(2 y_{i} + e^{- \frac{1}{2} {(α σ)}^{2}} α A \sin (α y_{i}))}^{2}}, \end{array}

(147)

which serves as an approximation in the asymptotic limit of infinitely large dimensionality and population size. However, experimental investigations will also show good agreement for finite N, μ, and λ.

For future investigations of the convergence and step-size adaptation properties of the (μ/μ_I,λ)-ES, a simpler expression than (147) is needed. To this end, the N-dependency of the terms within {·} of (147) is investigated. It will be shown that for N → ∞ and μ = o (N) only the term −σ²/μ yields relevant contributions. The relevant terms in {·} of Eq. (147) are abbreviated according to their respective factors as $e_{ϑ}^{1, 1}, c_{ϑ} / D_{Q}$ , c_ϑ/D_Q and $e_{ϑ}^{2, 0}$ . In order to maximize the absolute value of the individual terms a lower bound for $D_{Q}^{2}$ is needed. Given the form of $D_{Q}^{2}$ from Eq. (31), no useful lower bound for the variance could be established satisfying $D_{Q}^{2}$ > 0 for any y_i due to the trigonometric terms. Therefore, we will restrict the analysis to the sphere limit case A → 0. This assumption might seem crude. However, the most important characteristics are already contained in the first φ_i-dependent term of (147) referred to as the gain term in sphere model theory [5]. On the other hand, the loss terms in {·} are mostly dominated by the first term −σ²/μ. Experiments will affirm this assumption.

As the $φ_{i}^{II}$ -approximation shall be valid for a constant σ* given any R-value, the mutation strength is re-normalized using (4)

σ = \frac{σ^{*} R}{N} .

(148)

Setting A = 0, σ = σ*R/N, and $\sum_{i} y_{i}^{2} = R^{2}$ in (31), one obtains the sphere variance for constant normalized mutation strength as

\begin{array}{l} D_{Q, sph}^{2} & = & \sum_{i = 1}^{N} [4 σ^{2} y_{i}^{2} + 2 σ^{4}] = 4 σ^{2} R^{2} + 2 N σ^{4} = 4 R^{4} {(\frac{σ^{*}}{N})}^{2} + 2 N {(\frac{σ^{*} R}{N})}^{4} \\ = & 4 R^{4} {(\frac{σ^{*}}{N})}^{2} (1 + \frac{σ^{* 2}}{2 N}) . \end{array}

(149)

In the limit N → ∞ the second term of (149) is negligible for constant σ* giving

D_{Q, sph}^{2} ≃ 4 R^{4} {(\frac{σ^{*}}{N})}^{2} .

(150)

Having obtained the sphere variance asymptotic in (150), the terms within {·} of (147) are evaluated. The term with prefactor $e_{ϑ}^{1, 1}$ yields with σ = σ*R/N and using (150)

e_{ϑ}^{1, 1} \frac{σ^{2}}{D_{Q}^{2}} {(2 y_{i})}^{2} = e_{ϑ}^{1, 1} \frac{{(σ^{*} \frac{R}{N})}^{2}}{4 R^{4} {(σ^{*} / N)}^{2}} {(2 y_{i})}^{2} = e_{ϑ}^{1, 1} \frac{y_{i}^{2}}{R^{2}} = O (\frac{1}{N}) .

(151)

It was used in (151) that a single component $y_{i}^{2}$ contributes in expectation 1/N to the residual distance $R^{2} = \sum_{j = 1}^{N} y_{j}^{2},$ see also (12). The second term with prefactor c_ϑ/D_Q using D_Q ≃ 2R²σ*/N with A = 0 as

\frac{3 c_{ϑ} {(σ^{*} \frac{R}{N})}^{2}}{D_{Q}} = \frac{3 c_{ϑ} {(σ^{*} \frac{R}{N})}^{2}}{2 R^{2} σ^{*} / N} = O (\frac{1}{N}) .

(152)

The last term with prefactor $e_{ϑ}^{2, 0}$ yields with A = 0 and using (150)

\begin{array}{l} (μ - 1) e_{ϑ}^{2, 0} \frac{σ^{2}}{D_{Q}^{2}} {(2 y_{i})}^{2} & = & (μ - 1) e_{ϑ}^{2, 0} \frac{{(σ^{*} \frac{R}{N})}^{2}}{4 R^{4} {(σ^{*} / N)}^{2}} {(2 y_{i})}^{2} \\ = & (μ - 1) e_{ϑ}^{2, 0} \frac{y_{i}^{2}}{R^{2}} = {\begin{array}{l} O (\frac{1}{N}) if μ (N) = const . \\ O (\frac{μ (N)}{N}) else . \end{array} \end{array}

(153)

In (153) the notation μ(N) was introduced to emphasize that the population size is usually chosen depending on the dimensionality of the search space. Finally, inserting the results of the loss term investigation for the three terms (151), (152), and (153) back into progress rate (147), one gets for the loss term in {·} of (147)

- \frac{σ^{2}}{μ} {1 + O (\frac{1}{N}) + O (\frac{μ (N)}{N})} .

(154)

Provided that the population size μ = o (N), i.e., increasing sub-linearly with N, all terms except “1” in {·} can be neglected for N → ∞. Theoretical results concerning population sizing, i.e., choosing the necessary μ(N) to achieve high global convergence probability (success probability), are not available at this point. It is one of the main future goals of the current research project. Note that treating μ as a constant is also not satisfactory, since for large N an increase of μ is necessary to maintain a high success rate on a highly multimodal problem. However, experimental investigations on the Rastrigin function including step-size adaptation suggest a sub-linear relation, which validates the approximation. Finally, the lengthy result (147) is simplified using the loss term asymptotic of (154) and the second order progress rate approximation is obtained.

Second order progress rate

The second order component-wise progress rate on the Rastrigin function in the asymptotic limits of infinitely large population size μ (constant ϑ = μ/λ) and infinitely large dimensionality N with μ = o (N) yields

φ_{i}^{II} ≃ 2 y_{i} φ_{i} - \frac{σ^{2}}{μ}

(155)

≃ c_{ϑ} \frac{σ^{2}}{D_{Q}} (4 y_{i}^{2} + e^{- \frac{1}{2} {(α σ)}^{2}} 2 α A y_{i} \sin (α y_{i})) - \frac{σ^{2}}{μ} .

(156)

The expressions for $c_{ϑ} = e_{ϑ}^{1, 0}$ from (45) and D_Q from (31) were not inserted to improve readability. The first line (155) emphasizes the dependence of $φ_{i}^{II} (φ_{i})$ and can be thought of as a more general formula provided that φ_i is known and the loss term behaves similarly to the sphere function loss term −σ²/μ. The second line (156) shows the explicit results for the Rastrigin function. The results (155) and (156) can be mapped to the Evolutionary Progress Principle [5] as the expressions contain a progress gain and loss term, respectively. Here, the gain part scales with c_ϑ and it is a y_i-dependent expression. Hence, depending on the sign of y_i sin (α y_i) it may also yield negative contributions due to local attraction moving the ES away from the global optimizer, cf. Fig. 3. The loss term −σ²/μ is characteristic for intermediate recombination. It introduces significant loss for large σ, but can be decreased using a larger μ due to recombination effects.

Results of one-generation experiments are presented in Figs. 6 and 7 by evaluating (8) over 10⁶ trials (black dots with vanishing error bars) and comparing with the obtained approximations. The red dash-dotted line is showing simplified result (156), while the blue dashed line is showing (147). The positions y were initialized randomly (given R) and kept constant over all repetitions. Fig. 6 shows a smaller dimensionality N = 20 and truncation ratio ϑ = 1/4, while Fig. 7 shows larger values N = 100 with ϑ = 1/2. This was done to exemplarily investigate the results at different parameter sets.

Fig. 6 — Second order progress rate $φ_{i}^{II}$ as a function of σ* for (10/10,40)-ES with N = 20, A = 1, α = 2π, at two residual distances $R = 10 \sqrt{N}$ with *y_i* = 11.6 (left) and $R = 0.1 \sqrt{N}$ with *y_i* = 0.116 (right). The dashed blue curves show Eq. (147) and the dash-dotted red curves Eq. (156).

Fig. 7 — Second order progress rate $φ_{i}^{II}$ as a function of σ* for (100/100, 200)-ES with N = 100, A = 1, α = 2π, at two residual distances $R = 10 \sqrt{N}$ with *y_i* = 11.9 (left) and $R = 0.1 \sqrt{N}$ with *y_i* = 0.119 (right). The dashed blue curves show Eq. (147) and the dash-dotted red curves Eq. (156).

First thing to note is that the loss term allows negative progress for large σ*, which was not the case for φ_i. The approximation quality is good for different R-values (see left and right plots, respectively) and improves for larger N and μ in Fig. 7, which was expected. Simplified expression $φ_{i}^{II}$ from (156) [red, dash-dotted] yields good results compared to (147) [blue, dashed], with (147) giving slightly better results for smaller σ* and (156) better results at larger σ*. This indicates that additional terms of the Taylor expansion (70) would be needed to further improve the results of (147). However, this would make the expression more involved, which is not desired. Furthermore, the results of Fig. 6 are relatively good considering that a rather small population (10/10, 40)-ES was used at low dimensionality N = 20. One can conclude that (156) yields very good results considering its “simplicity”. It will therefore be used in Sec. 5 to investigate the dynamical behavior of the ES. It should be noted that at this point there is no aggregated progress measure over all N components, such as the R-dependent sphere progress rate. Given some y^(g) one can evaluate all i = 1, …, N values for $φ_{i}^{II}$ and obtain a progress vector, but the overall effect on R^(g) → R^(g+1) is not known. This will be part of future research. However, the cumulative effect of all N progress rates can be evaluated within a dynamical systems model to be shown in the next chapter.

5. Evolution equations

In the previous sections one-generation experiments were conducted and compared against progress rate results (96), (147), and (156). In order to have an aggregated measure over all components and many generations, φ_i and $φ_{i}^{II}$ will be used within the evolution equations and compared to real optimization runs of Algorithm 1. Using this method the (mean) global convergence behavior can be investigated.

Given definitions for first and second order progress (7) and (8), the expressions can be reformulated as stochastic iterative mappings between two generations g → g + 1 according to

y_{i}^{(g + 1)} = y_{i}^{(g)} - φ_{i} (σ^{(g)}, y^{(g)}) + ϵ^{(1)} (σ^{(g)}, y^{(g)})

(157)

{(y_{i}^{(g + 1)})}^{2} = {(y_{i}^{(g)})}^{2} - φ_{i}^{II} (σ^{(g)}, y^{(g)}) + ϵ^{(2)} (σ^{(g)}, y^{(g)}) .

(158)

The two terms ϵ⁽¹⁾ and ϵ⁽²⁾ can be interpreted as fluctuations w.r.t. the expected values (provided by φ_i and $φ_{i}^{II}$ ). Thus, it holds E[ϵ⁽¹⁾] = 0 = E[ϵ⁽²⁾]. However, the exact transition densities for g → g + 1 are not known at this point. In principle, they could be approximated using a finite number of higher order moments (or cumulants) to model the fluctuations [5, Ch. 7]. However, for a first study of the progress rate results on the dynamics, the fluctuations are neglected by setting ϵ⁽¹⁾ = 0 = ϵ⁽²⁾. Therefore, one arrives at the (deterministic) equations describing the mean-value dynamics of the parental position coordinates

y_{i}^{(g + 1)} = y_{i}^{(g)} - φ_{i} (σ^{(g)}, y^{(g)})

(159)

{(y_{i}^{(g + 1)})}^{2} = {(y_{i}^{(g)})}^{2} - φ_{i}^{Il} (σ^{(g)}, y^{(g)}),

(160)

with constant normalized mutation strength σ* from Eq. (4) giving

σ^{(g)} = σ^{*} ‖ y^{(g)} ‖ / N .

(161)

Two important issues need to be discussed. Firstly, the positional iterations are defined for a single component i. For large N however, it is not feasible to display each component individually. While the components will be iterated separately, the dynamics will be presented as a function of the residual distance R = ‖y^(g)‖. Secondly, for the evaluation of $φ_{i}^{II}$ being a function of y^(g), the square root of the components ${(y_{i}^{(g)})}^{2}$ has to be taken after iteration giving two solutions $\pm y_{i}^{(g)}$ . As the corresponding terms of $φ_{i}^{II}$ and $D_{Q}^{2} (y)$ are even in $y_{i}^{(g)}$ , both solutions are equivalent.

In the following, the deterministic iterations (159) and (160) using mutation strength rescaling (161) are compared to real optimization runs. For the initialization, y⁽⁰⁾ is chosen randomly such that ‖y⁽⁰⁾‖ = R⁽⁰⁾ for a given R⁽⁰⁾. The starting position is kept constant for consecutive runs of the same experiment. For the magnitude of R⁽⁰⁾ it is ensured that the strategy starts far enough away from the local minima landscape. Given Fig. 1 with A = 1, the farthermost local minimizer is at y_i ≈ 3 with resulting $R \approx 3 \sqrt{N}$ for N-components, such that $R^{(0)} = 20 \sqrt{N} > 3 \sqrt{N}$ is chosen.

Considering the choice of σ* one observes in experiments that larger mutation strengths (compared to a sphere-optimal σ*) increase the success probability P_S of individual trials to converge to the global optimizer. This is due to the fact that large steps tend to overcome local attraction more easily. However, this comes at the expense of efficiency, since large steps are often overshooting the global optimizer. Therefore in Fig. 8, σ* is chosen larger than the sphere-optimal value ${\hat{σ}}_{sph}^{*},$ which can be obtained numerically from [5, Eq. (6.54)], but small enough to prevent negative progress. The aim was to obtain P_S ≈ 1.

In order to aggregate the R^(g)-data of multiple dynamic experiments, the median has shown to be a suitable measure of central tendency. The main issue is that due to fluctuations the R^(g)-values of distinct ES-runs may differ by orders of magnitude, such that the mean yields biased results due to a skewed distribution. The median is more suitable in this case and a more stable measure.

In Fig. 8 one can observe three phases within the dynamics. First, linear convergence is observed for large R^(g)-values, where the sphere function dominates. Then, a slow down is observed due to increasing effects of local attraction. For small R^(g)-values, the ES descends into the global attractor basin and linear convergence can be observed again. One can see that the φ_i-iteration (blue) shows by far too much progress compared to $φ_{i}^{II}$ -iteration. This is due to the first order model, which does not include loss terms and overestimates the progress significantly, see also discussion of result (96). Iteration via $φ_{i}^{II}$ (red) shows good results compared to the median curve, especially for larger μ and N (right plot). Better agreement for large populations is also due to reduced fluctuation effects, which were neglected at the beginning of Sec. 5.

In Fig. 9 the effect of reduced σ* is investigated, which increases the probability of local convergence. The left plot shows σ* = 5 with no globally converging runs, as the mutation strength is too low. Technically, for constant σ* there is no local convergence as the algorithm never stops if R is not decreasing. Still, the experiments are stopped after some g-threshold is reached. The stagnating behavior of the ES around some R^(g) can be illustrated using Fig. 3. For σ = 0.2 one has σ* = σ N/R ≈ 0.9, which is small compared to ${\hat{σ}}_{sph}^{*} \approx 5.7$ . Both left and right progress components of Fig. 3 are significantly influenced by the local attraction region at σ = 0.2. While some components may be improved (positive value left), others are worsened (negative value right) resulting in a cumulative effect of R^(g)-stagnation. One way out can be increasing σ (or equivalently σ*). However, the local minima landscape changes with changing R and arbitrary σ*-increase is not possible. Stagnation may appear at different σ* and R^(g)-values depending on fitness and strategy parameters. For an active step-size adaptation, changing σ appropriately – without converging locally – poses a major challenge.

In the central plot of Fig. 9 roughly half of the runs are globally converging at increased $σ^{*} = {\hat{σ}}_{sph}^{*}$ . In this case the deterministic iteration follows a single converging path, as no fluctuations are modeled. The residual distance of the locally converging runs is reduced compared to ES-runs with σ* = 5. Note that the convergence speed is faster (steeper negative slope) for the globally converging runs compared to σ* = 30 of Fig. 8 due to sphere-optimal ${\hat{σ}}_{sph}^{*}$ . However, this comes with the disadvantage of a lower P_S, as more trials are converging locally. The right plot with σ* = 25 is similar to σ* = 30 of Fig. 8, but with several non-converging runs. Again, the ES convergence speed is faster, if σ* is chosen closer to ${\hat{σ}}_{sph}^{*}$ , but shows a slightly reduced P_S -value. The overall prediction quality of the iterative mapping (160) is good and the results affirm the expectation, that relatively large mutations are favorable to maximize P_S on the Rastrigin function.

To confirm the expectation that the approximation quality increases further for larger μ and N, experiments are shown in Fig. 10. First thing to notice is that positional fluctuations of the ES trials decrease further, such that nearly all runs show a similar R-dynamics. This is related to the intermediate recombination, see Eq. (34), as position y^(g+1) is obtained by averaging over a large number of individuals. One can see good agreement, but for the left plot there is still some room for improvement. This is related to truncation ratio ϑ = 1/4, such that the Taylor expansion point in Eq. (70) via function g(x_i) is shifted by Φ⁻¹(ϑ). For ϑ = 1/2 and even larger N and μ (right plot), very good agreement is observed.

6. Conclusion and outlook

In this paper the full first and second order progress rate analysis of the (μ/μ_I, λ)-ES has been presented. In order to obtain closed-form expressions for φ_i and $φ_{i}^{II}$ it was necessary to consider the large dimensionality and large population assumption. While the latter does not present a serious issue because large populations are needed to ensure global convergence, it was the key prerequisite to solve and simplify the expected value integrals. As the experiments have shown, the approximation quality of the progress rate expressions is rather good even for N as small as 20 and comparably small populations of μ = 10. For larger N and μ the approximation quality improves further, as expected. The first order progress rate result is able to model the local attraction effects on the Rastrigin function. This is a very important step, as all subsequent investigations in this paper are based on φ_i-results. The second order progress rate derivation was needed to obtain additional loss terms completing the progress model, which was especially needed for larger mutation strengths and close to the global optimizer.

Using the progress rate expressions, the dynamics of the evolution process have been investigated. There is a good agreement between the iterations and real ES-runs using median aggregation of the residual distance R to the global optimizer. As has been shown, depending on the choice of the normalized mutation strength, one can model global as well as local convergence behavior. Additionally, one observes a trade-off between efficiency and success rate, as relatively large mutations have to be chosen to maximize the success probability.

The conducted experiments assume scale-invariance, i.e., the mutation strength is controlled by the residual distance R. This is in contrast to the full self-adaptive ES where σ evolves during the ES run either by mutative self-adaptation (SA), cumulative step-size adaptation (CSA), or Meta-ES. The incorporation of the self-adaptation process will be the next step completing the analysis of the (μ/μ_I, λ)-ES on Rastrigin. To this end, the self-adaptation response (SAR) function must be derived. Combining N progress rates with the SAR function yields N + 1 evolution equations. In order to get manageable expressions that allow for analytic population sizing and expected runtime investigations, additional aggregation is needed. One possible approach would be the aggregation of individual parental y_i components into the parental distance R modeling the expected progress as a function of the residual distance. This would reduce the number of evolution equations to two and making further analytic treatment more accessible. A first step in this direction has been done in [19].

Finally, the presented approach to model the ES-dynamics is based on mean value considerations. That is, fluctuations are not considered so far. Whether the approach presented can be extended to allow for the calculation of the global attractor convergence probability as a function of strategy and fitness parameters remains an open question.

Acknowledgements

This work was supported by the Austrian Science Fund (FWF) under grant P33702-N.

Footnotes

For independently distributed quality gain components Q_i(x_i) with finite mean and variance the Central Limit Theorem holds [10], if for some δ > 0 the Lyapunov condition

\lim_{N \to \infty} \frac{\sum_{i = 1}^{N} E [{| Q_{i} - E [Q_{i}] |}^{2 + δ}]}{D_{Q}^{2 + δ}} = 0

holds. The validation of the condition could be approached using Eqs. (24), (25), and (26) for the respective quantities by evaluating higher order moments.

Actually, using the result from (80) one could even calculate a closed-form second-order approximation for (69). However, the resulting formula would be rather complex.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

No data was used for the research described in the article.

References

[1].Abramowitz M, Stegun IA. Pocketbook of Mathematical Functions. Verlag Harri Deutsch; Thun: 1984. [Google Scholar]
[2].Agapie A, Solomon O, Bădin L. Theory of (1+1) ES on SPHERE revisited. IEEE Trans Evol Comput. 2022:938–948. [Google Scholar]
[3].Agapie A, Solomon O, Giuclea M. Theory of (1 + 1) ES on the RIDGE. IEEE Trans Evol Comput. 2022;26(3):501–511. [Google Scholar]
[4].Arnold DV. Noisy Optimization with Evolution Strategies. Kluwer Academic Publishers; Dordrecht: 2002. [Google Scholar]
[5].Beyer H-G. The Theory of Evolution Strategies. Springer; Heidelberg: 2001. (Natural Computing Series). [Google Scholar]
[6].Beyer H-G, Arnold DV, Meyer-Nieberg S. A new approach for predicting the final outcome of evolution strategy optimization under noise. Genet Program Evol Mach. 2005;6(1):7–24. [Google Scholar]
[7].Beyer H-G, Melkozerov A. The dynamics of self-adaptive multi-recombinant evolution strategies on the general ellipsoid model. IEEE Trans Evol Comput. 2014;18(5):764–778. doi: 10.1109/TEVC.2013.2283968. [DOI] [Google Scholar]
[8].Beyer H-G, Schwefel H-P. Evolution strategies: a comprehensive introduction. Nat Comput. 2002;1(1):3–52. [Google Scholar]
[9].Beyer H-G, Sendhoff B. Toward a steady-state analysis of an evolution strategy on a robust optimization problem with noise-induced multi-modality. IEEE Trans Evol Comput. 2017;21(4):629–643. doi: 10.1109/TEVC.2017.2668068. [DOI] [Google Scholar]
[10].Billingsley P. Probability and Measure. Wiley; 1995. (Wiley Series in Probability and Statistics). [Google Scholar]
[11].Hansen N, Kern S. In: Yao X, et al., editors. Evaluating the CMA evolution strategy on multimodal test functions; Parallel Problem Solving from Nature 8; Berlin. 2004. pp. 282–291. [Google Scholar]
[12].Hellwig M, Beyer H-G. On the steady state analysis of covariance matrix self-adaptation evolution strategies on the noisy ellipsoid model. Theor Comput Sci. 2018 doi: 10.1016/j.tcs.2018.05.016. [DOI] [Google Scholar]
[13].Melkozerov A, Beyer H-G. In: Branke J, et al., editors. On the analysis of self-adaptive evolution strategies on elliptic model: first results; GECCO’10: Proceedings of the Genetic and Evolutionary Computation Conference; New York. 2010. pp. 369–376. [Google Scholar]
[14].Meyer-Nieberg S. Self-Adaptation in Evolution Strategies. PhD thesis, University of Dortmund, CS Department; Dortmund, Germany: 2007. [Google Scholar]
[15].Müller N, Glasmachers T. Non-local optimization: imposing structure on optimization problems by relaxation; Foundations of Genetic Algorithms; 2021. pp. 1–10. [Google Scholar]
[16].Omeradzic A, Beyer H-G. Progress Analysis of a Multi-Recombinative Evolution Strategy on the Highly Multimodal Rastrigin Function, Report. Vorarlberg University of Applied Sciences; 2022. https://opus.fhv.at/frontdoor/index/index/docId/4722 . [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Omeradzic A, Beyer H-G. In: Rudolph G, Kononova AV, Aguirre H, Kerschke P, Ochoa G, Tušar T, editors. Progress rate analysis of evolution strategies on the Rastrigin function: first results; Parallel Problem Solving from Nature – PPSN XVII; 2022. pp. 499–511. [DOI] [PMC free article] [PubMed] [Google Scholar]
[18].Omeradzic A, Beyer H-G. Rastrigin Function: Quality Gain and Progress Rate for (µ/µI, λ)-ES, Report. Vorarlberg University of Applied Sciences; 2023. https://opus.fhv.at/frontdoor/index/index/docId/5151 . [Google Scholar]
[19].Omeradzic A, Beyer H-G. Convergence properties of the (µ/µI, λ)-ES on the Rastrigin function; Proceedings of the 17th ACM/SIGEVO Conference on Foundations of Genetic Algorithms, FOGA ‘23; New York, NY, USA. 2023. pp. 117–128. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No data was used for the research described in the article.

[R1] [1].Abramowitz M, Stegun IA. Pocketbook of Mathematical Functions. Verlag Harri Deutsch; Thun: 1984. [Google Scholar]

[R2] [2].Agapie A, Solomon O, Bădin L. Theory of (1+1) ES on SPHERE revisited. IEEE Trans Evol Comput. 2022:938–948. [Google Scholar]

[R3] [3].Agapie A, Solomon O, Giuclea M. Theory of (1 + 1) ES on the RIDGE. IEEE Trans Evol Comput. 2022;26(3):501–511. [Google Scholar]

[R4] [4].Arnold DV. Noisy Optimization with Evolution Strategies. Kluwer Academic Publishers; Dordrecht: 2002. [Google Scholar]

[R5] [5].Beyer H-G. The Theory of Evolution Strategies. Springer; Heidelberg: 2001. (Natural Computing Series). [Google Scholar]

[R6] [6].Beyer H-G, Arnold DV, Meyer-Nieberg S. A new approach for predicting the final outcome of evolution strategy optimization under noise. Genet Program Evol Mach. 2005;6(1):7–24. [Google Scholar]

[R7] [7].Beyer H-G, Melkozerov A. The dynamics of self-adaptive multi-recombinant evolution strategies on the general ellipsoid model. IEEE Trans Evol Comput. 2014;18(5):764–778. doi: 10.1109/TEVC.2013.2283968. [DOI] [Google Scholar]

[R8] [8].Beyer H-G, Schwefel H-P. Evolution strategies: a comprehensive introduction. Nat Comput. 2002;1(1):3–52. [Google Scholar]

[R9] [9].Beyer H-G, Sendhoff B. Toward a steady-state analysis of an evolution strategy on a robust optimization problem with noise-induced multi-modality. IEEE Trans Evol Comput. 2017;21(4):629–643. doi: 10.1109/TEVC.2017.2668068. [DOI] [Google Scholar]

[R10] [10].Billingsley P. Probability and Measure. Wiley; 1995. (Wiley Series in Probability and Statistics). [Google Scholar]

[R11] [11].Hansen N, Kern S. In: Yao X, et al., editors. Evaluating the CMA evolution strategy on multimodal test functions; Parallel Problem Solving from Nature 8; Berlin. 2004. pp. 282–291. [Google Scholar]

[R12] [12].Hellwig M, Beyer H-G. On the steady state analysis of covariance matrix self-adaptation evolution strategies on the noisy ellipsoid model. Theor Comput Sci. 2018 doi: 10.1016/j.tcs.2018.05.016. [DOI] [Google Scholar]

[R13] [13].Melkozerov A, Beyer H-G. In: Branke J, et al., editors. On the analysis of self-adaptive evolution strategies on elliptic model: first results; GECCO’10: Proceedings of the Genetic and Evolutionary Computation Conference; New York. 2010. pp. 369–376. [Google Scholar]

[R14] [14].Meyer-Nieberg S. Self-Adaptation in Evolution Strategies. PhD thesis, University of Dortmund, CS Department; Dortmund, Germany: 2007. [Google Scholar]

[R15] [15].Müller N, Glasmachers T. Non-local optimization: imposing structure on optimization problems by relaxation; Foundations of Genetic Algorithms; 2021. pp. 1–10. [Google Scholar]

[R16] [16].Omeradzic A, Beyer H-G. Progress Analysis of a Multi-Recombinative Evolution Strategy on the Highly Multimodal Rastrigin Function, Report. Vorarlberg University of Applied Sciences; 2022. https://opus.fhv.at/frontdoor/index/index/docId/4722 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] [17].Omeradzic A, Beyer H-G. In: Rudolph G, Kononova AV, Aguirre H, Kerschke P, Ochoa G, Tušar T, editors. Progress rate analysis of evolution strategies on the Rastrigin function: first results; Parallel Problem Solving from Nature – PPSN XVII; 2022. pp. 499–511. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] [18].Omeradzic A, Beyer H-G. Rastrigin Function: Quality Gain and Progress Rate for (µ/µI, λ)-ES, Report. Vorarlberg University of Applied Sciences; 2023. https://opus.fhv.at/frontdoor/index/index/docId/5151 . [Google Scholar]

[R19] [19].Omeradzic A, Beyer H-G. Convergence properties of the (µ/µI, λ)-ES on the Rastrigin function; Proceedings of the 17th ACM/SIGEVO Conference on Foundations of Genetic Algorithms, FOGA ‘23; New York, NY, USA. 2023. pp. 117–128. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Progress analysis of a multi-recombinative evolution strategy on the highly multimodal Rastrigin function✩

Amir Omeradzic

Hans-Georg Beyer

Abstract

1. Introduction

Fig. 1.

Algorithm 1. (μ/μI, λ)-ES with constant σ*.

2. Local performance measures and quality gain distribution

Fig. 2.

Approximation 1 (Quality gain distribution)

Approximation 2 (Quality gain distribution given xi)

3. First order progress rate

Theorem 1

Proof

Proposition 1

Proof

Theorem 2

Proof

Approximation 3 (Truncated cumulative distribution function series)

Approximation 4 (Progress rate integral for large dimensionality)

First order progress rate

Fig. 3.

Fig. 4.

Fig. 5.

4. Second order progress rate

Proposition 2

Proof

Theorem 3

Proof

Proposition 3

Proof

Theorem 4

Proof

Second order progress rate

Fig. 6.

Fig. 7.

5. Evolution equations

Fig. 8.

Fig. 9.

Fig. 10.

6. Conclusion and outlook

Acknowledgements

Footnotes

Data availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Progress analysis of a multi-recombinative evolution strategy on the highly multimodal Rastrigin function^✩

Algorithm 1. (μ/μ_I, λ)-ES with constant σ*.

Approximation 2 (Quality gain distribution given x_i)