Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2024 Mar 26.
Published in final edited form as: Parallel Probl Solving Nat. 2022 Aug 15;13399:499–511. doi: 10.1007/978-3-031-14721-0_35

Progress Rate Analysis of Evolution Strategies on the Rastrigin Function: First Results

Amir Omeradzic 1,, Hans-Georg Beyer 1
PMCID: PMC7615766  EMSID: EMS189095  PMID: 38532780

Abstract

A first order progress rate is derived for the intermediate multi-recombinative Evolution Strategy (μ/μI, λ)-ES on the highly multimodal Rastrigin test function. The progress is derived within a linearized model applying the method of so-called noisy order statistics. To this end, the mutation-induced variance of the Rastrigin function is determined. The obtained progress approximation is compared to simulations and yields strengths and limitations depending on mutation strength and distance to the optimizer. Furthermore, the progress is iterated using the dynamical systems approach and compared to averaged optimization runs. The property of global convergence within given approximation is discussed. As an outlook, the need of an improved first order progress rate as well as the extension to higher order progress including positional fluctuations is explained.

Keywords: Evolution Strategies, Rastrigin function, Progress rate analysis, Global optimization

1. Introduction

Evolution Strategies (ES) [12,13] are well-recognized Evolutionary Algorithms suited for real-valued non-linear optimization. State-of-the-art ES such as the CMA-ES [8] or its simplification [5] are also well-suited for locating global optimizers in highly multimodal fitness landscapes. While the CMA-ES was originally mainly intended for non-differentiable optimization problems, but yet regarded as a locally acting strategy, it was already in [7] observed that using a large population size can make the ES a strategy that is able to locate the global optimizer among a huge number of local optima. This is a surprising observation when considering the ES as a strategy that acts mainly local in the search space following some kind of gradient or natural gradient [3,6,11]. As one can easily check using standard (highly) multimodal test functions such as Rastrigin, Ackley, and Griewank to name a few, this ES property is not intimately related to the covariance matrix adaptation (CMA) ES which generates non-isotropic correlated mutations, but can also be found in (μ/μI, λ)-ES with isotropic mutations. Therefore, if one wants to understand the underlying working principles how the ES locates the global optimizer, the analysis of the (μ/μI, λ)-ES should be the starting point.

The question regarding why and when optimization algorithms − originally designed for local search − are able to locate global optima has gained attention in the last few years. A recurring idea comes from relaxation procedures that transform the original multimodal optimization problem into a convex optimization problem called Gaussian continuation [9]. Gaussian continuation is nothing else but a convolution of the original optimization problem with a Gaussian kernel. As has been shown in [10], using the right Gaussian, Rastrigin-like functions can be transformed into a convex optimization problem, thus making it accessible to gradient following strategies. However, this raises the question how to perform the convolution efficiently. One road followed in [14] uses high-order Gauss-Hermite integration in conjunction with a gradient descent strategy yielding surprisingly good results. The other road coming to mind is approximating the convolution by Gaussian sampling. This resembles the procedure ES do: starting from a parental state, offspring are generated by Gaussian mutations. The problem is, however, that in order to get a reliable gradient, a huge number of samples, i.e. offspring in ES must be generated in order to get reliable convolution results. The number of offspring needed to get reliable estimates seems much larger than the offspring population size needed in ES experiments conducted in [7] showing approximately a linear relation between problem dimension N and population size for the Rastrigin function. Therefore, understanding the ES performance from viewpoint of Gaussian relaxation does not seem to help much.

The approach followed in this paper will incorporate two main concepts, namely a progress rate analysis as well as its application within the so-called evolution equations modeling the transition dynamics of the ES [2]. The progress rate measure yields the expected positional change in search space between two generations depending on location, strategy and test function parameters. Aiming to investigate and understand the dynamics of globally converging ES runs, the progress rate is an essential quantity to model the expected evolution dynamics over many generations.

This paper provides first results of a scientific program that aims at an analysis of the performance of the (μ/μI, λ)-ES on Rastrigin’s test function based on a first order progress rate. After a short introduction of the (μ/μI, λ)-ES, the N-dimensional first order progress will be defined and an approximation will be derived resulting in a closed form expression. The predictive power and its limitations will be checked by one-generation experiments. The progress rate will then be used to simulate the ES dynamics on Rastrigin using difference equations. This simulation will be compared with real runs of the (μ/μI, λ)-ES. In a concluding section a summary of the results and outlook of the future research will be given.

2. Rastrigin Function and Local Quality Change

The real-valued minimization problem defined for an N-dimensional search vector y = (y1, …, yN) is performed on the Rastrigin test function f given by

f(y)=i=1Nfi(yi)=i=1Nyi2+AAcos(αyi), (1)

with A denoting the oscillation amplitude and α = 2π the corresponding frequency. The quadratic term with superimposed oscillations yields a finite number of local minima M for each dimension i, such that the overall number of minima scales exponentially as MN posing a highly multimodal minimization problem. The global optimizer is at ŷ = 0.

For the progress rate analysis in Sect. 4 the local quality function Qy(x) at y due to mutation vector x = (x1, …, xN) is needed. In order to reuse results from noisy progress rate theory it will be formulated for the maximization case of F(y) = −f(y) with Fi(yi) = −fi(yi), such that local quality change yields

Qy(x)=F(y+x)F(y)=f(y)f(y+x). (2)

Qy(x) can be evaluated for each component i independently giving

Qy(x)=i=1NQi(xi)=i=1Nfi(yi)fi(yi+xi) (3)
=i=1N(xi2+2yixi+Acos(αyi)(1cos(αxi))+Asin(αyi)sin(αxi)). (4)

A closed form solution of the progress rate appears to be obtainable only for a linearized expression of Qi(xi). A first approach taken in this paper is based on a Taylor expansion for the mutation xi and discarding higher order terms

Qi(xi)=Fi(yi+xi)Fi(yi)=Fiyixi+O(xi2) (5)
(2yiαAsin(αyi))xi=:fixi, (6)

using the following derivative terms

ki=2yianddi=αAsin(αyi),suchthatfiyi=fi=ki+di. (7)

A second approach is to consider only the linear term of Eq. (4) and neglect all non-linear terms denoted by δ(xi) according to

Qi(xi)=2yixixi2Acos(αyi)(1cos(αxi))Asin(αyi)sin(αxi) (8)
=2yixi+δ(xi)2yixi=kixi. (9)

The linearization using fi is a local approximation of the function incorporating oscillation parameters A and α. Using only ki (setting di = 0) discards oscillations by approximating the quadratic term via ki=(yi2)/yi=2yi with negative sign due to maximization. Both approximations will be evaluated later.

3. The (μ/μI, λ)-ES with Normalized Mutations

The Evolution Strategy under investigation consists of a population of μ parents and λ offspring (μ < λ) per generation g. Algorithm 1 is presented below and offspring variables are denoted with overset “∼”.

Population variation is achieved by applying an isotropic normally distributed mutation xσ𝒩(0, 1) with strength σ to the parent recombinant in Lines 6 and 7. The recombinant is obtained using intermediate recombination of all μ parents equally weighted in Line 11. Selection of the m = 1, …, μ best search vectors ym (out of λ) according to their fitness is performed in Line 10.

Note that the ES in Algorithm 1 operates under constant normalized mutation σ* in Lines 3 and 12 using the spherical normalization

σ=σ(g)N||y(g)||=σ(g)NR(g). (10)

This property ensures global convergence of the algorithm as the mutation strength σ(g) decreases if and only if the residual distance ║y(g)║ = R(g) decreases. While σ* is not known during black-box optimizations, it is used here to investigate the dynamical behavior of the ES using the first order progress rate approach to be developed in this paper. Incorporating self-adaptation of σ or cumulative step-size adaptation remains for future research.

Algorithm 1. (μ/μI, λ)-ES with constant σ*.

1: g ← 0

2: y(0)y(init)

3: σ(0)σ* ║y(0)║ /N

4: repeat

5:     for l = 1, …, λ do

6:         x˜lσ(g)𝒩l(0,1)

7:         y˜ly(g)+x˜l

8:         f˜lf(y˜l)

9:     end for

10:    (y˜1;λ,,y˜μ;λ)sort(y˜w.r.t.ascendingf˜)

11:    y(g+1)1μm=1μy˜m;λ

12:    σ(g+1)σ* ║y(g+1)║/N

13:    gg + 1

14: until termination criterion

4. Progress Rate

4.1. Definition

Having introduced the Evolution Strategy, we are interested in the expected one-generation progress of the optimization on the Rastrigin function (1) before investigating the dynamics over multiple generations.

A first order progress rate φi for the i-th component between two generations gg + 1 can be defined as the expectation value over the positional difference of the parental components

φi=E[yi(g)yi(g+1)|σ(g),y(g)]=yi(g)E[yi(g+1)|σ(g),y(g)], (11)

given mutation strength σ(g) and the position y(g). First, an expression for y(g+1) is needed, see Algorithm 1, Line 11. It is the result of mutation, selection and recombination of the m = 1, …, μ offspring vectors yielding the highest fitness, such that y(g+1)=1μm=1μy˜m;λ=1μm=1μ(y(g)+x)m;λ. Considering the i-th component, noting that y(g) is the same for all offspring and setting (xm)i = xm one has

yi(g+1)=1μm=1μ(yi(g)+xm;λ)=yi(g)+1μm=1μxm;λ. (12)

Taking the expectation E[yi(g+1)], setting x = σz = σ𝒩(0, 1) and inserting the expression back into (11) yields

φi=1μE[m=1μxm;λ|σ(g),y(g)]=σμE[m=1μzm;λ|σ(g),y(g)]. (13)

Therefore progress can be evaluated by averaging over the expectations of μ selected mutation contributions. In principle this task can be solved by deriving the induced order statistic density pm;λ for the m-th best individual and subsequently solving the integration over the i-th component

φi=1μm=1μxipm;λ(xi|σ(g),y(g))dxi. (14)

However, the task of computing expectations of sums of order statistics under noise disturbance has already been discussed and solved by Arnold in [1]. Therefore the problem of Eq. (13) will be reformulated in order to apply the solutions provided by Arnold.

4.2. Expectations of Sums of Noisy Order Statistics

Let z be a random variate with density pz(z) and zero mean. The density is expanded into a Gram-Charlier series by means of its cumulants κi (i ≥ 1) according to [1, p. 138, D.15]

pz(z)=12πκ2ez22κ2(1+γ16He3(zκ2)+γ224He4(zκ2)+), (15)

with expectation κ1 = 0, variance κ2, skewness γ1=κ3/κ23/2, excess γ2=κ4/κ22 (higher order terms not shown) and Hek denoting the k-th order probabilist’s Hermite polynomials. For the problem at hand, see Eq. (13), the mutation variate z𝒩(0, 1) with κ2 = 1 and κi = 0 for i ≠ 2 yielding a standard normal density.

Furthermore, let ϵ𝒩(0,σϵ2) model additive noise disturbance, such that resulting observed values are v = z + ϵ. Selection of the m-th largest out of λ values yields

vm;λ=(z+𝒩(0,σϵ2))m;λ, (16)

and the distribution of selected source terms zm;λ follows a noisy order statistic with density pm;λ. Given this definition and a linear relation between zm;λ and vm;λ the method of Arnold is applicable.

In our case the i-th mutation component xm;λ of Eq. (13) is related to selection via the quality change defined in Eq. (3). Maximizing the fitness Fi(yi + xi) conforms to maximizing quality Qi(xi) with Fi(yi) being a constant offset.

Aiming at an expression of form (16) and starting with (3), we first isolate component Qi from the remaining N − 1 components denoted by Σji Qj. Then, approximations are applied to both terms yielding

Qy(x)=Qi(xi)+jiQj(xj) (17)
fixi+𝒩(Ei,Di2), (18)

with linearization (6) applied to Qi(xi). Additionally, jiQj𝒩(Ei,Di2), as the sum of independent random variables asymptotically approaches a normal distribution in the limit N → ∞ due to the Central Limit Theorem. This is ensured by Lyapunov’s condition provided that there are no dominating components within the sum due to largely different values of yj. The corresponding Rastrigin quality variance Di2=Var[jiQj(xj)] is calculated in the supplementary material (https://github.com/omam-evo/paper/blob/main/ppsn22/PPSN22_OB22.pdf). As the expectation Ei = E[Σji Qj(xj)] is only an offset to Qy(x) it has no influence on the selection and its calculation can be dropped.

Using xi = σzi and fi=sgn(fi)|fi|, expression (18) is reformulated as

Qy(x)=sgn(fi)|fi|σzi+Ei+𝒩(0,Di2) (19)
Qy(x)Ei|fi|σ=sgn(fi)zi+𝒩(0,Di2(fiσ)2). (20)

The decomposition using sign function and absolute value is needed for correct ordering of selected values w.r.t. zi in (20).

Given result (20), one can define the linearly transformed quality measure vi:=(Qy(x)Ei)/|fi|σ and noise variance σϵ2:=(Di/fiσ)2, such that the selection of mutation component sgn (fi)zi is disturbed by a noise term due to the remaining N − 1 components. A relation of the form (16) is obtained up to the sign function.

In [1] Arnold calculated the expected value of arbitrary sums SP of products of noisy ordered variates containing ν factors per summand

SP={n1,,nν}zn1;λp1znν;λpν, (21)

with random variate z introduced in Eqs. (15) and (16). The vector P = (p1, …, pν) denotes the positive exponents and distinct summation indices are denoted by the set {n1, …, nν}. The generic result for the expectation of (21) is provided in [1, p. 142, D.28] and was adapted to account for the sign difference between (16) and (20) resulting in possible exchanged ordering. Performing simple substitutions in Arnold’s calculations in [1] and recalling that in our case γ1 = γ2 = 0, the expected value yields

E[SP]=sgn(fi)||P1||κ2||P1||μ!(μν)!n=0νk0ζn,0(P)(k)hμ,λνn,k. (22)

Note that expression (22) deviates from Arnold’s formula only in the sign in front of κ2. The coefficients ζn,0(P)(k) are defined in terms of a noise coefficient a according to

a=κ2κ2+σϵ2 with ζn,0(P)(k)=Polynomial(a), (23)

for which tabulated results are presented in [1, p. 141]. The coefficients hμ,λi,k are numerically obtainable solving

hμ,λi,k=λμ2π(λμ)Hek(x)e12x2[ϕ(x)]i[Φ(x)]λμ1[1Φ(x)]μidx. (24)

Now we are in the position to calculate expectation (13) using (22). Since z ~ 𝒩(0,1), it holds κ2 = 1. Identifying P = (1), ║P1 = 1 and ν = 1 yields

E[m=1μzm;λ]=sgn(fi)μ!(μ1)!n=01k0ζn,0(1)(k)hμ,λ1n,k=sgn(fi)μζ0,0(1)(0)hμ,λ1,0=sgn(fi)μacμ/μ,λ, (25)

with ζ1,0(1)(k)=0 for any k, and ζ0,0(1)(k)0 only for k = 0 yielding a. The expression hμ,λ1,0 is equivalent to the progress coefficient definition cμ/μ,λ [2, p. 216]. Inserting (25) back into (13), using a=1/(1+(Di/fiσ)2)=|fi|σ/(fiσ)2+Di2 with the requirement a > 0, and noting that fi=sgn(fi)|fi| one finally obtains for the i-th component first order progress rate

φi(σ,y)=cμ/μ,λfi(yi)σ2(fi(yi)σ)2+Di2(σ,(y)ji). (26)

The population dependency is given by progress coefficient cμ/μ,λ. The fitness dependent parameters are contained in fi, see (7), and in Di2 calculated in the supplementary material (https://github.com/omam-evo/paper/blob/main/ppsn22/PPSN22_OB22.pdf). For better readability the derivative fi and variance Di2 are not inserted into (26). An exemplary evaluation of Di2 as a function of the residual distance R using normalization (10) is also shown in the supplementary material.

4.3. Comparison of Simulation and Approximation

Figure 1 shows an experimentally obtained progress rate compared to the result of (26). Due to large N one exemplary φi-graph is shown on the left, and corresponding i = 1, …, N errors are shown on the right.

Fig. 1.

Fig. 1

One-generation experiments with (150/150, 300)-ES, N = 100, A = 10 are performed and quantity (11) is measured averaging over 105 runs. Left: φi over σ for i = 2 at position y2 ≈ 1.19, where y was chosen randomly such that ║y║ = R = 10. Right: error measure φiφi,sim between (26) and simulation for i = 1, …, N evaluated at σ = {0.1, 1}. The colors are set according to the legend. (Color figure online)

The left plot shows the progress rate over a σ-range of [0, 1]. This magnitude was chosen in order to study the oscillation, as the frequency α = 2π. The initial position was chosen randomly to be on the sphere surface R = 10.

The red dashed curve uses fi as linearization, while the blue dash-dotted curve assumes fi=ki (with di = 0), see also (7). As fi approximates the quality change locally, agreement for the progress is given only for very small mutations σ. For larger σ very large deviation may occur, depending on the local derivative.

The blue curve φi(ki) neglects the oscillation (di = 0) and therefore follows the progress of the quadratic function f(y)=iyi2 for large σ with very good agreement. Due to a linearized form of Qi(xi) in (6) neither approximation can reproduce the oscillation for moderately large σ.

To verify the approximation quality, the error between (26) and simulation is displayed on the right side of Fig. 1 for all i = 1, …, N. It was done for small σ = 0.1 and large σ = 1. The deviations are very similar in magnitude for all i, given randomly chosen yi. Note that for σ = 1 the red points show very large errors compared to blue, which was expected.

Figure 2 shows the progress rate φi over σ*, for i = 2 as in Fig. 1, with y randomly on the surface radii R = {100, 10, 1, 0.1}. Using σ* the mutation σ is normalized by the residual distance R with spherical normalization (10). Far from the origin with R = {100, 10} the quadratic terms are dominating giving better results using φi(ki). Reaching R = 1 local minima are more relevant and mixed results are obtained with φi(fi) better for smaller σ* and φi(ki) for larger σ*. Within the global attractor R = 0.1 the local structure dominates and φi(fi) yields better results. These observations will be relevant analyzing the dynamics in Fig. 3 where both approximations show strengths and weaknesses.

Fig. 2.

Fig. 2

One-generation progress φi (i = 2) over normalized mutation σ* for (150/150, 300)-ES, N = 100, A = 1 and R = {100, 10, 1, 0.1}. Simulations are averaged over 105 runs. These experiments are preliminary investigations related to the dynamics shown in Fig. 3 with σ* = 30. Given a constant σ* the approximation quality varies over different magnitudes of R.

Fig. 3.

Fig. 3

Comparing average of 100 optimization runs of Algorithm 1 (black, solid) with iterated dynamics from Eq. (27) under constant σ* = 30 for A = 1 and N = 100. Large populations sizes are chosen to ensure global convergence (left: μ = 150; right: μ = 1500; constant μ/λ = 0.5). Iteration using progress (26) is performed for both fi=ki+di (red/orange dashed) and fi(di=0)=ki (blue dash-dotted) using Equations (27) and (28). The orange dashed iteration was initialized with R(0) = 0.1 and translated to the corresponding position of the simulation for easier comparison. The evaluation of quality variance Di2(R) is shown in the supplementary material (https://github.com/omam-evo/paper/blob/main/ppsn22/PPSN22_OB22.pdf). (Color figure online)

5. Evolution Dynamics

As we are interested in the dynamical behavior of the ES, averaged real optimization runs from Algorithm 1 will be compared to the iterated dynamics using progress result (26) by applying the dynamical systems approach [2]. Neglecting fluctuations, i.e., yi(g+1)=E[yi(g+1)σ(g),y(g)] the mean value dynamics for the mapping yi(g)yi(g+1) immediately follows from (11) giving

yi(g+1)=yi(g)φi(σ(g),y(g)). (27)

The control scheme of σ(g) was introduced in Eq. (10) and yields simply

σ(g)=σ||y(g)||/N. (28)

Equations (27) and (28) describe a deterministic iteration in search space and rescaling of mutations according to the residual distance. For a convergence analysis, we are interested in the dynamics of R(g) = ║y(g)║ rather than the actual position values y(g). Hence in Fig. 3 the R(g)-dynamics of the conducted experiments is shown.

In Fig. 3, all runs of Algorithm 1 exhibit global convergence with the black line showing the average. The left and right plots differ by population size. Iteration φi(ki), blue dash-dotted curve, also converges globally, though very slowly and therefore not shown entirely. The convergence behavior of iteration φi(fi), red and orange dashed curves, strongly depends on the initialization and is discussed below.

Three phases can be observed for the simulation. It shows linear convergence at first being followed by a slow-down due to local attractors. Reaching the global attractor the convergence speed increases again. Iteration φi(ki) is able to model the first two phases to some degree. Within the global attractor the slope information di is missing such that the progress is largely underestimated.

Iteration φi(fi) converges first, but yields a stationary state with Rst ≈ 20 when the progress φi becomes dominated by derivative term di. Starting from R(0) = 102 the stationary yist are either fixed or alternating between coordinates depending on σ, Di, ki, and di. This effect is due to attraction of local minima and due to the deterministic iteration disregarding fluctuations. It occurs also with varying initial positions. Initialized at R(0) = 10−1 orange iteration φi(fi) is globally converging.

It turns out that the splitting point of the two approximations in Fig. 3 occurs at a distance R to the global optimizer where the ES approaches the attractor region of the “first” local minima. For the model parameters considered in the experiment this is at about R ≈ 28.2 − the distance of the farest local minimizer to the global optimizer (obtained by numerical analysis).

Plots in Fig. 3 differ by population size. The convergence speed, i.e. the slopes, show better agreement for large populations, which can be attributed to the fluctuations neglected in (27). Investigations on unimodal funtions Sphere [2] and Ellipsoid [4] have shown that progress is decreased by fluctuations due to a loss-term scaling with 1/μ, which agrees with Fig. 3. On the left the iterated progress is faster due to neglected but present fluctuations, while on the right better agreement is observed due to insignificant fluctuations. These observations will be investigated in future research.

6. Summary and Outlook

A first order progress rate φi was derived for the (μ/μI, λ)-ES by means of noisy order statistics in (26) on the Rastrigin function (1). To this end, the mutation induced variance of the quality change Di2 is needed. Starting from (4) a derivation yielding Di2 has been presented in the supplementary material. Furthermore, the approximation quality of φi was investigated using Rastrigin and quadratic derivatives fi and ki, respectively, by comparing with one-generation experiments.

Linearization fi shows good agreement for small-scale mutations, but very large deviations for large mutations. Conversely, linearization ki yields significantly better results for large mutations as the quadratic fitness term dominates. A progress rate modeling the transition between the regimes is yet to be determined. First numerical investigations of (14) including all terms of (4) indicate that nonlinear terms are needed for a better progress rate model, which is an open challenge and part of future research.

The obtained progress rate was used to investigate the dynamics by iterating (27) using (28) and comparing with ES runs. Iteration via fi only converges globally if initialized close to the optimizer, since local attraction is strongly dominating. Dynamics via ki converges globally independent of initialization, but the observed rate matches only for the initial phase and for very large populations. This confirms the need for a higher order progress rate modeling the effect of fluctuations, especially when function evaluations are expensive and small populations must be used. Additionally, an advanced progress rate formula is needed combining effects of global and local attraction to model all three phases of the dynamics correctly.

The investigations done so far are a first step towards a full dynamical analysis of the ES on the multimodal Rastrigin function. Future investigations must also include the complete dynamical modeling of the mutation strength control. One aim is the tuning of mutation control parameters such that the global convergence probability is increased while still maintaining search efficiency. Our final goal will be the theoretical analysis of the full evolutionary process yielding also recommendations regarding the choice of the minimal population size needed to converge to the global optimizer with high probability.

Acknowledgments

This work was supported by the Austrian Science Fund (FWF) under grant P33702-N. Special thanks goes to Lisa Schönenberger for providing valuable feedback and helpful discussions.

References

  • 1.Arnold D. Noisy Optimization with Evolution Strategies. Kluwer Academic Publishers; Dordrecht: 2002. [Google Scholar]
  • 2.Beyer HG. Natural Computing Series. Springer; Heidelberg: 2001. The Theory of Evolution Strategies. [DOI] [Google Scholar]
  • 3.Beyer HG. Convergence analysis of evolutionary algorithms that are based on the paradigm of information geometry. Evol Comput. 2014;22(4):679–709. doi: 10.1162/EVCO_a_00132. [DOI] [PubMed] [Google Scholar]
  • 4.Beyer HG, Melkozerov A. The dynamics of self-adaptive multi-recombinant evolution strategies on the general ellipsoid model. IEEE Trans Evol Comput. 2014;18(5):764–778. doi: 10.1109/TEVC.2013.2283968. [DOI] [Google Scholar]
  • 5.Beyer HG, Sendhoff B. Simplify your covariance matrix adaptation evolution strategy. IEEE Trans Evol Comput. 2017;21(5):746–759. doi: 10.1109/TEVC.2017.2680320. [DOI] [Google Scholar]
  • 6.Glasmachers T, Schaul T, Sun Y, Wierstra D, Schmidhuber J. Exponential natural evolution strategies. In: Branke J, et al., editors. GECCO 2010: Proceedings of the Genetic and Evolutionary Computation Conference; New York. 2010. pp. 393–400. [Google Scholar]
  • 7.Hansen N, Kern S. In: PPSN 2004 LNCS. Yao X, et al., editors. Vol. 3242. Springer; Heidelberg: 2004. Evaluating the CMA evolution strategy on multimodal test functions; pp. 282–291. [DOI] [Google Scholar]
  • 8.Hansen N, Müller S, Koumoutsakos P. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES) Evol Comput. 2003;11(1):1–18. doi: 10.1162/106365603321828970. [DOI] [PubMed] [Google Scholar]
  • 9.Mobahi H, Fisher J. A theoretical analysis of optimization by Gaussian continuation. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence; AAAI Press; 2015. pp. 1205–1211. [Google Scholar]
  • 10.Müller N, Glasmachers T. Foundations of Genetic Algorithms. Vol. 16. ACM; 2021. Non-local optimization: imposing structure on optimization problems by relaxation; pp. 1–10. [DOI] [Google Scholar]
  • 11.Ollivier Y, Arnold L, Auger A, Hansen N. Information-geometric optimization algorithms: a unifying picture via invariance principles. J Mach Learn Res. 2017;18(18):1–65. [Google Scholar]
  • 12.Rechenberg I. Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Frommann-Holzboog Verlag; Stuttgart: 1973. [Google Scholar]
  • 13.Schwefel HP. Numerical Optimization of Computer Models. Wiley; Chichester: 1981. [Google Scholar]
  • 14.Zhang J, Bi S, Zhang G. A directional Gaussian smoothing optimization method for computational inverse design in nanophotonics. Mater Des. 2021;197:109213. doi: 10.1016/j.matdes.2020.109213. [DOI] [Google Scholar]

RESOURCES