Abstract
Evolutionary polynomial regression (EPR) is a data mining tool that has been widely used in solving various geotechnical engineering problems. The fitness function is the core of EPR. However, overfitting may still occur in EPR, and this issue may cause the testing dataset not to perform effectively. Improvement of the EPR fitness function through L1 and L2 regularization methods is critical to avoid overfitting and enhance good generalization. First, the appropriate values of the regularization parameter λ of the L1 regularization method (L1RM) and L2 regularization method (L2RM) are determined by comparing the test data sets. Then, the EPR with a classical fitness function is compared with that of L1 or L2 regularization methods to evaluate their abilities in developing regression and producing accurate predictions. The results show that the fitness function combined with the regularization method could improve the EPR. However, L1RM performs better in prediction than L2RM. Improvement of EPR using L1RM could solve problems associated with construction constitutive models or could be used for prediction in geotechnical engineering.
1. Introduction
Evolutionary polynomial regression (EPR) is a numerical regression method that was first developed in 2006 [1]. Recently, EPR has been successful in modeling nonlinear complex problems. As presented in a study [2], the genetic algorithm (GA) [3] is used to construct the general functional structure from elementary functions in EPR. Additionally, the parameter aj is estimated by the least squares method (LSM). In 2009, Javadi and Rezania [4, 5] were the first to use EPR to solve problems associated with the geomaterial constitutive model. Ahangar-Asr et al. [6] presented a new approach using EPR in 2011 for predicting permeability, maximum dry density, and optimum moisture content as functions of some physical soil properties. Faramarzi et al. [7] developed material models for soils based on EPR, in which volumetric behavior was taken into account. Javadi et al. [8] proposed a new approach to implementing the finite element method using EPR for modeling soil behavior under cyclic loading. Cuisinier et al. [9] analyzed the impact of high-pH fluid on the mechanical behavior of backfill material for deep galleries via the EPR model.
The EPR model is used to construct the geomaterial constitutive models. It is also used as a regression model to find the best model to fit the observed data and solve geotechnical problems. Ahangar-Asr et al. [10] analyzed the stability of soil and rock slope with EPR, and the results showed that the proposed approach was robust in modeling the slope behavior and in providing a unified approach for the analysis of the slope stability problems. Doglioni et al. [11] introduced the EPRMODA to model the relationship between the expected displacements, the past measured values of displacement, and past cumulative rainfall values. Wu et al. [12] combined the EPR and real coded genetic algorithm to correlate the compression index with the physical properties of the clay. Yin et al. [13] proposed a new approach for evaluating the compressibility of remodeled clay using the EPR and optimization methods. Rezania et al. [14] introduced a new approach using EPR for determining the liquefaction potential of sands. Rezania et al. [15] developed another model based on EPR for evaluating the potential of liquefaction and lateral spreading. Balf et al. used EPR in one-dimensional water quality models to improve the simulation of solute concentration in natural rivers [16]. EPR is also effective for geotechnics applications for predicting soil saturated water content [17], evaluating pile-bearing capacity [18, 19], and conducting the description of soil behaviors [20]. Thus, EPR has proved to be more robust in solving geotechnical problems than soft computing methods, such as artificial neural networks (ANNs) [21, 22], or genetic programming (GP) [23].
However, in most current EPR models, fitness functions suffer from training data errors, such as the sum of squared errors (SSE) or the coefficient of determination. As a result, the proposed models are usually overfitting and weak in terms of generalization ability [24]. Thus, the EPR technique must be improved to avoid overfitting and to achieve good generalization ability. Nikolaev and Iba [25] presented a regularization approach for inductive genetic programming, tuned for learning polynomials. Montana et al. [26] presented a regularization method for controlling program complexity in linear genetic programming, tuned for transcendental elementary functions. Their goal was to improve the performance of evolutionary methods when solving symbolic regression tasks. Jin et al. [24] proposed a novel EPR procedure using a newly enhanced differential evolution (DE) algorithm. Maracco et al. [27] presented a robust and well-conditioned EPR technique to remove potential outliers, leverage points, and any biased data set. Roozbeh, Babaie–Kafaki, and Aminifard [28, 29] used different techniques to improve the modern regression method.
This study proposes an improved EPR method to enhance the fitness function through the regularization method. First, we determine the appropriate value of the regularization parameter (λ) of the L1 regularization method (L1RM) and L2 regularization method (L2RM) by comparing the test data. Then, we compare EPR with the classical fitness function using a combination of L1RM and L2RM to test their regression and prediction performances. Finally, we compare and analyze the EPR method and enhanced fitness function to determine their level of improvement.
2. Classical EPR and regularization methods
2.1 Overview of classical EPR methods
EPR is a numerical regression method based on evolutionary theory aimed at identifying the best polynomial expressions between input and output data. The classical form of the EPR is presented in (1) [1]:
| (1) |
where YN×1(θ,Z) is the least squares estimation of the vector N target values; θd×1 is the vector of d = m + 1 parameters aj; j = 1:m, m is the number of terms, and a0; ZN×d is a matrix formed by I, a unitary column vector for bias a0, and m vectors of variables Zj. A fixed j is a product of the independent predictor vectors of variables/inputs. X = [X1 X2 ⋯ Xk], where k is the dimension of the independent variables.
The matrix of inputs X is given as follows:
| (2) |
where the k-th column X denotes the candidate variables for the j-th term in (1). Therefore, the j-th term in (1) is written as:
| (3) |
where Zj is the j-th column vector whose elements are products of candidate-independent inputs, and ES is a matrix of exponents. The (1) is expanded as follows:
| (4) |
The key idea of the EPR is to start from (1) and use GA to search for the best form of the function first, i.e., a combination of vectors of independent variables (inputs), ZN×d. The next step is to determine the adjustable parameters θ (vector of aj) for each combination of inputs using the least squares regression model. When GA is searching, Giustolisi and Savic [1] propose the penalization of complexity (PCS) fitness function for the trade-off between the quality of the SSE fit and the model complexity (number of input combinations), as shown below:
| (5) |
with
| (6) |
where Nd = k • m is the maximum number of inputs that can be considered, px is the actual number of inputs selected by the GA, and a is an adjustable exponent that controls the degree of pressure to control complexity. Moreover, is the prediction value computed by the polynomial expression and obtained by EPR; yi is the target value of the training data set. The complexity of the regression model is reduced to control the total number of inputs in (5). However, the risk associated with model overfitting must still be studied.
2.2 Regularization method
The regularization/penalty functions are common practice in machine learning and are generally applied in GA [30]. L1 regularization and L2 regularization are some of the regularization techniques mostly used to address over-fitting and feature selection and create a less complex (parsimonious) model when having numerous features in a dataset.
L1 regularization, also known as the L1 norm or Lasso regression, adds the absolute value of the magnitude of the coefficient as a penalty term to the loss function [31] (like the fitness function in GA) as revealed below:
| (7) |
where is the predicted value, yi is the actual value, λ is the regularization parameter, and θi is the weight/coefficient of the model term.
L2 regularization, also known as the L2 norm or Ridge regression, combats overfitting by forcing weights/coefficient values to be small. However, this does not make them exactly 0 because it is vital to add the sum of squares of all of the feature weights as a penalty term to the loss function:
| (8) |
The penalty term is the key difference between the L1 and L2 two regularization methods. L1 regularization penalizes the sum of absolute values of the coefficients, whereas L2 regularization penalizes the sum of squares of the coefficients. As a result, L1 regularization shrinks the less important feature’s coefficient to 0 and removes some features altogether. In other words, L1 regularization works effectively for feature selection in cases for many features. However, L2 regularization does not perform feature selection since coefficients are only reduced to values near 0. Nevertheless, the robustness and generalization ability of the regression model is improved with the L1 or L2 regularization.
The value of the regularization parameter λ is vital to the regularization method. As shown in (7) and (8), if λ is 0, the loss function is the mean squared error (MSE). If the λ has a large value, the coefficient will be 0, and the regression model will be underfitting. The value of the λ is determined first through comparison and discussion.
3. EPR improved by regularization methods
3.1. New fitness function combined with L1RM and L2RM
The fitness function is the core of EPR because it helps GA find the best exponent matrix that determines the structure of the polynomial regression model. When the structure of the polynomial regression model is determined through the LSM, the regression model is obtained uniquely. Therefore, the performance of an EPR depends on the fitness function, as shown in (5). Thus, the classical fitness function (PCS) of EPR uses ‘(Nd − px + 1)’ to penalize the complexity of the regression model by reducing the actual number of inputs (the actual number of polynomial terms). Differently, fitness functions of improved EPR combined with L1RM or L2RM penalize the complexity of the regression model by reducing the values of the coefficients to address over-fitting (as mentioned in Subsection 2.2). The two new fitness functions combined with L1RM and L2RM are expressed as follows:
| (9) |
| (10) |
where N is the number of data points and SSE is the sum of squared errors calculated by (6). Then, λ is the regularization parameter and θi is the coefficient of the regression model.
When EPR is working, a random exponent matrix is first generated. The experimental EPR model is obtained through LSM. Then, the fitness function is calculated to determine whether the EPR model is optimal. In this process, the goal is to make the error of the test model as small as possible and to accurately predict new samples. However, this can still cause overfitting problems. A generalization method such as L1RM can solve this problem; L1RM controls overfitting through a sparse coefficient vector, and it adds the sum of the absolute values of the coefficient vector to the fitness function.
3.2 Discussion of regularization parameters
The regularization parameter λ is vital to the regularization method; this is discussed first, and a 100-testing dataset (dataset 1) was generated by (11) [1]:
| (11) |
where Y plays the role of outputs; X1, X2, and X3 are three random variables (uniformly distributed random numbers in the range [0,1]), which play the role of inputs. Moreover, N(0, σ) is normally distributed random numbers having zero mean and standard deviation σ, which play the role of the error term. Thus, the standard deviation σ is equal to 0.5. Some statistical values for Y and Zi (Z1 = X1/X2, Z2 = X2/X3, Z3 = X3/X1) are shown in Table 1. Outputs Y with errors and the values of errors are shown in Figs 1 and 2.
Table 1. Statistics of output Y and terms Zi (dataset 1).
| Y | Z1 | Z2 | Z3 | |
|---|---|---|---|---|
| Maximum | 97.530 | 19.783 | 86.821 | 30.689 |
| Minimum | 13.012 | 0.028 | 0.044 | 0.017 |
| Mean | 18.389 | 2.032 | 3.220 | 3.137 |
| Variance | 104.210 | 10.492 | 88.134 | 35.538 |
Fig 1. Outputs Y with errors in dataset 1.
Fig 2. Values of errors in dataset 1.
The value of the exponent was constrained to integers in the range [-2, 2]; the number of terms was constrained to [2, 8] for the generalization parameter. For GA, the population size was set to 80; the crossover probability rate was set to 0.4, and the mutation probability rate was set to 0.1. The coefficient of determination (R2) was selected to evaluate the performance of the obtained model. Higher R2 values are closer to 1, showing that the performance is better.
A series of calculations using different values of λ (i.e., λ = 0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10, 100) is conducted to evaluate how λ affects the EPR when combined with L1RM and L2RM. The values of λ were determined according to the mean value of output (Y), i.e., the max value was 10 times the order of magnitude of Y, and the min value was 10−7 times the order of magnitude of Y. Figs 3 and 4 show that the results (R2) of different λ of L1RM and L2RM ranges from 3 to 8 when some terms (represented by “m” in the figures).
Fig 3. Comparison of EPR results (R2) for different values of λ to L1RM (dataset 1).
(a) m = 3, (b) m = 4, (c) m = 5, (d) m = 6, (e) m = 7 and (f) m = 8.
Fig 4. Comparison of EPR results (R2) for different values of λ to L2RM (dataset 1).
(a) m = 3, (b) m = 4, (c) m = 5, (d) m = 6, (e) m = 7 and (f) m = 8.
As shown in Figs 3 and 4, R2 increases. In contrast, the λ of L1RM and L2RM decreases from 100 to 0.000001. In most cases, when λ decreases to 0.1, R2 will reach the top and remain at the stable maximum value. The appropriate value of λ is that the R2 premises are large (close to 1), revealing that the trade-off between performance and generalization is obtained. Moreover, the performance of EPR does not improve significantly with an increase in the number of terms because of the simple data structure and small errors. Next, a more complex dataset was created to simulate the elastic constitutive model of geotechnical engineering and to test the EPR performance with the regularization method.
3.3. Performance of EPR with L1RM and L2RM
3.3.1. Determination of the appropriate value of λ
In this section, a 100-testing dataset (dataset 2) was generated by the elastic constitutive Eq (12) of geotechnical engineering. The values of the parameters and variables in (12) were randomly generated in the specified intervals, similar to the engineering values, to test the performance of EPR with L1RM and L2RM, as shown in Table 2. Thus, the appropriate value of λ to the elastic dataset must be determined because of the different orders of magnitude and the complexity. The value of λ ranges from 1 to 1*108, intervals of ten times, according to the determination rules in Subsection 3.2. Other EPR parameters are the same as those in Subsection 3.2. Therefore, the coefficient of determination (R2) was selected to evaluate the performance of the obtained model. Larger errors were used as shown in the statistics in Figs 5 and 6.
| (12) |
where σx is the stress in the x-axis direction; E is the elastic modulus; εx and εy is the strain in the x-axis direction and y-axis direction, respectively; μ is Poisson’s ratio; and N(0, σ) is normally distributed random numbers having zero mean and standard deviation σ, which plays the role of the error term.
Table 2. Statistics of the parameters and variables in dataset 2.
| σ x | E | ε x | ε y | μ | |
|---|---|---|---|---|---|
| (*107 N/m2) | (*108 N/m2) | ||||
| Maximum | 3.679 | 3.988 | 0.0995 | 0.0097 | 0.396 |
| Minimum | 0.0329 | 2.004 | 0.00009 | 0.00018 | 0.200 |
| Mean | 1.978 | 3.002 | 0.043 | 0.0054 | 0.306 |
| Variance | 1.137*107 | 3.580*107 | 0.00077 | 0.000008 | 0.0034 |
Fig 5. Outputs σx with errors in dataset 2.
Fig 6. Values of errors in dataset 2.
Figs 7 and 8 compare the EPR results (R2) of different values in λ with L1RM and L2RM, respectively. Since the effect of the error was larger, the regression results (R2) were volatile. Moreover, the change of λ in L1RM and L2RM was irregular. The rules of finding the appropriate λ are still the same, i.e., the R2 premise is large (closer to 1); λ is also as large as possible. As shown in Fig 7, when λ is equal to 100, the R2 is always at a high level, revealing that the appropriate value of λ to L1RM is 100. As shown in Fig 8, when λ is equal to 105, the R2 is always at a high level, and the appropriate value of λ to L2RM is 105. When λ was equal to 0, the results (R2) of EPR (whose fitness function was only MSE, without regularization method) were listed and compared with that of the regularization method having an appropriate λ (Table 3). When m is less than 5, the results (R2) of EPR with a regularization method having an appropriate λ are very close to those whose λ is equal to 0 (Table 3); some are even better. When m is ≥5, the differences in results (R2) become slightly larger, but the largest relative difference is 2.5%, which is still small enough.
Fig 7. Comparison of EPR results (R2) in different values of λ to L1RM (dataset 2).
(a) m = 3, (b) m = 4, (c) m = 5, (d) m = 6, (e) m = 7 and (f) m = 8.
Fig 8. Comparison of EPR results (R2) in different values of λ to L2RM (dataset 2).
(a) m = 3, (b) m = 4, (c) m = 5, (d) m = 6, (e) m = 7 and (f) m = 8.
Table 3. Comparisons of R2 with different λ.
| M (number of terms) | R2 of EPR without regularization (λ = 0) | R2 of EPR with L1RM having an appropriate λ | R2 of EPR with L2RM having an appropriate λ |
|---|---|---|---|
| 3 | 0.82554 | 0.82349 | 0.82344 |
| 4 | 0.83563 | 0.84128 | 0.83677 |
| 5 | 0.85499 | 0.84266 | 0.84805 |
| 6 | 0.8602 | 0.84605 | 0.84414 |
| 7 | 0.86402 | 0.85212 | 0.84225 |
| 8 | 0.8631 | 0.85092 | 0.85353 |
3.3.2 Performance of EPR with L1RM and L2RM
After the appropriate regularization parameters (λ) of the two regularization methods were determined, the performance of EPR with L1RM and L2RM was discussed and compared with classical EPR with the fitness function of PCS (5). Using (12), two different sizes of datasets were built, i.e., a 30-dataset (dataset 3) with 20 for regression and 10 for prediction. Thus, a 100-dataset (dataset 4) was used, in which 70 was for regression and 30 for prediction. All of the parameters and variables in the two datasets were randomly generated in the specified intervals, similar to the engineering values. Then, three indicators were used to evaluate the performance of the three EPR models obtained by three different fitness functions (the classical fitness function (PCS), the two improved fitness functions with L1RM and L2RM), i.e., coefficient of determination (R2), root mean square error (RMSE) and mean absolute error (MAE):
| (13) |
| (14) |
| (15) |
where N is the number of data points, Y is the vector of outputs (σx) in (12), and Yp is the vector of predicted values. The higher R2 or the lower RMSE and MAE show that the performance is better.
(a) A 30-dataset (dataset 3: used for simulating the less data situation): 20 for the regression test and 10 for the prediction test were discussed. The statistics of dataset 3 are shown in Table 4, Figs 9 and 10.
Table 4. Statistics of the parameters and variables in dataset 3.
| σ x | E | ε x | ε y | μ | |
|---|---|---|---|---|---|
| (*107 N/m2) | (*108 N/m2) | ||||
| Maximum | 4.310 | 3.959 | 0.097 | 0.0097 | 0.388 |
| Minimum | 0.304 | 2.127 | 0.007 | 0.00035 | 0.202 |
| Mean | 2.245 | 3.189 | 0.058 | 0.0048 | 0.301 |
| Variance | 1.384*107 | 2.740*107 | 0.00082 | 0.000008 | 0.0028 |
Fig 9. Outputs σx with errors in dataset 3.
Fig 10. Values of errors in dataset 3.
R2 is used to evaluate the regression performance of the three EPR models with different fitness functions: a classical fitness function (PCS) and two improved fitness functions L1RM and L2RM shown in Figs 11–13. R2 (calculated by EPR with the classical fitness function (PCS)) increases from 0.81145 to 0.89845 as the number of terms increases from 3 to 8 (Fig 11). Additionally, R2 (calculated by EPR and the improved fitness functions with L1RM) increases from 0.85774 to 0.95967 as the number of terms increases from 3 to 8 (Fig 12). Moreover, R2 (calculated by EPR with the improved fitness functions with L2RM) increases from 0.83613 to 0.93808 as the number of terms increases from 3 to 8 (Fig 13). The higher R2 demonstrates a better performance, revealing that the regression performance of EPR with L1RM is the best, and that of EPR with L2RM is better than the classical EPR, whose fitness function is PCS. After the regression performance is compared, the prediction performance (generalization ability) is discussed (Table 5) when the number of terms is 6 (m = 6).
Fig 11. R2 calculated by EPR with PCS in dataset 3.
(a) m = 3, (b) m = 4, (c) m = 5, (d) m = 6, (e) m = 7 and (f) m = 8.
Fig 13. R2 calculated by EPR with L2RM in dataset 3.
(a) m = 3, (b) m = 4, (c) m = 5, (d) m = 6, (e) m = 7 and (f) m = 8.
Fig 12. R2 calculated by EPR with L1RM in dataset 3.
(a) m = 3, (b) m = 4, (c) m = 5, (d) m = 6, (e) m = 7 and (f) m = 8.
Table 5. Prediction performance of 3 EPR models in dataset 3.
| EPR with PCS | EPR with L1RM | EPR with L2RM | |
|---|---|---|---|
| EPR models | |||
| R2 | 0.24437 | 0.34889 | 0.46732 |
| RMSE | 9186821.38 | 8527800.09 | 7713323.60 |
| MAE | 6116553.31 | 7646176.93 | 5647908.66 |
Table 5 reveals the prediction performances (generalization ability) of the three EPR models based on three indicators. EPR with L2RM ranks first and offers the best performance. EPR with L1RM ranks second, and EPR with PCS is third. The complexities of the three EPR models are different; the EPR model with PCS has lower variables (the value of E, μ, εx, εy is 10), and EPR models with L1RM or with L2RM have more variables (the value of E, μ, εx, εy for EPR models with L1RM and L2RM is 19 and 17, respectively). The EPR model with L1RM or with L2RM has smaller values of coefficients than the EPR model with PCS. The performance of 3 EPR models is different because PCS punishes the number of terms, and L1RM or L2RM punishes the values of coefficients. Thus, the data variability is high because the amount of data is small. Meanwhile, the R2 for regression is very different from R2 for prediction.
(b) We discussed a 100-dataset (dataset 4: used for simulating the normal amount of data): 70 for the regression test and 30 for the prediction test. The statistics of dataset 4 are shown in Table 6, Figs 14 and 15.
Table 6. Statistics of the parameters and variables in dataset 4.
| σ x | E | ε x | ε y | μ | |
|---|---|---|---|---|---|
| (*107 N/m2) | (*108 N/m2) | ||||
| Maximum | 5.042 | 3.985 | 0.097 | 0.001 | 0.398 |
| Minimum | -0.770 | 2.035 | 0.002 | 0.00003 | 0.206 |
| Mean | 1.591 | 2.966 | 0.049 | 0.0054 | 0.302 |
| Variance | 1.219*107 | 2.760*107 | 0.00083 | 0.000008 | 0.0033 |
Fig 14. Outputs σx with errors in dataset 4.
Fig 15. Values of errors in dataset 4.
R2 was used to evaluate the regression performance of the three EPR models (with different fitness functions). The classical fitness function (PCS) and the two improved fitness functions with L1RM and L2RM) are shown in Figs 16–18. As the number of terms increases from 3 to 8, R2 (calculated by EPR with the classical fitness function (PCS)) increases from 0.70421 to 0.74111(Fig 16). R2 (calculated by EPR with the improved fitness functions with L1RM) increases from 0.74354 to 0.78099 as the number of terms increases from 3 to 8 (Fig 17). R2 (calculated by EPR with improved fitness functions with L2RM) increases from 0.73808 to 0.78509, as the number of terms increases from 3 to 8 (Fig 18). Thus, the regression performance of the EPR with L1RM and that of EPR with L2RM is close. Both models perform better than the classical EPR (whose fitness function is PCS). After comparing the regression performance, the prediction performance (generalization ability) is discussed (Table 7) when the number of terms is 6 (m = 6).
Fig 16. R2 calculated by EPR with PCS in dataset 4.
(a) m = 3, (b) m = 4, (c) m = 5, (d) m = 6, (e) m = 7 and (f) m = 8.
Fig 18. R2 calculated by EPR with L2RM in dataset 4.
(a) m = 3, (b) m = 4, (c) m = 5, (d) m = 6, (e) m = 7 and (f) m = 8.
Fig 17. R2 calculated by EPR with L1RM in dataset 4.
(a) m = 3, (b) m = 4, (c) m = 5, (d) m = 6, (e) m = 7 and (f) m = 8.
Table 7. Prediction performance of 3 EPR models in dataset 4.
| EPR with PCS | EPR with L1RM | EPR with L2RM | |
|---|---|---|---|
| EPR models | |||
| R2 | 0.76728 | 0.81959 | 0.8087 |
| RMSE | 5901792.48 | 5196360.01 | 5350863.83 |
| MAE | 4880189.75 | 4329662.19 | 4456582.20 |
As shown in Table 7, the prediction performances (generalization ability) of the three EPR models presented three indicators. EPR with L1RM offers the best performance; EPR with L2RM ranks second. However, its performance is close to EPR with L1RM. EPR with PCS is the third. R2 calculated by EPR with L1RM is 6.4%, which is higher than that of the EPR with PCS. Moreover, the RMSE of EPR with L1RM is 12.0% lower than that of EPR with PCS, and the MAE of EPR with L1RM is 11.3%, which is lower than that of EPR with PCS. The complexities of the three EPR models are still different. The EPR model with PCS has fewer variables (the value of E, μ, εx, εy is 6); the EPR models with L1RM or L2RM have more variables (the value of E, μ, εx, εy is 18 and 19 for L1RM and L2RM, respectively). Note that the EPR model obtained by EPR with PCS does not have the term of εy. Although the results are relatively accurate, they miss the data of variables εy; such a model is vulnerable. R2 for regression is close to R2 for prediction because the amount of data is normal.
4. Conclusions
In this study, EPR significantly improved using L1RM and L2RM regularization methods. We built testing datasets that were used for simulating geotechnical engineering data. Appropriate values of generalization parameters (λ) in L1RM and L2RM were first studied and determined. Then, the regression ability and generalization ability of EPR with both regularization methods were tested and compared with that of EPR with PCS in various conditions, i.e., the less data situation and normal data situation.
By punishing the number of terms, EPR with PCS (the classic EPR) controls the complexity of its model. Moreover, the EPR with L1RM or L2RM controls the complexity of its model by punishing the coefficient values of the model. Therefore, the models obtained by EPR with PCS had fewer terms and were more concise. However, they lost independent variables making them suffer from potential dangers. Models obtained by EPR with L1RM or L2RM had more terms and lower coefficient values. Although they seemed to be more complex, they contained all the independent variables. Furthermore, lower coefficient values could promote the generalization ability of EPR models and help avoid overfitting.
When dealing with the less data situation, EPR with L1RM or L2RM had better regression and prediction abilities (generalization ability) than EPR with PCS. When dealing with the normal data situation, EPR with L1RM had the best regression ability and prediction ability. EPR with L2RM was still better than EPR with PCS in both regression and prediction abilities. In summary, both EPR with L1RM and L2RM perform better than EPR with PCS in regression abilities and generalization ability, and EPR with L1RM performs a little better than EPR with L2RM.
Thus, both the regression ability and generalization ability of EPR are promoted by introducing regularization methods (L1RM and L2RM). Moreover, the vulnerability of losing some independent variables of the classic EPR is eliminated, so the EPR is improved through regularization methods.
Data Availability
All relevant data are within the paper.
Funding Statement
The author(s) received no specific funding for this work.
References
- 1.Giustolisi O, Savic DA. A symbolic data-driven technique based on evolutionary polynomial regression. Journal of Hydroinformatics. 2006;8(3):207–22. [Google Scholar]
- 2.Gao W. A comprehensive review on identification of the geomaterial constitutive model using the computational intelligence method. Advanced Engineering Informatics. 2018;38:420–40. doi: 10.1016/j.aei.2018.08.021 [DOI] [Google Scholar]
- 3.Holland JH. Genetic Algorithms and Adaptation. In: Selfridge OG, Rissland EL, Arbib MA, editors. Adaptive Control of Ill-Defined Systems. Boston, MA: Springer US; 1984. p. 317–33. [Google Scholar]
- 4.Javadi A, Rezania M. Applications of artificial intelligence and data mining techniques in soil modeling. Geomechanics Engineering 2009;1(1):53–74. [Google Scholar]
- 5.Javadi AA, Rezania M. Intelligent finite element method: An evolutionary approach to constitutive modeling. Advanced Engineering Informatics. 2009;23(4):442–51. doi: 10.1016/j.aei.2009.06.008 [DOI] [Google Scholar]
- 6.Ahangar-Asr A, Faramarzi A, Mottaghifard N, Javadi AA. Modeling of permeability and compaction characteristics of soils using evolutionary polynomial regression. Computers & Geosciences. 2011;37(11):1860–9. [Google Scholar]
- 7.Faramarzi A, Javadi AA, Alani AM. EPR-based material modelling of soils considering volume changes. Computers & Geosciences. 2012;48:73–85. [Google Scholar]
- 8.Javadi AA, Ahangar-Asr A, Johari A, Faramarzi A, Toll D. Modelling stress–strain and volume change behaviour of unsaturated soils using an evolutionary based data mining technique, an incremental approach. Engineering Applications of Artificial Intelligence. 2012;25(5):926–33. [Google Scholar]
- 9.Cuisinier O, Javadi AA, Ahangar-Asr A, Masrouri F. Identification of coupling parameters between shear strength behaviour of compacted soils and chemical’s effects with an evolutionary-based data mining technique. Computers and Geotechnics. 2013;48:107–16. [Google Scholar]
- 10.Ahangar-Asr A, Faramarzi A, Javadi AA. A new approach for prediction of the stability of soil and rock slopes. Engineering Computations. 2010;27(7):878–93. [Google Scholar]
- 11.Doglioni A, Crosta GB, Frattini P, Melidoro NL, Simeone V, editors. Predicting Landslide Displacements by Multi-objective Evolutionary Polynomial Regression. Engineering Geology for Society and Territory—Volume 5; 2015. 2015//; Cham: Springer International Publishing. [Google Scholar]
- 12.Wu Z-x, Ji H, Yu C, Zhou C. EPR-RCGA-based modelling of compression index and RMSE-AIC-BIC-based model selection for Chinese marine clays and their engineering application. Journal of Zhejiang University-SCIENCE A. 2018;19(3):211–24. [Google Scholar]
- 13.Yin Z-Y, Jin Y-F, Huang H-W, Shen S-L. Evolutionary polynomial regression based modelling of clay compressibility using an enhanced hybrid real-coded genetic algorithm. Engineering Geology. 2016;210:158–67. [Google Scholar]
- 14.Rezania M, Javadi AA, Giustolisi O. Evaluation of liquefaction potential based on CPT results using evolutionary polynomial regression. Computers and Geotechnics. 2010;37(1):82–92. [Google Scholar]
- 15.Rezania M, Faramarzi A, Javadi AA. An evolutionary based approach for assessment of earthquake-induced soil liquefaction and lateral displacement. Engineering Applications of Artificial Intelligence. 2011;24(1):142–53. [Google Scholar]
- 16.Balf MR, Noori R, Berndtsson R, Ghaemi A, Ghiasi BJJoWSR, Technology-Aqua. Evolutionary polynomial regression approach to predict longitudinal dispersion coefficient in rivers. 2018;67(5):447–57. [Google Scholar]
- 17.Khoshkroudi SS, Sefidkouhi MAG, Ahmadi MZ, Ramezani M. Prediction of soil saturated water content using evolutionary polynomial regression (EPR). Archives of Agronomy and Soil Science. 2014;60(8):1155–72. [Google Scholar]
- 18.Ebrahimian B, Movahed V. Evaluation of axial bearing capacity of piles in sandy soils by CPT results. Evaluation 2013;29:31. [Google Scholar]
- 19.Ebrahimian B, Movahed V. Application of an evolutionary-based approach in evaluating pile bearing capacity using CPT results. Ships and Offshore Structures. 2017;12(7):937–53. [Google Scholar]
- 20.Nassr A, Javadi A, Faramarzi A. Developing constitutive models from EPR-based self-learning finite element analysis. 2018;42(3):401–17. [Google Scholar]
- 21.Kohestani VR, Hassanlourad M. Modeling the Mechanical Behavior of Carbonate Sands Using Artificial Neural Networks and Support Vector Machines. 2016;16(1):04015038. [Google Scholar]
- 22.He S, Li J. Modeling nonlinear elastic behavior of reinforced soil using artificial neural networks. Applied Soft Computing. 2009;9(3):954–61. [Google Scholar]
- 23.Alemdag S, Gurocak Z, Cevik A, Cabalar AF, Gokceoglu C. Modeling deformation modulus of a stratified sedimentary rock mass using neural network, fuzzy inference and genetic programming. Engineering Geology. 2016;203:70–82. doi: 10.1016/j.enggeo.2015.12.002 [DOI] [Google Scholar]
- 24.Jin Y-F, Yin Z-Y, Zhou W-H, Yin J-H, Shao J-F. A single-objective EPR based model for creep index of soft clays considering L2 regularization. Engineering Geology. 2019;248:242–55. [Google Scholar]
- 25.Nikolaev NY, Iba H. Regularization approach to inductive genetic programming. IEEE Transactions on evolutionary computation. 2001;5(4):359–75. [Google Scholar]
- 26.Montaña J, Alonso CL, Borges CE, Tirnauca C. Model-driven regularization approach to linear genetic programminng. 2015. [Google Scholar]
- 27.Marasco S, Fiore A, Greco R, Cimellaro GP, Marano GC. Evolutionary Polynomial Regression Algorithm Enhanced with a Robust Formulation: Application to Shear Strength Prediction of RC Beams without Stirrups. 2021;35(6):04021017. [Google Scholar]
- 28.Roozbeh M, Babaie–Kafaki S, Aminifard ZJJoI, Optimization M. Two penalized mixed–integer nonlinear programming approaches to tackle multicollinearity and outliers effects in linear regression models. 2021;17(6):3475. [Google Scholar]
- 29.Roozbeh M, Babaie-Kafaki S, Aminifard ZJOM, Software. Improved high-dimensional regression models with matrix approximations applied to the comparative case studies with support vector machines. 2022:1–18. [Google Scholar]
- 30.Gupta RK, Bhunia AK, Roy D. A GA based penalty function technique for solving constrained redundancy allocation problem of series system with interval valued reliability of components. J Comput Appl Math. 2009;232:275–84. [Google Scholar]
- 31.Jaiswal S, Mehta A, Nandi GC, editors. Investigation on the Effect of L1 an L2 Regularization on Image Features Extracted Using Restricted Boltzmann Machine. 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS); 2018 14–15 June 2018.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All relevant data are within the paper.


















