Abstract

Recently, production optimization has gained increasing interest in the petroleum industry. The most computationally intensive and critical part of the production optimization process is the evaluation of the production function performed by the numerical reservoir simulator. Employing proxy models as a substitute for the reservoir simulator is proposed for alleviating this high computational cost. In this study, a new approach to construct adaptive proxy models for production optimization problems is proposed. An adaptive difference evolution algorithm (SaDE) optimized least-squares support vector machine (LSSVM) is used as an approximation function, while training is performed using a self-adaptive response surface experimental design (SaRSE). SaDE selects the optimal hyperparameters of LSSVM during the training process to improve the prediction accuracy of the proxy model. Cross-validation methods are used in the recursive training and network evaluation phases. The developed method is used to optimize the production of block gas reservoir models. Computational results confirm that the developed adaptive proxy model outperforms traditional regression methods. It is further verified that when the experimental data are updated, the alternative model still has high prediction accuracy when performing the objective function evaluation. The results show that the proposed proxy modeling approach enhances the entire optimization process by providing a fast approximation of the actual reservoir simulation model with better accuracy.
1. Introduction
The development of onshore oil and gas has made rapid progress over the past decade, primarily due to the development of unconventional resources, particularly shale gas.1−7 Although shale gas extraction has broken through the industrial capacity barrier, it is at the critical point of marginal benefits, making it difficult to achieve beneficial development. Shale gas production optimization is getting increasing attention in the oil industry. Production evaluation is one of the most critical and difficult processes in production optimization under complex geological and engineering conditions. Reasonable and accurate production evaluation is essential for production optimization.
Numerical reservoir simulation technology, developed by reservoir geologists, reservoir engineers, and reservoir numerical simulators, is a physics-driven modeling approach recognized as the standard decision-making approach in the petroleum industry.8 Numerical reservoir simulations can effectively simulate discrete fracture networks, fluid seepage, and key physical effects that could be important in shale reservoirs. Zhang et al.9 applied a porous system with bedrock grid to describe the flow of unsteady gas from bedrock to fracture interval and used numerical simulation software to simulate the effects of parameters such as porosity, permeability, bedrock-fracture coupling parameters, and fracture half-length spacing on shale gas production capacity. Hu et al.10 simulated the sensitivity of production capacity to parameters such as the number of fracture bars, fracture conductivity, and fracture spacing, which provided guidance for designing fracture construction plans for shale gas wells. Numerical simulations with higher accuracy have been frequently used as a powerful tool for engineering design and optimization. However, hundreds or thousands of reservoir simulations need to be run to perform optimization tasks. A single run of a simulated reservoir model consisting of thousands or even millions of grid blocks can take several hours. In addition, the large number of control parameters exacerbates this numerical simulation-based optimization design problem.
Essentially, the proxy model is a data mining tool. Advances in data mining, especially in machine learning, made it possible to create machine learning models that outperformed many existing computer models. Hassani et al.11 used the traditional quadratic, multiplicative, and redial basis function (RBF) models to approximate the cumulative outflow from the reservoir when drilling a new horizontal well, which significantly reduced the computational time. Schuetter et al.12 employed simple regression and other advanced methods such as random forests (RFs), support vector regression (SVR), gradient propulsion machines (GBM), and multidimensional kriging to build prediction models for production indicators. Zhang et al.13 proposed a new multipoint adaptive Gaussian process agent model for design domain (MAGPSM-IDD) and applied it to an integrated field and an actual reservoir, showing superior computational performance. Wang et al.14 employed a deep neural network (DNN) and Sobol global sensitivity analysis to analyze 2919 wells (2780 multistage hydraulically fractured horizontal wells and 139 straight wells) in the Bakken Formation. A thermal coding method was also employed to process the classified data. A reliable DNN model was built by evaluating Xavier initialization, exit techniques, and batch normalization. The results show that the proposed DNN model can integrate directly into existing hydraulic fracturing design programs. Brantson et al.15 applied back-propagation artificial neural network (BPANN), radial basis function neural network (RBNN), and generalized regression neural network (GRNN) as proxy models to predict the historical production decreasing trend of extralow porosity tight gas reservoirs and further validated the effectiveness of the proxy models.
Although many studies have shown the broad application of data mining techniques in various engineering disciplines, its application in petroleum engineering is still in its infancy. Moreover, the performance of production forecasting in unconventional shale reservoirs is not impressive.16−18 An important reason is that reservoir properties, including permeability, porosity, gas adsorption, gas saturation distribution, and reservoir pressure, as well as fracture stimulation design, will have a severe impact on shale gas production. More significantly, the prediction performance of machine learning algorithms is influenced a lot by their hyperparameters, such as learning rate, initial weights and thresholds, activation function, number of neurons, and the number of layers of a multilayer neural network, as well as the characteristics of the training samples. Hyperparameter tuning or optimization is the significant process of automatically testing different configurations for training a machine learning model. Prabusankarlal et al.19 combined the adaptive difference evolution algorithm (SaDE) with extreme learning machine (ELM), where the SaDE algorithm optimizes the hidden node learning parameters of ELM to improve the model prediction accuracy and generalization performance significantly.
We used a self-adaptive response surface experimental design as the data set. Response surface design is a statistical method that uses reasonable experimental design methods and obtains specific data through experiments. Using multiple quadratic regression equations to fit the functional relationship between factors and response values, we seek optimal process parameters by analyzing the regression equations to solve multivariate problems.20−22 If the training sample is limited, then the proxy model will be challenging to extract discriminative features automatically and guarantee regression accuracy. As the number of training samples increases, all of these estimates approach the true regression accuracy, which is the accuracy of a proxy designed with the full knowledge of the actual situation. This study will use a self-adaptive response surface design scheme, which generates training samples with specified accuracy. This adaptive accuracy model ensures that the training samples have actual reservoir characteristics, thus avoiding unsatisfactory learning results due to errors in the training samples. In addition, we introduce the least-squares support vector machine (LSSVM) as a proxy model. The LSSVM is functionalized by SaDE to optimally select the unknown parameters of the LSSVM and compare the results with other optimization algorithms (genetic algorithm (GA), particle swarm algorithm (PSO), gray wolf algorithm (GWO)) to generate a reasonable production proxy model. Our approach efficiently (in a short time) automatically searches for the best machine learning algorithm and hyperparameter values for a given machine learning problem.
2. Constructing the Proxy Model
2.1. Methodology of Least-Squares Support Vector Machines
The support vector machine (SVM) is a typical machine learning method based on statistical theory. It is effective in solving problems with a few samples, nonlinearity, and high dimensionality. With the advancement of the research, SVM is being widely used for complex nonlinear modeling problems in various fields. The main disadvantage of SVMs is the dimensional disaster problem. LSSVM is an SVM-based improvement type proposed by Suykens23 in 1999. The higher computational load of SVM is overcome by LSSVM, which uses a set of linear equations to solve the problem.
The basic principle of LSSVM is to classify the hyperplane that can correctly partition the data set and has the largest geometric interval for a given data set. As shown in Figure 1, ω·x + b = 0 is the classification hyperplane. For a linearly divisible data set, there are infinitely many such hyperplanes, but the classification hyperplane with the largest geometric interval is the only one.
Figure 1.

Schematic diagram of basic principles of LSSVM.
This study utilizes the regression form of LSSVM as the proxy model. LSSVM can transform linearly inseparable data into linearly separable data using kernel functions to map them to a high-dimensional space. The function of the kernel function in LSSVM is shown in Figure 2.
Figure 2.

Function of the kernel function in LSSVM.
For the training sample set S = {(xi, ···, yi)}i=1n with size n, consisting of the input value xi ∈ R and the corresponding output values yi ∈ R, its LSSVM regression model is determined as follows
| 1 |
where ω is the weight vector, b is the offset, and the nonlinear transformation φ(x) is the mapping from the low-dimensional space to the high-dimensional space. According to the principle of structural risk minimization, in the LSSVM, the squared error loss function in the optimization targets is selected so that the regression problem is transformed into a quadratic optimization problem as follows
| 2 |
where ei is the slack variable and γ is the penalty. The constraint condition is as follows
| 3 |
To solve the optimization problem, the Lagrange function was introduced
| 4 |
where αi represents the Lagrange multiplier. The following formula could be obtained based on the Karush–Kuhn–Tucker (KKT) optimization conditions
| 5 |
where K(xi,xk) refers to the kernel function. In this study, the radial basis function was selected as the kernel function of the LSSVM regression model for its excellent generalization ability and wide-range convergence domain. Its expression is as follows
| 6 |
where σ refers to the kernel function width; γ and σ have a significant degree of impact on the learning and generalization ability of LSSVM models. Therefore, optimization algorithms (e.g., PSO, GA, SaDE, GWO) are generally applied to select the hyperparameters γ and σ of the LSSVM model comprehensively and optimally.
The quadratic optimization problem can be transformed into the problem of linear equation
![]() |
7 |
The ultimate LSSVM regression function was built as follows
| 8 |
2.2. Methodology of Parameter Optimization of the LSSVM
Differential evolutionary (DE) algorithm is an excellent swarm intelligence optimization algorithm after genetic algorithm (GA) and particle swarm algorithm (PSO). The DE algorithm has been extensively applied because of its simple structure, few control parameters, easy implementation using accurate number coding, and fast convergence speed.
2.2.1. Differential Evolution Algorithm
The basic procedures of the DE algorithm include initialization, mutation, crossover, and selection. Individuals are randomly generated in the search space by initialization; then, mutation and crossover generate new individuals and eventually select which individual will enter the next generation. This process will continue to repeat until its termination condition is reached.
2.2.1.1. Initialization
For minimizing
the objective function
concerning the parameter vector x ∈ RD, the population containing NP parameter vectors evolves toward the
global minimum. The ith individual is as follows
| 9 |
where i = 1, 2, ···, NP. G refers to the number of evolutionary generations.
Initialize to generate a set of NP parameter vectors xi,G to cover the parameter space as much as possible
| 10 |
where xmin = [xmin1, xmin, ···, xminD] and xmax = [xmax, xmax2, ···, xmax] refer to the maximum and minimum parameters boundary, respectively.
2.2.1.2. Mutation
The existing vectors are converted into new mutant vectors by a self-organization scheme that takes the difference vectors of randomly selected population vectors. For each parameter vector xi,G of the current generation, a mutation vector Vi,G is generated by some mutation strategy. Several popular mutation strategies (MT) are listed as followsMT1: DE/rand/1
| 11 |
MT2: DE/rand/2
| 12 |
MT3: DE/rand-to-best/2
| 13 |
MT4: DE/current-to-rand/1
| 14 |
where F refers to the positive amplification factor, which controls the scaling of the difference vector and is usually chosen in the range 0 ≤ F ≤ 2. K refers to the control parameter, which is randomly generated in the range 0 ≤ K ≤ 1. Indices r are r1i, r2, r3i, r4, and r5i mutually exclusive integers and are randomly generated within the range: [1, 2, ···, NP].
Among the different vector generation strategies mentioned above, MT1 has a strong exploration capability and is suitable for solving multimodal problems. However, this strategy usually exhibits a slow convergence rate. The two-difference vector-based strategies “MT2” and “MT3” have better perturbation effects than the one-difference vector-based strategy, but they require a higher computational cost. The MT3 strategy relies on the best solution found so far, has a fast convergence rate, and performs well for single-peaked problems. However, this strategy will likely fall into a local optimum for multipeaked issues, leading to premature convergence. The “MT4” strategy is rotationally invariant. The results show that the MT4 strategy is effective for solving multiobjective optimization problems.
2.2.1.3. Crossover
To increase the diversity of the perturbation parameter vectors, a crossover approach was used. At generation G, concerning each mutation vector Vi,G = [Vi,G1, Vi,G, ···, Vi,GD], the trial vector Ui,G = [Ui,G, Ui,G2, ···, Ui,G] will be generated according to the following crossover equation
![]() |
15 |
where CR refers to the crossover rate, which controls the fraction of parameter values obtained from the mutation vector with positive values between [0,1], randj refers to the jth sought value of the uniform random number generator, which takes values in [0,1], and jrand is a chosen integer in [1,D] randomly and it ensures that one parameter in Ui,G is different from the target vector xi,G at least.
2.2.1.4. Selection
For the minimization problem, each target vector xi,G, and its corresponding trial vector Ui,G+1, a selection procedure is performed using the fitness function. The population with the lower fitness function value will be selected as the next-generation population.
| 16 |
2.1.2. Self-Adaptive Differential Evolution Algorithm
The control parameters used by the DE algorithm, such as CR, F, and NP, with the learning strategy, determine the algorithm’s performance. For achieving superior performance on a given problem, fine-tuning individual key control parameters is essential. Additionally, but more importantly, all available learning strategies must be tested in the mutation phase. A better solution is to use the SaDE algorithm that can adjust the control parameters and learning strategies during the evolutionary process through the self-adaptive mechanism automatically.
Among the three key control parameters of the DE algorithm, CR, F, and NP, NP is adopted as a user-defined value to deal with different dimensions. F corresponds to the speed of convergence, which should be maintained throughout the evolutionary process for local searches with smaller values of F and global searches with larger values of F, to produce influential mutant individuals. For different individuals in the current population, F can select any value in the (0, 2) range of the normal distribution with mean (0.5) and standard deviation (0.3) randomly.
In general, CR affects the convergence speed and robustness of the search process, where choosing a good CR can achieve better learning results under different learning strategies. Therefore, the value of CR can be dynamically adjusted within a particular generation interval by accumulating previous learning experiences. After several generations, the CR values are modified several times, using all recorded CR values, and then the mean of the CR normal distribution is recalculated based on the vector of successful trials. When the process is repeated with this new mean and standard deviation of the normal distribution, the range of correct CR values for the current problem has been automatically learned.
This work introduces a self-adaptive mechanism based on statistical experience and the roulette wheel selection method to select the better mutation strategy from the pool of variant strategy candidates at different stages of evolution. MT1, MT2, MT3, and MT4 are four candidate learning strategies that can be selected during evolution. The probabilities of applying the learning strategies to each individual in the current population are P1, P2, P3, and P4, respectively. The initial possibilities of all four strategies are equal (0.25). For a population of size NP, a probability vector of length NP can be randomly generated and uniformly distributed in the range [0, 1].
If the value of the lth element of the vector is less than or equal to P1, the strategy MT1 will be applied to the lth individual in the current aggregate or moved to the next available strategy. After evaluating all trial vectors, the number of new trial vectors generated by different strategies into the next-generation G is NS1, NS2, NS3, and NS4. The number of discarded trial vectors is NF1, NF2, NF3, and NF4. These values are accumulated within a specific number of G that is called a learning period. Then, the probability p of strategy l is updated as follows
| 17 |
At the end of the learning period, it will automatically update the probabilities of applying the four learning strategies and reset the values obtained in the previous period for NS1–NS4 and NF1–NF4. This process will gradually evolve as the most suitable learning strategy for the given problem.
The performance of the LSSVM is affected to a significant degree by two key hyperparameters (eqs 5–8): γ and σ. Therefore, these hyperparameters are to be optimized using the SaDE algorithm, and the optimization process is shown in Figure 3.
Figure 3.
Flow diagram for LSSVM’s parameter optimization with SaDE.
The variables are chosen as the hyperparameters γ and σ to be optimized, with the selected fitness value function as follows
| 18 |
where y(xi) and ye(xi) are the true and proxy data at the test point xi, respectively, and R(γ,σ) refers to the root mean square of the prediction error (RMSE), which varies with the LSSVM parameters γ and σ. For proxy models, any error or correlation metric is obtained by validation in a test data set. The complexity of a proxy model requires multiple methods to assess its accuracy. The indicator for evaluating the performance of a regression model is generally the regression accuracy, which is also known as the model accuracy. In addition to RMSE, relative error (RE) is also a fundamental factor to be considered.
Generally, the relative error better reflects the confidence level of the calculation. It is defined by the following equation
| 19 |
3. Constructing Integrated Optimization Model
The utilization of hydraulic fracturing and horizontal drilling has facilitated the recovery of oil reserves from shale and other tight formations, resulting in horizontal well design and fracture optimization being the most critical aspects of low-permeability reservoir development design. The characteristics of horizontal well placement and fracturing design are analyzed, combined with the proxy model in Section 2, which establishes an integrated mathematical model for optimal strategy in this section.
Most conventional horizontal well design and fracturing optimization methods focus on finding the optimal design parameters to specific metrics, such as estimated ultimate recovery (EUR), cumulative gas production over a while, or net present value (NPV).24−30 The economic analysis recommends the decision with maximum NPV (present value of net benefits, or benefits minus costs, over time) or maximum benefit–cost ratio (present value of benefits to present value of fees). The NPV is calculated to obtain the present value of future income generated by eliminating the use of standard economic techniques. NPV is the difference between the present value of cash inflows and outflows associated with a project, adjusted for inflation to its present value, the objective function most widely used in reservoir optimization. For the optimization problem of horizontal well placement and fracturing design in shale gas reservoirs, the objective function (NPV) of the optimization is calculated as follows
| 20 |
where xf and Lw denote the optimization variables fracture half-length and horizontal well length, respectively; Nt and tn are the total year and the time step calculated to year tn, respectively; b is the discount rate; Pg denotes the natural gas price; Qgastn denotes the total gas production in the tn-th year; Coperate is the operating cost of natural gas in the year tn; ctax is the tax constant; Cground is the cost of ground works; and Cdrill and Cfracture denote drilling costs and fracturing costs, respectively.
Drilling costs are usually defined as a linear function of horizontal well length
| 21 |
where Cd fix denotes the fixed cost of drilling a new horizontal well and Cd is the drilling penetration cost per lateral length of the horizontal well. Table 1 provides the values for the fixed cost and penetration cost of the drilling process.
Table 1. Parameter for the NPV Calculation.
| parameter | values | unit | symbols |
|---|---|---|---|
| discount rate | 8 | %/100 | b |
| gas price | 1.3 | CNY/m3 | Pg |
| operating cost | 2.64 | 106 CNY | Coperate |
| tax constant | 0.3277 | CNY/m3 | ctax |
| cost of ground works | 5 | 106 CNY | Cground |
| fixed cost of drilling | 19.923 | 106 CNY | Cd fix |
| drilling cost per lateral length | 2269 | CNY/m | Cd |
| fixed fracturing cost | 8.57 | 106 CNY | Cfix |
In horizontal well multistage fracturing, large volumes of fracturing fluid and proppant are pumped into the formation. The proppant is pumped into the fractured rock to help improve the permeability of the formation fluids once the injected fracturing fluid has released the fracturing pressure. Therefore, the fracturing cost is primarily determined by the fracturing fluid and proppant. The following fracturing costs are defined
| 22 |
where cf and cp are the fracturing fluid and proppant costs per fracturing section, respectively; nf,j is the fracture section number, which is determined by the horizontal section length and fracture spacing; Cfix is the fixed fracturing cost per well; rp is the proppant concentration in the fracturing fluid; and Fin,j is the total volume of fracturing fluid. Typically, xf and Fin,j are nonlinearly related. This is partly because as the fracture half-length increases, the filtration loss increases due to the increased contact area between the fracturing fluid and the formation during injection and also because the injection pressure decreases as the fracture half-length increases under the influence of formation frictional resistance. This ultimately leads to the linear increase of injection volume and does not achieve the fracturing result of the linear increase of fracture half-length. In this study, according to the literature,31,32 we regressed the relationship between Fin,j and xf as a function of the following, based on the statistics of actual fracturing data
| 23 |
It is essential to emphasize that production forecasting and optimal design should be an integrated system. With the development of computer technology, researchers utilize optimization algorithms combined with proxy models to deal with production problems such as well placement and fracturing optimization design.33−36 The optimization results tend to be more influenced by the optimization algorithm when the accuracy of the proxy model is sufficient.
Guyaguler et al.37 proposed a hybrid genetic algorithm (HGA), which reduces the computational burden of performing many simulations by combining GA and general kriging algorithms. They obtained results consistent with observations from the Pompano field in the Gulf of Mexico and used the HGA to study the PUNQ-S3 reservoir. Ma et al.38 applied gradient-based finite difference (FD), discrete simultaneous regressive stochastic approximation (DSPSA), and genetic algorithm (GA) to resolve the hydraulic fracture arrangement problem. Plaksina et al.39 offer the algorithm that gives quantitative and qualitative measures of “goodness” of the optimal production plans, which can handle objectives effectively and produce the Pareto optimal solutions without requiring the user to assign weights to each aim inside an aggregate function. Li et al.30 proposed a dynamic simplex interpolation-based alternating subspace (DSIAS) search method for the mixed-integer optimization problem of shale gas development projects and also presented a case study on the development of the Barnett Shale gas field and validated the optimization effect.
4. Results and Discussion
In this section, we apply the SADE-LSSVM model to actual blocks (the lower Silurian Longmaxi Formation from the southern Sichuan Basin). First, a production proxy model is built based on the adaptive response surface experimental design with the historical fitting of regional production data. The applicability of the proxy model was also verified, with the proxy model remaining efficient and accurate when the data set was updated.
4.1. Model Building and Experiment Designing
The central target formation in the study area is the Longmaxi shale gas formation, which has a reservoir thickness of 30 m, a reservoir depth of about 2500–3000 m, average porosity of 4%, an average permeability of 3 × 10–5 mD, low-pressure coefficient, and fault and microfracture developed. The reservoir is assumed to be homogeneous, with uniform fracture spacing and porosity, and permeability independent of stress. Only gas flows in the reservoir, and the effects of gas desorption are considered. Detailed reservoir information for this section is shown in Table 2.
Table 2. Parameters Used in Reservoir Simulation.
| parameter | value | unit |
|---|---|---|
| porosity | 4 | % |
| matrix permeability | 3.6 × 10–5 | mD |
| gas content saturation | 65 | % |
| natural fracture spacing | 1 | m |
| hydraulic fracture half-length | 105 | m |
| hydraulic fracture height | 30 | m |
| fracture conductivity | 1.2 | mD·m |
Self-adaptive response surface experimental design (SaRSE) was applied based on six uncertain parameters with reasonable ranges of parameter values:
porosity (⌀)
matrix permeability (p)
fracture half-length (xf)
horizontal section length (lw)
gas content saturation (Sg)
SaRSE experimental data are determined by the response surface regression accuracy, with the number of experiments increasing until the specified accuracy is reached. Experimental data thus obtained can better reflect the actual reservoir characteristics and seepage patterns. Numerical simulation software was used to calculate the 15-year gas production (Cg). Some of the experimental design results are shown in Supporting Information Table S1.
4.2. Proxy Model Accuracy
Based on SaRSE, the response surface regression formula will be obtained. The results of adaptive response surface regression are shown in Figure 4. The response surface formula predicts results with low accuracy. Therefore, a new method is warranted to build the agent model.
Figure 4.

Regression results of adaptive response surface.
Cross-validation is introduced to improve the model’s reliability. The total data set was divided into three data sets: training set, validation set, and test set. More specifically, first, 25% of the entire data set was randomly selected as the test set and excluded from the training and validation process. The remaining data set was then randomly divided into 10 parts, and a fivefold cross-validation method was used to use 60% of the data as the training set and 15% of the data to validate the model for each cycle.
Figure 5 depicts the Cg for the training, validation, test, and total data sets. The results show that the error difference between the training and test data sets is slight, indicating that the training process is reliable (no overfitting).
Figure 5.
SaDE-LSSVM comparison of predicted and target Cg for (a) training set, (b) validation set, (c) testing set, and (d) total set.
The predictions are in good agreement with the target values, and the accuracy is within the acceptable range. We compared the SADE-LSSVM model with the PSO-LSSVM, GA-LSSVM, and GWO-LSSVM models, showing the high prediction accuracy of the SADE optimized LSSVM, as shown in Figures 6–8. Table 3 shows a more comprehensive set of error metrics.
Figure 6.
PSO-LSSVM comparison of predicted and target Cg for (a) training set, (b) validation set, (c) testing set, and (d) total set.
Figure 8.
GWO-LSSVM comparison of predicted and target Cg for (a) training set, (b) validation set, (c) testing set, and (d) total set.
Table 3. Comparison of LSSVM Model Errors (RMSE and RE).
| proxy model | SaDE-LSSVM | PSO-LSSVM | GA-LSSVM | GWO-LSSVM | |
|---|---|---|---|---|---|
| training set | RMSE | 0.002264 | 0.005254 | 0.007137 | 0.005262 |
| RE | 0.001350 | 0.002044 | 0.002488 | 0.002047 | |
| validation set | RMSE | 0.002610 | 0.005651 | 0.007552 | 0.005659 |
| RE | 0.001368 | 0.002677 | 0.003418 | 0.002680 | |
| testing set | RMSE | 0.002955 | 0.007006 | 0.009437 | 0.007016 |
| RE | 0.001429 | 0.002632 | 0.003381 | 0.002636 | |
| total set | RMSE | 0.002503 | 0.005780 | 0.007808 | 0.005788 |
| RE | 0.001372 | 0.002290 | 0.002856 | 0.002293 | |
Figure 7.
GA-LSSVM comparison of predicted and target Cg for (a) training set, (b) validation set, (c) testing set, and (d) total set.
4.3. Sensitivity Analysis and Optimization Results
A first sensitivity analysis was carried out according to the experimental design in Supporting Information Table 1. The F-value of analysis of variance is a fine indicator of the degree of influence of different parameters. Figure 9 shows that all investigated parameters have different degrees of influence on Cg and NPV. The results showed that Cg and NPV exhibited the same behavior in response to parameter variations. Cg and NPV are sensitive to geological properties (⌀ and Sg) to a high degree. The effect of k on Cg is relatively small. Shale gas is mainly recovered by hydraulic fracturing with high inflow capacity, where changing the matrix permeability does not bring significant changes in production, which is the reason why hydraulic fracturing is used for shale gas development.
Figure 9.
Degree of influence of geological and engineering parameters on (a) Cg and (b) NPV.
A second sensitivity study was then performed for each influencing parameter to determine the effect of individual parameters on Cg and NPV based on the experimental data in Supporting Information Table 1. The effect patterns of parameters on Cg and NPV are shown in Figure 10. Obviously, the increase of ⌀ and Sg can directly increase Cg with no cost constraint, which eventually makes the NPV show the same trend as Cg. However, an xf growth leads to a higher Cfracture while increasing Cg, which results in a different trend of Cg and NPV. For lw, both Cg and Cdrill are close to linear to lw, which leads to Cdrill not being able to constrain NPV, which makes Cg and NPV trend the same while the optimal lw is taken to the extreme value on the other hand.
Figure 10.
Effects of (a) ⌀, (c) Sg, (e) xf, and (g) lw on Cg, and effects of (b) ⌀, (d) Sg, (f) xf, and (h) lw on NPV.
In this study, the SaDE-LSSVM proxy model with relatively high accuracy is applied to the single-well NPV optimization with the optimization variables xf and lw. The final optimization results are xf = 155 m and lw = 2000 m.
4.4. Practical Application of Proxy Model
The numerical simulation itself is a simplified mathematical model, so the LSSVM fitting is excellent. The reliability of the LSSVM method can be tested with actual field data. Therefore, to further validate the applicability of the proxy models, we developed the EUR proxy model and the absolute open flow (AOF) proxy model for the study area based on actual data. We select 16 real production wells data as training samples.
The characteristic variables are formation pressure, clump number, fracturing fluid volume, proppant dose, ⌀, k, Sg, rejection rate, fracturing pressure, and pumping stop pressure. Five wells were also selected as the test set to verify the prediction accuracy of the proxy model. Figure 11 shows the accuracy of the SADE-LSSVM model and PSO-LSSVM model, respectively. A comprehensive set of error evaluations is shown in Table 4. The proxy model errors are within acceptable limits, indicating the reliability of using actual production data to build the proxy models.
Figure 11.
Predicted results of QAOF and EUR under SaDE-LSSVM and PSO-LSSVM, respectively.
Table 4. Comparison of Errors in Predicting QAOF and EUR by LSSVM Models.
| proxy model | SaDE-LSSVM | PSO-LSSVM | ||
|---|---|---|---|---|
| QAOF | training set | RMSE | 0.039221 | 0.022444 |
| RE | 0.828875 | 0.425939 | ||
| testing set | RMSE | 0.030152 | 0.019894 | |
| RE | 0.633160 | 0.812785 | ||
| EUR | training set | RMSE | 0.013228 | 0.054516 |
| RE | 0.032447 | 0.042602 | ||
| testing set | RMSE | 0.098707 | 0.100341 | |
| RE | 0.079433 | 0.100720 | ||
Conclusions
In this study, a proxy model of shale gas production, SADE-LSSVM, is developed using a swarm optimization algorithm and machine learning. Its errors are compared and evaluated, showing a high accuracy of the proxy model. The proxy model built by the swarm optimization algorithm combined with the machine learning approach still has strong applicability when the data set is updated. We obtained the following recognition points:
(1) The developed adaptive proxy modeling method has high accuracy, due to the fact that a sufficient number of data are selected with the self-adaptive response surface design in the developed method, which improves the accuracy of the proxy model.
(2) The developed proxy model can simulate the reservoir simulator response with acceptable accuracy, thus replacing numerical simulation in the production optimization process to satisfy the optimization process under multiple iterations and multiple control parameter variations, which is particularly important for production optimization applications.
(3) The proxy models can be easily updated when any new data sets are available. These data sets include geological data, drilling data, fracturing data, or other custom design-of-experiment data. This data-driven approach has great potential for application by quickly implementing proxy models on other shale reservoir data sets.
Acknowledgments
The authors are grateful for the financial support provided by the National Major Science and Technology Project (2016ZX05061) and research project funded by SINOPEC (P19017-3).
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.1c05158.
Complete experimental design results (Table S1) (XLSX)
The authors declare no competing financial interest.
Supplementary Material
References
- Li W.; Lu S.; Li J.; Wei Y.; Feng W.; Zhang P.; Song Z. Geochemical modeling of carbon isotope fractionation during methane transport in tight sedimentary rocks. Chem. Geol. 2021, 566, 120033 10.1016/j.chemgeo.2020.120033. [DOI] [Google Scholar]
- Chen G.; Gang W.; Chang X.; Wang N.; Zhang P.; Cao Q.; Xu J. Paleoproductivity of the Chang 7 unit in the Ordos Basin (North China) and its controlling factors. Palaeogeogr., Palaeoclimatol., Palaeoecol. 2020, 551, 109741 10.1016/j.palaeo.2020.109741. [DOI] [Google Scholar]
- He T.; Lu S.; Li W.; Wang W.; Sun D.; Pan W.; Zhang B. Geochemical characteristics and effectiveness of thick, black shales in southwestern depression, Tarim Basin. J. Pet. Sci. Eng. 2020, 185, 106607 10.1016/j.petrol.2019.106607. [DOI] [Google Scholar]
- Hu T.; Pang X.; Jiang F.; Wang Q.; Liu X.; Wang Z.; Jiang S.; Wu G.; Li C.; Xu T.; et al. Movable oil content evaluation of lacustrine organic-rich shales: Methods and a novel quantitative evaluation model. Earth-Sci. Rev. 2021, 214, 103545 10.1016/j.earscirev.2021.103545. [DOI] [Google Scholar]
- Wang H.; Qiao L.; Lu S.; Chen F.; He X.; Zhang J.; He T.; et al. A novel shale gas production prediction model based on machine learning and its application in optimization of multistage fractured horizontal wells. Front. Earth Sci. 2021, 9, 675 10.3389/feart.2021.726537. [DOI] [Google Scholar]
- Zou C.; Zhu R.; Chen Z.; Ogg J. G.; Wu S.; Dong D.; Qiu Z.; Wang Y.; Wang L.; Lin S.; et al. Organic-matter-rich shales of China. Earth-Sci. Rev. 2019, 189, 51–78. 10.1016/j.earscirev.2018.12.002. [DOI] [Google Scholar]
- Wang D.; Wang X.; Ge H.; Sun D.; Yu B. Insights into the Effect of Spontaneous Fluid Imbibition on the Formation Mechanism of Fracture Networks in Brittle Shale: An Experimental Investigation. ACS Omega 2020, 5, 8847–8857. 10.1021/acsomega.0c00452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kulga B.; Artun E.; Ertekin T. Development of a data-driven forecasting tool for hydraulically fractured, horizontal wells in tight-gas sands. Comput. Geosci. 2017, 103, 99–110. 10.1016/j.cageo.2017.03.009. [DOI] [Google Scholar]
- Zhang X.; Du C.; Deimbacher F.; Crick M.; Harikesavanallur A. K. In Sensitivity Studies of Horizontal Wells with Hydraulic Fractures in Shale Gas Reservoirs, International Petroleum Technology Conference, 2009.
- Jia H.; Meng Y. Multiple fracturing of horizontal well in shale gas productivityfactors numerical simulation researching. Pet. Ind. Appl. 2013, 32, 34–39. [Google Scholar]
- Hassani H.; Sarkheil H.; Foroud T.; Kaimpooli S. In A Proxy Modeling Approach to Optimization Horizontal Well Placement, 45th U.S. Rock Mechanics/Geomechanics Symposium, 2011; pp 26–29.
- Schuetter J.; Mishra S.; Zhong M.; LaFollette R. A Data-Analytics Tutorial: Building Predictive Models for Oil Production in an Unconventional Shale Reservoir. SPE J. 2018, 23, 1075–1089. 10.2118/189969-PA. [DOI] [Google Scholar]
- Zhang L.; Li Z. P.; Lai F. P.; Li H.; Adenutsi C. D.; Wang K.; Yang S.; Xu W. Integrated optimization design for horizontal well placement and fracturing in tight oil reservoirs. J. Pet. Sci. Eng. 2019, 178, 82–96. 10.1016/j.petrol.2019.03.006. [DOI] [Google Scholar]
- Wang S.; Chen Z.; Chen S. Applicability of deep neural networks on production forecasting in Bakken shale reservoirs. J. Pet. Sci. Eng. 2019, 179, 112–125. 10.1016/j.petrol.2019.04.016. [DOI] [Google Scholar]
- Brantson E. T.; Ju B.; Ziggah Y. Y.; Akwensi P.; Sun Y.; Wu D.; Addo B. J. Forecasting of Horizontal Gas Well Production Decline in Unconventional Reservoirs using Productivity, Soft Computing and Swarm Intelligence Models. Nat. Resour. Res. 2019, 28, 717–756. 10.1007/s11053-018-9415-2. [DOI] [Google Scholar]
- Montgomery J. B.; O’Sullivan M. Spatial variability of tight oil well productivity and the impact of technology. Appl. Energy 2017, 195, 344–355. 10.1016/j.apenergy.2017.03.038. [DOI] [Google Scholar]
- Zhou Q.; Dilmore R.; Kleit A.; Wang J. Y. Evaluating Gas Production Performances in Marcellus Using Data Mining Technologies. J. Pet. Sci. Eng. 2014, 20, 109–120. [Google Scholar]
- Lolon E.; Hamidieh K.; Weijers L.; Mayerhofer M.; Mechler H.; Oduda O. In Evaluating the Relationship between Well Parameters and Production Using Multi-Variate Statistical Models: A Middle Bakken and Three Forks Case History, SPE Hydraulic Fracturing Technology Conference, 2016.
- Prabusankarlal K. M.; Thirumoorthy P.; Manavalan R. Classification of breast masses in ultrasound images using self-adaptive differential evolution extreme learning machine and rough set feature selection. J. Med. Imaging 2017, 4, 024507 10.1117/1.JMI.4.2.024507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu W.; Kamy S. Optimization of Multiple Hydraulically Fractured Horizontal Wells in Unconventional Gas Reservoirs. J. Petrol. Sci. Eng. 2013, 2013, 1–16. 10.1155/2013/151898. [DOI] [Google Scholar]
- Yu W.; Sepehrnoori K. An Efficient Reservoir-Simulation Approach to Design and Optimize Unconventional Gas Production. J. Can. Petrol. Technol. 2014, 53, 109–121. 10.2118/165343-PA. [DOI] [Google Scholar]
- Nguyen-Le V.; Shin H. Development of reservoir economic indicator for Barnett Shale gas potential evaluation based on the reservoir and hydraulic fracturing parameters. J. Nat. Gas. Sci. Eng. 2019, 66, 159–167. 10.1016/j.jngse.2019.03.024. [DOI] [Google Scholar]
- Suykens J. A. K.; Vandewalle J. Least Squares Support Vector Machine Classifiers. Neural Process Lett. 1999, 9, 293–300. 10.1023/A:1018628609742. [DOI] [Google Scholar]
- Fonseca R. M.; Leeuwenburgh O.; Rossa E. D.; Paul M. J.; Jansen J. D. In Ensemble-Based Multi-Objective Optimization of On-Off Control Devices Under Geological Uncertainty, SPE Reservoir Simulation Symposium, 2015.
- Awotunde A. A. In On the Joint Optimization of Well Placement and Control, Society of Petroleum Engineers SPE Saudi Arabia Section Technical Symposium and Exhibition, 2014.
- Oliveira D. F.; Reynolds A. An Adaptive Hierarchical Multiscale Algorithm for Estimation of Optimal Well Controls. SPE J. 2014, 19, 909–930. 10.2118/163645-PA. [DOI] [Google Scholar]
- Wilson K. C.; Durlofsky L. J. Optimization of shale gas field development using direct search techniques and reduced-physics models. J. Pet. Sci. Eng. 2013, 108, 304–315. 10.1016/j.petrol.2013.04.019. [DOI] [Google Scholar]
- Xu S.; Feng Q.; Wang S.; Farzam J.; Li Y. Optimization of multistage fractured horizontal well in tight oil based on embedded discrete fracture model. Comput. Chem. Eng. 2018, 117, 291–308. 10.1016/j.compchemeng.2018.06.015. [DOI] [Google Scholar]
- Wang X.; Peng X.; Zhang S.; Ying L.; Peng F.; Zeng F. Guidelines for Economic Design of Multistage Hydraulic Fracturing, Yanchang Tight Formation, Ordos Basin. Nat. Resour. Res. 2019, 29, 1413–1426. 10.1007/s11053-019-09500-w. [DOI] [Google Scholar]
- Li J. C.; Gong B.; Wang H. G. Mixed integer simulation optimization for optimal hydraulic fracturing and production of shale gas fields. Eng. Optim. 2015, 1378–1400. 10.1080/0305215X.2015.1111002. [DOI] [Google Scholar]
- Zhang M. L.; Zhang T. Y.; Fan J. Y. Calculation and analysis of fracture extension parameters based on PKN model. Sci. Technol. Eng. 2019, 19, 116–123. [Google Scholar]
- Wang W.; Chen Z. H.; Mei J. W.; Ren J. H.; Zeng Q. D. Post-fracturing numerical simulation for geology-engineering integration of normal pressure shale gas: A case study of the well area DP2. Pet. Geol. Recovery Effic. 2021, 28, 1–9. [Google Scholar]
- Qin A. K.; Huang V. L.; Suganthan P. N. Differential Evolution Algorithm with Strategy Adaptation for Global Numerical Optimization. IEEE Trans. Evol. Comput. 2009, 13, 398–417. 10.1109/TEVC.2008.927706. [DOI] [Google Scholar]
- Pratimsarangi P.; Sahu A.; Panda M. A Hybrid Differential Evolution and Back-Propagation Algorithm for Feedforward Neural Network Training. Int. J. Comput. Appl. 2013, 84, 1–9. 10.5120/14641-2943. [DOI] [Google Scholar]
- Knudsen B. R.; Foss B. Designing shale-well proxy models for field development and production optimization problems. J. Nat. Gas. Sci. Eng. 2015, 27, 504–514. 10.1016/j.jngse.2015.08.005. [DOI] [Google Scholar]
- Marongiu-Porcu M.; Economides M. J.; Holditch S. A. Economic and physical optimization of hydraulic fracturing. J. Nat. Gas. Sci. Eng. 2013, 14, 91–107. 10.1016/j.jngse.2013.06.001. [DOI] [Google Scholar]
- Guyaguler B.; Horne R. Optimization of Well Placement. J. Energy Resour. Technol. 2000, 122, 64–70. 10.1115/1.483164. [DOI] [Google Scholar]
- Ma X.; Plaksina T.; Gildin E. In Optimization of Placement of Hydraulic Fracture Stages in Horizontal Wells Drilled in Shale Gas Reservoirs,Unconventional Resources Technology Conference, 2013; pp 1479–1489.
- Plaksina T.; Gildin E. Practical Handling of Multiple Objectives Using Evolutionary Strategy for Optimal Placement of Hydraulic Fracture Stages in Unconventional Gas Reservoirs. J. Nat. Gas. Sci. Eng. 2015, 27, 443–451. 10.1016/j.jngse.2015.06.049. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.










