Modeling the Solubility of Sulfur in Sour Gas Mixtures Using Improved Support Vector Machine Methods

Yu-Chen Wang; Zheng-Shan Luo; Yi-Qiong Gao; Yu-Lei Kong

doi:10.1021/acsomega.1c05032

. 2021 Nov 22;6(48):32987–32999. doi: 10.1021/acsomega.1c05032

Modeling the Solubility of Sulfur in Sour Gas Mixtures Using Improved Support Vector Machine Methods

Yu-Chen Wang ¹, Zheng-Shan Luo ^1,^*, Yi-Qiong Gao ¹, Yu-Lei Kong ¹

PMCID: PMC8655918 PMID: 34901650

Abstract

graphic file with name ao1c05032_0012.jpg

The study of sulfur solubility is of great significance to the safe development of sulfur-containing gas reservoirs. However, due to measurement difficulties, experimental research data on sulfur solubility thus far are limited. Under the research background of small samples and poor information, a weighted least-squares support vector machine (WLSSVM)-based machine learning model suitable for a wide temperature and pressure range is proposed to improve the prediction accuracy of sulfur solubility in sour gas. First, we use the comprehensive gray relational analysis method to extract important factors affecting sulfur solubility as the model input parameters. Then, we use the whale optimization algorithm (WOA) and gray wolf optimizer (GWO) intelligence algorithms to find the optimal solution of the penalty factor and kernel coefficient and bring them into three common kernel functions. The optimal kernel function is calculated, and the final WOA-WLSSVM and GWO-WLSSVM models are established. Finally, four evaluation indicators and an outlier diagnostic method are introduced to test the proposed model’s performance. The empirical results show that the WOA-WLSSVM model has better performance and reliability; the average absolute relative deviation is as low as 3.45%, determination coefficient (R²) is as high as 0.9987, and the prediction accuracy is much higher than that of other models.

1. Introduction

The amount of harmful gas emitted by natural gas combustion is far lower than that of other fossil energy sources, which plays an important role in supporting the low-carbon and green development of the world. At present, unconventional oil and gas (such as sour gas reservoirs) account for an increasing proportion of the world’s new oil and gas production and reserves. China’s proven geological reserves of sour gas with high H₂S and CO₂ exceed 5000 × 108 m³, accounting for approximately one-fourth of the total reserves of gas reservoirs in China.¹ The discovery of large high-sulfur gas reservoirs such as the Luojiazhai Gas Field, Puguang Gas Field, Dukouhe Gas Field, Tieshanpo Gas Field, and Yuanba Gas Field provides an important gas source guarantee for the national “West-East Gas Pipeline Project.”² The sulfur deposition damage of high-sulfur gas reservoirs is the main feature that distinguishes them from conventional gas reservoirs, and it is also one of the main factors that affect the economic benefits of high-sulfur gas field development. Since the 1950s, scholars from the United States, Canada, Germany, and other countries have successively carried out much research on sulfur deposition during the exploitation of sulfur-bearing gas fields. They believe that sulfur solubility is an important condition for identifying sulfur deposition, so accurate prediction of sulfur solubility in sour gas is very important for the development of the sulfur gas field.^3,4

At present, there are four methods for obtaining the solubility of sulfur in sulfur-containing gas: experimental measurement, equation of state (EOS), empirical model, and machine learning method. As early as 1960, Kennedy conducted the first experiment on the solubility of elemental sulfur in single-component gas and multicomponent mixed gas. Since 1990 in China, Gu Mingxing, Zeng Ping, Yang Xuefeng, Bian Xiaoqiang, Sun Changyu, Hu Jinghong, and others have also analyzed the solubility of elemental sulfur.⁵ Sulfur solubility experiments usually need to be carried out at high temperatures (303.2∼433.15 K) and high pressures (6.7∼155 MPa), and H₂S is toxic and corrosive, making experiments more difficult. Therefore, experimental data on sulfur solubility are scarce and valuable compared to other solubility data and are an important basis for our subsequent studies. EOS and empirical formulas are not only difficult to calculate but also have certain limitations.^6,7 Machine learning (ML), as a relatively young and important branch of artificial intelligence, can now also be used to predict sulfur solubility and gradually reveal its excellent performance and practicality.⁸Table 1 shows the comparison of various ML methods to predict sulfur solubility. Through research, it has been found that most of the predecessors used an artificial neural network (ANN) to make predictions, for example, feedforward neural networks (Mohammadi),⁹ the GA-LM-BP hybrid model (Chen),¹⁰ and the cascaded forward neural network (CFNN) hybrid model (Amar M N).¹¹ Although the ANN is an efficient and long-established ML model, the complexity of the model itself (the increase in layers and parameters) necessitates a large amount of data for training. However, the precipitation of sulfides in sour gas reservoirs is long-term, and it is difficult to obtain comprehensive first-hand data. Therefore, compared with the ANN, the support vector machine (SVM), an ML model suitable for small samples and poor information, is more in line with the background of sulfur solubility research. To date, it has been uncommon for scholars to use SVM to calculate sulfur solubility. For the first time, Bian et al.^12,13 combined the gray wolf optimizer (GWO) algorithm with a least-squares support vector machine (LSSVM) and used 70% of the experimental sulfur solubility data in 184 groups of mixed gases as the training set to train the LSSVM. The model’s average absolute relative deviation (AARD) = 3.5029% and R² = 0.9976 showed excellent predictive performance. In addition, Liu et al.¹⁴ used SVR to predict the thermodynamic properties of pure fluids and their mixtures and also obtained ideal and excellent prediction results.

Table 1. Comparison of Several Machine Learning Methods for Sulfur Solubility Prediction.

ML models	differences
Mohammadi (2008)	a feedforward neural network (FNN) is first used to predict the dissolution of sulfur in pure H₂S at high temperatures (316–433 K) and high pressure (60 MPa). The results show that the average relative error between the predicted value and the experimental value is 6.1%.
Chen (2014)	a GA-LM-BP ANN model is proposed, and 74 sets of data are used to train and test the model. The simulation results show that the average relative deviation (AARD) between the training results and measured values is 5.90%, and the AARD for the test results is 5.54%.
Bian (2019)	using the GWO-LSSVM hybrid model, five influencing factors are considered. This model shows good performance, with the minimum average absolute relative deviation (AARD = 3.5029%) and the maximum determination coefficient (R² = 0.9976) for all 239 data (for pure H₂S and sour gas).
Amar M. N. (2020)	three models of CFNN, GEP, and MLP are established, and it is concluded that for the calculation of the solubility of pure mixed H₂S and sour gas, the cascaded forward neural network (CFNN) prediction model is better than other methods. The overall RMSE values of the CFNN model are 3.8101 and 0.0232, respectively.

Open in a new tab

In summary, the experimental measurement method has a long period, high cost, and low security; the EOS and empirical model have low universality and excessive calculation. Among ML methods, prediction models based on the ANN have been widely used in research on sulfur solubility and have excellent practical performance; prediction models based on SVM have not been involved in much previous research and have broad development prospects in the research on sulfur solubility prediction.¹³ It is important to note that while the use of ML methods allows for direct modeling based on existing data, the development of other methods is encouraged and invaluable and should not be superseded by other methods.

In this work, a comprehensive gray relational analysis (CGRA) method that combines difference and division methods is first constructed to screen out the main factors affecting sulfur solubility to determine the model input parameters.¹⁵ Then, an SVM-based hybrid machine learning model (WOA&GWO-WLSSVM) is proposed to predict sulfur solubility in sour gas. The input parameters of the model are reservoir temperature, pressure, and mole fraction of CH₄, H₂S, and CO₂, and the target parameter is sulfur solubility. The model is developed and tested using data sets (245) in the public literature, evaluated by four statistical indicators (average absolute relative deviation (AARD), root-mean-squared error (RMSE), standard deviation (SD), and R),² and compared with the prediction results of three empirical formulas and three ML models. After rigorous calculations, the results show that the AARD and R² of WOA-WLSSVM reached 3.45% and 0.9987, respectively, both of which were superior to those of other models, indicating that the performance of the model was good and the prediction effect was more accurate. In addition, outlier diagnosis is carried out through the leverage method, and only individual data points are outside the valid range, which proves that the model passes the statistical test and has good validity and reliability.

This research is organized as follows, and the research process is shown in Figure 1. In Section 2, the modeling technique is described in detail. Section 3 describes the data analysis and model training. In Section 4, the prediction results of the model are evaluated through statistical indicators and leverage methods, and a rigorous quantitative evaluation of the performance of the new model is conducted. Section 5 gives the conclusion of this research.

2. Modeling Techniques

2.1. CGRA: An Improvement Based on GRA

When dealing with problems that have complex interrelationships, we often do not have all the information and sufficient data. The gray relational analysis (GRA) method does not require a large amount of sample data. It mainly focuses on the degree of relevance between the impact index and the research question.¹⁶ Traditional GRA uses the absolute value of the difference between two data sequences to calculate the correlation degree. It considers only the degree of geometric similarity between data sequences and ignores the degree of numerical proximity.¹⁷ If the two curves are parallel, the correlation degree between them is calculated by the traditional gray correlation analysis method to be 1. In fact, the correlation degree between the two curves is not 1, and the calculated correlation degree does not match the actual situation. Therefore, a CGRA method that combines difference and division methods is constructed; it uses distance similarity and shape similarity to describe the degree of relevance, which addresses the disadvantage of traditional GRA that it ignores the degree of numerical proximity. To enhance the generalization ability and robustness of the weighted LSSVM (WLSSVM) model, CGRA is used to extract and analyze the features. The process of performing CGRA is as follows:

Construction ofthe feature matrix: Let X₀ be the quantity that characterizes the behavior of the system, where its observed value on the sequence number k is x₀(k); then, X₀(k) = (x₀(1), x₀(2), ···, x₀(m)) is called the characteristic behavior sequence of the system. Let X_i be the system factor, where its observed value on the serial number k is x_i(k); then, X_i(k) = (x_i(1), x_i(2), ···, x_i(m)) (i = 1,2, ···, n) is said to be the behavior sequence of the system’s related factors. These n + 1 sequences form a characteristic matrix of order m × (n + 1), as shown in eq 1:

where m is the dimension of the eigenvector; n is the sample number; the subscript k = 1,2, ···, m; and i = 1,2, ···, n (the same is true below).

Calculation of the difference matrix: The difference between each component of the characteristic behavior sequence of the system and the behavior sequence of the related factors is calculated to form a difference matrix, as shown in eq 2.

where Δx_i⁰(k) represents the difference between the kth eigenvalue of the system feature and the kth eigenvalue of the ith sample in the sequence of related factors.

Δx_i⁰ is introduced into the following formula to form the gray correlation degree of the shape similarity:

Calculation of the quotient matrix: The quotient of each component in the system characteristic behavior sequence and the related factor behavior sequence is calculated to form a quotient matrix, as shown in eq 4.

where Δx_i⁰′(k) represents the quotient of the kth eigenvalue of the system feature and the kth eigenvalue of the ith sample in the behavior sequence of related factors.

Δx_i⁰′(k) is introduced into the following formula to form the gray correlation degree of the distance similarity:

Calculation of the comprehensive gray correlation degree: Combining eqs 3 and 5, the formula for the comprehensive gray relational degree is defined as follows:

2.2. WLSSVM: An Improvement Based on SVM

SVM is an ML model suitable for small samples and poor information. It is difficult to obtain comprehensive first-hand data due to the long-term and continuous precipitation of sulfide in acid gas reservoirs, so this is consistent with the background of the SVM model. LSSVM is a special extension of SVM. Although the computational complexity is reduced, the robust performance of the model is also reduced.¹⁸ In 2002, Suykens proposed an improved LSSVM algorithm—WLSSVM. Its core idea is to assign weights to training errors based on LSSVM, which can effectively reduce the impact of noise in the training samples and improve the rate of convergence.¹⁹

WLSSVM is based on the optimization problem of LSSVM that weights the error ξ_i of each item with a coefficient v_i.(20) The optimization problem can be described as eq 7:

where b is the threshold value; ω is the weight coefficient vector; ϕ( · ) is the mapping from the input space to a high-dimensional space; ϑ is the regularization parameter; ξ_i is the error sequence, and v_i is the weight value, which is calculated according to the sample training error.

We introduce the Lagrange function:

where α*(1,2, ···, N) is the Lagrange multiplier, according to the Karush–Kuhn–Tucker condition:

In the feature space, the inner product operation in the mapping space is simplified by introducing a kernel function. There are three main types of kernel functions, as follows:

1)
Sigmoid kernel functions:

2)
Polynomial kernel functions:

3)
Radial basis function (RBF) kernel functions:

where σ is the parameter of the kernel function.

Then, the optimization problem for eq 7 can be transformed into the following problem:

where l_1 × N is the unit row vector of 1 × N; l_N × 1 is the unit column vector of N × 1;

By solving eq 13, the expression of the WLSSVM model can be obtained as follows:

2.3. Swarm-Based Algorithm

The swarm-based algorithm is an emerging intelligent algorithm that has become the focus of an increasing number of researchers. It has a very special connection with artificial life, especially evolutionary strategies and genetic algorithms. Some classic intelligence algorithms are often used to optimize WLSSVM models, such as differential evolution, GA, and the ant lion optimizer. Although the classic algorithm just mentioned has a certain improvement on the classification effect of the WLSSVM, it does not easily jump out of the trap of local extremes, resulting in low classification accuracy.²¹ Compared with these algorithms, the GWO and WOA adopt a new search mechanism. They have the advantages of simple and fast calculations, fewer parameters, and strong global search capabilities. Therefore, they have a great probability of avoiding local extremes. They have also been used in different ML applications.²²

2.3.1. GWO Algorithm

The GWO is a new type of swarm-based algorithm derived from the social hierarchy mechanism and hunting behavior of gray wolves in nature.²³ At present, the GWO algorithm has been successfully applied to power systems, UAV path planning, economic dispatch assignment, PI controller optimization, workshop schedules, and other fields.^24,25

In the GWO, there are four wolves of different social classes. α, β, and δ wolves are the first three categories (classes: α > β > δ), which play an important role in guiding the main search direction, and a large number of ω wolves attack prey at the lowest level. The algorithm mechanism is shown in Figure 2. The main optimization process can be divided into four stages.²⁶

(1)
Encircling prey

The position of each gray wolf in the search space is updated according to the position of the prey. The update equation is as follows:

where t is the number of iterations, X_p is the position of the prey, X is the position of the gray wolf, and D is the distance between the prey and the gray wolf, which is defined as follows:

A and C are vector coefficients, and the calculation formulas are as follows:

where a linearly decreases from 2 to 0 as the iteration progresses, and r₁ and r₂ are random numbers in [0,1].

(2)
Hunting

According to the information of α wolves, β wolves, and δ wolves, the positions of individual gray wolves in the wolf pack are updated. The update formula is as follows:

where X₁, X₂, X₃ are defined as eqs 24 25 26:

where X_α, X_β, X_δ are the three optimal solutions in the tth iteration and D_α, D_β, D_δ are defined as eqs 27 28 29:

(3)
Attacking prey

Attacking prey is the final stage of the hunting process, which is equivalent to strengthening the local search during the search process. Through the above process, the wolf terminates the attack on the prey when the prey stops moving, which is also controlled by A and a. A change in A can be achieved by a change in a, and the interval of a is [0,2] in the whole iteration process. When |A| < 1, the wolf can move to any position between its current position and its prey. When |A| > 1, the wolves look for new spaces to find better prey.

2.3.2. WOA Algorithm

The WOA was also proposed by Professor Mirjalili,²⁷ but it was slightly later than the GWO, so we can see some influence of the GWO on the WOA. Relatively speaking, the main feature of the WOA is the use of random individuals or optimal individuals to simulate the hunting behavior of humpback whales and the use of spirals to simulate the bubble-net attack mechanism of humpback whales.^28,29

The predation process of the whale is summarized as follows:

(1)
Encircling prey

When a whale is looking for prey, it should first determine the position of the prey and then encircle it. Assuming that the current optimal position is the target prey, the individuals in the group move to the optimal position. The vector D is the distance between an individual and the optimal whale position. The location is updated as eqs 30 and 31:

where t is the current iteration number, X*(t) is the position of the best whale in generation t, and X(t) is the position of the whale in generation t.

The definitions of random vectors A and C are as follows:

where r is a random vector in [0,1]; a = 2 – 2t/T_max (T_max is the maximum number of iterations.)

When |A| ≤ 1, the whale thinks that it has found its prey and can launch a bubble attack.

(2)
Bubble-net attacking method

In the WOA, two whale predation methods are established, namely, the shrinking hunting method and the spiral bubble-net attacking method. Shrinking hunting is achieved by reducing the vector a (the size of vector A is in [−a, a]); when the spiral bubble-net attack is launched, the individual whales attack their prey in a spiral path. The updated position equation used is as follows:

where D′ = |X*(t) – X(t)| represents the distance between the whale and the current optimal position, the constant b represents the shape of the spiral, and l is a random number in [−1,1].

To simulate the attack of whale groups on prey, both shrinking envelopment and spiral paths are used. The WOA sets a probability p, where p is a random number in [0,1]. It is assumed that the probabilities of the whales using the two predation methods are both 0.5, and the iterative mathematical model of the whale position is as follows:

(3)
Searching for prey

During the predation process, in addition to updating the position of the whale following the optimal position, the whale will randomly update its position; this forces the whale to have a larger search range so that the WOA has a better global search capability. When |A| ≥ 1, the whale conducts a random search for prey, and the mathematical expression at this stage is as follows:

where X_rand is the random agent position vector in the population.

2.4. K-Fold Cross-Validation

Before K-fold cross-validation was proposed, hold-out cross-validation was often used. Here, the data were used only once and not fully utilized. However, when training the model, it is often the case that the number of samples is not sufficient. K-fold cross-validation can efficiently utilize the data set and avoid over- and under-learning.

The principle of K-fold cross-validation is to divide the entire data sample set into K groups, taking turns to use K-1 groups of the data set as the training set and the remaining group (i groups) as the testing set; each time the model is trained, the corresponding score is obtained, and the final average score is used as the model evaluation criterion.³⁰

The K-fold cross-validation structure is shown in Figure 3.

2.5. Establishment of the WOA&GWO-WLSSVM Model

According to the basic principle of the WLSSVM algorithm, it is important to obtain the appropriate parameters (penalty factor ϑ and kernel coefficient σ²) for the WLSSVM model. Therefore, this study uses two intelligence algorithms—the GWO and WOA—to optimize the parameters to improve the regression performance of the model. Figure 4 shows the overall framework of the WOA&GWO-WLSSVM model. The establishment of the model is divided into two major stages:

(1)
Training stage: The training sample data is read and normalized; WLSSVM is optimized through the GWO and WOA, and the optimal solution of the penalty factor and the kernel coefficient is found. Then, the optimal solution is brought into three common kernel functions, and the MSE and R² are used as the verification standards to select the kernel function to determine the final prediction model.
(2)
Prediction stage: The normalized test set is substituted into the final prediction model for calculation, the predicted value is denormalized, and the MSE and R² between the actual value and the predicted value are calculated. In this iterative loop, when the predicted value of MSE is the smallest and R² is the largest (within the maximum number of iterations), the iteration ends, and the final sulfur solubility prediction result is output.

Overall framework of the WOA&GWO-WLSSVM model.

3. Data Analysis and Model Training

3.1. Experimental Data

A total of 245 sets of experimental sulfur solubility data^6,7,31−35 were collected from previous literature studies for the establishment and evaluation of the model. The data set used in the study is shown in Table 2(temperature (303.2–486 K), pressure (6.7–155 MPa), and H₂S content (1–26.62%)). Compared with the data sets in previous studies (Bian, Liang Fu, Amar), it is more extensive. The Sun data set is used as an independent checking set to evaluate the application performance of the model in the field of actual gas reservoir engineering. The predicted values of the k testing set are averaged as the final predicted result of the testing set (k = 10 in the present study).³⁶

Table 2. Sulfur Solubility Data Sets Used in the Study.

author	temperature (°C)	pressure (MPa)
Brunner and Woll (1980)	373.15–433.15	10–60
Brunner (1988)	398–486	6.7–155
Gu (1993)	363.2–383.2	10–50
Sun CY (2003)	303.2–363.2	20–45
Yang XF (2009)	373.15	24–36
Bian XQ (2010)	336.2–396.6	10–55.2
Zhang GD (2014)	373.15–425.65	20–66.52

Open in a new tab

The training set is used to adjust the parameters ϑ and σ²; the testing set does not participate in training and is used to evaluate the generalization ability of the final model.

3.2. Selection of the Model Input Parameters

When determining the input parameters of the model, it is necessary to investigate the main factors affecting the solubility of sulfur in the mixed gas. The CGRA method is used to obtain the gray correlation coefficient value.^37,38 The larger the gray correlation coefficient value of a factor is, the greater its impact on the research objective is.³⁹ As shown in Figure 5, the most influential factor is H₂S content followed by CO₂ content, reservoir pressure, temperature, and CH₄ content. The gray correlation coefficient values of N₂ and C₂H₆ content are less than 0.5, so these two factors are eliminated. Therefore, the new model aims to obtain the best regression between sulfur solubility and H₂S content, CO₂ content, reservoir pressure, temperature, and CH₄ content.

CGRA for sulfur solubility in mixed acid gas.

3.3. Determination of the Model Details

The accuracy of model prediction is closely related to the choice of the kernel function. Different kernel functions will cause WLSSVM to choose different support vector algorithms.⁴⁰⁻⁴² Substituting three common kernel functions in learning and using the MSE and R² as the verification standards, equations are given as eqs 38 and 39, and the running results are shown in Table 3.

where N is the number of all experimental sulfur solubility data points and y_i^exp, y_i, y_ave^exp represent the experimental value of sulfur solubility, the predicted value, and the average value of the experimental data, respectively.

Table 3. MSE and R² of Different Kernel Functions.

kernel function	validation criteria	WOA-WLSSVM	GWO-WLSSVM
K₁(x,x_i)	MSE	0.866	0.741
K₁(x,x_i)	R²	0.714	0.622
K₂(x,x_i)	MSE	0.511	0.642
K₂(x,x_i)	R²	0.318	0.253
K₃(x,x_i)	MSE	0.029	0.044
K₃(x,x_i)	R²	0.945	0.914

Open in a new tab

Table 3 shows that the MSE and R² corresponding to the RBF kernel function (K₃(x, x_i)) are both the best, and the prediction accuracy is significantly higher than that of the other two kernel functions. This shows that its approximation characteristics and sulfur solubility values are more suitable for the relevant data provided in this study, so the RBF kernel function is more in line with the requirements of sulfur solubility regression prediction in this study. During the training process, trial-and-error testing is used to determine the parameters of the GWO-WLSSVM and WOA-WLSSVM models. The parameters are listed in Table 4.

Table 4. Parameters of the Trained Model.

parameter	GWO-WLSSVM	WOA-WLSSVM
input data form	[−1, +1]	[−1, +1]
input variables	5	5
max iterations	200	200
search agents	30	30
ϑ_Best	0.7833	2.3718
σ_Bes²_t	8.8485	12.9816

Open in a new tab

4. Results and Discussion

4.1. Quantitative Evaluation

To verify the prediction effect of the model, the following four statistical indicators were selected for quantitative evaluation: the R², the AARD, the RMSE, and the SD. They are calculated using eqs 39 40 41 42:

The comparison between the prediction results of the training set, the testing set, and the checking set and the experimental data are shown in Table 5 and Figure 6. For the training set, both the WOA-WLSSVM model and the GWO-WLSSVM model have a low AARD value and high R² value. The calculated data points are in good agreement with the experimental data points, which indicates that the two models have a strong fitting ability. For the testing set, the WOA-WLSSVM model’s AARD = 3.68% and R² = 0.9985; the GWO-WLSSVM model’s AARD = 3.84% and R² = 0.9983; the prediction value of the former is slightly more consistent with the experimental value, which proves that the WOA-WLSSVM model has a better prediction effect. To prove the accuracy of the two models in this study, three widely used data sets^6,32,33 are used to compare the prediction results with the experimental data, as shown in Figures 7 8 9.

Table 5. Statistical Evaluation Results of the Sulfur Solubility Prediction Model (a, b).

data sets	AARD (%)	SD	RMSE	R²
(a) WOA-WLSSVM
training sets	3.35	0.06	0.03	0.9991
testing sets	3.68	0.07	0.04	0.9985
checking sets	3.87	0.09	0.01	0.9896
all sets	3.45	0.07	0.02	0.9987
(b) GWO-WLSSVM
training sets	3.43	0.06	0.03	0.9987
testing sets	3.84	0.08	0.05	0.9983
checking sets	3.89	0.08	0.01	0.9888
all sets	3.47	0.07	0.02	0.9983

Open in a new tab

(a,b) Predicted results compared with the experimental results: Brunner.

Predicted results compared with the experimental results: Bian.

Predicted results compared with the experimental results: Zhang.

To evaluate the application performance of the model in actual gas reservoir engineering, a new set of data sets, that of Sun³⁵ (where the experimental data are more representative and suitable for most sour gas reservoirs), is used as a checking set for application performance testing, as shown in Table 6. The relative error (RE) indicates that the predicted values of the two new models are not greatly different from the experimental values. Between them, the RE value of WOA-WLSSVM is lower, which proves that its performance is better and it can better predict sulfur solubility in acid gas reservoirs.

Table 6. Performance Testing with a New Data Set.

gas composition	temperature (K)	pressure (MPa)	experiment value g/m³	WOA-WLSSVM	WOA-WLSSVM	GWO-WLSSVM	GWO-WLSSVM
gas composition	temperature (K)	pressure (MPa)	experiment value g/m³	calculated value g/m³	RE/%	calculated value g/m³	RE/%
4.95% H₂S, 7.40% CO₂, 87.65% CH₄	303. 2	30	0.057	0.055	3.509	0.091	2.247
	303. 2	40	0.105	0.102	2.857	0.123	2.500
	323. 2	30	0.083	0.082	1.205	0.111	5.932
	323. 2	40	0.128	0.121	5.469	0.153	1.325
	343. 2	35	0.152	0.145	4.605	0.165	5.096
	343. 2	40	0.175	0.182	4.000	0.203	3.571
	363. 2	40	0.220	0.221	0.455	0.284	2.899
	363. 2	45	0.284	0.283	0.352	0.355	0.281
9.93% H₂S, 7.16% CO₂, 82.91% CH₄	303. 2	30	0.089	0.087	2.247	0.091	2.247
	303. 2	40	0.120	0.123	2.500	0.123	2.500
	323. 2	30	0.118	0.115	2.542	0.111	5.932
	323. 2	40	0.151	0.148	1.987	0.153	1.325
	343. 2	35	0.157	0.160	1.911	0.165	5.096
	343. 2	40	0.196	0.195	0.510	0.203	3.571
	363. 2	40	0.276	0.272	1.449	0.284	2.899
	363. 3	45	0.356	0.359	0.843	0.355	0.281
14.98% H₂S, 7.31% CO₂, 77.71% CH₄	303. 2	30	0.118	0.123	4.237	0.122	3.390
	303. 2	40	0.139	0.138	0.719	0.142	2.158
	323. 2	30	0.142	0.143	0.704	0.145	2.113
	323. 2	40	0.190	0.187	1.579	0.188	1.053
	343. 2	35	0.231	0.235	1.732	0.227	1.732
	343. 2	40	0.287	0.261	9.059	0.268	6.620
	363. 2	40	0.497	0.523	5.231	0.484	2.616
	363. 2	45	0.666	0.681	2.252	0.671	0.751
17.71% H₂S, 6.81% CO₂, 75.48% CH₄	303. 2	20	0.012	0.014	16.667	0.013	8.333
	303. 2	30	0.133	0.112	15.789	0.113	15.038
	303. 2	40	0.162	0.172	6.173	0.157	3.086
	323. 2	30	0.148	0.144	2.703	0.144	2.703
	323. 2	40	0.244	0.239	2.049	0.249	2.049
	343. 2	35	0.267	0.271	1.498	0.271	1.498
	343. 2	40	0.351	0.345	1.709	0.345	1.709
	363. 2	40	0.618	0.633	2.427	0.623	0.809
	363 .2	45	0.814	0.812	0.246	0.832	2.211
26.62% H₂S, 7.00% CO₂, 66.38% CH₄	303. 2	30	0.193	0.202	4.663	0.213	10.363
	303. 2	40	0.248	0.271	9.274	0.246	0.806
	323. 2	30	0.240	0.235	2.083	0.237	1.250
	323. 2	40	0.368	0.375	1.902	0.372	1.087
	343. 2	35	0.488	0.451	7.582	0.495	1.434
	343. 2	40	0.657	0.761	15.830	0.703	7.002
	363. 2	40	1.194	1.231	3.099	1.201	0.586
	363. 2	45	1.455	1.475	1.375	1.507	3.574
10.00% H₂S, 0.86% CO₂, 89.14% CH₄	303.2	30	0.081	0.084	3.704	0.085	4.938
	303. 2	40	0.113	0.116	2.655	0.102	9.735
	323. 2	30	0.117	0.123	5.128	0.125	6.838
	323. 2	40	0.124	0.129	4.032	0.119	4.032
	343. 2	35	0.152	0.148	2.632	0.148	2.632
	343. 2	40	0.180	0.186	3.333	0.179	0.556
	363. 2	40	0.225	0.230	2.222	0.242	7.556
	363. 2	45	0.317	0.345	8.833	0.312	1.577
10.03%H₂S, 10.39%CO₂, 79.58% CH₄	303. 2	30	0.091	0.085	6.593	0.088	3.297
	303. 2	40	0.127	0.133	4.724	0.123	3.150
	323. 2	30	0.130	0.136	4.615	0.136	4.615
	323. 2	40	0.155	0.159	2.581	0.152	1.935
	343. 2	35	0.160	0.164	2.500	0.165	3.125
	343. 2	40	0.204	0.198	2.941	0.199	2.451
	363. 2	40	0.293	0.287	2.048	0.285	2.730
	363. 2	45	0.366	0.382	4.372	0.372	1.639

Open in a new tab

4.2. Model Comparison

The accuracy and reliability of the model were further verified by using all the data and the four statistical indicators mentioned above. The WOA&GWO-WLSSVM model was compared with three widely used empirical models (those of Roberts,⁴³ and Guo–Wang,⁴⁴ and Hu⁴⁵) and three ML models (those of Chen,¹⁰ Amar,¹¹ and Bian¹²), and the analysis results are shown in Table 7. After calculation, it is found that the statistical indicators of the prediction results of the empirical model are generally inferior to those of the ML methods. In addition, among the ML methods, WOA-WLSSVM obtains the best statistical indicators: the AARD of the model is 0.7, 0.06, 0.05, and 0.02% lower than that of the Chen model, Amar model, Bian model, and GWO-WLSSVM, respectively; the SD of the model is reduced by 0.01 compared with that of the Chen model; the RMSE of the model is reduced by 0.011, 0.002, 0.003, and 0.001 compared with that of the Chen model, Amar model, Bian model, and GWO-WLSSVM, respectively; for WOA-WLSSVM, R² reached 0.9987, which is higher than that of the other models, indicating that the model has a higher degree of fit and a better prediction effect.

Table 7. Comparison of the New Model with Other Models.

models	AARD (%)	SD	RMSE	R²
Roberts model	64.36	0.86	0.67	0.6792
Guo–Wang model	12.84	0.15	0.17	0.9833
Hu model	17.32	0.22	0.21	0.9731
Chen model	4.15	0.06	0.032	0.9968
Amar model	3.51	0.07	0.023	0.9981
Bian model	3.50	0.07	0.024	0.9976
WOA-WLSSVM	3.45	0.07	0.021	0.9987
GWO-WLSSVM	3.47	0.07	0.022	0.9983

Open in a new tab

The model we propose is based on SVM, and the approach used in this work seems similar to Bian’s approach¹²from the macro level. Therefore, in this section, we compare the scores of the 10-fold cross-validation, which not only evaluates the prediction effect but also reflects the stability of the model.^30,46 The stability of the model is directly related to its application effect in actual engineering and is also the focus of our attention.

The scores of the proposed model and Bian’s model after 10-fold cross-validation are shown in Table 8. The mean score of WOA-WLSSVM was as high as 0.8941, and the SD σ was also 0.0192 lower than that of GWO-WLSSVM. This indicates that WOA can find the optimal parameters of WLSSVM more precisely and better satisfy the pursuit of high accuracy and precision for the model. The mean score of GWO-WLSSVM is 12.80% higher and has a lower SD than Bian’s model, indicating that the improved WLSSVM model outperforms the improved LSSVM model in terms of prediction accuracy and stability. It should be noted that through Table 7 we can see that the model of Bian et al. also performs very well and predicts much better than the empirical model, indicating that the GWO-LSSVM model is equally reliable and applicable. This also fully illustrates that the improved SVM model is an efficient method for sulfur solubility prediction.

Table 8. 10-Fold Cross-Validation Score.

number	WOA-WLSSVM	GWO-WLSSVM	Bian model (GWO-LSSVM)
1	0.7991	0.7732	0.6112
2	0.8713	0.7361	0.6301
3	0.8863	0.889	0.7119
4	0.9211	0.8213	0.7702
5	0.9502	0.8402	0.7322
6	0.8899	0.9031	0.8071
7	0.8818	0.9004	0.8102
8	0.9033	0.8912	0.7969
9	0.9075	0.7919	0.7829
10	0.9306	0.8989	0.8346
average rating	0.8941	0.8445	0.7487
standard deviation (σ)	0.0390	0.0582	0.0729

Open in a new tab

4.3. Outlier Diagnosis

Outlier diagnosis tests unreasonable data, uses the leverage method to search for outliers in the data set for reliability analysis, and draws a Williams plot to show the correlation between the standardized cross-validation residuals and the hat index (H).⁴⁷⁻⁴⁹ The definition of H is as follows (eq 43):

where X is a two-dimensional matrix composed of n data values (rows) and k input variables (columns) and t is the transpose matrix.

In the Williams plot, there is a square area (0 ≤ H ≤ H* and −3 ≤ SR ≤ 3) determined by the standard residual (SR) and leverage threshold H* (H* is generally equal to 3n/(k + 1)).⁵⁰ If most of the data points are distributed in the square area, it means that there are few abnormal data points and also proves the validity of the model in the field of statistics. The Williams plot output by the two new models after outlier detection is shown in Figure 10. It can be seen from the figure that most of the sulfur solubility data predicted by the two models are within the valid range of [−3,3] and [0, H*]. It is proven that the two models proposed in this study pass the statistical test. We can see that the WOA-WLSSVM model has fewer outliers, so it is more effective and reliable than GWO-WLSSVM.

5. Conclusions

(1)
The main factors affecting the solubility of sulfur in sour gases were screened by an improved CGRA method, and the input variables of the WLSSVM model were determined. As an improvement of the traditional SVM, the use of WLSSVM improves the rate of convergence and saves computational cost.
(2)
Using the WOA and GWO swarm-based algorithms to find the optimal parameters, the WOA-WLSSVM and GWO-WLSSVM sulfur solubility prediction models (for sour gas) are established. After statistical analysis (AARD, RMSE, SD, and R²) and statistical tests (outlier diagnostics), the results prove that the two models have good accuracy, robustness, generalization, validity, and reliability.
(3)
The WOA-WLSSVM models are superior to other predictive models, including empirical models and ML models: AARD is as low as 3.45%, R² is as high as 0.9987, and the prediction accuracy is much higher than that of other prediction models. This indicates that the improved SVM model is an efficient method for predicting sulfur solubility in sour gas mixtures.
(4)
The sulfur solubility data set (245 data sets) used in this study ranges in temperature (303.2–486 K), pressure (6.7–155 MPa), and H₂S content (1–26.62%). Compared with the data sets in previous studies (Bian, Liang Fu, and Amar), it is more extensive. It should be noted that the prediction of sulfur solubility using the ML method in this range is better than that using other models. For predictions outside this range, the validity of the model needs to be tested using reliable experimental data.

Acknowledgments

We acknowledge the National Natural Science Foundation of China ″Research on Risk Assessment and Management of In-service Submarine Oil and Gas Transmission Pipelines″ (41877527) and Shaanxi Provincial Social Science Fund ″Research on Urban Natural Gas Pipeline Risk Assessment and Management under the Background of Big Data″ (2018S34).

The authors declare no competing financial interest.

References

Zeng D.; Zhang J.; Zhang G.; Mi L. Research Progress of Sinopec’s Key Underground Gas Storage Construction Technologies. Nat. Gas Ind. 2020, 40, 115–123. 10.3787/j.issn.1000-0976.2020.06.012. [DOI] [Google Scholar]
Gu S.; Shi Z.; Hu X.; Shi Y.; Qin S.; Guo X. An Experimental Study on Gas-Liquid Sulfur Two-Phase Flow in Ultradeep High-Sulfur Gas Reservoirs. Nat. Gas Ind. 2018, 38, 70–75. 10.3787/j.issn.1000-0976.2018.10.010. [DOI] [Google Scholar]
Burgers W. F. J.; Northrop P. S.; Kheshgi H. S.; Valencia J. A. Worldwide Development Potential for Sour Gas. Energy Procedia 2011, 4, 2178–2184. 10.1016/j.egypro.2011.02.104. [DOI] [Google Scholar]
Goodwin M.; Musa O.; Steed J. Problems Associated with Sour Gas in the Oilfield Industry and Their Solutions. Energy Fuels 2015, 29, 4667–4682. 10.1021/acs.energyfuels.5b00952. [DOI] [Google Scholar]
Ru Z.; Hu J.; Fan L.; Qin J.; Wang S. Impact of Sulfur Precipitation in Production of Sour Gas Wells. J. Liaoning Tech. Univ., Nat. Sci. Ed. 2017, 36, 1143–1148. 10.11956/j.issn.1008-0562.2017.11.005. [DOI] [Google Scholar]
Brunner E.; Woll W. Solubility of Sulfur in Hydrogen Sulfide and Sour Gases. Soc. Pet. Eng. J. 1980, 20, 377–384. 10.2118/8778-PA. [DOI] [Google Scholar]
Brunner E.; Place M. C. Jr.; Woll W. H. Sulfur Solubility in Sour Gas. J. Pet. Technol. 1988, 40, 1587–1592. 10.2118/14264-PA. [DOI] [Google Scholar]
Fu L.; Hu J.; Zhang Y.; Li Q. Investigation on Sulfur Solubility in Sour Gas at Elevated Temperatures and Pressures with an Artificial Neural Network Algorithm. Fuel 2020, 262, 116541 10.1016/j.fuel.2019.116541. [DOI] [Google Scholar]
Mohammadi A. H.; Richon D. Estimating Sulfur Content of Hydrogen Sulfide at Elevated Temperatures and Pressures Using an Artificial Neural Network Algorithm. Ind. Eng. Chem. Res. 2008, 47, 8499–8504. 10.1021/ie8004463. [DOI] [Google Scholar]
Chen L.; Li C. J.; Leng M.; Ren S.; Liu G.; Ren Q. Genetic BP neural network-based prediction of sulfur solubility in high sulfur-containing gases. Mod. Chem. 2014, 034, 142–147. [Google Scholar]
Nait Amar M. Modeling Solubility of Sulfur in Pure Hydrogen Sulfide and Sour Gas Mixtures Using Rigorous Machine Learning Methods. Int. J. Hydrogen Energy 2020, 45, 33274–33287. 10.1016/j.ijhydene.2020.09.145. [DOI] [Google Scholar]
Bian X. Q.; Song Y. L.; Mwamukonda M. K.; Fu Y. Prediction of the Sulfur Solubility in Pure H2S and Sour Gas by Intelligent Models. J. Mol. Liq. 2020, 299, 112242 10.1016/j.molliq.2019.112242. [DOI] [Google Scholar]
Bian X. Q.; Zhang L.; Du Z. M.; Chen J.; Zhang J. Y. Prediction of Sulfur Solubility in Supercritical Sour Gases Using Grey Wolf Optimizer-Based Support Vector Machine. J. Mol. Liq. 2018, 261, 431–438. 10.1016/j.molliq.2018.04.070. [DOI] [Google Scholar]
Liu Y.; Hong W.; Cao B. Machine Learning for Predicting Thermodynamic Properties of Pure Fluids and Their Mixtures. Energy 2019, 188, 116091 10.1016/j.energy.2019.116091. [DOI] [Google Scholar]
Wu Y.; Lei J. W.; Bao L. S.; Li C. Z. Short-term load forecasting based on improved grey correlation analysis with bat-optimized neural networks optimized by bat algorithm. Autom. Electr. Power Syst. 2018, 42, 73–78. 10.7500/AEPS20180125004. [DOI] [Google Scholar]
Zhang Q. S.; Deng J. L. Gray Relation Entropy Analysis Method. Systems Engineering - Theory & Practice 1996, 16, 7–11. [Google Scholar]
Guoqing Y.; Zili W.; Baosen Z.; Zhigang X. The Grey Relational Analysis of Sluice Monitoring Data. Procedia Eng. 2011, 15, 5192–5196. 10.1016/j.proeng.2011.08.962. [DOI] [Google Scholar]
Cawley G. C.; Talbot N. L. C. Improved Sparse Least-Squares Support Vector Machines. Neurocomputing 2002, 48, 1025–1031. 10.1016/S0925-2312(02)00606-9. [DOI] [Google Scholar]
Suykens J. A. K.; de Brabanter J.; Lukas L.; Vandewalle J. Weighted Least Squares Support Vector Machines: Robustness and Sparse Approximation. Neurocomputing 2002, 48, 85–105. 10.1016/S0925-2312(01)00644-0. [DOI] [Google Scholar]
Liu X.; Gu Y.; He S.; Xu Z.; Zhang Z. A Robust Reliability Prediction Method Using Weighted Least Square Support Vector Machine Equipped with Chaos Modified Particle Swarm Optimization and Online Correcting Strategy. Appl. Soft Comput. 2019, 85, 105873 10.1016/j.asoc.2019.105873. [DOI] [Google Scholar]
Long W.; Jiao J.; Liang X.; Tang M. An Exploration-Enhanced Grey Wolf Optimizer to Solve High-Dimensional Numerical Optimization. Eng App Artif Intel. 2018, 68, 63–80. 10.1016/j.engappai.2017.10.024. [DOI] [Google Scholar]
Luo Z. S.; Qin Y.; Zhang X. S.; Bi A. R. Prediction of External Corrosion Rate of Marine Pipelines Based on LASSO-WOA-LSSVM. Surf. Technol. 2021, 50, 245–252. [Google Scholar]
Mirjalili S.; Mirjalili S. M.; Lewis A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. 10.1016/j.advengsoft.2013.12.007. [DOI] [Google Scholar]
Bian X. Q.; Huang J. H.; Wang Y.; Liu Y. B.; Kasthuriarachchi D.; Huang L. J. Prediction of wax disappearance temperature by intelligent models. Energy Fuels 2019, 33, 2934–2949. 10.1021/acs.energyfuels.8b04286. [DOI] [Google Scholar]
Bian X. Q.; Zhang Q.; Zhang L.; Chen J. A Grey Wolf Optimizer-Based Support Vector Machine for the Solubility of Aromatic Compounds in Supercritical Carbon Dioxide. Chem. Eng. Res. Des. 2017, 123, 284–294. 10.1016/j.cherd.2017.05.008. [DOI] [Google Scholar]
Meng X.; Jiang J.; Wang H. AGWO: Advanced GWO in Multi-Layer Perception Optimization. Expert Syst. Appl. 2021, 173, 114676 10.1016/j.eswa.2021.114676. [DOI] [Google Scholar]
Mirjalili S.; Lewis A. The Whale Optimization Algorithm. Adv Eng Softw. 2016, 95, 51–67. 10.1016/j.advengsoft.2016.01.008. [DOI] [Google Scholar]
Chakraborty S.; Saha A. K.; Chakraborty R.; Saha M. An Enhanced Whale Optimization Algorithm for Large Scale Optimization Problems. Knowledge-Based Systems 2021, 233, 107543 10.1016/j.knosys.2021.107543. [DOI] [Google Scholar]
Yan Z.; Zhang J.; Zeng J.; Tang J. Nature-Inspired Approach: An Enhanced Whale Optimization Algorithm for Global Optimization. Math Comput. Simulat. 2021, 185, 17–46. 10.1016/j.matcom.2020.12.008. [DOI] [Google Scholar]
Amini S.; Taki M.; Rohani A. Applied Improved RBF Neural Network Model for Predicting the Broiler Output Energies. Appl. Soft Comput. 2020, 87, 106006 10.1016/j.asoc.2019.106006. [DOI] [Google Scholar]
Yang X. F.; Huang X. P.; Zhong B. Experimental Test and Calculation Methods of Elemental Sulfur Solubility in High Sulfur Content Gas. Nat. Gas. Geosci. 2009, 20, 416–419. [Google Scholar]
Zhang G. D.Study on features of phase behaviors and seepage mechanism in high sour gas reservoir-take samples from changxing gas reservoir in yuanba as study objects; Chengdu University of Technology, 2014; pp 22–33. [Google Scholar]
Bian X. Q.; Du Z. M.; Guo X. Measurement of the Solubility of Sulfur in Natural Gas with a High H₂S Content. Nat. Gas Ind. 2010, 30, 57–58. 10.3787/j.issn.1000-0976.2010.12.014. [DOI] [Google Scholar]
Gu M. X.; Li Q.; Zhou S. Y.; Chen W. D.; Guo T. M. Experimental and Modeling Studies on the Phase Behavior of High H2S-Content Natural Gas Mixtures. Fluid Phase Equilib. 1993, 82, 173–182. 10.1016/0378-3812(93)87141-M. [DOI] [Google Scholar]
Sun C. Y.; Chen G. J. Experimental and Modeling Studies on Sulfur Solubility in Sour Gas. Fluid Phase Equilib. 2003, 214, 187–195. 10.1016/S0378-3812(03)00351-0. [DOI] [Google Scholar]
Zhu W. G.; Li Y. X.; Yang W. Q.; Liu X. C.; Xiong N.; Zhou C.; Wang L. Short-term load forecasting based on K-fold cross-validation and stacking ensemble. J. Electr. Power Sci. Technol. 2021, 36, 87–95. 10.1109/ACCESS.2020.3041779. [DOI] [Google Scholar]
Deepanraj B.; Sivasubramanian V.; Jayaraj S. Multi-Response Optimization of Process Parameters in Biogas Production from Food Waste Using Taguchi – Grey Relational Analysis. Energy Convers. Manage. 2017, 141, 429–438. 10.1016/j.enconman.2016.12.013. [DOI] [Google Scholar]
Nelabhotla D. M.; Jayaraman T.; Asghar K.; Das D. The Optimization of Chemical Mechanical Planarization Process-Parameters of c-Plane Gallium-Nitride Using Taguchi Method and Grey Relational Analysis. Mater. Des. 2016, 104, 392–403. 10.1016/j.matdes.2016.05.031. [DOI] [Google Scholar]
Jena M.; Chjeelenahalii M.; Wali S.; Ganganagappa N.; Siddaramanna A.; Mankunipoyil S. A. Optimization of Parameters for Maximizing Photocatalytic Behaviour of Zn 1-x Fe x O Nanoparticles for Methyl Orange Degradation Using Taguchi and Grey Relational Analysis Approach. Mater. Today Chem. 2019, 12, 187–199. 10.1016/j.mtchem.2019.01.004. [DOI] [Google Scholar]
Zhang X.; Zhang Q. Research on Prediction of Corrosion Depth of Long Oil Pipelines Based on Improved RFFS and GSA-SVR. Xitong Gongcheng Lilun yu Shijian/System Engineering Theory and Practice 2021, 41, 1598–1610. 10.12011/SETP2020-0066. [DOI] [Google Scholar]
Okwuashi O.; Ndehedehe C. Deep Support Vector Machine for Hyperspectral Image Classification. Pattern Recognit. 2020, 103, 107298 10.1016/j.patcog.2020.107298. [DOI] [Google Scholar]
Sun W.; Xu C. Carbon Price Prediction Based on Modified Wavelet Least Square Support Vector Machine. Sci. Total Environ. 2021, 754, 142052 10.1016/j.scitotenv.2020.142052. [DOI] [PubMed] [Google Scholar]
Roberts B. The Effect of Sulfur Deposition on Gaswell Inflow Performance. SPE Reservoir Eng. 1997, 12, 118–123. 10.2118/36707-PA. [DOI] [Google Scholar]
Guo X.; Wang Q. A New Prediction Model of Elemental Sulfur Solubility in Sour Gas Mixtures. J. Nat. Gas Sci. Eng. 2016, 31, 98–107. 10.1016/j.jngse.2016.02.059. [DOI] [Google Scholar]
Hu J.-H.; Zhao J.-Z.; Wang L.; Meng L.-Y.; Li Y.-M. Prediction Model of Elemental Sulfur Solubility in Sour Gas Mixtures. J. Nat. Gas Sci. Eng. 2014, 18, 31–38. 10.1016/j.jngse.2014.01.011. [DOI] [Google Scholar]
Baumgartl H.; Tomas J.; Buettner R.; Merkel M. A Deep Learning-Based Model for Defect Detection in Laser-Powder Bed Fusion Using in-Situ Thermographic Monitoring. Prog Addit Manuf. 2020, 5, 277–285. 10.1007/s40964-019-00108-3. [DOI] [Google Scholar]
Valderrama P.; Braga J. W. B.; Poppi R. J. Variable Selection, Outlier Detection, and Figures of Merit Estimation in a Partial Least-Squares Regression Multivariate Calibration Model. A Case Study for the Determination of Quality Parameters in the Alcohol Industry by Near-Infrared Spectroscopy. J. Agric. Food Chem. 2007, 55, 8331–8338. 10.1021/jf071538s. [DOI] [PubMed] [Google Scholar]
Eslamimanesh A.; Gharagheizi F.; Mohammadi A. H.; Richon D. Assessment Test of Sulfur Content of Gases. Fuel Process. Technol. 2013, 110, 133–140. 10.1016/j.fuproc.2012.12.005. [DOI] [Google Scholar]
El-Melegy M. Model-Wise and Point-Wise Random Sample Consensus for Robust Regression and Outlier Detection. Neural Netw. 2014, 59, 23–35. 10.1016/j.neunet.2014.06.010. [DOI] [PubMed] [Google Scholar]
Mokarizadeh H.; Atashrouz S.; Mirshekar H.; Mohaddespour A.; Hemmati-Sarapardeh A. Comparison of LSSVM Model Results with Artificial Neural Network Model for Determination of the Solubility of SO₂ in Ionic Liquids. J. Mol. Liq. 2020, 304, 112771 10.1016/j.molliq.2020.112771. [DOI] [Google Scholar]

[ref1] Zeng D.; Zhang J.; Zhang G.; Mi L. Research Progress of Sinopec’s Key Underground Gas Storage Construction Technologies. Nat. Gas Ind. 2020, 40, 115–123. 10.3787/j.issn.1000-0976.2020.06.012. [DOI] [Google Scholar]

[ref2] Gu S.; Shi Z.; Hu X.; Shi Y.; Qin S.; Guo X. An Experimental Study on Gas-Liquid Sulfur Two-Phase Flow in Ultradeep High-Sulfur Gas Reservoirs. Nat. Gas Ind. 2018, 38, 70–75. 10.3787/j.issn.1000-0976.2018.10.010. [DOI] [Google Scholar]

[ref3] Burgers W. F. J.; Northrop P. S.; Kheshgi H. S.; Valencia J. A. Worldwide Development Potential for Sour Gas. Energy Procedia 2011, 4, 2178–2184. 10.1016/j.egypro.2011.02.104. [DOI] [Google Scholar]

[ref4] Goodwin M.; Musa O.; Steed J. Problems Associated with Sour Gas in the Oilfield Industry and Their Solutions. Energy Fuels 2015, 29, 4667–4682. 10.1021/acs.energyfuels.5b00952. [DOI] [Google Scholar]

[ref5] Ru Z.; Hu J.; Fan L.; Qin J.; Wang S. Impact of Sulfur Precipitation in Production of Sour Gas Wells. J. Liaoning Tech. Univ., Nat. Sci. Ed. 2017, 36, 1143–1148. 10.11956/j.issn.1008-0562.2017.11.005. [DOI] [Google Scholar]

[ref6] Brunner E.; Woll W. Solubility of Sulfur in Hydrogen Sulfide and Sour Gases. Soc. Pet. Eng. J. 1980, 20, 377–384. 10.2118/8778-PA. [DOI] [Google Scholar]

[ref7] Brunner E.; Place M. C. Jr.; Woll W. H. Sulfur Solubility in Sour Gas. J. Pet. Technol. 1988, 40, 1587–1592. 10.2118/14264-PA. [DOI] [Google Scholar]

[ref8] Fu L.; Hu J.; Zhang Y.; Li Q. Investigation on Sulfur Solubility in Sour Gas at Elevated Temperatures and Pressures with an Artificial Neural Network Algorithm. Fuel 2020, 262, 116541 10.1016/j.fuel.2019.116541. [DOI] [Google Scholar]

[ref9] Mohammadi A. H.; Richon D. Estimating Sulfur Content of Hydrogen Sulfide at Elevated Temperatures and Pressures Using an Artificial Neural Network Algorithm. Ind. Eng. Chem. Res. 2008, 47, 8499–8504. 10.1021/ie8004463. [DOI] [Google Scholar]

[ref10] Chen L.; Li C. J.; Leng M.; Ren S.; Liu G.; Ren Q. Genetic BP neural network-based prediction of sulfur solubility in high sulfur-containing gases. Mod. Chem. 2014, 034, 142–147. [Google Scholar]

[ref11] Nait Amar M. Modeling Solubility of Sulfur in Pure Hydrogen Sulfide and Sour Gas Mixtures Using Rigorous Machine Learning Methods. Int. J. Hydrogen Energy 2020, 45, 33274–33287. 10.1016/j.ijhydene.2020.09.145. [DOI] [Google Scholar]

[ref12] Bian X. Q.; Song Y. L.; Mwamukonda M. K.; Fu Y. Prediction of the Sulfur Solubility in Pure H2S and Sour Gas by Intelligent Models. J. Mol. Liq. 2020, 299, 112242 10.1016/j.molliq.2019.112242. [DOI] [Google Scholar]

[ref13] Bian X. Q.; Zhang L.; Du Z. M.; Chen J.; Zhang J. Y. Prediction of Sulfur Solubility in Supercritical Sour Gases Using Grey Wolf Optimizer-Based Support Vector Machine. J. Mol. Liq. 2018, 261, 431–438. 10.1016/j.molliq.2018.04.070. [DOI] [Google Scholar]

[ref14] Liu Y.; Hong W.; Cao B. Machine Learning for Predicting Thermodynamic Properties of Pure Fluids and Their Mixtures. Energy 2019, 188, 116091 10.1016/j.energy.2019.116091. [DOI] [Google Scholar]

[ref15] Wu Y.; Lei J. W.; Bao L. S.; Li C. Z. Short-term load forecasting based on improved grey correlation analysis with bat-optimized neural networks optimized by bat algorithm. Autom. Electr. Power Syst. 2018, 42, 73–78. 10.7500/AEPS20180125004. [DOI] [Google Scholar]

[ref16] Zhang Q. S.; Deng J. L. Gray Relation Entropy Analysis Method. Systems Engineering - Theory & Practice 1996, 16, 7–11. [Google Scholar]

[ref17] Guoqing Y.; Zili W.; Baosen Z.; Zhigang X. The Grey Relational Analysis of Sluice Monitoring Data. Procedia Eng. 2011, 15, 5192–5196. 10.1016/j.proeng.2011.08.962. [DOI] [Google Scholar]

[ref18] Cawley G. C.; Talbot N. L. C. Improved Sparse Least-Squares Support Vector Machines. Neurocomputing 2002, 48, 1025–1031. 10.1016/S0925-2312(02)00606-9. [DOI] [Google Scholar]

[ref19] Suykens J. A. K.; de Brabanter J.; Lukas L.; Vandewalle J. Weighted Least Squares Support Vector Machines: Robustness and Sparse Approximation. Neurocomputing 2002, 48, 85–105. 10.1016/S0925-2312(01)00644-0. [DOI] [Google Scholar]

[ref20] Liu X.; Gu Y.; He S.; Xu Z.; Zhang Z. A Robust Reliability Prediction Method Using Weighted Least Square Support Vector Machine Equipped with Chaos Modified Particle Swarm Optimization and Online Correcting Strategy. Appl. Soft Comput. 2019, 85, 105873 10.1016/j.asoc.2019.105873. [DOI] [Google Scholar]

[ref21] Long W.; Jiao J.; Liang X.; Tang M. An Exploration-Enhanced Grey Wolf Optimizer to Solve High-Dimensional Numerical Optimization. Eng App Artif Intel. 2018, 68, 63–80. 10.1016/j.engappai.2017.10.024. [DOI] [Google Scholar]

[ref22] Luo Z. S.; Qin Y.; Zhang X. S.; Bi A. R. Prediction of External Corrosion Rate of Marine Pipelines Based on LASSO-WOA-LSSVM. Surf. Technol. 2021, 50, 245–252. [Google Scholar]

[ref23] Mirjalili S.; Mirjalili S. M.; Lewis A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. 10.1016/j.advengsoft.2013.12.007. [DOI] [Google Scholar]

[ref24] Bian X. Q.; Huang J. H.; Wang Y.; Liu Y. B.; Kasthuriarachchi D.; Huang L. J. Prediction of wax disappearance temperature by intelligent models. Energy Fuels 2019, 33, 2934–2949. 10.1021/acs.energyfuels.8b04286. [DOI] [Google Scholar]

[ref25] Bian X. Q.; Zhang Q.; Zhang L.; Chen J. A Grey Wolf Optimizer-Based Support Vector Machine for the Solubility of Aromatic Compounds in Supercritical Carbon Dioxide. Chem. Eng. Res. Des. 2017, 123, 284–294. 10.1016/j.cherd.2017.05.008. [DOI] [Google Scholar]

[ref26] Meng X.; Jiang J.; Wang H. AGWO: Advanced GWO in Multi-Layer Perception Optimization. Expert Syst. Appl. 2021, 173, 114676 10.1016/j.eswa.2021.114676. [DOI] [Google Scholar]

[ref27] Mirjalili S.; Lewis A. The Whale Optimization Algorithm. Adv Eng Softw. 2016, 95, 51–67. 10.1016/j.advengsoft.2016.01.008. [DOI] [Google Scholar]

[ref28] Chakraborty S.; Saha A. K.; Chakraborty R.; Saha M. An Enhanced Whale Optimization Algorithm for Large Scale Optimization Problems. Knowledge-Based Systems 2021, 233, 107543 10.1016/j.knosys.2021.107543. [DOI] [Google Scholar]

[ref29] Yan Z.; Zhang J.; Zeng J.; Tang J. Nature-Inspired Approach: An Enhanced Whale Optimization Algorithm for Global Optimization. Math Comput. Simulat. 2021, 185, 17–46. 10.1016/j.matcom.2020.12.008. [DOI] [Google Scholar]

[ref30] Amini S.; Taki M.; Rohani A. Applied Improved RBF Neural Network Model for Predicting the Broiler Output Energies. Appl. Soft Comput. 2020, 87, 106006 10.1016/j.asoc.2019.106006. [DOI] [Google Scholar]

[ref31] Yang X. F.; Huang X. P.; Zhong B. Experimental Test and Calculation Methods of Elemental Sulfur Solubility in High Sulfur Content Gas. Nat. Gas. Geosci. 2009, 20, 416–419. [Google Scholar]

[ref32] Zhang G. D.Study on features of phase behaviors and seepage mechanism in high sour gas reservoir-take samples from changxing gas reservoir in yuanba as study objects; Chengdu University of Technology, 2014; pp 22–33. [Google Scholar]

[ref33] Bian X. Q.; Du Z. M.; Guo X. Measurement of the Solubility of Sulfur in Natural Gas with a High H₂S Content. Nat. Gas Ind. 2010, 30, 57–58. 10.3787/j.issn.1000-0976.2010.12.014. [DOI] [Google Scholar]

[ref34] Gu M. X.; Li Q.; Zhou S. Y.; Chen W. D.; Guo T. M. Experimental and Modeling Studies on the Phase Behavior of High H2S-Content Natural Gas Mixtures. Fluid Phase Equilib. 1993, 82, 173–182. 10.1016/0378-3812(93)87141-M. [DOI] [Google Scholar]

[ref35] Sun C. Y.; Chen G. J. Experimental and Modeling Studies on Sulfur Solubility in Sour Gas. Fluid Phase Equilib. 2003, 214, 187–195. 10.1016/S0378-3812(03)00351-0. [DOI] [Google Scholar]

[ref36] Zhu W. G.; Li Y. X.; Yang W. Q.; Liu X. C.; Xiong N.; Zhou C.; Wang L. Short-term load forecasting based on K-fold cross-validation and stacking ensemble. J. Electr. Power Sci. Technol. 2021, 36, 87–95. 10.1109/ACCESS.2020.3041779. [DOI] [Google Scholar]

[ref37] Deepanraj B.; Sivasubramanian V.; Jayaraj S. Multi-Response Optimization of Process Parameters in Biogas Production from Food Waste Using Taguchi – Grey Relational Analysis. Energy Convers. Manage. 2017, 141, 429–438. 10.1016/j.enconman.2016.12.013. [DOI] [Google Scholar]

[ref38] Nelabhotla D. M.; Jayaraman T.; Asghar K.; Das D. The Optimization of Chemical Mechanical Planarization Process-Parameters of c-Plane Gallium-Nitride Using Taguchi Method and Grey Relational Analysis. Mater. Des. 2016, 104, 392–403. 10.1016/j.matdes.2016.05.031. [DOI] [Google Scholar]

[ref39] Jena M.; Chjeelenahalii M.; Wali S.; Ganganagappa N.; Siddaramanna A.; Mankunipoyil S. A. Optimization of Parameters for Maximizing Photocatalytic Behaviour of Zn 1-x Fe x O Nanoparticles for Methyl Orange Degradation Using Taguchi and Grey Relational Analysis Approach. Mater. Today Chem. 2019, 12, 187–199. 10.1016/j.mtchem.2019.01.004. [DOI] [Google Scholar]

[ref40] Zhang X.; Zhang Q. Research on Prediction of Corrosion Depth of Long Oil Pipelines Based on Improved RFFS and GSA-SVR. Xitong Gongcheng Lilun yu Shijian/System Engineering Theory and Practice 2021, 41, 1598–1610. 10.12011/SETP2020-0066. [DOI] [Google Scholar]

[ref41] Okwuashi O.; Ndehedehe C. Deep Support Vector Machine for Hyperspectral Image Classification. Pattern Recognit. 2020, 103, 107298 10.1016/j.patcog.2020.107298. [DOI] [Google Scholar]

[ref42] Sun W.; Xu C. Carbon Price Prediction Based on Modified Wavelet Least Square Support Vector Machine. Sci. Total Environ. 2021, 754, 142052 10.1016/j.scitotenv.2020.142052. [DOI] [PubMed] [Google Scholar]

[ref43] Roberts B. The Effect of Sulfur Deposition on Gaswell Inflow Performance. SPE Reservoir Eng. 1997, 12, 118–123. 10.2118/36707-PA. [DOI] [Google Scholar]

[ref44] Guo X.; Wang Q. A New Prediction Model of Elemental Sulfur Solubility in Sour Gas Mixtures. J. Nat. Gas Sci. Eng. 2016, 31, 98–107. 10.1016/j.jngse.2016.02.059. [DOI] [Google Scholar]

[ref45] Hu J.-H.; Zhao J.-Z.; Wang L.; Meng L.-Y.; Li Y.-M. Prediction Model of Elemental Sulfur Solubility in Sour Gas Mixtures. J. Nat. Gas Sci. Eng. 2014, 18, 31–38. 10.1016/j.jngse.2014.01.011. [DOI] [Google Scholar]

[ref46] Baumgartl H.; Tomas J.; Buettner R.; Merkel M. A Deep Learning-Based Model for Defect Detection in Laser-Powder Bed Fusion Using in-Situ Thermographic Monitoring. Prog Addit Manuf. 2020, 5, 277–285. 10.1007/s40964-019-00108-3. [DOI] [Google Scholar]

[ref47] Valderrama P.; Braga J. W. B.; Poppi R. J. Variable Selection, Outlier Detection, and Figures of Merit Estimation in a Partial Least-Squares Regression Multivariate Calibration Model. A Case Study for the Determination of Quality Parameters in the Alcohol Industry by Near-Infrared Spectroscopy. J. Agric. Food Chem. 2007, 55, 8331–8338. 10.1021/jf071538s. [DOI] [PubMed] [Google Scholar]

[ref48] Eslamimanesh A.; Gharagheizi F.; Mohammadi A. H.; Richon D. Assessment Test of Sulfur Content of Gases. Fuel Process. Technol. 2013, 110, 133–140. 10.1016/j.fuproc.2012.12.005. [DOI] [Google Scholar]

[ref49] El-Melegy M. Model-Wise and Point-Wise Random Sample Consensus for Robust Regression and Outlier Detection. Neural Netw. 2014, 59, 23–35. 10.1016/j.neunet.2014.06.010. [DOI] [PubMed] [Google Scholar]

[ref50] Mokarizadeh H.; Atashrouz S.; Mirshekar H.; Mohaddespour A.; Hemmati-Sarapardeh A. Comparison of LSSVM Model Results with Artificial Neural Network Model for Determination of the Solubility of SO₂ in Ionic Liquids. J. Mol. Liq. 2020, 304, 112771 10.1016/j.molliq.2020.112771. [DOI] [Google Scholar]

PERMALINK

Modeling the Solubility of Sulfur in Sour Gas Mixtures Using Improved Support Vector Machine Methods

Yu-Chen Wang

Zheng-Shan Luo

Yi-Qiong Gao

Yu-Lei Kong

Abstract

1. Introduction

Table 1. Comparison of Several Machine Learning Methods for Sulfur Solubility Prediction.

Figure 1.

2. Modeling Techniques

2.1. CGRA: An Improvement Based on GRA

2.2. WLSSVM: An Improvement Based on SVM

2.3. Swarm-Based Algorithm

2.3.1. GWO Algorithm

Figure 2.

2.3.2. WOA Algorithm

2.4. K-Fold Cross-Validation

Figure 3.

2.5. Establishment of the WOA&GWO-WLSSVM Model

Figure 4.

3. Data Analysis and Model Training

3.1. Experimental Data

Table 2. Sulfur Solubility Data Sets Used in the Study.

3.2. Selection of the Model Input Parameters

Figure 5.

3.3. Determination of the Model Details

Table 3. MSE and R2 of Different Kernel Functions.

Table 4. Parameters of the Trained Model.

4. Results and Discussion

4.1. Quantitative Evaluation

Table 5. Statistical Evaluation Results of the Sulfur Solubility Prediction Model (a, b).

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Table 6. Performance Testing with a New Data Set.

4.2. Model Comparison

Table 7. Comparison of the New Model with Other Models.

Table 8. 10-Fold Cross-Validation Score.

4.3. Outlier Diagnosis

Figure 10.

5. Conclusions

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 3. MSE and R² of Different Kernel Functions.