Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Oct 8;15:35097. doi: 10.1038/s41598-025-18980-1

Kernel principal component analysis-based water quality index modelling for coastal aquifers in Saudi Arabia

Ali Aldrees 1, Abdulhayat M Jibrin 2,, Salisu Dan’azumi 1,3, Mohammad Al-Suwaiyan 2, Sani I Abba 4, Zaher Mundher Yaseen 2
PMCID: PMC12508030  PMID: 41062553

Abstract

This study developed a novel Water Quality Index (WQI) using Kernel Principal Component Analysis (PCA) to assess groundwater quality (GWQ) in the coastal aquifers of Al-Qatif, Saudi Arabia. A total of 39 groundwater samples were collected from shallow and deep wells and analyzed for key physicochemical parameters. Six kernel types were tested, and the polynomial kernel was found to be most effective in preserving variance and reducing dimensionality. The Kernel PCA-based WQI classified wells into ‘Very Bad,’ ‘Bad,’ and ‘Medium’ categories, with scores such as W3 (WQI = 25.51, “Very Bad”), W31 (WQI = 46.7, “Bad”), and W38 (WQI = 56.75, “Medium”). Salinity and EC presented poor Sub-Index (SI) scores, reflecting the impact of seawater intrusion and over-extraction, while pH consistently showed high SI values (100), indicating natural buffering. By integrating non-linear dimensionality reduction, the proposed framework enhances traditional WQIs and facilitates more targeted and transparent groundwater decision-making. This includes identifying priority wells for remediation and supporting sustainable abstraction policies. The findings offer insight into sustainable water management in arid and semi-arid regions that are confronting groundwater degradation.

Keywords: Groundwater quality assessment, Coastal aquifers, Seawater intrusion, Water quality index, Kernels, Principal component analysis

Subject terms: Hydrology, Civil engineering

Introduction

Groundwater is a vital natural resource, replenished through various natural and artificial recharge processes such as rainfall infiltration, river seepage, lateral subsurface flow, and managed aquifer recharge13. Its availability is especially critical in arid and semi-arid regions, where surface water is scarce and rainfall is intermittent4,5. It constitutes nearly 99% of the world’s freshwater, excluding snow and ice6, and supports critical uses such as domestic supply, agriculture, and industry. Coastal areas, which host a significant share of the global population, depend heavily on groundwater for these purposes7,8. However, coastal aquifers face increasing environmental pressures—seawater intrusion, salinization, and pollution from agricultural and urban activities—which severely affect groundwater quality (GWQ)3,9,10.

In Saudi Arabia, where more than 80% of the water supply is sourced from groundwater11. These pressures are especially acute. With annual extraction exceeding 17 billion m³12, regions such as Al-Qatif experience intense stress from over-abstraction, high evapotranspiration, and rapid urban expansion13,14. These dynamics contribute to declining water quality, especially in coastal zones where geological and hydrological conditions favor saltwater intrusion. As demands grow, there is an urgent need for advanced tools to monitor and evaluate GWQ15. A combination of natural processes and anthropogenic activities, including mineral dissolution, seawater intrusion, agricultural runoff, and industrial discharge, influences the GWQ16. Assessment methods vary widely and include direct comparison to regulatory limits, the use of integrated indicators such as the Water Quality Index (WQI)2, geospatial mapping17, machine learning18,19, and statistical approaches20.

The WQI is widely used to simplify complex hydrochemical datasets into a single interpretive value, guiding decisions on water usability for drinking, irrigation, and other applications2123. While effective, the traditional WQI relies on fixed weighting schemes and linear aggregations, which may overlook non-linear interactions and fail to capture complex trends. To enhance the robustness of WQI construction, statistical methods such as Principal Component Analysis (PCA) have been used to determine data-driven weights and reduce redundancy2426.

PCA, however, assumes linearity and cannot effectively account for the complex, non-linear relationships that often characterize groundwater systems, particularly where salinity, ion mobility, and anthropogenic influence intersect27. In such cases, PCA may fail to resolve overlapping or intertwined gradients in water quality parameters, leading to oversimplified or misleading interpretations. This is especially possible for arid coastal environments, such as Al-Qatif, where interacting processes, including seawater intrusion, reverse ion exchange, and variable abstraction rates across aquifer depths, drive hydrochemical variability. These processes produce multi-scale, non-linear behavior in physicochemical parameters—particularly in salinity, chloride, nitrate, bromide, and EC—making traditional linear projections insufficient for accurate representation of water quality patterns. In contrast, Kernel PCA offers a powerful extension. By mapping input data into a higher-dimensional feature space using non-linear kernel functions, Kernel PCA can uncover latent structures that classical PCA cannot detect28,29. This enables more flexible modeling of GWQ patterns in regions where both natural processes and localized contamination contribute to variability.

Despite its potential, Kernel PCA has rarely been applied in the context of groundwater for WQI development. Studies have not evaluated the comparative performance of different kernel types in preserving data variance or capturing key hydrochemical interactions. This study addresses this gap by developing a Kernel PCA-based WQI for groundwater assessment in the coastal aquifers of Al-Qatif, Saudi Arabia. Six kernel types—linear, polynomial, radial basis function (RBF), sigmoid, cosine, and Laplacian—were evaluated for their ability to retain variance and reduce dimensionality. The proposed approach integrates kernel-based dimensionality reduction with physicochemical analysis to construct a data-driven WQI, offering a novel framework for GWQ assessment under complex, non-linear conditions.

The current study develops an enhanced version of WQI that integrates with Kernel PCA to overcome the various limitations presented by the conventional and custom models of WQI. The objectives of the study are:

  • i.

    To develop a Kernel PCA-WQI that provides a robust and reliable framework for evaluating GWQ under extreme variability and complex conditions.

  • ii.

    To determine the overall water quality of groundwater from each well and classify it into predefined WQI categories (Very bad, bad, medium, good, and excellent) based on PCA-derived WQI scores.

Study area and description

Description of the study area

The study area is located in the Eastern Province of Saudi Arabia, along the western coastline of the Arabian Gulf. It includes the Al-Qatif governorate, comprising the Al-Qatif region and Tarout Island. This area is characterized by an arid climate with low annual rainfall and high evaporation, contributing to elevated groundwater salinity14. Evaporative concentration and seawater intrusion are key drivers of salinization in the region’s coastal aquifers30. Groundwater is the principal water source for domestic, agricultural, and industrial use. The subsurface lithology of the Al-Qatif region comprises several distinct formations that control groundwater occurrence and quality. The dominant geologic units include the Dammam Formation, composed mainly of dolomitized limestone and marl; the Hadrukh Formation, characterized by interbedded shales and sandstone; and the Dam Formation, which contains marine clays and marls3,31. These formations serve as host strata for three major aquifers: the shallow Neogene aquifer and the deeper Alat and Al-Khobar aquifers. The Al-Khobar aquifer, developed in karstified limestone of the Dammam Formation, is especially productive. Overlying Quaternary deposits, such as sabkha and eolian sand, also influence recharge and salinity through land–sea interactions and evapotranspiration.

The hydrochemical characteristics and water quality of shallow and deep aquifers in the study area reveal distinct patterns driven by depth-dependent recharge and salinization processes. The shallow aquifers—primarily hosted within unconsolidated sediments and sabkha zones—exhibited elevated levels of salinity, EC, Na⁺, and Cl⁻, which are indicative of seawater intrusion and increased vulnerability to surface contamination32. In contrast, deep aquifers associated with the Dammam and Hadrukh formations showed comparatively lower concentrations of these ions and greater buffering capacity due to lithological confinement and reduced direct interaction with surface activities3. Similar trends have been reported that salinization and ionic enrichment were more pronounced in shallow wells, particularly those located near the coastline and in faulted zones with high transmissivity31. The terrain consists of a flat coastal plain, underlain by both shallow and deep aquifers. The shallow Neogene aquifer reaches depths of up to 12 m, while the deeper Alat and Al-Khobar aquifers extend to approximately 130 m. Groundwater exploitation is more intensive in the deeper aquifers due to salinity issues in the upper layers. Water levels range from less than 1 m to 11 m, with lower levels near the coast. The central part of Al-Qatif shows relatively higher groundwater levels, attributed to a reduction in abstraction and partial recharge through treated wastewater33.

Data collection

A systematic collection of water samples from both shallow and deep aquifers in Al-Qatif was conducted to facilitate a comprehensive laboratory analysis of physicochemical properties. Groundwater sampling was conducted between March and April 2022, following standard procedures recommended by the United States Environmental Protection Agency (EPA). The samples were obtained using submersible pumps after approximately 15 min of pumping to ensure that stale water was adequately removed34. To ensure the groundwater data’s representativeness and accuracy, in-situ measurements of key physicochemical parameters, including pH, turbidity, and electrical conductivity (EC), were performed using a multimeter. At the laboratory, samples were filtered using a 0.45 μm membrane to remove suspended particles and then stored at a constant temperature of 4 °C until the beginning of the analysis. Each well was analyzed for different physical and chemical parameters, including pH, electrical conductivity (EC), calcium carbonate (CaCO₃), turbidity, and concentrations of sodium (Na⁺), potassium (K⁺), magnesium (Mg²⁺), calcium (Ca²⁺), fluorine (F⁻), chlorine (Cl⁻), bromine (Br⁻), nitrate (NO₃⁻), sulfate (SO₄²⁻), and salinity, with the measurements reported in mg/L, while pH is unitless and EC is expressed in µS/cm. Figure 1 illustrates the study area of the Al-Qatif region and the locations of the wells, while Table 1 provides a statistical overview of the GWQ parameters. The observed high standard deviations in several hydrochemical parameters, salinity, EC, Na⁺, and Br⁻, reflect the strong spatial heterogeneity across the study area. This variation arises from localized seawater intrusion and site-specific anthropogenic influences.

Fig. 1.

Fig. 1

Study area with sample locations. The boundary map data was derived from DIVA-GIS (https://www.diva-gis.org/), and the map was prepared in QGIS software (https://www.https://qgis.org/).

Table 1.

Descriptive statistics of GWQ quality parameters.

Parameter Mean Standard deviation Skewness Variation coefficient
pH 7.1213 0.4568 0.7567 6.4149
EC 7802 7284 3.5717 93.3506
CaCO3 252 168.9107 1.3195 67.0220
Turbidity 6.6621 11 2.9523 169.6646
Na+ 601 722 4.1144 120.0879
K+ 31 38 3.6023 120.4794
Mg2+ 144 201 4.7064 138.9529
Ca2+ 280 167 1.3139 59.4527
F⁻ 1.2457 0.9370 1.1984 75.2194
Cl⁻ 1607 1712 3.7099 106.5261
Br⁻ 6.8015 6.6940 3.2529 98.4195
NO3 8.2285 5.5813 −0.6282 67.8281
SO42 876 861 3.4149 98.2895
Salinity 5446 5667 3.6366 104.0679

Methodology

Kernel principal component analysis (Kernel PCA)

Kernel Principal Component Analysis (Kernel PCA) identifies principal components (PCs) from high-dimensional data and converts it into a lower-dimensional representation that simplifies the model35,36. Kernel PCA uses a kernel trick to map the original dataset into a high-dimensional feature space where linear separability can be achieved37. Projected PCs can be mapped back to the original space for dimensionality reduction while preserving data structure. Unlike classical PCA, which is based on a linear approach, Kernel PCA uses kernel functions to capture nonlinear relationships in data. Kernel PCA is effective in situations where data lie on nonlinear manifolds due to its unique characteristics. Therefore, it is an essential approach in multivariate statistical analysis and finds a broad range of applications in several scientific and engineering fields38,39.

Outlier assessment was performed using box-and-whisker plots to visualize the spread and skewness of each parameter (see Fig. 4, presented later). Although several values appeared extreme, they were not removed to preserve the natural variability in the dataset. Instead, min-max normalization was applied to scale all features to the range [0, 1], ensuring uniform contribution to Kernel PCA without discarding relevant hydrochemical signatures. The procedure for conducting the Kernel PCA is summarized in three main steps as follows:

Fig. 4.

Fig. 4

Box-and-whisker plot for the normalized dataset.

Step 1 Min-max normalization.

Min-max normalization scales data to a specific range, typically [0, 1], ensuring uniformity across variables for analysis (Eq. 1)40.

graphic file with name d33e683.gif 1

where Inline graphic is the original value, Inline graphic and Inline graphic are the minimum and maximum values of the feature and Inline graphic is the normalized value.

Step 2 Testing suitability for Kernel PCA.

Before applying Kernel PCA, the dataset’s suitability must be verified to ensure meaningful results. This involves two key statistical tests:

  • i.

    Kaiser-Meyer-Olkin (KMO) Test.

The KMO measure evaluates the adequacy of the dataset for PCA by assessing the proportion of variance among variables that can be attributed to common factors41. A KMO value closer to 1 indicates high suitability, while a value less than 0.5 represents an inappropriate choice for PCA analysis. The overall KMO statistic is computed using Eq. 2 (42).

graphic file with name d33e742.gif 2

where Inline graphic and Inline graphic are the squared correlation coefficient and the squared partial correlation coefficient, respectively.

  • ii.

    Bartlett’s Test of Sphericity (BTS).

This test indicates whether the dataset’s correlation matrix is significantly different from an identity matrix, which means sufficient interrelationships among variables for PCA43,44. The null hypothesis Inline graphic assumes that the correlation matrix is an identity matrix. A significant Inline graphic value (< 0.05) rejects Inline graphic, confirming suitability for PCA. Bartlett’s test statistics are calculated using Eq. 345.

graphic file with name d33e806.gif 3

Where Inline graphic is the sample size, Inline graphic represents the number of variables and Inline graphic is the correlation matrix.

Step 3 Applying Kernel PCA.

Kernel PCA was applied to the normalized dataset to identify key patterns and reduce dimensionality. PCs were derived, with the number of retained components determined by the cumulative variance they explained (at least 90%)46. Table 2 shows the comparison of Kernel types and their properties. The kernel that shows higher cumulative variance with fewer components is the best one.

Table 2.

Overview of kernel types, formulae, and their functions4749.

Kernel Type Mathematical Formula Characteristics Hyperparameters
Linear Inline graphic Preserves the original data structure and is equivalent to standard PCA. None
Polynomial Inline graphic Maps data into a higher-degree polynomial space and captures feature interactions. Inline graphic(coefficient),Inline graphic(degree)
RBF Inline graphic Transforms data into an infinite-dimensional space, effectively capturing complex, nonlinear structures. Inline graphic(spread of Gaussian)
Sigmoid Inline graphic Inspired by neural networks and models similarity in a smooth way. Inline graphic(scale),Inline graphic(shift)
Cosine Inline graphic Measures the angle between vectors None
Laplacian Inline graphic Similar to RBF but uses the L1 norm to capture local structures. Inline graphic(scale factor)

*Inline graphicis the kernel function for vectors (Inline graphicandInline graphic) andInline graphicrepresents the transpose operator.

Kernel parameter tuning was conducted via grid search. For the polynomial kernel, combinations of degree = [2, 3, 4, 5] and gamma = [0.01, 0.1, 1] were tested. The optimal configuration (degree = 3, gamma = 0.1) was selected based on the few PCs needed to exceed 95% cumulative variance. Similar parameter tuning was applied to RBF and sigmoid kernels, while linear and cosine kernels required no tuning.

All analyses were performed using Python (version 3.13). The scikit-learn library was used for Kernel PCA and data normalization, including the implementation of six kernel types. Scipy.stats was used to conduct Bartlett’s test of sphericity, and the factor_analyzer module was used for calculating the Kaiser-Meyer-Olkin (KMO) statistic. Visualization and plotting were performed using the matplotlib and seaborn libraries. The pseudocode in Algorithm 1 outlines the key computational steps for implementing Kernel PCA, including normalization, kernel matrix computation, centering, eigen-decomposition, and projection into PC space. This method enables the transformation of original groundwater parameters into a lower-dimensional feature space for the development of the WQI.

graphic file with name 41598_2025_18980_Figa_HTML.jpg

Algorithm 1

Pseudocode for Kernel PCA used in groundwater data transformation.

Water quality index (WQI)

The development of WQI has evolved significantly over time, with foundational contributions from50. They developed the most widely accepted method, which involves computing the quality rating scale for each water quality parameter and multiplying it by the weight of each parameter. These weights are inversely proportional to recommended standards. In this work, the WQI is prepared with weights from Kernel PCA based on standards and practices concerning GWQ. Figure 2 illustrates the overall research methodology for developing Kernel PCA-Based WQI. PCA has also been implemented to further develop and interpret the dataset. Finally, the final model of the WQI provides a numerical index ranging from 0 to 100, categorizing water quality based on the classification scheme. Table 3 presents the scores and classification based on the proposed ranking criteria.

Fig. 2.

Fig. 2

Research methodology flowchart for the development of Kernel PCA-Based WQI.

Table 3.

WQI ranking criteria 51.

WQI Category
0–25 Very bad
26–50 Bad
51–70 Medium
71–90 Good
91–100 Excellent

The permissible limit refers to the maximum concentration that is considered safe for use or consumption. Table 4 provides the standard permissible limits for GWQ parameters. The permissible levels were compiled from the World Health Organization (WHO) and the Ministry of Environment, Water and Agriculture (MEWA)52,53. The procedure for the WQI is categorized into five steps after performing the PCA. The steps are as follows:

Table 4.

Permissible limits of groundwater parameters 52,53.

Parameter Standard
pH 7
EC 1000 µS/cm
CaCO₃ 500 mg/L
Turbidity 5 NTU
Na+ 200 mg/L
K+ 12 mg/L
Mg2+ 50 mg/L
Ca2+ 75 mg/L
F⁻ 1.5 mg/L
Cl⁻ 200 mg/L
Br⁻ 2 mg/L
NO₃⁻ 50 mg/L
SO₄²⁻ 250 mg/L
Salinity 1000 mg/L

Step 1 Sub-index (

Inline graphic).

The sub-index (Inline graphic) is calculated using either Eq. 4 or 5 depending on the observed values of the parameters.

graphic file with name d33e1256.gif 4

where Inline graphic is the observed value, Inline graphic denotes the ideal value, and Inline graphic is the permissible limit.

Purpose Eq. 4 is used when the observed value Inline graphicis within the permissible range (i.e., Inline graphic).

Behavior:

  • It linearly scales the deviation of Inline graphic between Inline graphic and Inline graphic.

  • If Inline graphic approaches Inline graphic, the score Inline graphic approaches 0.

  • If ​Inline graphic is closer to Inline graphic, the score approaches 100.

  • The use of max(0, …) ensures that the sub-index cannot drop below 0.

If Inline graphic exceeds Inline graphic​, a log transformation is applied:

graphic file with name d33e1382.gif 5

Purpose This Eq. 5 is used when the observed value 

Inline graphic exceeds the permissible limit Inline graphic.

Behavior:

  • A logarithmic transformation is applied to dampen the penalty for values of Inline graphic that exceed Inline graphic​.

  • This prevents extreme deviations from having disproportionately large penalties compared to a linear scale.

  • The inclusion of Inline graphic as a scaling factor ensures the score decreases more gradually.

  • Unlike Eq. 4, there is no explicit floor of Inline graphic, meaning Inline graphic can technically drop below zero for very high Inline graphic.

Step 2 Weight of a parameter (Inline graphic).

The initial weight is determined based on the PCA loading (Inline graphic) of parameter Inline graphic on the selected PCs as shown in Eq. 654:

graphic file with name d33e1495.gif 6

where Inline graphic is the number of retained PCs, Inline graphic represents the eigenvalue (explained variance) of the Inline graphic PC, and Inline graphic​ is the loading of the parameter Inline graphic on the Inline graphic PC.

Step 4 Normalized weight 

Inline graphic.

 The weights are normalized to ensure their sum equals 1 (Eq. 7)55:

graphic file with name d33e1560.gif 7

If weights are negative, absolute values are applied to adjust relative contributions.

Step 5 Final score.

The final Inline graphic is calculated by multiplying the normalized weights (Inline graphic) with the Inline graphic​ as shown in Eq. 8 :

graphic file with name d33e1596.gif 8

where Inline graphic is the adjusted relative weight of the parameter Inline graphic.

Combining all steps, the general equation becomes (Eq. 9):

graphic file with name d33e1621.gif 9

Results and discussion

Kaiser–Meyer–Olkin (KMO) and bartlett’s tests of sphericity (BTS)

The appropriateness of the dataset for Kernel PCA was evaluated through the KMO test and BTS. The KMO test measures sampling adequacy by quantifying the proportion of variance accounted for by underlying PCs. A KMO value nearing 1 indicates a dataset suitable for PCA, while a KMO value below 0.5 indicates that the dataset is unsuitable for PCA. In this current study, the observed KMO value is 0.79, indicating adequate sampling. Conversely, Bartlett’s test of sphericity assesses whether the correlation matrix resembles an identity matrix. This imply that the variables are uncorrelated and thus not conducive to PCA46. In this analysis, a p-value of 0.000 (below the 0.05 threshold) justifies rejecting the null hypothesis, confirming the existence of significant relationships among the variables. Both tests confirmed that the dataset is suitable for the PCA.

Figure 3 presents the results of PCA suitability and the correlation heatmap illustrating the relationships between the parameters. The correlation matrix highlights essential trends in GWQ parameters, revealing strong positive correlations between EC, salinity, Na+, and Cl⁻, which significantly contribute to the ionic strength of groundwater. K+, Mg2+, and Ca2+ also highly correlate with EC, further emphasizing their influence on water chemistry. In contrast, pH exhibits negative correlations with CaCO₃ and SO42⁻, suggesting that lower pH levels enhance processes like carbonate dissolution. Turbidity shows moderate correlations with both EC and SO₄²⁻, suggesting partial influences on ionic variations. On the other hand, nitrate generally exhibits a weak correlation with most parameters, indicating localized human activities, such as agricultural runoff.

Fig. 3.

Fig. 3

Correlation matrix heatmap and PCA suitability tests.

Kernel PCA-based WQI

Normalization of raw data represents the first step in developing the Kernel PCA-based WQI. It is one of the essential steps that makes the range of the data uniform, providing all variables with an opportunity to contribute to the analysis56. Figure 4 shows a box-and-whisker plot that visually summarizes the distribution, variability, and outliers of each physicochemical parameter. NO₃⁻ and pH exhibit high variability, as evidenced by their more extensive interquartile ranges (IQR) and multiple outliers. In contrast, EC, CaCO₃, turbidity, and salinity display relatively minor IQRs, indicating more consistent values across the dataset. The outliers observed in Cl⁻, F⁻, and Ca2+ parameters suggest localized anomalies or unique physicochemical characteristics in specific wells.

The comparison of CV across five types of Kernels PCA—linear, polynomial, radial basis function (RBF), sigmoid, cosine, and Laplacian—reveals significant differences in their ability to retain data variability, as shown in Table 5. The polynomial kernel performed exceptionally well, capturing 98.33% of the variance in the first PC. By PC5, it captures 99.77% of the variance, and by PC7, it nearly achieves complete variability at 99.98%. The strong performance of the polynomial kernel PCA can be attributed to its ability to preserve maximum variability while capturing essential feature directions and efficiently reducing dimensionality57. The polynomial kernel helps reduce intrinsic dimensionality while preserving the dataset’s statistics58. The cosine kernel performs well, capturing 96.13% of the variance with PC5 and 99.99% cumulatively with PC9. Both polynomial and cosine kernels are effective at reducing dimensionality while preserving significant variability, making them suitable for non-linear data applications.

Table 5.

Comparison of CV for kernel types.

PC Linear (CV) Polynomial (CV) RBF (CV) Sigmoid (CV) Cosine (CV) Laplacian (CV)
PC1 0.73341 0.98329 0.34067 0.75024 0.59494 0.33498
PC2 0.88383 0.99344 0.57161 0.90435 0.86133 0.51469
PC3 0.93498 0.99543 0.66110 0.93824 0.91480 0.61207
PC4 0.95852 0.99673 0.72179 0.96203 0.94356 0.67532
PC5 0.97478 0.99772 0.77394 0.97729 0.96128 0.73530
PC6 0.98502 0.99831 0.81349 0.98795 0.97709 0.78262
PC7 0.99222 0.99874 0.85003 0.99226 0.99029 0.82351
PC8 0.99688 0.99914 0.88323 0.99559 0.99557 0.85417
PC9 0.99934 0.99941 0.91533 0.99874 0.99904 0.88390
PC10 0.99969 0.99962 0.94184 0.99939 0.99950 0.91060
PC11 0.99987 0.99978 0.96049 0.99985 0.99979 0.93535
PC12 0.99993 0.99989 0.97702 0.99996 0.99990 0.95873
PC13 0.99997 0.99996 0.99203 0.99999 0.99995 0.98017
PC14 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000

The RBF kernel demonstrates the weakest performance in retaining CV. At PC1, it accounts for only 34.07% of the variance (CV = 0.340674), and by PC5, it captures just 77.39% of the CV, which is lower than that of other kernels. Although the RBF kernel eventually reaches 96.05% variance at PC11, its slower accumulation suggests inefficiency in capturing variability in the earlier components. While the RBF kernel is known for its effectiveness with non-linear data, it lacks a systematic approach for selecting and integrating optimal parameters to enhance performance59. The poor performance of the RBF kernel makes it unsuitable for this analysis, as it risks losing critical data variability essential for clustering and interpreting physicochemical patterns. The Laplacian kernel has the weakest performance compared to the other kernels. At PC1, it captures only 33.49% of the variance (CV = 0.33498), which is even lower than that of the RBF kernel. By PC5, it retains 73.53%, trailing behind all the other kernels. Even at PC12, where most kernels approach full variance retention, the Laplacian kernel still falls short, achieving only 95.87% CV at PC13. In contrast, the linear and sigmoid kernels demonstrate intermediate performance, retaining 95.85% and 96.20% variance by PC4, respectively, and reaching near-total variability by PC8.

Figure 5 shows the CV results of different kernels in Kernel PCA with a 95% threshold. The polynomial kernel quickly achieves this threshold in the first PC, effectively capturing most of the data’s variability early. However, the linear and sigmoid kernels require three PCs to reach the 95% threshold, whereas the cosine kernel achieves this within four PCs. The RBF kernel lags, crossing the 95% mark only after the tenth PC. The Laplacian kernel performs the weakest. Testing various kernels in PCA offers valuable insights into their suitability for specific datasets. The polynomial kernel is selected for this study due to its ability to retain the highest CV with fewer components.

Fig. 5.

Fig. 5

Comparison of cumulative variance of kernels with a 95% threshold.

Table 6 presents the SI and WQI scores for groundwater parameters, offering a detailed assessment of water conditions in the coastal aquifers of the study area. The WQI, derived as a weighted sum of Inline graphic using adjusted relative weights to categorize GWQ across wells into classes based on the WQI ranking criteria by51 as shown in Table 1, earlier. Notably, W2, W3, and W4, with WQI values of 31.27, 30.26, and 28.82, respectively, fall into the “Bad” category, while W11 exhibits the worst water quality, with a WQI of 10.64, classified as “Very Bad.” Conversely, wells such as W16, W18, and W19, with WQI values of 63.94, 61.84, and 64.63, are classified as “Medium,” indicating moderate GWQ in certain parts of the coastal aquifer.

Table 6.

Results of the SI and WQI scores for the water quality parameters.

Well SI (pH) SI (EC) SI (CaCO3) SI (Turbidity) SI (Na+) SI (K+) SI (Mg2+) SI (Ca2+) SI (F⁻) SI (Cl⁻) SI (Br⁻) SI (NO3⁻) SI (SO42⁻) SI (Salinity) WQI Class
W1 100 0 90.05 76.19 18.02 0 64.81 37.77 53.84 0 0 100 0 0 34.92 Bad
W2 100 0 8.37 51.89 35.01 0 8.39 66.34 75.65 0 0 90.23 12.47 0 31.27 Bad
W3 100 0 77.34 37.11 0 0 41.57 19.2 35.76 0 0 100 0 0 25.51 Very bad
W4 100 0 94.59 31.75 0 0 46.18 28.28 74.4 0 0 100 0 0 30.26 Bad
W5 100 0 98.37 63.68 30.5 0 89.04 60.48 82.38 0 0 100 10.3 0 42.31 Bad
W6 100 0 89.02 57.01 20.85 0 76.79 47.77 48.5 0 0 100 0 0 35.14 Bad
W7 100 0 27.06 45.91 55.26 7.78 13.76 66.4 100 0 0 100 20.72 0 37.53 Bad
W8 100 0 32.32 0 29.18 0 83.23 35.31 88.74 0 0 100 0 0 31.6 Bad
W9 100 0 32.46 8.77 43.62 0 0.5 62.04 61.7 0 0 100 12.59 0 28.82 Bad
W10 100 0 29.64 34.92 37.55 0 99.78 60.59 68.04 0 0 100 9.07 0 37.06 Bad
W11 100 0 73.21 26.41 15.82 0 75.23 40.59 42.21 0 0 100 0 0 30.64 Bad
W12 100 0 49.7 0 0 0 0 16.44 37.61 0 0 100 0 0 18.63 Very bad
W13 100 5.64 100 8.75 76.56 66.65 88.33 26.63 100 23.76 0 93.67 58.62 26.26 52.5 Medium
W14 100 9.4 100 96.25 80.64 73.99 89.92 31.24 100 27.45 0 92.47 60.84 28.01 60.59 Medium
W15 100 8.56 100 86.75 79.17 68.03 86.52 27.29 90.25 26.37 0 93.31 56.85 26.61 57.51 Medium
W16 100 11.49 100 60.5 84.36 75.82 86.12 31.91 100 32.36 0 93.47 53.29 28.13 58.4 Medium
W17 100 24.35 100 79.75 97.87 45.1 95.44 46.46 100 43.63 0 91.05 63.57 43.27 63.94 Medium
W18 100 28.91 100 89 4.46 100 98.71 56.23 100 47.22 0 94.3 68.13 46.59 64.55 Medium
W19 100 17.71 100 84.75 90.96 93.25 91.43 39.54 100 37.38 0 92.84 59.91 36.23 64.83 Medium
W20 99.33 15.74 100 83.25 88.55 97.85 91.94 37.87 100 34.82 0 95.14 61.09 34.99 64.59 Medium
W21 100 11.49 100 94 84.56 86.98 89.76 33.1 98 30.68 0 95.46 59.86 30.06 62.41 Medium
W22 100 8.06 100 91 79.75 72.23 87.4 29.59 100 26.55 0 95.22 58.66 29.09 59.66 Medium
W23 100 12.38 100 92 84.38 82.89 88.04 31.65 100 30.93 0 94.9 57.46 31.19 61.8 Medium
W24 100 2.73 100 96.75 72.47 56.25 86.19 22.17 93 19.71 0 98.01 59.6 21.26 55.77 Medium
W25 100 21.89 100 100 95.07 12.1 93.48 44.38 100 41.41 0 92.51 61.5 38.23 61.53 Medium
W26 100 13.47 100 91.75 85.96 76.06 89.14 32.7 100 32.55 0 94.39 57.91 31.57 61.78 Medium
W27 100 11.13 100 73.5 78.99 50.19 78.5 24.57 87.37 29.71 0 99.13 41.88 28.48 54.09 Medium
W28 50.67 0 100 95 56.37 47.67 70.16 95.79 73.62 1.43 0 100 51.71 4.28 52.94 Medium
W29 52 0 100 98 42.36 25.98 55.7 84.9 93.25 0 0 100 45.45 0 49.16 Bad
W30 52.67 0 100 97.75 41.11 24.04 54.31 83.95 67.62 0 0 100 45.45 0 46.7 Bad
W31 52.67 0 100 98.25 38.44 18.46 49.84 82.14 58.13 0 0 100 42.76 0 44.67 Bad
W32 50.67 0 100 96.25 36.15 16.32 44.37 79.24 94.25 0 0 100 37.76 0 45.97 Bad
W33 56 0 100 98.25 3.49 0 77.89 33.41 93.12 0 0 90.1 0 0 36.96 Bad
W34 50.67 0 100 98.75 31.73 5.38 38.23 75.76 34.75 0 0 100 35.24 0 39.29 Bad
W35 31.33 0 100 97 63.3 41.61 81.64 17.46 46 11.1 0 96.58 54.85 13.71 44.87 Bad
W36 38 1.12 100 97.75 73.57 68.04 84.66 25.48 92.12 36.33 0 88.01 58.01 20.32 54.6 Medium
W37 68 0 100 96.75 65.2 48.72 80.96 13.27 100 12.95 0 93.8 55.67 14.29 50.89 Bad
W38 43.33 17.5 100 97.5 90.73 25.35 90.12 38.5 100 20.3 0 91.14 57.33 39.54 56.75 Medium
W39 38 17.71 100 100 67.22 54.02 82.5 16.42 100 14.43 0 94.84 57.51 39.79 54.42 Medium

The SI calculations reveal that salinity is one of the significant parameters affecting GWQ in the study area. All wells recorded SI values ranging from 0 to 50, reflecting highly saline water. Salinity indicates intense salinization caused by over-extraction of groundwater, which decreases hydraulic pressure and allows seawater intrusion into the aquifers60. Similarly, very low SI values were recorded in all wells where extreme levels of EC were above the permissible limits. Salinity strongly influences the EC, which is considered one of the essential indicators showing the TDS in water to reflect the intensity of ionic concentration and the possible seawater intrusion and contamination of groundwater61,62. Thus, poor scores for salinity and EC in SI continuously indicate that over-extraction has increased the intensity of seawater intrusion in the study area. Br⁻ and Cl⁻ also exhibit extremely low SI values, maintaining zero SI values in all wells. Br⁻ is representative of contamination from anthropogenic activities. It is further increased in the coastal aquifers due to the mixing of seawater63.

On the other Hand, pH maintains a consistently high SI value of 100 in most of the wells. This suggests that the pH remains stable within the acceptable range, usually buffered through the intrinsic buffering capacity of the aquifer64. Other parameters, Ca2+ and SO42⁻, show moderate SI scores, with variability attributed to geological sources and human activities. Alfarrah et al.65 investigated high sulfate concentrations in a coastal aquifer in Libya and related them to seawater intrusion, gypsum dissolution, and deep saline waters. Similarly, Awaleh et al.66 examined sulfate pollution in aquifers in Djibouti and found that anthropogenic inputs, such as fertilizers and animal manure, contribute significantly to high sulfate levels. Further, Cheng et al.67 attributed the contribution of industrial sources to sulfate contamination in coal mining areas as stemming from sulfide mineral oxidation and discharge of sewage waters.

These findings offer critical implications for GWQ management in the Al-Qatif region. The consistent classification of several wells (e.g., W11, W12) as “Very Bad” suggests zones of acute salinization requiring targeted remediation. Salinity, EC, and Br⁻ emerge as Dominant stressors, pointing to the ongoing impact of seawater intrusion exacerbated by over-extraction. Conversely, stable pH values across wells indicate natural buffering potential that may support managed aquifer recharge strategies. The WQI classifications developed here offer a practical framework for prioritizing well rehabilitation, informing pumping restrictions, and guiding the integration of treated wastewater for irrigation, in Alignment with Saudi Vision 2030 objectives for sustainable water use.

Conclusion and recommendation

This research was motivated by the need to enhance GWQ analysis in arid and semi-arid coastal regions where water scarcity, seawater intrusion, and anthropogenic contamination present growing challenges. It proposed a novel WQI based on Kernel PCA, representing methodological advancement in GWQ assessment. By using Kernel PCA for parameter weighting and dimensionality reduction, the framework mitigated the subjectivity inherent in traditional WQI models. Among the kernel types evaluated, the polynomial kernel demonstrated the strongest performance, capturing over 98% of the variance in the first principal component. The associated SI framework enabled a nuanced assessment of deviations from permissible limits, offering a more precise classification of water quality conditions.

The findings revealed substantial spatial and qualitative variability across the 39 sampled wells. Sixteen wells were classified as “Medium,” while another 16 were identified as “Bad,” and seven wells, including W11 (WQI = 10.64), fell into the “Very Bad” category. Salinity, electrical conductivity (EC), and bromide (Br⁻) were identified as the Dominant stressors, whereas pH remained consistently within acceptable limits, reflecting the buffering capacity of the aquifer system. This emphasizes the suitability of the Kernel PCA-based WQI for capturing both extreme and stable groundwater characteristics. The approach Aligns with Saudi Vision 2030 and SDG targets by offering a scalable and reproducible tool for water quality assessment.

The study is subjected to several limitations. The analysis was based on a relatively small sample size and reflects only a single-season event, limiting temporal generalizability. Spatial gaps in monitoring coverage may also obscure transitions between freshwater and saline zones. Future studies can incorporate seasonal monitoring campaigns and higher-resolution spatial sampling to more accurately capture variability. In terms of actionable recommendations, prioritizing wells with “Very Bad” classifications for targeted remediation, particularly those with critical salinity and EC levels. The developed WQI framework can also support the design of decision-support tools for groundwater abstraction regulation, recharge zone identification, and integration of treated wastewater in non-potable applications. Finally, incorporating satellite-derived environmental data or machine learning extensions may further enhance predictive capacity and regional scalability of the approach.

Acknowledgements

The authors thank Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia, for their support and funding. In addition, the authors would like to thank the reviewers and editors for their comprehensive and constructive comments for improving the manuscript. Further, Zaher Mundher Yaseen would like to thank the Civil and Environmental Engineering Department, King Fahd University of Petroleum & Minerals, Saudi Arabia.

Author contributions

A.A.: Funding Acquisition, Conceptualization, Methodology, Resource, Software, Writing- Reviewing and Editing. A.M.J.: Data Curation, Software, Visualization, Investigation, Validation, Writing- Original draft. S.D.: Supervision, Software, Validation, Formal Analysis, Resource. M.A.: Supervision, Validation, Software, Formal Analysis, Writing- Reviewing and Editing. S.I.A.: Data Curation, Supervision, Validation, Formal analysis, Writing- Reviewing and Editing. Z.M.Y.: Supervision, Validation, Software, Formal Analysis, Writing- Reviewing and Editing.

Funding

This research was supported via funding from Prince Sattam bin Abdulaziz University under project number (PSAU/2025/R/1446).

Data availability

The data is available upon request from the corresponding author (A.M. Jibrin, abdulhayatjm@gmail.com).

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1. Xu, F., Li, P., Wang, Y. & Du, Q. Integration of Hydrochemistry and Stable Isotopes for Assessing Groundwater Recharge and Evaporation in Pre- and Post-Rainy Seasons in Hua County, China. Nat Resour Res [Internet]. ;32(5):1959–73. (2023). Available from: 10.1007/s11053-023-10235-y
  • 2.Ahmed, A. K. A. & El-Rawy, M. The impact of aquifer recharge on groundwater quality. In: Managed Aquifer Recharge in MENA Countries: Developments, Applications, Challenges, Strategies, and Sustainability. Springer; 207–222. (2024).
  • 3.Jibrin, A. M., Al-Suwaiyan, M., Yaseen, Z. M. & Abba, S. I. New perspective on density-based spatial clustering of applications with noise for groundwater assessment. J Hydrol [Internet]. ;661(PA):133566. (2025). Available from: 10.1016/j.jhydrol.2025.133566
  • 4.Doost, Z. H. & Yaseen, Z. M. The impact of land use and land cover on groundwater fluctuations using remote sensing and geographical information system: representative case study in Afghanistan. Environ. Dev. Sustain. ;1–24. (2023).
  • 5.Aryafar, A., Khosravi, V., Zarepourfard, H. & Rooki, R. Evolving genetic programming and other AI-based models for estimating groundwater quality parameters of the Khezri plain, Eastern Iran. Environ. Earth Sci.78, 1–13 (2019). [Google Scholar]
  • 6.Herschy, R. W. & Fairbridge, R. W. Encyclopedia of hydrology and water resources. In. (1998). Available from: https://api.semanticscholar.org/CorpusID:129205818
  • 7.Lin, L. & Pussella, P. Assessment of vulnerability for coastal erosion with GIS and AHP techniques case study: Southern coastline of Sri Lanka. Nat Resour Model [Internet]. ;30(4):e12146. (2017). Available from: 10.1111/nrm.12146
  • 8.SEDAC. Percentage of Total Population Living in Coastal Areas. United Nations [Internet]. ;170–5. (2008). Available from:
  • 9.Abba, S. I. et al. Nitrate concentrations tracking from multi-aquifer groundwater vulnerability zones: Insight from machine learning and spatial mapping. Process Saf Environ Prot [Internet]. ;184(February):1143–57. (2024). Available from: 10.1016/j.psep.2024.02.041
  • 10.Gao, M. S. & Luo, Y. M. Change of groundwater resource and prevention and control of seawater intrusion in coastal zone. Bull. Chin. Acad. Sci.31 (10), 1197–1203 (2016). [Google Scholar]
  • 11.Ali, I., Hasan, M. A. & Alharbi, O. M. L. Toxic metal ions contamination in the groundwater, Kingdom of Saudi Arabia. J. Taibah Univ. Sci.14 (1), 1571–1579 (2020). [Google Scholar]
  • 12.Al-Omran, A. M., Aly, A. A., Al-Wabel, M. I., Sallam, A. S. & Al-Shayaa, M. S. Hydrochemical characterization of groundwater under agricultural land in arid environment: a case study of Al-Kharj, Saudi Arabia. Arab. J. Geosci.9, 1–17 (2016). [Google Scholar]
  • 13.Ministry of Municipal & Rural Affairs. Qatif City Profile [Internet]. 2019 [cited 2024 Dec 24]. Available from: https://unhabitat.org/sites/default/files/2020/03/qatif.pdf
  • 14.Al-Shaibani, A. Economic potential of Brines of Sabkha Jayb uwayyid, Eastern Saudi Arabia. Arab. J. Geosci. ;6. (2012).
  • 15.Awadh, S. M., Al-Mimar, H. & Yaseen, Z. M. Groundwater availability and water demand sustainability over the upper mega aquifers of Arabian Peninsula and west region of Iraq. Environ Dev Sustain [Internet]. ;23(1):1–21. (2021). Available from: 10.1007/s10668-019-00578-z
  • 16.Jamei, M. et al. Computational assessment of groundwater salinity distribution within coastal multi–aquifers of Bangladesh. Sci. Rep. ;12. (2022). [DOI] [PMC free article] [PubMed]
  • 17.Maluventhi, M. SK, Kulandaisamy, P., Rajendran, BalagurumoorthiDR & Veeramalai, G. Machine learning-powered Geospatial mapping of groundwater quality and salinity: towards sustainable water management in Southern India. Stoch. Environ. Res. Risk Assess.1, 20 (2025). [Google Scholar]
  • 18.Abba, S. I. et al. Mapping of groundwater salinization and modelling using meta-heuristic algorithms for the coastal aquifer of Eastern Saudi Arabia. Sci. Total Environ. ;858 (2023). (November 2022). [DOI] [PubMed]
  • 19.Jibrin, A. M. et al. Tracking the impact of heavy metals on human health and ecological environments in complex coastal aquifers using improved machine learning optimization. Environ Sci Pollut Res [Internet]. ; (2024). Available from: 10.1007/s11356-024-34716-6 [DOI] [PubMed]
  • 20.Alshahrani, A., Ahmad, M., Laiq, M. & Nabi, M. Geostatistical analysis and multivariate assessment of groundwater quality. Sci. Rep.15 (1), 7435 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Uddin, M. G., Nash, S., Rahman, A. & Olbert, A. I. A novel approach for estimating and predicting uncertainty in water quality index model using machine learning approaches. Water Res [Internet]. ;229(November 2022):119422. (2023). Available from: 10.1016/j.watres.2022.119422 [DOI] [PubMed]
  • 22.Jibrin, A. M. et al. Machine learning predictive insight of water pollution and groundwater quality in the Eastern Province of Saudi Arabia. Sci Rep [Internet]. ;14(1):1–17. (2024). Available from: 10.1038/s41598-024-70610-4 [DOI] [PMC free article] [PubMed]
  • 23.Das, A. Prediction of Urban Surface Water Quality Scenarios Using Water Quality Index (WQI), Multivariate Techniques, and Machine Learning (ML) Models in Water Resources, in Baitarani River Basin, Odisha: Potential Benefits and Associated Challenges. Earth Syst Environ [Internet]. ; (2025). Available from: 10.1007/s41748-025-00623-0
  • 24.Singh, G., Chaudhary, S., Giri, B. S. & Mishra, V. K. Assessment of geochemistry and irrigation suitability of the river ganga, varanasi, india: PCA reduction for water quality index and health risk evaluation. Environ. Sci. Pollut Res. ;1–20. (2025). [DOI] [PubMed]
  • 25.Pande, C. B. et al. Implications of seasonal variations of hydrogeochemical analysis using GIS, WQI, and statistical analysis method for the semi-arid region. Appl. Water Sci.15 (4), 80 (2025). [Google Scholar]
  • 26.Do, D. D., Le, A. H., Le, D. A. N. & Bui, H. M. Evaluation of water quality and key factors influencing water quality in intensive shrimp farming systems using principal component analysis-fuzzy approach. Desalin. Water Treat.321, 101002 (2025). [Google Scholar]
  • 27.Anowar, F., Sadaoui, S. & Selim, B. Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE). Comput Sci Rev [Internet]. ;40:100378. (2021). Available from: 10.1016/j.cosrev.2021.100378
  • 28.Briscik, M., Dillies, M. A. & Déjean, S. Improvement of variables interpretability in kernel PCA. BMC Bioinform.24 (1), 1–21 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kwak, N. Kernel discriminant analysis for regression problems. Pattern Recognit [Internet]. ;45(5):2019–31. (2012). Available from: 10.1016/j.patcog.2011.11.006
  • 30.Armanuos, A. M., Al-Ansari, N. & Yaseen, Z. M. Assessing the effectiveness of using recharge wells for controlling the saltwater intrusion in unconfined coastal aquifers with sloping beds: numerical study. Sustainability12 (7), 2685 (2020). [Google Scholar]
  • 31.Abba, S. I. et al. Fluoride and nitrate enrichment in coastal aquifers of the Eastern Province, Saudi Arabia: The influencing factors, toxicity, and human health risks. Chemosphere [Internet]. ;336(June):139083. (2023). Available from: 10.1016/j.chemosphere.2023.139083 [DOI] [PubMed]
  • 32.Benaafi, M. et al. Integrated Hydrogeological, Hydrochemical, and Isotopic Assessment of Seawater Intrusion into Coastal Aquifers in Al-Qatif Area, Eastern Saudi Arabia. Vol. 27, Molecules. (2022). [DOI] [PMC free article] [PubMed]
  • 33.Manzar, M. S. et al. New generation neurocomputing learning coupled with a hybrid neuro-fuzzy model for quantifying water quality index variable: A case study from Saudi Arabia. Ecol Inform [Internet]. ;70:101696. (2022). Available from: https://www.sciencedirect.com/science/article/pii/S1574954122001467
  • 34.Roll, I. B., Driver, E. M. & Halden, R. U. Apparatus and method for time-integrated, active sampling of contaminants in fluids demonstrated by monitoring of hexavalent chromium in groundwater. Sci Total Environ [Internet]. ;556:45–52. (2016). Available from: https://www.sciencedirect.com/science/article/pii/S0048969716304363 [DOI] [PMC free article] [PubMed]
  • 35.Xiao, Y., Zou, C., Chi, H. & Fang, R. Boosted GRU model for short-term forecasting of wind power with feature-weighted principal component analysis. Energy [Internet]. ;267:126503. (2023). Available from: 10.1016/j.energy.2022.126503
  • 36.Zhang, C., Tian, Y. X. & Fan, Z. P. Forecasting sales using online review and search engine data: A method based on PCA–DSFOA–BPNN. Int. J. Forecast.38 (3), 1005–1024 (2022). [Google Scholar]
  • 37.Hoffmann, H. Kernel PCA for novelty detection. Pattern Recognit.40 (3), 863–874 (2007). [Google Scholar]
  • 38.Costa, A. P. et al. Integrating multicriteria decision making and principal component analysis: a systematic literature review. Cogent Eng [Internet]. ;11(1):2374944. (2024). Available from: 10.1080/23311916.2024.2374944
  • 39.Granato, D., Santos, J. S., Escher, G. B., Ferreira, B. L. & Maggio, R. M. Use of principal component analysis (PCA) and hierarchical cluster analysis (HCA) for multivariate association between bioactive compounds and functional properties in foods: A critical perspective. Trends Food Sci Technol [Internet]. ;72(December 2017):83–90. (2018). Available from: 10.1016/j.tifs.2017.12.006
  • 40.Mosteller, F. & Tukey, J. W. Data analysis and regression. A second course in statistics. Addison-Wesley Ser. Behav. Sci. Quant. Methods ; (1977).
  • 41.Schreiber, J. B. Issues and recommendations for exploratory factor analysis and principal component analysis. Res Soc Adm Pharm [Internet]. ;17(5):1004–11. (2021). Available from: 10.1016/j.sapharm.2020.07.027 [DOI] [PubMed]
  • 42.Kaiser, H. F., Rice, J. & Little Jiffy, Mark, I. V. Educational PsychologicalMeasurement, 34 (1), 111–117. (1974). [Google Scholar]
  • 43.Yuan, S., Zhou, J., Pan, J. & Shen, J. Sphericity and identity test for High-dimensional covariance matrix using random matrix theory. Acta Math. Appl. Sin Engl. Ser.37 (2), 214–231 (2021). [Google Scholar]
  • 44.Wang, Y. Financial Crisis Prediction Model of Listed Companies Based on Statistics and AI. Sci Program. ;2022. (2022).
  • 45.Bartlett, M. S. Tests of significance in factor analysis. Brit J. Stat. Psycho ; (1950).
  • 46.Tripathi, M. & Singal, S. K. Use of Principal Component Analysis for parameter selection for development of a novel Water Quality Index: A case study of river Ganga India. Ecol Indic [Internet]. ;96:430–6. (2019). Available from: https://www.sciencedirect.com/science/article/pii/S1470160X18307003
  • 47.Hossain, M. M. & Hossain, M. A. Feature Reduction and Classification of Hyperspectral Image Based on Multiple Kernel PCA and Deep Learning. In: 2019 IEEE International Conference on Robotics, Automation, Artificial-intelligence and Internet-of-Things (RAAICON). pp. 141–4. (2019).
  • 48.Hou, G., Wang, J. & Fan, Y. Multistep short-term wind power forecasting model based on secondary decomposition, the kernel principal component analysis, an enhanced arithmetic optimization algorithm, and error correction. Energy [Internet]. ;286(November 2023):129640. (2024). Available from: 10.1016/j.energy.2023.129640
  • 49.Lu, H., Meng, Y., Yan, K. & Gao, Z. Kernel principal component analysis combining rotation forest method for linearly inseparable data. Cogn Syst Res [Internet]. ;53:111–22. (2019). Available from: https://www.sciencedirect.com/science/article/pii/S1389041717302887
  • 50.Tiwari, T. N. & Mishra, M. A. A preliminary assignment of water quality index of major Indian rivers. Indian J. Env Prot.5 (4), 276–279 (1985). [Google Scholar]
  • 51.Gupta, A. K., Gupta, S. K. & Patil, R. S. A comparison of water quality indices for coastal water. J. Environ. Sci. Heal - Part. Toxic/Hazardous Subst. Environ. Eng.38 (11), 2711–2725 (2003). [DOI] [PubMed] [Google Scholar]
  • 52.MEWA. Executive Regulations for the Protection of Aqueous Media from Pollution. ;1–34. (2020). Available from: https://www.mewa.gov.sa/en/InformationCenter/DocsCenter/RulesLibrary/Docs/
  • 53.WHO. Guidelines for Drinking Water Qualityvol. 1 (Recommendations. World Health Organization, 2004).
  • 54.Mahanty, B., Lhamo, P. & Sahoo, N. K. Inconsistency of PCA-based water quality index – Does it reflect the quality? Sci Total Environ [Internet]. ;866:161353. (2023). Available from: https://www.sciencedirect.com/science/article/pii/S0048969722084571 [DOI] [PubMed]
  • 55.Shrestha, S. & Kazama, F. Assessment of surface water quality using multivariate statistical techniques: A case study of the Fuji river basin, Japan. Environ. Model. Softw.22 (4), 464–475 (2007). [Google Scholar]
  • 56.Jibrin, A. M. et al. Influence of membrane characteristics and operational parameters on predictive control of permeance and rejection rate using explainable artificial intelligence (XAI). Next Res [Internet]. ;2(1):100100. (2025). Available from: https://www.sciencedirect.com/science/article/pii/S305047592400099X
  • 57.Zhao, H. et al. Supervised kernel principal component analysis-polynomial chaos-Kriging for high-dimensional surrogate modelling and optimization. Knowledge-Based Syst [Internet]. ;305(October):112617. (2024). Available from: 10.1016/j.knosys.2024.112617
  • 58.Ma, X. & Zabaras, N. Kernel principal component analysis for stochastic input model generation. J Comput Phys [Internet]. ;230(19):7311–31. (2011). Available from: 10.1016/j.jcp.2011.05.037
  • 59.Anyanwu, G. O., Nwakanma, C. I., Lee, J. M. & Kim, D. S. RBF-SVM kernel-based model for detecting DDoS attacks in SDN integrated vehicular network. Ad Hoc Networks [Internet]. ;140(October 2022):103026. (2023). Available from: 10.1016/j.adhoc.2022.103026
  • 60.Han, D., Post, V. E. A. & Song, X. Groundwater salinization processes and reversibility of seawater intrusion in coastal carbonate aquifers. J Hydrol [Internet]. ;531:1067–80. (2015). Available from: https://www.sciencedirect.com/science/article/pii/S0022169415008896
  • 61.Ghezelsofloo, E., Raghimi, M., Mahmoodlu, M. G., Rahimi-Chakdel, A. & Khademi, S. M. S. Saltwater intrusion in drinking water wells of Kordkuy, Iran: an integrated quantitative and graphical study. Environ Earth Sci [Internet]. ;80(16):1–15. (2021). Available from: 10.1007/s12665-021-09843-9
  • 62.Najib, S., Fadili, A., Mehdi, K., Riss, J. & Makan, A. Contribution of hydrochemical and geoelectrical approaches to investigate salinization process and seawater intrusion in the coastal aquifers of Chaouia, Morocco. J Contam Hydrol [Internet]. ;198:24–36. (2017). Available from: 10.1016/j.jconhyd.2017.01.003 [DOI] [PubMed]
  • 63.Park, S. C. et al. Regional hydrochemical study on salinization of coastal aquifers, western coastal area of South Korea. J Hydrol [Internet]. ;313(3):182–94. (2005). Available from: https://www.sciencedirect.com/science/article/pii/S0022169405001320
  • 64.Schafer, D. et al. Fluoride release from carbonate-rich fluorapatite during managed aquifer recharge: Model-based development of mitigation strategies. Water Res [Internet]. ;193:116880. (2021). Available from: https://www.sciencedirect.com/science/article/pii/S0043135421000786 [DOI] [PubMed]
  • 65.Alfarrah, N., Berhane, G., Mjemah, I. C., Van Camp, M. & Walraevens, K. The origin of high sulfate concentrations and hydrochemistry of the upper Miocene–Pliocene–Quaternary aquifer complex of Jifarah plain, NW Libya. Environ. Earth Sci.75 (20), 1–18 (2016). [Google Scholar]
  • 66.Awaleh, M. O. et al. Origin of nitrate and sulfate sources in volcano-sedimentary aquifers of the East Africa Rift System: An example of the Ali-Sabieh groundwater (Republic of Djibouti). Sci Total Environ [Internet]. ;804:150072. (2022). Available from: https://www.sciencedirect.com/science/article/pii/S0048969721051470 [DOI] [PubMed]
  • 67.Cheng, L., Jiang, C., Li, C. & Zheng, L. Tracing sulfate source and transformation in the groundwater of the Linhuan coal mining area, Huaibei coalfield, China. Int. J. Environ. Res. Public. Health ;19(21). (2022). [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data is available upon request from the corresponding author (A.M. Jibrin, abdulhayatjm@gmail.com).


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES