Abstract
Background/Objectives: In Saudi Arabia, noncommunicable diseases (NCDs) are an increasing public health concern, with almost 70% of deaths related to chronic diseases. The study aimed to analyze 33-year trends in NCD risk factors and apply machine learning (ML) models to identify ecological associates of premature NCD-related mortality, sex-specific analyses and project trajectories to 2030. Methods: A longitudinal ecological time-series design which used WHO Global Health Observatory (GHO) NCD Indicators (1990–2022; select lipid indicators from 1980). Five supervised regression ML models—OLS, LASSO, Ridge, Random Forest, and Gradient Boosting—were trained with TimeSeriesSplit cross-validation (five folds) to preserve temporal order and prevent data leakage. A formal PELT changepoint algorithm confirmed trend breakpoints. Linear projections to 2030 were estimated with 95% prediction intervals. Results: Adult obesity increased by +20.6 percentage points (pp) over 33 years. Under a no-policy-change scenario, female obesity is projected at 50.3% by 2030 (95% PI: 50.0–50.5%). Premature NCD mortality declined by −5.9 pp. Under TimeSeriesSplit CV, all models yielded negative R2, confirming LOOCV R2 = 0.98 reflected shared time-trend artefacts; the ML component is reframed as descriptive feature-importance analysis. The obesity sex gap (female minus male) was the strongest ecological associate of premature NCD mortality. Diabetes treatment coverage showed a strong inverse ecological association (r = −0.913). Conclusions: NCD risk factors in Saudi Arabia are evolving in complex ways. Targeted interventions addressing sex-specific disparities and healthcare system performance are urgently needed to meet national and global NCD targets.
Keywords: noncommunicable diseases, Saudi Arabia, machine learning, obesity, premature mortality, ecological study, longitudinal trends, diabetes, physical inactivity, public health
1. Introduction
1.1. Background
Worldwide, noncommunicable diseases (NCDs) have been drastically rising, making them a major public health challenge. The burden of NCDs has increased in the Kingdom of Saudi Arabia (KSA) in recent decades. Epidemiological trends indicate a dramatic change in the disease burden profile in the Kingdom, with an increase in chronic conditions such as diabetes, cardiovascular diseases, obesity and hypertension, which have been collectively associated with an estimated 70% of all deaths, impacting significantly on morbidity and premature mortality [1,2,3].
Due to the surge in socioeconomic development, urbanization, and lifestyle changes, risk behaviours such as physical inactivity and poor dietary habits have affected over 85% of Saudi adults. WHO GHO age-standardized data indicate adult obesity (BMI ≥ 30) reached 40.6% in 2022 (Table 1); the commonly cited ~60% figure refers to overweight (BMI ≥ 25) prevalence from national surveys [2,3] and a hypertension prevalence close to 40% [2,3].
Table 1.
Descriptive statistics for NCD risk factor indicators, Saudi Arabia, 1990–2022 (WHO GHO).
| Indicator | Period | n | Mean ± SD | Min → Max | Abs. Δ (pp) | Direction |
|---|---|---|---|---|---|---|
| Adult Obesity—Both (%) | 1990–2022 | 33 | 30.7 ± 6.5 | 20.0 → 40.6 | +20.6 | ↑ |
| Adult Obesity—Female (%) | 1990–2022 | 33 | 38.3 ± 5.7 | 28.1 → 46.6 | +18.5 | ↑ |
| Adult Obesity—Male (%) | 1990–2022 | 33 | 25.9 ± 7.1 | 14.9 → 37.3 | +22.4 | ↑ |
| Adult Overweight ≥ 25 BMI (%) | 1990–2022 | 33 | 65.7 ± 4.3 | 58.2 → 71.8 | +13.7 | ↑ |
| Child Obesity > +2 SD (%) | 1990–2022 | 33 | 10.3 ± 4.1 | 4.2 → 17.5 | +13.3 | ↑ |
| Child Overweight > +1 SD (%) | 1990–2022 | 33 | 24.7 ± 8.3 | 14.2 → 40.1 | +25.9 | ↑ |
| Diabetes Prevalence (%) | 1990–2022 | 33 | 26.9 ± 2.1 | 21.8 → 29.3 | +2.4 | ↑/↓ |
| Diabetes Tx Coverage (%) | 2000–2021 | 22 | 43.9 ± 5.2 | 35.6 → 51.4 | +15.8 | ↑ |
| Physical Inactivity—Both (%) | 2000–2022 | 22 | 55.1 ± 2.0 | 51.5 → 57.8 | −6.3 | ↓ |
| Physical Inactivity—Female (%) | 2000–2022 | 22 | 63.7 ± 3.4 | 57.8 → 68.8 | −11.0 | ↓ |
| Physical Inactivity—Male (%) | 2000–2022 | 22 | 49.0 ± 1.2 | 46.7 → 50.7 | −4.0 | ↓ |
| Total Cholesterol (mmol/L) | 1980–2018 | 39 | 4.6 ± 0.0 | 4.6 → 4.6 | 0.0 | → |
| HDL Cholesterol (mmol/L) | 1980–2018 | 39 | 1.1 ± 0.1 | 1.1 → 1.2 | +0.1 | ↑ |
| Cigarette Smoking—Male (%) * | 2000–2022 | 8 | 16.6 ± 0.7 | 15.5 → 17.6 | +2.1 | ↑ |
| NCD Premature Mortality 30–70 (%) | 1990–2022 | 33 | 15.9 ± 2.7 | 12.5 → 19.5 | −5.9 | ↓ |
* Cigarette smoking (male) excluded from ML modelling due to n = 8 data points (not Dataset A). Δ = absolute change (last − first observation). ↑ = increasing; ↓ = decreasing; → = stable; ↑/↓ = non-linear (rises then falls). pp = percentage points. Data source: WHO Global Health Observatory.
Socioeconomic factors are among the main determinants of the prevalence and distribution of NCD risk factors in Saudi Arabia. Research indicates that the likelihood of individuals presenting with these risk factors is significantly influenced by educational attainment, income level, employment status and gender. For instance, women and individuals with lower levels of education have higher rates of hypertension and obesity. However, tobacco use appears to be more prevalent in some occupational groups [2,3]. Furthermore, the prevalence of diabetes is associated with socioeconomic inequalities, with higher rates seen in groups with lower education and income levels, emphasizing the complex relationship between social factors and chronic disease risk [4]. Addressing these inequalities is crucial to reduce health inequalities associated with NCDs and to improve overall population health outcomes. In response, Saudi Arabia has initiated extensive healthcare reforms aligned with Vision 2030 and the SDGs [5,6].
1.2. Prior Work
Recent developments in ML and big data analytics present significant opportunities for chronic disease management [7,8,9,10]. The six NCD risk factor domains—anthropometry, diabetes, physical activity, lipids, tobacco, and mortality—were selected based on their established relevance to the Saudi epidemiological profile and alignment with WHO NCD surveillance priorities [7,11,12]. The selection is tailored to the specific lifestyle characteristics of Saudi Arabia, marked by high levels of physical inactivity, rapidly increasing obesity rates, and a growing diabetes treatment infrastructure.
1.3. Study Objectives
This study bridges three specific gaps. Objectives were to (i) evaluate 33-year longitudinal trends in six NCD risk factor domains; (ii) implement and compare five machine learning models to describe ecological temporal associations with premature NCD mortality; (iii) determine key population-level ecological associates with four feature importance techniques; and (iv) project risk factor trajectories to 2030.
2. Methods
2.1. Study Design
A longitudinal ecological time-series design was used. Saudi Arabia is the only study unit, observed over 1990–2022 (depending on the indicator). The unit of analysis is the country-year (n = 22 for the ML dataset). All of these associations are at the population level and should not be interpreted as evidence of causality at the individual level (ecological fallacy).
2.2. Data Source
Data were obtained from the WHO GHO NCD Indicators (https://data.humdata.org/dataset/who-data-for-sau accessed on 2 February 2026). The WHO NCD Risk Factor Collaboration (NCD-RisC) [11,12] produces these estimates using Bayesian hierarchical models. Age-standardized estimates enable meaningful comparisons over time. A note on WHO modelled estimate uncertainty: these estimates carry inherent uncertainty beyond reported confidence intervals, including measurement error in source surveys, model extrapolation for sparse years, and Bayesian structural assumptions—a key limitation acknowledged throughout.
2.3. Indicator Selection and Data Preparation
Eighteen indicators were extracted, reshaped from long to wide format (one row per year). Two derived variables were computed: (1) obesity sex gap (female–male, pp) this variable captures sex-specific divergence not fully represented by individual male/female obesity variables alone; its unique predictive signal is confirmed by consistent #1 ranking across all feature-importance methods; and (2) total cholesterol HDL ratio. Cigarette smoking (male) was excluded from Dataset A and all ML modelling due to limited temporal coverage (n = 8 non-consecutive data points). It appears in Table 1 for descriptive completeness only.
2.4. Statistical Analysis
2.4.1. Descriptive and Trend Analysis
OLS linear regression against calendar year estimated annual rates of change (β, 95% CI, R2). A note on data periods: trend analysis covers 1990–2022 for primary obesity/mortality indicators; lipid data extend to 1980.
2.4.2. Correlation Analysis
Pearson correlation coefficients were computed for ecological associations between risk factors and premature NCD mortality (2000–2021; n = 22). Strength classified following Cohen [13] and Evans [14]: very strong |r| > 0.90, strong |r| > 0.70, moderate |r| > 0.40. VIF estimated for multicollinearity. All correlations are ecological—they describe co-occurring population-level trends and cannot be used to infer individual-level causal pathways (ecological fallacy).
2.4.3. Formal Changepoint Detection
Pruned Exact Linear Time (PELT) algorithm (L2 cost model; Python ruptures v1.1.9) applied to adult obesity 1990–2022 (n = 33). Sensitivity analysis at candidates 2003–2007 confirmed robustness (all R2 > 0.996, p < 0.001).
2.5. Machine Learning Models
Five supervised regression models: OLS (baseline), LASSO, Ridge, Random Forest (500 trees), Gradient Boosting (200 estimators). All features z-score standardized.
Validation Strategy
LOOCV does not respect temporal order: predicting year t may train on years t + 1, …, t + k (data leakage), artificially inflating R2. LOOCV is replaced with TimeSeriesSplit CV (five folds, scikit-learn v1.8.0), which strictly enforces temporal order. Folds: (1) train 2000–2006, test 2007–2009; (2) train 2000–2009, test 2010–2012; (3) train 2000–2012, test 2013–2015; (4) train 2000–2015, test 2016–2018; (5) train 2000–2018, test 2019–2021.
Under TimeSeriesSplit CV, all models yielded negative mean R2 (range −316 OLS to −2.5 GB), confirming LOOCV R2 = 0.98 was artefactual (shared temporal trends, not structural predictive relationships). The ML component is reframed as descriptive feature-importance analysis of ecological co-variation.
Detrended Analysis: all variables were linearly detrended before ML modelling. Results showed partial attenuation of associations after detrending, confirming shared temporal trends contribute to co-variation disclosed transparently as a limitation.
2.6. Feature Importance
Four methods: (i) RF impurity importance; (ii) GB split-gain importance; (iii) permutation importance (100 repeats); (iv) absolute LASSO coefficients. Rankings are reported as descriptive ecological co-variation indicators, not causal effect estimates.
2.7. Forecasting to 2030
Linear trend projections to 2030 using the most recent 10-year trajectory (2013–2022) for each indicator. All projections include 95% prediction intervals (PI). These are illustrative no-policy-change scenarios, not forecasts. For non-linear indicators (e.g., diabetes prevalence), projections should be treated with particular caution.
2.8. Ethical Considerations
This study used publicly available, de-identified, aggregated national data from the WHO GHO. No individual participant data were collected or processed. No ethical approval was required. The study adheres to the STROBE reporting guidelines for observational ecological studies.
3. Results
3.1. Descriptive Statistics and Data Coverage
Table 1 presents national-level descriptive statistics for 15 NCD risk factor and outcome indicators (1990–2022). Note: Cigarette smoking (male) is included for descriptive completeness only; it was excluded from ML modelling (n = 8 data points). The most notable finding is a +20.6 pp increase in adult obesity alongside a −5.9 pp decline in premature NCD mortality.
3.2. Time-Trend Analysis (Table 2)
Table 2 presents formal ordinary least squares (OLS) regression statistics for all major indicators, confirming that the changes documented in Table 1 represent statistically significant time trends rather than random variation. Annual slope (β), 95% confidence interval, p-value, and R2 are reported. Adult obesity (both sexes) is associated with an increase of +0.62 percentage points per year (95% CI: +0.58 to +0.67; p < 0.001; R2 = 0.99). Child overweight shows the steepest trajectory at +1.05 pp/yr. Premature NCD mortality is associated with a significant decline of −0.29 pp/yr (p < 0.001; R2 = 0.97). All reported R2 values are rounded to two decimal places in accordance with the precision appropriate for national-level epidemiological time-series data.
Table 2.
OLS linear trend regression results for NCD risk factor indicators, Saudi Arabia.
| Indicator | Period | β (pp/yr) | 95% CI | R2 | p-Value | Trend |
|---|---|---|---|---|---|---|
| Adult Obesity—Both (%) | 1990–2022 | +0.670 | (+0.649, +0.691) | 0.997 | <0.001 | ↑ Increasing |
| Adult Obesity—Female (%) | 1990–2022 | +0.588 | (+0.560, +0.616) | 0.990 | <0.001 | ↑ Increasing |
| Adult Obesity—Male (%) | 1990–2022 | +0.730 | (+0.714, +0.746) | 0.998 | <0.001 | ↑ Increasing |
| Adult Overweight ≥ 25 BMI (%) | 1990–2022 | +0.446 | (+0.424, +0.468) | 0.990 | <0.001 | ↑ Increasing |
| Child Obesity > +2 SD (%) | 1990–2022 | +0.422 | (+0.409, +0.435) | 0.996 | <0.001 | ↑ Increasing |
| Child Overweight > +1 SD (%) | 1990–2022 | +0.852 | (+0.793, +0.911) | 0.981 | <0.001 | ↑ Increasing |
| Diabetes Prevalence (%) | 1990–2022 | +0.029 | (−0.051, +0.109) | 0.018 | 0.462 | → Stable (non-linear) |
| Diabetes Tx Coverage (%) | 2000–2021 | +0.536 | (+0.495, +0.577) | 0.993 | <0.001 | ↑ Increasing |
| Physical Inactivity—Both (%) | 2000–2022 | −0.291 | (−0.312, −0.270) | 0.987 | <0.001 | ↓ Decreasing |
| Physical Inactivity—Female (%) | 2000–2022 | −0.505 | (−0.531, −0.479) | 0.996 | <0.001 | ↓ Decreasing |
| Physical Inactivity—Male (%) | 2000–2022 | −0.180 | (−0.197, −0.163) | 0.976 | <0.001 | ↓ Decreasing |
| HDL Cholesterol (mmol/L) | 1980–2018 | +0.004 | (+0.002, +0.006) | 0.750 | <0.001 | ↑ Increasing |
| NCD Premature Mortality 30–70 (%) | 1990–2022 | −0.359 | (−0.462, −0.256) | 0.770 | <0.001 | ↓ Decreasing |
β = annual rate of change (pp = percentage points; mmol/L for cholesterol). CI of 95% computed as β ± t0.975 × SE. OLS = ordinary least squares. Data source: WHO GHO.
3.3. Formal Changepoint Detection
Figure 1 presents the formal PELT changepoint detection results. The primary breakpoint of 2004 was confirmed across all penalty values (pen = 1–10). Phase 1 (1990–2004): β1 = +0.709 pp/yr (R2 = 0.996, p < 0.001); Phase 2 (2004–2022): β2 = +0.603 pp/yr (R2 = 0.998, p < 0.001). The original visual 2005 selection is validated within ±1 year.
Figure 1.
Formal changepoint detection—Adult Obesity Trend, Saudi Arabia 1990–2022. PELT algorithm (L2 model) confirms primary breakpoint of 2004. Phase 1 (1990–2004): β1 = +0.709 pp/yr (R2 = 0.996). Phase 2 (2004–2022): β2 = +0.603 pp/yr (R2 = 0.998). Orange dotted line = confirmed breakpoint. Data: WHO GHO.
3.4. Phase Analysis (Table 3)
Table 3 shows the segmented (piecewise) linear regression results for the five obesity and overweight indicators for the two phases based on the 2004 breakpoint. All slopes in both phases were positive and statistically significant (p < 0.001) with high explanatory power (R2 ≥ 0.948). The slope change (Δβ) indicated different trends in adults and children: adult obesity overall and in females slowed down after 2004 (Δβ = −0.106 and −0.224 pp/yr, respectively), while both childhood indicators accelerated, with child overweight (>+1 SD) showing the largest change after the breakpoint (Δβ = +0.448 pp/yr), nearly doubling its annual rate.
Table 3.
Two-phase segmented trend analysis for obesity indicators, Saudi Arabia. Breakpoint confirmed at 20.
| Indicator | Phase 1 (1990–2004) β (pp/yr) | Phase 1 R2 | Phase 2 (2004–2022) β (pp/yr) | Phase 2 R2 | Δβ (pp/yr) | Interpretation |
|---|---|---|---|---|---|---|
| Adult Obesity—Both (%) | +0.709 *** | 0.996 | +0.603 *** | 0.998 | −0.106 | Deceleration |
| Adult Obesity—Female (%) | +0.709 *** | 0.999 | +0.485 *** | 0.998 | −0.224 | Deceleration |
| Adult Obesity—Male (%) | +0.668 *** | 0.996 | +0.727 *** | 0.998 | +0.059 | Slight acceleration |
| Child Obesity > +2 SD (%) | +0.351 *** | 0.996 | +0.458 *** | 0.999 | +0.107 | Acceleration |
| Child Overweight > +1 SD (%) | +0.555 *** | 0.948 | +1.003 *** | 0.998 | +0.448 | Strong acceleration |
*** p < 0.001. Δβ = Phase 2 β − Phase 1 β; negative = deceleration, positive = acceleration. Breakpoint = 2004 (PELT algorithm, L2 model; from visual 2005 selection).
3.5. Correlation Analysis and Multicollinearity Assessment (Table 4)
All correlations are ecological they describe co-occurring national-level trends. They cannot be used to infer individual-level causal associations (ecological fallacy). Near-perfect intercorrelations (obesity vs. inactivity: r = −0.985; obesity vs. diabetes tx coverage: r = +0.996) justified use of tree-based ML models robust to multicollinearity (Figure 2).
Figure 2.
Pearson correlation heatmap NCD risk factors and premature NCD mortality, Saudi Arabia 2000–2021 (n = 22 ecological observations). Green = positive association; Red = negative association. Near-perfect correlations (|r| > 0.98) between obesity, physical inactivity, and diabetes treatment indicate a tightly coupled metabolic risk cluster. Data: WHO Global Health Observatory.
Table 4.
Pearson correlation coefficients between NCD risk factors and premature NCD mortality, Saudi Arabia 2000–2021 (n = 22 ecological observations).
| Feature | r (Pearson) | p-Value | Strength | Direction | Interpretation |
|---|---|---|---|---|---|
| Diabetes Tx Coverage | −0.913 | <0.001 | Very Strong | Inverse | Higher coverage → lower mortality (ecological) |
| Obesity Sex Gap (F−M) | +0.893 | <0.001 | Strong | Positive | Widening sex gap → higher mortality (ecological) |
| Physical Inactivity—Female | +0.868 | <0.001 | Strong | Positive | Higher inactivity → higher mortality (ecological) |
| Physical Inactivity—Both | +0.848 | <0.001 | Strong | Positive | Higher inactivity → higher mortality (ecological) |
| Physical Inactivity—Male | +0.834 | <0.001 | Strong | Positive | |
| Diabetes Prevalence | +0.776 | <0.001 | Strong | Positive | |
| Obesity—Female | −0.882 | <0.001 | Strong | Inverse | Obesity rises as mortality falls (shared time trend) |
| Obesity—Both | −0.885 | <0.001 | Strong | Inverse | |
| Obesity—Male | −0.888 | <0.001 | Strong | Inverse | |
| Overweight—Both | −0.888 | <0.001 | Strong | Inverse |
3.6. Machine Learning Model Performance (Table 5)
Table 5 gives a side-by-side comparison of LOOCV (original, inflated) and TimeSeriesSplit CV. Under honest temporal validation, all models yield negative R2, confirming that LOOCV R2 = 0.98 was artefactual. Gradient Boosting (1.167% ± 0.988) achieved the best TimeSeriesSplit RMSE (Figure 3).
Figure 3.
(NEW). Influence of validation procedure on reported performance Left: Comparison of RMSE. Right: R2 comparison. LOOCV artificially inflates R 2 because it allows future observations to train past predictions. TimeSeriesSplit CV provides honest evaluation in time without data leakage.
Table 5.
Performance of machine learning models: original LOOCV vs. revised TimeSeriesSplit CV, Saudi Arabia 2000–2021 (N = 22).
| Model | LOOCV RMSE | LOOCV MAE | LOOCV R2 | TS-CV RMSE ± SD | TS-CV R2 ± SD | R2 Inflation | Verdict |
|---|---|---|---|---|---|---|---|
| OLS (Baseline) | 0.547 | 0.488 | 0.96 | 8.27 ± 5.76 | −316 ± 194 | +316 | Severely inflated |
| LASSO | 0.780 | 0.610 | 0.91 | 2.36 ± 1.34 | −32.5 ± 30.7 | +33.4 | Severely inflated |
| Ridge | 0.816 | 0.633 | 0.90 | 2.22 ± 1.10 | −56.1 ± 79.3 | +57.0 | Severely inflated |
| Random Forest | 0.469 | 0.393 | 0.97 | 1.34 ± 1.08 | −7.6 ± 9.5 | +8.6 | Inflated |
| ★ Gradient Boosting | 0.313 | 0.224 | 0.98 | 1.17 ± 0.99 | −2.5 ± 1.5 | +3.5 | Least inflated |
LOOCV = Leave-one-out cross-validation (original submission; data leakage—using future years to train past predictions). TS-CV = TimeSeriesSplit cross-validation (updated, temporal order strictly maintained). Negative R2 = worse than mean-only baseline—the expected and correct result for n = 22 autocorrelated time-series with 10 correlated features. ★ = best model by TS-CV RMSE. R2 Inflation = LOOCV R2 − TS-CV R2. RMSE in % units. All models n_estimators OR iterations as defined; random_state = 42.
3.7. Feature Importance (Table 6)
Rankings describe ecological covariation—they do not estimate causal effect. The obesity sex gap (Female–Male, pp) is consistently ranked as the top ecological covariate of premature NCD mortality across all four methods and all five cross-validation folds, confirming its robustness as the strongest ecological co-variate of premature NCD mortality.
Table 6.
Ranks of feature importance from our four complementary methods, Saudi Arabia 2000–2021 (n = 22 ecological observations).
| Feature | Overall Rank | RF Impurity | GB Split Gain | Permutation | LASSO |β| | Interpretation |
|---|---|---|---|---|---|---|
| Inactivity (Both) | 1 | 0.104 | 0.078 | 0.024 | 11.716 | Highest LASSO weight |
| Obesity Sex Gap (F−M) | 2 | 0.137 | 0.068 | 0.042 | 0.000 | Highest permutation; #1 in the prior literature |
| Inactivity (Female) | 3 | 0.092 | 0.286 | 0.025 | 0.000 | Highest GB split gain |
| Obesity (Female) | 4 | 0.113 | 0.035 | 0.026 | 0.000 | |
| Diabetes Prevalence | 5 | 0.100 | 0.068 | 0.024 | 5.371 | 2nd LASSO weight |
| Obesity (Male) | 6 | 0.093 | 0.121 | 0.020 | 0.000 | |
| Diabetes Tx Coverage | 7 | 0.090 | 0.079 | 0.017 | 8.996 | 3rd LASSO weight |
| Overweight (Both) | 8 | 0.090 | 0.230 | 0.019 | 0.000 | |
| Obesity (Both) | 9 | 0.101 | 0.010 | 0.023 | 0.000 | |
| Inactivity (Male) | 10 | 0.081 | 0.025 | 0.020 | 0.000 |
RF = Random Forest impurity importance; GB = Gradient Boosting split gain importance; Permutation = model agnostic permutation importance (100 repeats, mean decrease R2); LASSO |β| = absolute standardized coefficient (LassoCV) Overall rank = average rank over the 4 methods. The importance values are descriptive, not causal estimates of ecological covariation. All analyses: Python 3.10, scikit-learn v1.8.0.
3.8. Projections to 2030 (Table 7)
Table 7 presents updated projections with explicit 95% prediction intervals (PI). These are no-policy-change illustrative scenarios, not forecasts. For indicators with documented non-linear trajectories (e.g., diabetes prevalence—rising to ~2010 then declining), projections are particularly unreliable and are flagged accordingly (Figure 4).
Figure 4.
Scenario-based projections to 2030 with 95% prediction intervals. No-policy-change scenario|Linearity assumed|2013–2022 trend basis. Wide bands for mortality reflect longer extrapolation horizon.
Table 7.
Scenario-based projections to 2030 with 95% prediction intervals, Saudi Arabia (base period: 2013–2022).
| Indicator | 2022 Value | 2030 Estimate | 95% PI Lower | 95% PI Upper | Trajectory | Note |
|---|---|---|---|---|---|---|
| Adult Obesity—Both (%) | 40.6% | 45.1% | 44.8% | 45.3% | ↑ | Tight recent trend (2013–2022) |
| Adult Obesity—Female (%) | 46.6% | 50.3% | 50.0% | 50.5% | ↑ | Narrow PI; 50% central estimate—not definitive forecast |
| Adult Obesity—Male (%) | 37.3% | 42.7% | 42.4% | 43.0% | ↑ | Tight recent trend |
| Physical Inactivity—Both (%) | 51.5% | 48.6% | 48.5% | 48.7% | ↓ | Only improving indicator |
| Diabetes Prevalence (%) ⚠️ | 24.2% | 21.2% | 20.8% | 21.7% | ↓ | ⚠️ Non-linear trajectory—extrapolation unreliable |
| NCD Premature Mortality 30–70 (%) | ~12.5% | ~11.4% | ~10.6% | ~12.2% | ↓ | Trajectory nearing SDG 3.4 target (<15%) |
95% PI = 95% prediction interval (formula for OLS prediction interval; base period 2013–2022, n = 10 for each indicator) These are examples of no-policy-change scenarios. We assume linearity, non-linear trajectories will yield unreliable projections. ⚠️ Diabetes prevalence showed non-linear trends (increased up to about 2010 and then decreased), and its projection is particularly uncertain. SDG 3.4 target = 1/3 reduction in premature NCD mortality by 2030 against 2015 baseline.
4. Discussion
4.1. Principal Findings
This study provides a comprehensive 33-year longitudinal analysis of NCD risk factors and their ecological associations with premature mortality in Saudi Arabia. The ML component is reframed as descriptive feature-importance analysis of ecological co-variation: under TimeSeriesSplit CV, all models yielded negative R2, confirming 22 autocorrelated country-year observations with 10 correlated features cannot support out-of-sample prediction. The ecological feature-importance rankings remain valid as descriptors of which trends co-vary most strongly with mortality.
The sustained increase in obesity prevalence across all groups is the most significant trend. Regarding the divergence between rising obesity and declining mortality, multiple concurrent mechanisms are plausible: (1) improving diabetes treatment coverage (supported by r = −0.913 ecological association); (2) demographic changes (younger age structure diluting mortality rates); (3) measurement artefacts in WHO modelled estimates; and (4) lag effects from earlier-decade risk factor accumulation. Our ecological design cannot directly test any single mechanism.
4.2. Comparison with Prior Work
The present study complements global studies GBD 2021 [11] and NCD-RisC [15] and prior Saudi cross-sectional research by integrating ML-based feature importance across 33 years of longitudinal national data [2]—an approach not employed in prior Saudi ecological studies [16,17].
4.3. Public Health and Policy Implications
The findings have several important implications for public health policy. First, the rapid increase in obesity, particularly among women, indicates an urgent need for targeted, culturally appropriate interventions addressing diet, physical activity, and behavioural risk factors. Secondly, the significant ecological footprint of diabetes care delivery on mortality highlights the necessity of enhancing healthcare systems, boosting treatment adherence and closing care gaps. Third, the identification of key ecological predictors using ML highlights the value of integrating advanced analytics into national surveillance systems. Such approaches can support data-driven decision making, allowing policymakers to focus interventions by predictive risk profile, not just descriptive trend.
Finally, the projected trajectories to 2030 indicate that Saudi Arabia may not achieve global NCD targets without intensified prevention strategies. National policies should be aligned with SDG 3.4 by coordinating efforts across health, education and social sectors [18].
4.4. Strengths and Limitations
Strengths: 33 years of standardized WHO data, multi-domain integration, trans-parent validation comparison. Limitations: (1) Ecological design—all associations population level only; ecological fallacy applies. (2) n = 22 ML sample—negative TS-CV R2 confirms prediction is not possible. (3) High feature-to-sample ratio (10 features, 22 obs)—overfitting inevitable. (4) LOOCV data leakage corrected. (5) PELT breakpoint sensitive to penalization parameter. (6) Linear projections assume no structural breaks. (7) WHO modelled estimates carry uncertainty beyond reported CIs. (8) Single-country design limits generalizability. (9) Cigarette smoking excluded from ML (n = 8 data points).
5. Conclusions
NCD risk factors in Saudi Arabia are evolving in complex ways, with obesity, physical inactivity, and diabetes treatment coverage as key ecological associates of premature mortality. The ML component serves as a descriptive feature-importance analysis of temporal ecological co-variation, rather than being predictive. Comprehensive, data-driven interventions addressing behavioural risk factors and healthcare system performance are urgently needed to meet national and global NCD targets.
Acknowledgments
The authors acknowledge the WHO Global Health Observatory for providing publicly available, de-identified, aggregated national health data used in this study.; Generative AI Disclosure: The Grammarly software (https://www.grammarly.com/ accessed on 10 March 2026) was used for editing and enhancing clarity.
Abbreviations
| CI | Confidence interval |
| GB | Gradient Boosting |
| GBD | Global Burden of Disease |
| GHO | Global Health Observatory |
| HDL | High-density lipoprotein |
| IoT | Internet of Things |
| KSA | Kingdom of Saudi Arabia |
| LASSO | Least absolute shrinkage and selection operator |
| LOOCV | Leave-one-out cross-validation |
| MAE | Mean absolute error |
| ML | Machine learning |
| NCD | Noncommunicable disease |
| NCD-RisC | NCD Risk Factor Collaboration |
| OLS | Ordinary least squares |
| pp | Percentage points |
| RF | Random Forest |
| RMSE | Root mean square error |
| SDG | Sustainable Development Goal |
| SD | Standard deviation |
| STROBE | Strengthening the Reporting of Observational Studies in Epidemiology |
| VIF | Variance inflation factor |
| WHO | World Health Organization |
Author Contributions
N.A. and S.B.B. were assigned the roles of conceptualization, investigation, writing—original draft, and writing—review and editing, suggesting a leading role in the research project, from its conception to the writing of the manuscript.; H.A. and S.K.M.M. were responsible for investigation and validation, indicating their roles in conducting the research and confirming the results.; E.A. was assigned roles in data management, analysis, and visualization, indicating a key role in handling and interpreting the research data.; S.R. was assigned to writing—review and editing, implying a contribution to refining the manuscript and to investigation and validation, suggesting involvement in conducting the research and verifying the findings. All authors have read and agreed to the published version of the manuscript.
Institutional Review Board Statement
Ethical review and approval were waived for this study because the study used publicly available, de-identified, aggregated national data from the WHO Global Health Observatory. No individual participant data were collected.
Informed Consent Statement
Patient consent has been waived. This study used publicly available, de-identified, aggregated national data from the WHO Global Health Observatory.
Data Availability Statement
Data are available from the WHO Global Health Observatory at https://data.humdata.org/dataset/who-data-for-sau (accessed on 2 February 2026).
Conflicts of Interest
The authors declare no conflicts of interest.
Funding Statement
The research has received no external funding.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
References
- 1.Memish Z.A., Jaber S., Mokdad A.H., AlMazroa M.A., Murray C.J., Al Rabeeah A.A., Saudi Burden of Disease Collaborators Burden of disease, injuries, and risk factors in the Kingdom of Saudi Arabia, 1990–2010. Prev. Chronic Dis. 2014;11:E169. doi: 10.5888/pcd11.140176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Al-Hanawi M.K., Keetile M. Socio-economic and demographic correlates of non-communicable disease risk factors among adults in Saudi Arabia. Front. Med. 2021;8:605912. doi: 10.3389/fmed.2021.605912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Al-Hanawi M.K. Socioeconomic determinants and inequalities in the prevalence of non-communicable diseases in Saudi Arabia. Int. J. Equity Health. 2021;20:174. doi: 10.1186/s12939-021-01510-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Alzeidan R., Rabiee F., Mandil A., Hersi A., Fayed A. Non-communicable disease risk factors among employees and their families of a Saudi university: An epidemiological study. PLoS ONE. 2016;11:e0165036. doi: 10.1371/journal.pone.0165036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Mani Z.A., Goniewicz K. Transforming Healthcare in Saudi Arabia: A Comprehensive Evaluation of Vision 2030’s Impact. Sustainability. 2024;16:3277. doi: 10.3390/su16083277. [DOI] [Google Scholar]
- 6.Suleiman A.K., Ming L.C. Transforming healthcare: Saudi Arabia’s vision 2030 healthcare model. J. Pharm. Policy Pract. 2025;18:2449051. doi: 10.1080/20523211.2024.2449051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hazazi A., Wilson A. Noncommunicable diseases and health system responses in Saudi Arabia: Focus on policies and strategies. Health Res. Policy Syst. 2022;20:63. doi: 10.1186/s12961-022-00872-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dong Z., Li P., Jiang Y., Wang Z., Fu S., Che H., Liu M., Zhao X., Liu C., Zhao C., et al. Integrative multi-omics and routine blood analysis using deep learning: Cost-effective early prediction of chronic disease risks. Adv. Sci. 2025;12:2412775. doi: 10.1002/advs.202412775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Eloranta S., Boman M. Predictive models for clinical decision making: Deep dives in practical machine learning. J. Intern. Med. 2022;292:278–295. doi: 10.1111/joim.13483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Qaffas A.A., Hoque R., Almazmomi N. The Internet of Things and big data analytics for chronic disease monitoring in Saudi Arabia. Telemed. e-Health. 2021;27:74–81. doi: 10.1089/tmj.2019.0289. [DOI] [PubMed] [Google Scholar]
- 11.NCD Risk Factor Collaboration (NCD-RisC) Worldwide trends in body-mass index, underweight, overweight, and obesity from 1975 to 2016: A pooled analysis of 2416 population-based measurement studies in 128.9 million children, adolescents, and adults. Lancet. 2017;390:2627–2642. doi: 10.1016/S0140-6736(17)32129-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Zhou B., Lu Y., Hajifathalian K., Bentham J., Di Cesare M., Danaei G., Bixby H., Cowan M., Ali M., Taddei C., et al. Worldwide trends in diabetes since 1980: A pooled analysis of 751 population-based studies with 4.4 million participants. Lancet. 2016;387:1513–1530. doi: 10.1016/S0140-6736(16)00618-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Lawrence Erlbaum Associates; Hillsdale, NJ, USA: 1988. [Google Scholar]
- 14.Evans J.D. Straightforward Statistics for the Behavioral Sciences. Brooks/Cole Publishing; Pacific Grove, CA, USA: 1996. [Google Scholar]
- 15.Guthold R., Stevens G.A., Riley L.M., Bull F.C. Worldwide trends in insufficient physical activity from 2001 to 2016: A pooled analysis of 358 population-based surveys with 1.9 million participants. Lancet Glob. Health. 2018;6:e1077–e1086. doi: 10.1016/S2214-109X(18)30357-7. [DOI] [PubMed] [Google Scholar]
- 16.Liang P., Zhang S., Yan M., Huang H., Kang J., Li Y., Li G., Wu H. Trends and predictive research on the global burden of ischemic heart disease from 1990 to 2021: An analysis of the Global Burden of Disease study 2021. Front. Public Health. 2025;13:1569179. doi: 10.3389/fpubh.2025.1569179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ärnlöv J. Global burden of disease 2019 and the need for data. Glob. Heart. 2020;15:61. doi: 10.5334/gh.891. [DOI] [Google Scholar]
- 18.United Nations, Department of Economic and Social Affairs, Population Division Data Portal. [(accessed on 2 March 2026)]. Available online: https://population.un.org/dataportal/home.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data are available from the WHO Global Health Observatory at https://data.humdata.org/dataset/who-data-for-sau (accessed on 2 February 2026).




