Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2024 Sep 10;14:21141. doi: 10.1038/s41598-024-72047-1

Pollutants-mediated viral hepatitis in different types: assessment of different algorithms and time series models

Shengfei Pei 1, Li Yang 2, Huixia Gao 2, Yuzhen Liu 2, Erhei Dai 2, Fumin Feng 1,, Jianhua Lu 2,
PMCID: PMC11387817  PMID: 39256598

Abstract

The escalating frequency of environmental pollution incidents has raised significant concerns regarding the potential health impacts of pollutant fluctuations. Consequently, a comprehensive study on the role of pollutants in the prevalence of viral hepatitis is indispensable for the advancement of innovative prevention strategies. Monthly incidence rates of viral hepatitis from 2005 to 2020 were sourced from the Chinese Center for Disease Control and Prevention Infectious Disease Surveillance Information System. Pollution data spanning 2014–2020 were obtained from the National Oceanic and Atmospheric Administration (NOAA), encompassing pollutants such as CO, NO2, and O3. Time series analysis models, including seasonal auto-regressive integrated moving average (SARIMA), Holt-Winters model, and Generalized Additive Model (GAM), were employed to explore prediction and synergistic effects related to viral hepatitis. Spearman correlation analysis was utilized to identify pollutants suitable for inclusion in these models. Concurrently, machine learning (ML) algorithms were leveraged to refine the prediction of environmental pollutant levels. Finally, a weighted quantile sum (WQS) regression framework was developed to evaluate the singular and combined impacts of pollutants on viral hepatitis cases across different demographics, age groups, and environmental strata. The incidence of viral hepatitis in Beijing exhibited a declining trend, primarily characterized by HBV and HCV types. In predicting hepatitis prevalence trends, the Holt-Winters additive seasonal model outperformed the SARIMA multiplicative model ((1,1,0) (2,1,0) [12]). In the prediction of environmental pollutants, the SVM model demonstrated superior performance over the GPR model, particularly with Polynomial and Besseldot kernel functions. The combined pollutant risk effect on viral hepatitis was quantified as βWQS (95% CI) = 0.066 (0.018, 0.114). Among different groups, PM2.5 emerged as the most sensitive risk factor, notably impacting patients with HCV and HEV, as well as individuals aged 35–64. CO predominantly affected HAV patients, showing a risk effect of βWQS (95% CI) = − 0.0355 (− 0.0695, − 0.0016). Lower levels of PM2.5 and PM10 were associated with heightened risk of viral hepatitis incidence with a lag of five months, whereas elevated levels of PM2.5 (100–120 μg/m3) and CO correlated with increased hepatitis incidence risk with a lag of six months. The Holt-Winters model outperformed the SARIMA model in predicting the incidence of viral hepatitis. Among machine learning algorithms, SVM and GPR models demonstrated superior performance for analyzing pollutant data. Patients infected with HAV and HEV were primarily influenced by PM10 and CO, whereas SO2 and PM2.5 significantly impacted others. Individuals aged 35–64 years appeared particularly susceptible to these pollutants. Mixed pollutant exposures were found to affect the development of viral hepatitis with a notable lag of 5–6 months. These findings underscore the importance of long-term monitoring of pollutants in relation to viral hepatitis incidence.

Keywords: Viral hepatitis, Time series, Pollutants, Machine learning, Weighted quantile sum

Subject terms: Viral hepatitis, Risk factors

Introduction

Viral hepatitis refers to liver inflammation caused by infection with one of five known viruses: hepatitis A, B, C, D, and E 1,2. This condition poses a significant global public health challenge, affecting billions worldwide and contributing to high rates of morbidity and mortality. Hepatitis A and E typically follow a self-limiting course with full recovery, whereas hepatitis B and C often progress to chronic infection and are associated with severe health outcomes. Historical records trace the prevalence of hepatitis back to ancient times, with documented outbreaks dating back 5000 years ago in China and descriptions of jaundice recorded by Hippocrates on the island of Sássos in the fifth century BC3. Viral hepatitis causes over 1.4 million deaths annually4. In a multicenter international study across 161 countries, the prevalence of hepatitis B virus (HBV) surface antigen (HBsAg) was reported at 3.61%5. Despite declines in the disease burden of HBV and HCV infections globally over the past three decades, HBV remains prevalent in China6. Consequently, viral hepatitis has emerged as a top global health priority, prompting the implementation of extensive public health policies.

To effectively inform health policies aimed at preventing viral hepatitis, accurate prediction of its trends is paramount. Research in Iran has identified the Holt Exponential Smoothing (HES) model as highly accurate in forecasting HBV incidence7. However, comprehensive predictive studies for viral hepatitis remain limited. Existing literature predominantly focuses on clinical and virological factors, often overlooking environmental influences. For instance, a study in Spain demonstrated that each additional rainy day increased the risk of contracting hepatitis A two weeks later (IRR = 1.03, 95% CI = 1.01–1.05)8. Additionally, Chen et al.9 found a correlation between PM2.5 exposure and hepatitis progression to hepatocellular carcinoma, though research on the synergistic effects of pollutants with hepatitis infection remains scarce.

This study aims to investigate the epidemiological characteristics of viral hepatitis of viral hepatitis, develop predictive models using various methods, and explore the singular, multiple, and interactive effects of pollutants. Specifically, our objectives are to: (a) construct and evaluate prediction models using diverse methodologies; (b) explore the single and multiple effects of pollutants across different groups; (c) analyze pollutant interactions over lagging timeframes.

Patients and methods

Overview of the study area

Beijing, situated in northern China, covers a land area of 16,410.54 square kilometers. It is centrally located at approximately 116°20′ east longitude and 39°56′ north latitude. Beijing experiences a warm temperate semi-humid and semi-arid monsoon climate, characterized by hot and rainy summers and cold and dry winters. Administratively, the city comprises 16 districts and serves as the capital of the People’s Republic of China.

Data source

Data on all reported cases of viral hepatitis in Beijing from 2005 to 2020 were sourced from the public health science data center website (https://www.phsciencedata.cn/). This dataset includes information on the incidence and morbidity of various types of viral hepatitis such as HAV, HBV, HCV, HDV, HEV, and unclassified hepatitis. Diagnosis of all patients followed the criteria outlined in the viral hepatitis management guidelines issued by the Ministry of Health of the People's Republic of China. Ethical approval for this study was obtained from the China Center for Disease Control and Prevention. To ensure confidentiality, viral hepatitis data were analyzed anonymously. Given that viral hepatitis is classified as a statutory infectious disease under national mandatory surveillance, informed consent was not required. Monthly pollutions information (2014–2020) were sourced from the National Oceanic and Atmospheric Administration (NOAA) (https://www.noaa.gov/) encompassing parameters such as AQI, PM2.5, PM10, SO2, CO, NO2 and O3.

Time series analysis of single and multiple interaction

This study employed three models for time series analysis. The SARIMA and Holt-Winters models were primarily used for predicting the incidence trends of viral hepatitis. The Holt-Winters exponential smoothing model is effective in smoothing out random fluctuations and assigns varying weights to data across cycles, thereby enhancing the accuracy of future trend predictions10. Holt-Winters' additive model has the following expression:

y^t+h/t=lt+hbt+st-m+h,lt=αyt-st-m+1-αlt-1+bt-1,bt=βlt-lt-1+1-βbt-1,st=γyt-lt-1-bt-1+1-γst-m.

where, 0 ≤ α ≤ 1, 0 ≤ β ≤ 1, 0 ≤ γ ≤ 1 − α. st−m+h is the seasonal term. α, β, and γ are the smoothing parameters. m is seasonal periods, and h is the predicted step size.

The Seasonal Autoregressive Integrated Moving Average (SARIMA) model decomposes the observed values into three parts: residuals, seasonal features, and true trends11. The SARIMA (p, d, q) (P, D, Q) s model can be expressed as follows:

ΦpLAPLsΔdΔsDyt=ΘqLBQLsεt,ΦpL=1-φ1L-φ2L--φpLp,APLs=1-α1Ls-α2L2s--αPLPs,ΘqL=1+θ1L+θ2L++θqLq,BQLs=1+β1Ls+β2L2s++βQLQs,Δsyt=(1-Ls)yt=yt-yt-s,Δs=1-Ls,εt:WN0,σ2

where, Δ and Δs denote non-seasonal and seasonal differences, respectively. φ, Φ, θ and Θ are the parameters of the model, εt is white noise with independent and identical distribution12.

Following this, Spearman correlation analysis was used to identify relevant pollutants. Subsequently, the GAM generalized additive model (GAMs) was used to explore the interaction of pollutant factors on the prevalence of viral hepatitis13. The following model formula are as followed:

log[E(Yt)]=α1+s(X1,X2)+ΣsXt

α1 is the intercept; X1 and X2 indicate two interaction pollutants; s () indicates penalized spline function. s (X1, X2) is a spline function of the interaction between the parameters X1 and X2 (X1 and X2 are all 5–6 months lagged variables.). ΣsXt are the factors of non-interaction pollutants.

Machine learning training process

To predict viral hepatitis across different age groups and subtypes, various machine learning (ML) algorithms were employed, and the results compared. The modeling utilized data from 2014 to 2018 for training set and data from 2019 to 2020 for testing, with both sets undergoing ten-fold cross-validation. The Gaussian Process Regression (GPR) model operates by defining a Gaussian process to model the distribution of functions, followed by Bayesian inference in function space14. Four kernel function algorithms—Rbf, Polynomial, Laplace, and Bessel—were employed in the GPR model for comparison. The support vector regression (SVR) algorithms were also utilized, which map input features to a higher dimensional space, maximizing the margin between classes15. The SVR model compared four kernel function algorithms: Linear, Polynomial, Radial and Sigmoid. This study used R4.3.1 package e1071 and kernlab to construct SVR and GPR models, respectively. We use pollutants as predictor variables in the model of the ML algorithm. Subsequently, we consider the overall incidence of the population, the incidence among different age groups, and the incidence among different types of viral hepatitis as outcome variables. This allows us to investigate the sensitivity of different populations to air pollutants in terms of disease incidence.

Single pollution and weighted quantile sum (WQS) statistical analyses

The WQS regression model serves to evaluate the combined effects of multiple exposure variables on a specified outcome. Each exposure variable is assigned a weight within the model to quantify its influence on the outcome variable16. Initially, this study employs the WQS model to identify pollutants significantly impacting the incidence rate of viral hepatitis across various age groups and subtypes. To assess the cumulative impact of simultaneous exposure to multiple pollutants and discern individual contributions of each pollutant, a “mixtures” approach via WQS regression analysis was utilized. Concurrently, epidemiological data was stratified into different air quality categories based on Beijing's AQI, distinguishing between pollution and good air quality levels. Within varying environmental quality states, the WQS regression model was applied to analyze how different pollutants influence the incidence and mortality of viral hepatitis.

Results

Demographic characteristics

From Table 1, the incidence of viral hepatitis in Beijing between 2005 and 2020 exhibited a general declining trend, with a notable short-term surge observed from 2016 to 2018. Conversely, the mortality rate displayed an increasing trend, peaking at 0.77 per 100,000 in 2011. Predominantly, HBV and HCV subtypes accounted for approximately 86.25% of cases, while HDV cases were rare, totaling only three. The seasonal distribution indicated spring and summer epidemics. Among age groups, individuals aged 35–64 years constituted the majority at 51.23%, followed by those aged 15–34 years at 31.38%.

Table 1.

Distribution of viral hepatitis cases by age, types and season groups in Beijing, China, 2005–2020.

Characteristic 0–14 15–34 35–64 ≥ 65 Total Incidence
(10–5%)
Mortality
(10–5%)
No of hepatitis cases (%)
Year 2005 128 (1.31%) 3793 (38.85%) 4369 (44.75%) 1473 (15.09%) 9763 45.33 0.18
2006 154 (1.20%) 4925 (38.38%) 5817 (45.34%) 1935 (15.08%) 12,831 59.58 0.59
2007 103 (1.09%) 3492 (36.93%) 4482 (47.40%) 1378 (14.57%) 9455 43.90 0.38
2008 61 (0.86%) 2404 (34.00%) 3423 (48.41%) 1183 (16.73%) 7071 32.83 0.32
2009 43 (0.71%) 1800 (29.74%) 3097 (51.17%) 1112 (18.37%) 6052 28.10 0.67
2010 46 (0.86%) 1374 (25.56%) 2884 (53.65%) 1072 (19.94%) 5376 24.96 0.69
2011 30 (0.59%) 1295 (25.61%) 2847 (56.31%) 884 (17.48%) 5056 23.48 0.77
2012 17 (0.41%) 1097 (26.41%) 2352 (56.63%) 687 (16.54%) 4153 19.28 0.50
2013 16 (0.47%) 902 (26.25%) 1942 (56.52%) 576 (16.76%) 3436 15.95 0.72
2014 8 (0.26%) 763 (24.94%) 1779 (58.16%) 509 (16.64%) 3059 14.20 0.40
2015 14 (0.47%) 737 (24.78%) 1712 (57.57%) 511 (17.18%) 2974 13.81 0.42
2016 9 (0.31%) 727 (25.20%) 1635 (56.67%) 514 (17.82%) 2885 13.40 0.48
2017 9 (0.28%) 948 (29.07%) 1771 (54.31%) 533 (16.34%) 3261 15.14 0.48
2018 7 (0.20%) 952 (26.70%) 1983 (55.61%) 624 (17.50%) 3566 16.56 0.38
2019 9 (0.30%) 668 (22.27%) 1728 (57.62%) 594 (19.81%) 2999 13.93 0.39
2020 5 (0.24%) 479 (23.23%) 1209 (58.63%) 369 (17.90%) 2062 9.57 0.39
Classifications HAV 113 (4.25%) 839 (31.55%) 1288 (48.44%) 419 (15.76%) 2659
HBV 343 (0.63%) 20,487 (37.40%) 27,194 (49.64%) 6754 (12.33%) 54,778
HCV 112 (0.63%) 3203 (18.13%) 9216 (52.15%) 5140 (29.09%) 17,671
HDV 0 (0.00%) 0 (0.00%) 2 (66.67%) 1 (33.33%) 3
HEV 19 (0.32%) 851 (14.24%) 3766 (63.03%) 1339 (22.41%) 5975
Unclassified hepatitis 72 (2.47%) 976 (33.50%) 1564 (53.69%) 301 (10.33%) 2913
Seasons Spring (Mar–May) 176 (0.75%) 7486 (31.84%) 11,928 (50.74%) 3919 (16.67%) 23,509
Summer (Jun–Aug) 219 (1.06%) 6668 (32.38%) 10,420 (50.61%) 3283 (15.94%) 20,590
Autumn (Sep–Nov) 134 (0.70%) 6074 (31.65%) 9749 (50.79%) 3236 (16.86%) 19,193
Winter (Dec–Feb) 130 (0.63%) 6128 (29.59%) 10,933 (52.80%) 3516 (16.98%) 20,707
Total 659 (0.78%) 26,356 (31.38%) 43,030 (51.23%) 13,954 (16.61%) 83,999

The analysis of time series model results

Comparing the predicted graphs from Fig. 1A, B, it can be observed that the Holt-Winters model outperforms the SARIMA model in time periods. In Table S1, the Deviation indicator reveals that the Holt-Winters model demonstrates a relatively minor discrepancy compared to the SARIMA model in predicting outcomes for the year 2019. However, the Holt-Winters model exhibits a notable advantage in its predictions for 2020. In Table S2, the parameters for the Holt-Winters additive model are determined as α = 0.44, β = 0.09, γ = 1, while the SARIMA multiplicative model is specified as SARIMA (1,1,0) (2,1,0) [12]. Despite comparing metrics such as RMSE, it was found that there is little discernible difference in the performance of the two models.

Fig. 1.

Fig. 1

Forecast plots for Holt-Winters (A) and SARIMA (B) models. The deep shaded regions indicate 80% confidence intervals, the light shaded regions indicate 95% confidence intervals.

Model prediction comparisons

Figure S1 showed illustrates the results of Spearman's correlation analysis, revealing positive associations between five pollutants—PM2.5, PM10, SO2, CO and NO2—and the prevalence of viral hepatitis. Notably, PM2.5 shows a significant cross-correlation with both PM10 and CO (r = 0.84, P < 0.001). Table 2 compares four kernel algorithms of GPR, indicating relatively better predictive performance for HCV across different genotypes (R2test ∈ [0.087, 0.202]). Similarly, among age groups, individuals aged 35 and above exhibit more accurate predictions (R2test ∈ [0.024, 0.150]). The Besseldot kernel function within the GPR model demonstrates superior predictive capability. Table 3 evaluates four kernel algorithms of SVM, highlighting HBV as having better predictive outcomes across genotypes (R2test ∈ [0.215, 0.303]). Additionally, individuals aged 35 and above show enhanced prediction accuracy (R2test ∈ [0.010, 0.132]). The Polynomial kernel function proves advantageous within the SVM framework. Overall, SVM demonstrates superior predictive performance compared to GPR across the evaluated metrics, underscoring its efficacy in modeling the relationships between pollutants, genotypes, age groups, and viral hepatitis development.

Table 2.

Comparison of the prediction results with different kernal of gaussian distribution regression (GPR) models.

Model Series Parameters Training set Test set
RMSE R2 MAE RMSE R2 MAE

GPR

(rbfdot)

Total cases sigma = 0.476 0.163 0.582 0.120 0.383 0.002 0.325
Classification HAV sigma = 0.476 0.022 0.574 0.016 0.029 0.001 0.025
HBV sigma = 0.476 0.117 0.517 0.082 0.226 0.033 0.195
HCV sigma = 0.476 0.048 0.647 0.034 0.122 0.087 0.106
HEV sigma = 0.476 0.031 0.400 0.024 0.050 0.080 0.042
Unclassified hepatitis sigma = 0.476 0.005 0.638 0.004 0.007 0.089 0.007
Age 0–14 years sigma = 0.476 0.003 0.412 0.003 0.004 0.000 0.004
15–34 years sigma = 0.476 0.055 0.518 0.042 0.119 0.000 0.105
35–64 years sigma = 0.476 0.097 0.608 0.071 0.217 0.000 0.185
65- years sigma = 0.476 0.034 0.564 0.027 0.067 0.024 0.056

GPR

(polydot)

Total cases degree = 1, scale = 1, offset = 1 0.193 0.223 0.142 0.373 0.060 0.321
Classification HAV degree = 1, scale = 1, offset = 1 0.030 0.044 0.020 0.030 0.021 0.026
HBV degree = 1, scale = 1, offset = 1 0.137 0.174 0.097 0.222 0.003 0.194
HCV degree = 1, scale = 1, offset = 1 0.062 0.218 0.045 0.118 0.174 0.101
HEV degree = 1, scale = 1, offset = 1 0.033 0.201 0.027 0.052 0.043 0.043
Unclassified hepatitis degree = 1, scale = 1, offset = 1 0.007 0.074 0.005 0.008 0.000 0.007
Age 0–14 years degree = 1, scale = 1, offset = 1 0.004 0.115 0.003 0.004 0.009 0.004
15–34 years degree = 1, scale = 1, offset = 1 0.065 0.132 0.051 0.117 0.039 0.102
35–64 years degree = 1, scale = 1, offset = 1 0.115 0.255 0.085 0.210 0.072 0.180
65- years degree = 1, scale = 1, offset = 1 0.043 0.133 0.035 0.065 0.043 0.054

GPR

(laplacedot)

Total cases sigma = 0.476 0.154 0.765 0.114 0.374 0.037 0.319
Classification HAV sigma = 0.476 0.021 0.722 0.014 0.029 0.001 0.025
HBV sigma = 0.476 0.109 0.763 0.077 0.218 0.014 0.188
HCV sigma = 0.476 0.047 0.741 0.033 0.121 0.148 0.106
HEV sigma = 0.476 0.028 0.651 0.023 0.050 0.067 0.042
Unclassified hepatitis sigma = 0.476 0.005 0.794 0.004 0.007 0.052 0.007
Age 0–14 years sigma = 0.476 0.003 0.679 0.002 0.004 0.002 0.004
15–34 years sigma = 0.476 0.051 0.769 0.041 0.117 0.000 0.105
35–64 years sigma = 0.476 0.092 0.770 0.068 0.210 0.045 0.180
65- years sigma = 0.476 0.033 0.729 0.026 0.065 0.069 0.055

GPR

(besseldot)

Total cases sigma = 1, order = 1, degree = 1 0.192 0.276 0.142 0.366 0.151 0.307
Classification HAV sigma = 1, order = 1, degree = 1 0.028 0.197 0.019 0.031 0.000 0.027
HBV sigma = 1, order = 1, degree = 1 0.135 0.248 0.097 0.211 0.022 0.184
HCV sigma = 1, order = 1, degree = 1 0.058 0.338 0.041 0.120 0.202 0.106
HEV sigma = 1, order = 1, degree = 1 0.033 0.235 0.027 0.049 0.085 0.042
Unclassified hepatitis sigma = 1, order = 1, degree = 1 0.006 0.361 0.005 0.007 0.010 0.007
Age 0–14 years sigma = 1, order = 1, degree = 1 0.004 0.286 0.003 0.005 0.000 0.004
15–34 years sigma = 1, order = 1, degree = 1 0.065 0.213 0.051 0.117 0.046 0.105
35–64 years sigma = 1, order = 1, degree = 1 0.115 0.306 0.084 0.204 0.150 0.171
65- years sigma = 1, order = 1, degree = 1 0.041 0.282 0.032 0.063 0.133 0.053

Table 3.

Comparison of the prediction results with different kernal of support vector machines (SVM) models.

Model Series Parameters Training set Test set
RMSE R2 MAE RMSE R2 MAE

SVM

(Linear)

Total cases cost = 0.001, gamma = 0.2 0.221 0.125 0.167 0.354 0.196 0.306
Classification HAV cost = 0.1, gamma = 0.2 0.030 0.148 0.019 0.022 0.018 0.019
HBV cost = 0.001, gamma = 0.2 0.154 0.168 0.108 0.200 0.215 0.168
HCV cost = 1, gamma = 0.2 0.056 0.364 0.037 0.107 0.004 0.090
HEV cost = 0.001, gamma = 0.2 0.037 0.089 0.031 0.052 0.175 0.043
Unclassified hepatitis cost = 0.001, gamma = 0.2 0.007 0.120 0.006 0.007 0.001 0.007
Age 0–14 years cost = 5, gamma = 0.2 0.003 0.366 0.002 0.004 0.001 0.003
15–34 years cost = 0.001, gamma = 0.2 0.070 0.173 0.056 0.110 0.065 0.098
35–64 years cost = 1, gamma = 0.2 0.113 0.313 0.073 0.214 0.061 0.183
65- years cost = 0.01, gamma = 0.2 0.047 0.099 0.037 0.064 0.132 0.055

SVM

(Polynomial)

Total cases degree = 3, cost = 0.5, gamma = 0.2 0.200 0.231 0.135 0.374 0.182 0.325
Classification HAV degree = 3, cost = 0.1, gamma = 0.2 0.030 0.148 0.019 0.022 0.018 0.019
HBV degree = 3, cost = 0.5, gamma = 0.2 0.141 0.221 0.090 0.232 0.303 0.201
HCV degree = 3, cost = 1, gamma = 0.2 0.056 0.364 0.037 0.107 0.004 0.090
HEV degree = 3, cost = 0.1, gamma = 0.2 0.036 0.164 0.029 0.050 0.065 0.041
Unclassified hepatitis degree = 3, cost = 0.1, gamma = 0.2 0.007 0.143 0.005 0.007 0.001 0.006
Age 0–14 years degree = 3, cost = 3, gamma = 0.2 0.003 0.368 0.002 0.004 0.001 0.003
15–34 years degree = 3, cost = 0.1, gamma = 0.2 0.068 0.186 0.054 0.112 0.023 0.100
35–64 years degree = 3, cost = 1, gamma = 0.2 0.113 0.313 0.073 0.214 0.061 0.183
65- years degree = 3, cost = 0.1, gamma = 0.2 0.045 0.175 0.035 0.065 0.058 0.055

SVM

(Radial)

Total cases cost = 1, gamma = 1 0.150 0.594 0.085 0.393 0.001 0.349
Classification HAV cost = 1, gamma = 4 0.020 0.680 0.007 0.025 0.012 0.022
HBV cost = 1, gamma = 0.1 0.141 0.191 0.089 0.234 0.264 0.203
HCV cost = 1, gamma = 0.5 0.048 0.581 0.030 0.105 0.036 0.090
HEV cost = 1, gamma = 0.1 0.033 0.237 0.026 0.048 0.023 0.039
Unclassified hepatitis cost = 1, gamma = 4 0.003 0.882 0.002 0.007 0.000 0.007
Age 0–14 years cost = 1, gamma = 0.5 0.004 0.358 0.002 0.004 0.009 0.003
15–34 years cost = 1, gamma = 0.1 0.065 0.173 0.048 0.126 0.001 0.114
35–64 years cost = 1, gamma = 1 0.089 0.645 0.052 0.217 0.001 0.188
65- years cost = 1, gamma = 0.1 0.042 0.232 0.032 0.066 0.010 0.056

SVM

(Sigmoid)

Total cases coef0 = 0.1, gamma = 1 0.150 0.594 0.085 0.393 0.001 0.349
Classification HAV coef0 = 0.1, gamma = 4 0.020 0.680 0.007 0.025 0.012 0.022
HBV coef0 = 0.1, gamma = 0.1 0.141 0.191 0.089 0.234 0.264 0.203
HCV coef0 = 0.1, gamma = 0.5 0.048 0.581 0.030 0.105 0.036 0.090
HEV coef0 = 0.1, gamma = 0.1 0.033 0.237 0.026 0.048 0.023 0.039
Unclassified hepatitis coef0 = 0.1, gamma = 4 0.003 0.882 0.002 0.007 0.000 0.007
Age 0–14 years coef0 = 0.1, gamma = 0.5 0.004 0.358 0.002 0.004 0.009 0.003
15–34 years coef0 = 0.1, gamma = 0.1 0.065 0.173 0.048 0.126 0.001 0.114
35–64 years coef0 = 0.1, gamma = 1 0.089 0.645 0.052 0.217 0.001 0.188
65- years coef0 = 0.1, gamma = 0.1 0.042 0.232 0.032 0.066 0.010 0.056

Assess the combined association between multiple pollutions exposures and viral hepatitis

Table S3 presents the comprehensive sensitivity analysis, indicating that the combined effect of the five pollutants on viral hepatitis is βWQS (95% CI) = 0.066 (0.018, 0.114). Among different subtypes, pollutants demonstrate significant adverse effects on HAV, HCV, and HEV. Across different age groups, except for the 0–14 age group, pollutants show notable adverse effects. Subsequently, based on the results of the overall sensitivity analyses, the relevant key factors were initially screened. From Table 4, focusing on individual pollutant effects, PM2.5 emerges as the primary risk factor for viral hepatitis overall, with a risk effect of βWQS (95% CI) =  − 0.0050 (− 0.0089, − 0.0013). Among different subgroups, PM2.5 stands out as the most sensitive risk factor, particularly impacting HCV and HEV patients and individuals aged 35–64. SO2 primarily affects HCV patients and individuals aged 35–64, with risk effects of βWQS (95% CI) = 0.0022 (0.0004, 0.0040) and βWQS (95% CI) = 0.0043 (0.0005, 0.0080), respectively. CO mainly impacts HAV patients, with a risk effect of βWQS (95% CI) =  − 0.0355 (− 0.0695, − 0.0016). NO2 primarily affects individuals aged 0–14, while PM10 influences HEV patients. In terms of combined pollutant effects, pollutants mainly affect HCV patients and individuals aged 35–64 (with risk effects of βWQS (95% CI) = 0.0342 (0.0210, 0.0474) and βWQS (95% CI) = 0.0453 (0.0153, 0.1556), respectively).

Table 4.

Comparison of results from the survey-weighted single pollution analyses and WQS regression of the matrix specific pollutions mixtures for the viral hepatitis.

Series Mixtures Single pollution regression survey-weighted Multiple pollution regression survey-weighted
βWQS (95%CI) p-Value βWQS (95%CI) p-Value
Total SO2 0.0074 (− 0.0091, 0.0239) 0.3829 0.0887 (0.0118, 0.1657) 0.0284*
PM2.5 − 0.0050 (− 0.0089, − 0.0013) 0.0116*
HAV SO2 − 0.0004 (− 0.0016, 0.0007) 0.4625 0.0099 (0.0021, 0.0177) 0.016*
CO − 0.0355 (− 0.0695, − 0.0016) 0.0461*
NO2 − 0.0002 (− 0.0021, 0.0016) 0.8017
HBV SO2 0.0013 (− 0.0038, 0.0065) 0.617 0.0112 (− 0.0197, 0.0421) 0.48
PM2.5 − 0.0021(− 0.0049, 0.0005) 0.1222
HCV SO2 0.0022 (0.0004, 0.0040) 0.02197* 0.0342 (0.0210, 0.0474) 6.34E-06***
PM2.5 − 0.0013 (− 0.0024, − 0.0002) 0.02201*
HEV CO − 0.0028 (− 0.0515, 0.0460) 0.9117 0.0115 (0.0015, 0.1556) 0.0286*
SO2 0.0005 (− 0.0008, 0.0019) 0.425
PM2.5 − 0.0014 (-0.0026, − 0.0002) 0.0229*
PM10 0.0009 (0.0002, 0.0016) 0.0117*

Unclassified

hepatitis

PM2.5 − 0.0002 (− 0.0004, 0.0001) 0.156 0.0017 (− 0.0001, 0.0035) 0.064561
PM10 0.0001 (− 2.47E−05, 0.0003) 0.111
CO − 0.0002 (− 0.0103, 0.0097) 0.9565
NO2 0.0001 (− 0.0004, 0.0006) 0.6488
0–14 years SO2 − 0.0001 (− 0.0002, 0.0001) 0.3878 0.0001 (− 0.0010, 0.0013) 0.830809
PM10 − 3.84E−06 (− 0.0001, 0.0001) 0.9379
NO2 − 0.0002 (− 0.0004, − 5.14E− 06) 0.0499*
15–34 years SO2 0.0001 (− 0.0025, 0.0027) 0.96 0.0232 (0.0066, 0.1556) 0.0086**
PM2.5 − 0.0010 (− 0.0024, 0.0003) 0.1297
35–64 years SO2 0.0043 (0.0005, 0.0080) 0.03024* 0.0453 (0.0153, 0.1556) 0.00473**
PM2.5 − 0.0032 (− 0.0054, − 0.0009) 0.00793**
65- years SO2 0.0001 (− 0.0017, 0.0019) 0.897 0.0127 (0.0009, 0.1556) 0.0408*
PM2.5 − 0.0002 (− 0.0009, 0.0006) 0.652

The parameter estimate (β) is reported in bold for significant single pollution or WQS mixture effects. The components with the highest weights are reported for mixtures with significant effects.

Bold font indicates statistical significance at the 0.05 level.

*** P < 0.001, ** P < 0.01, * P < 0.05.

Regarding environmental pollution periods, as illustrated by Fig. S2, SO2 and CO are key pollutants influencing the onset and mortality of viral hepatitis. During polluted periods (Fig. S2C), SO2 and PM2.5 predominantly affect onset, whereas during periods of good environmental conditions (Fig. S2A), SO2 and PM2.5 are primary factors. Similarly, for mortality during polluted periods (Fig. S2D), CO and SO2 play critical roles, while during good environmental periods (Fig. S2B), CO and PM2.5 are significant influencers.

Non-linear interaction of pollutions

From Table S4, significant interaction effects of pollutants with PM2.5-PM10 and PM2.5-CO are observed at lag periods of 5–6 months, respectively. Specifically, the interaction effect of PM2.5-PM10 is better fitted at a lag of 5 month, while the interaction effect of PM2.5-CO shows better fit at a lag of 6 months. Figure 2 illustrates fitting effect plots, revealing that the risk of viral hepatitis onset is elevated at lower levels of PM2.5 and PM10 (Fig. 2A and B), while high levels of PM2.5 (100–120 μg/m3) and CO (Fig. 2C and D) correspond to increased onset risk. Additionally, as depicted in the fitting curves of Fig. S3, the dose–response relationships of SO2 and NO2 with viral hepatitis onset become progressively clearer with increasing lag months. At lag 6 month, NO2 achieves its maximum risk effect at the level of 30–40 μg/m3.

Fig. 2.

Fig. 2

The fitting interactions of the association among pollutants and viral hepatitis cases in Beijing, 2014–2020 based on the generalized additive model (GAM), with lagging of 5 (A, B) and 6 (C, D) months.

Discussion

The incidence of viral hepatitis in Beijing Municipality exhibited an overall decreasing trend from 2005 to 2020, primarily attributed to widespread hepatitis vaccination and standardized antiviral treatments in China. These advancements have significantly reduced new cases among patients17. However, despite these preventive measures, factors such as improved quality of life and various environmental influences have exacerbated the progression of hepatitis, leading to increased incidences of cirrhosis and liver cancer. Furthermore, the chronic nature of viral hepatitis, combined with limited effective prevention and treatment options, has contributed to a slight rise in long-term mortality rates. The primary types of hepatitis in this region are HBV (Hepatitis B Virus) and HCV (Hepatitis C Virus). HBV transmission, particularly from mother to child, has historically been prevalent in China due to inadequate medical hygiene practices in the past. In contrast, HCV, which often presents with subtle symptoms and is not typically part of routine health screenings, has also contributed to its spread. Our study identified distinct seasonal patterns, with spring and summer showing higher incidence rates. The age group most susceptible to infection was predominantly 35–64 years old, consistent with findings from previous research18. This age distribution reflects the prolonged duration of hepatitis infections, with older individuals typically experiencing longer periods of infection.

Establishing robust statistical models is essential for predicting the occurrence trends of infectious diseases. Commonly utilized in time series analysis are models like Holt-Winters and ARIMA, each offering distinct advantages for predictive accuracy and practical application. In the context of viral hepatitis prediction, this study compared the Holt-Winters model with SARIMA and found that the former generally outperformed the latter. This superiority can be attributed to challenges in determining SARIMA parameters and the potential for overfitting due to complex calculations, leading to less stable predictions. The Holt-Winters model proves effective in capturing epidemiological patterns of hepatitis onset due to its computational simplicity and high predictive accuracy19. Furthermore, this study employs machine learning-based methods to predict hepatitis onset risks associated with pollutant levels. Evaluation across different hepatitis types and age groups consistently shows superior predictive performance for primary hepatitis types and highly susceptible populations, aligning with epidemiological insights. This underscores that individuals in sensitive demographics are more vulnerable to environmental pollutants, influencing hepatitis susceptibility.

Different types of viral hepatitis primarily spread through gastrointestinal and bloodborne routes. HAV and HEV, for instance, mainly transmit through the gastrointestinal tract, with transmission influenced by pollutants such as PM10 and CO. This can be linked to increasing industrialization and declining environmental awareness. Higher levels of airborne particulate matter and vehicle emissions exacerbate environmental pollution, thereby enhancing transmission through the gastrointestinal route. Other types of viral hepatitis primarily transmit through blood and bodily fluids, affected notably by pollutants like SO2 and PM2.5. Epidemiological studies have shown an association between PM2.5 levels and liver fibrosis20. Animal research indicates that air pollution can activate Kupffer cells, trigger endoplasmic reticulum stress responses, induce cytokine production, and promote collagen deposition, thereby exacerbating fibrosis progression21. This suggests environmental pollutants can impact hepatic metabolism through the bloodstream route. Furthermore, this study identifies SO2 and CO as significant pollutants influencing the onset and mortality of viral hepatitis. CO, due to its high affinity for hemoglobin binding in the bloodstream, poses a notable threat to the progression and mortality of hepatitis. These findings underscore the importance highlighted in China's infectious disease planning of addressing hepatitis transmitted through the bloodstream route.

Current literature on infectious disease prediction and pollutant impacts often focuses on single methodologies and specific effects. This study, however, employed diverse time-series methods to forecast and analyze the interactive effects of viral hepatitis, revealing significant month-to-month prediction intervals marked by considerable fluctuations. These findings underscore the challenge of capturing the inherent volatility in viral hepatitis data using conventional models. Moreover, regional constraints within the study area limited the generalizability of findings across different types of hepatitis affected by pollutants. Future research endeavors are encouraged to validate these macroscopic epidemiological insights at a microscopic level, utilizing animal models to elucidate underlying physiological mechanisms.

Conclusion

The Holt-Winters model outperformed SARIMA in predicting viral hepatitis incidence. SVM and GPR models utilizing pollutant data showed potential for enhanced prediction accuracy. Patients with HAV and HEV were primarily impacted by PM10 and CO, while SO2 and PM2.5 affected other types. The 35–64 age group exhibited higher susceptibility. Long-term exposure to mixed pollutants influenced hepatitis development with a lag of 5–6 months, emphasizing the need for sustained pollutant monitoring for effective public health strategies.

Supplementary Information

Acknowledgements

The disease data was publicly supported from the National Public Health Data Centre of China (https://www.phsciencedata.cn/). Pollutions information was publicly from the National Oceanic and Atmospheric Administration (NOAA) (https://www.noaa.gov/) including CO, NO2, O3 etc.

Author contributions

SF P: Software, Conceptual, Methodology, Formal analysis, Investigation, Resources, Writing-original draft, Writing-review & editing. L Y: Software, Conceptual, Methodology, Formal analysis, Investigation, Writing-original draft, Writing-review & editing. HX G: Conceptualization, Methodology, Formal analysis, Writing-original draft, Writing-review & editing, Funding acquisition, Supervision. YZ L: Methodology, Software, Writing—original draft, Visualization. EH D: Methodology, Software, Writing—original draft, Visualization. FM F: Conceptualization, Methodology, Formal analysis, Writing-review & editing, Funding acquisition, Supervision. JH L: Conceptualization, Methodology, Formal analysis, Writing-review & editing, Funding acquisition, Supervision. All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity. All authors read and approved the final manuscript.

Funding

This work was supported by the Medical Research Program of Hebei Province (20231696) and the National Natural Science Foundation of China (81670525).

Data availability

The data that support the findings of this study are available on request from the National Public Health Data Centre of China (https://www.phsciencedata.cn/) and the National Oceanic and Atmospheric Administration (NOAA) (https://www.noaa.gov/).

Competing interests

The authors declare no competing interests.

Ethics approval and consent to participate

Viral hepatitis is a Class B infectious disease under China’s Infectious Disease Prevention and Control Law, and each case reported by a medical institution is reported through the direct reporting system of the infectious disease network and requires epidemiological investigation and surveillance testing to further clarify the source of the virus and infection. Specimens are first tested by the laboratories of medical institutions, of which positive specimens are reviewed by disease prevention and control institutions and the results are fed back to the sending units. Therefore, this study received ethical approval from the China Center for Disease Control and Prevention. Since the disease under investigation, viral hepatitis, is a statutory infectious disease subject to national statutory monitoring each year, informed consent is not required. For confidentiality reasons, all viral hepatitis data were analyzed anonymously.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Fumin Feng, Email: fm_feng@sina.com.

Jianhua Lu, Email: 13323219965@163.com.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-024-72047-1.

References

  • 1.Liou, J. W., Mani, H. & Yen, J. H. Viral hepatitis, cholesterol metabolism, and cholesterol-lowering natural compounds. Int. J. Mol. Sci.23, 7 (2022). 10.3390/ijms23073897 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Pisano, M. B. et al. Viral hepatitis update: Progress and perspectives. World J. Gastroenterol.27(26), 4018–4044 (2021). 10.3748/wjg.v27.i26.4018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Martin, N. A. The discovery of viral hepatitis: A military perspective. J. R. Army Med. Corps149(2), 121–124 (2003). 10.1136/jramc-149-02-04 [DOI] [PubMed] [Google Scholar]
  • 4.Stanaway, J. D. et al. The global burden of viral hepatitis from 1990 to 2013: Findings from the Global Burden of Disease Study 2013. Lancet388(10049), 1081–1088 (2016). 10.1016/S0140-6736(16)30579-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Schweitzer, A., Horn, J., Mikolajczyk, R. T., Krause, G. & Ott, J. J. Estimations of worldwide prevalence of chronic hepatitis B virus infection: A systematic review of data published between 1965 and 2013. Lancet386(10003), 1546–1555 (2015). 10.1016/S0140-6736(15)61412-X [DOI] [PubMed] [Google Scholar]
  • 6.Yue, T. et al. Trends in the disease burden of HBV and HCV infection in China from 1990–2019. Int. J. Infect. Dis.122, 476–485 (2022). 10.1016/j.ijid.2022.06.017 [DOI] [PubMed] [Google Scholar]
  • 7.Shahdoust, M., Sadeghifar, M., Poorolajal, J., Javanrooh, N. & Amini, P. Predicting hepatitis B monthly incidence rates using weighted Markov chains and time series methods. J. Res. Health Sci.15(1), 28–31 (2015). [PubMed] [Google Scholar]
  • 8.Gullón, P., Varela, C., Martínez, E. V. & Gómez-Barroso, D. Association between meteorological factors and hepatitis A in Spain 2010–2014. Environ. Int.102, 230–235 (2017). 10.1016/j.envint.2017.03.008 [DOI] [PubMed] [Google Scholar]
  • 9.Jang, T. Y., Ho, C. C., Wu, C. D., Dai, C. Y. & Chen, P. C. Air pollution as a potential risk factor for hepatocellular carcinoma in Taiwanese patients after adjusting for chronic viral hepatitis. J. Chin. Med. Assoc.87(3), 287–291 (2024). [DOI] [PubMed] [Google Scholar]
  • 10.Wang, S., Wei, F., Li, H., Wang, Z. & Wei, P. Comparison of SARIMA model and Holt-Winters model in predicting the incidence of Sjögren’s syndrome. Int. J. Rheum. Dis.25(11), 1263–1269 (2022). 10.1111/1756-185X.14417 [DOI] [PubMed] [Google Scholar]
  • 11.Nath, P., Saha, P., Middya, A. I. & Roy, S. Long-term time-series pollution forecast using statistical and deep learning methods. Neural Comput. Appl.33(19), 12551–12570 (2021). 10.1007/s00521-021-05901-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zhu, X. et al. Prediction study of electric energy production in important power production base, China. Sci. Rep.12(1), 21472 (2022). 10.1038/s41598-022-25885-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Chen, Y., Hou, W. & Dong, J. Time series analyses based on the joint lagged effect analysis of pollution and meteorological factors of hemorrhagic fever with renal syndrome and the construction of prediction model. PLoS Negl. Trop. Dis.17(7), e0010806 (2023). 10.1371/journal.pntd.0010806 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Cole, J. H. et al. Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker. NeuroImage163, 115–124 (2017). 10.1016/j.neuroimage.2017.07.059 [DOI] [PubMed] [Google Scholar]
  • 15.Pisner, D. A. & Schnyer, D. M. J. M. L. Support vector machine—ScienceDirect 101–121 (Elsevier, 2020). [Google Scholar]
  • 16.Xu, J. et al. Associations of metal exposure with hyperuricemia and gout in general adults. Front. Endocrinol.13, 1052784 (2022). 10.3389/fendo.2022.1052784 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Liang, X. et al. Epidemiological serosurvey of hepatitis B in China–declining HBV prevalence due to hepatitis B vaccination. Vaccine27(47), 6550–6557 (2009). 10.1016/j.vaccine.2009.08.048 [DOI] [PubMed] [Google Scholar]
  • 18.Bai, H., Liu, H., Chen, X., Xu, C. & Dou, X. Influence of age and HBeAg status on the correlation between HBV DNA and hepatic inflammation and fibrosis in chronic hepatitis B patients. Digest. Dis. Sci.58(5), 1355–1362 (2013). 10.1007/s10620-012-2479-7 [DOI] [PubMed] [Google Scholar]
  • 19.Zhou, Y. et al. Trend of the tuberculous pleurisy notification rate in Eastern China During 2017–2021: Spatiotemporal analysis. JMIR Public Health Surveill.9, e49859 (2023). 10.2196/49859 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Jang, T. Y. et al. Air pollution associate with advanced hepatic fibrosis among patients with chronic liver disease. Kaohsiung J. Med. Sci.40(3), 304–314 (2024). 10.1002/kjm2.12781 [DOI] [PubMed] [Google Scholar]
  • 21.Zheng, Z. et al. Exposure to fine airborne particulate matters induces hepatic fibrosis in murine models. J. Hepatol.63(6), 1397–1404 (2015). 10.1016/j.jhep.2015.07.020 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The data that support the findings of this study are available on request from the National Public Health Data Centre of China (https://www.phsciencedata.cn/) and the National Oceanic and Atmospheric Administration (NOAA) (https://www.noaa.gov/).


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES