Abstract
Background
The tuberculosis (TB) burden differs significantly across various regions of China, and these differences influence the effort focused on eradicating TB nationwide. The main factors influencing variations in TB incidence rates between different regions remain unclear. Therefore, the aim of this study was to analyze the factors influencing TB rates in different economic regions of China as well as determine the actual TB incidence rates during the COVID-19 pandemic and to project 2025 rates.
Methods
This study was based on the surveillance data of TB incidence from the Chinese Center for Disease Control and Prevention. Joinpoint regression analysis was employed to analyze the temporal trends of the TB incidence rate, and a generalized additive model was used to analyze the influencing factors and their differences in distribution in China and different economic zones. The machine learning models were used to determine the actual incidence of TB in China during the COVID-19 pandemic and forecast the incidence rate up to 2025.
Results
From 2004 to 2020, the incidence rate of TB increased in all areas, except for Xizang. Other provinces in China showed a downward trend, and the inflection point of the decline appeared near 2008. Western China had a notably higher incidence rate than other regions. The number of medical and health institutions, the number of health personnel, and gross domestic product per capita were negatively correlated with the incidence rate, especially in the western region. The seasonal autoregressive integrated moving average model achieved the optimal fit. Through this model, the following predictions were made: the incidence of TB in central, western, northeastern, and eastern China will be 52.460/100,000, 81.438/100,000, 59.152/100,000, and 52.401/100,000, respectively, with all incidence rates higher than the TB incidence rates reported during COVID-19 pandemic in 2020.
Conclusion
Except in the eastern region, China is unlikely to achieve its 2025 goals. Regional economic disparities coupled with strained medical resources during the COVID-19 crisis have hindered TB control efforts. To address this issue, it is recommended that the central and western regions prioritize optimizing health resource allocation and strengthening the management of patients with TB.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12889-025-24575-2.
Keywords: Tuberculosis, China, Incidence, Factors, Prediction
Introduction
Tuberculosis (TB), caused by Mycobacterium TB (Mtb), is one of the oldest diseases known to affect humans and a significant cause of death worldwide [1]. The latest Global Tuberculosis Report 2024 from the World Health Organization (WHO) showed that TB, as the world’s fisrt leading infectious cause of death, resulted in 1.25 million deaths worldwide [2]. Moreover, TB is responsible for almost twice as many deaths as human immunodeficiency virus/acquired immune deficiency syndrome globally.
China has made significant efforts to reduce the number of TB cases and deaths in the past three decades. The Health Administrative Department has issued policies such as the National Tuberculosis Prevention and Control Plan 1991–2000, the National Tuberculosis Prevention and Control Plan 2001–2010, the National Tuberculosis Prevention and Control Plan 2011–2015, the National Tuberculosis Prevention and Control Plan during the 13th Five Year Plan Period (2016–2020), and the Action Plan to Stop Tuberculosis (2019–2022). China has also implemented various prevention and control measures for TB and made significant progress in prevention and control. Consequently, the reported incidence rate decreased from 70.6/100,000 in 2012 to 59.3/100,000 in 2018, and the treatment success rate has remained above 90% [3, 4]. Moreover, the TB epidemic continues to worsen in China. Despite this, China remains one of 30 countries with a high TB burden, ranking third in the world. There are approximately 800,000 newly reported cases of TB every year, ranking second among Class A and B infectious diseases in China [3]. The current situation in some regions remains severe, with school-based TB outbreaks occurring occasionally. The issue of drug resistance is becoming increasingly prominent, and patients bear a heavy burden in terms of medical expenses, making it a key focus of TB control in China now and into the future.
In China, the incidence of TB tends to be low in the eastern region and high in the western region, with a relatively lower incidence in developed regions and a relatively higher incidence in underdeveloped regions [5]. Determining key regions and factors influencing the prevention and control of TB is crucial. A significant amount of research has been conducted on the spatiotemporal distribution and factors influencing TB, supplemented by findings that age and sex have certain impacts on the incidence and prevalence of TB, while malnutrition, excessive workload, meteorological factors, social and economic conditions, and population mobility are also factors that may trigger TB [6, 7]. However, the main influencing factors and variations in influencing factors between different regions remain unclear.
In December 2019, the global Corona Virus Disease 2019 (COVID-19) outbreak had a significant impact on the prevention and control of TB. Consequently, the monitoring and reporting of TB incidence may not reflect the true incidence of TB in China. For this reason, this study is based on the surveillance data of TB incidence at the Chinese Center for Disease Control and Prevention (CCDC) from 2004 to 2020. Social development, meteorology, and other related data were analyzed as potential influencing factors, and differences between economic zones in China were also investigated. The aim of this research was to build machine learning prediction models based on long-term data, to determine the actual incidence rate of TB in China during the COVID-19 pandemic, evaluate the factors influencing TB in China, and provide up-to-date data to predict the future incidence of TB to help the health sector to develop targeted public health strategies.
Materials and methods
Data source
For the current study, we obtained data on TB from the Data Center of China Public Health Science (http://www.phsciencedata.cn/Share/edtShareNew.jsp?id=39208), which was established by and is operated by the CCDC. The database collects all TB data reported since the implementation of direct reporting to the infectious diseases network in 2004. It includes the number of cases, incidence, death and mortality rates, and other statistical summary data according to region, age, sex, occupation, disease type, and related original case data from 31 mainland provinces/cities (excluding Taiwan province, Hong Kong, and Macao). This database has provided data for many relevant studies published on TB [5, 8]. From this database, we extracted the monthly incidence rates of tuberculosis in 31 provinces of mainland China from 2004 to 2020. In this study, we also analyzed the social environment, economy, population, healthcare, meteorology, and other related data associated with the incidence rate. Meteorological factors were obtained from the China Meteorological Administration (https://data.cma.cn/), and social development factors were obtained from the National Bureau of Statistics (https://www.stats.gov.cn/sj/ndsj/), specifically from the China Statistical Yearbook section (https://www.stats.gov.cn/sj/ndsj/2021/indexch.htm). Based on the National Bureau of Statistics of China [9], we divided the 31 provinces and cities in China into four regions based on economic level to investigate the prevalence of TB in different economic regions (Figure S1).
Temporal trend analysis
We employed the Joinpoint Regression Program developed by the National Cancer Institute to conduct the Joinpoint regression analysis [10]. The specific steps are as follows:
Data preparation
We collected the incidence data of tuberculosis in mainland China from 2004 to 2020 and ensured the accuracy and completeness of the data. The data were arranged in chronological order, with the year serving as the time variable.
Model fitting
In the Joinpoint software, the maximum allowable number of Joinpoints was set. Based on the maximum likelihood estimation method, the software searched for different combinations of Joinpoint positions to identify the Joinpoint locations that yielded the best model fit. During this process, the software automatically compared the fitting effects of models with different numbers and positions of Joinpoints and selected the model with the minimum Akaike Information Criterion value as the optimal model.
Parameter estimation
For each divided sub - interval, the software estimated the parameters of the corresponding linear regression equation, including the slope and intercept. The slope represents the annual change rate of the disease incidence within that sub - interval.
Significance testing
Significance tests were performed on the estimated Joinpoints to determine whether these points represented real changes in the data trend. A commonly used testing method is the permutation test. By randomly rearranging the data multiple times, the distribution under the null hypothesis was constructed, and the p - value of each Joinpoint was calculated. If the p - value was less than the pre - set significance level (usually 0.05), the Joinpoint was considered significant.
The average annual percentage change (AAPC) was adopted to analyze the incidence trends, and a positive AAPC estimation along with the lower boundary of its 95% confidence interval (CI) greater than zero indicated an upward trend; conversely, if the AAPC estimation and the upper boundary of its 95% CI were both below zero, a downward trend was observed; if neither of these conditions were met, the trend was considered stable [11]. The AAPC is obtained by calculating the weighted average of the APCs across multiple intervals, where the weights are the proportions of the number of years in each interval to the total number of years. The formula for calculating the AAPC is:
![]() |
In the formula:
represents the number of intervals,
is the annual percent change of the
-th interval,
is the number of years within the
-th interval,
is the total number of years, and
is the regression coefficient.
In this study, the Joinpoint modeling method was “Grid Search”. The parameters related to join points were selected as follows: Min. number of observations from a joinpoint to either end of the data (excluding first or last): 2; Min. number of observations between two joinpoints (excluding any joinpoint if it falls): 2; Number of points to place between adjacent observed x values in the grid search: 0. The minimum number of join points was set to 0 and the maximum was set to 5. The model estimation method was Permutation Test, with Overall significance at 0.05 and Number of permutations at 4499.
Analysis of influencing factors
Correlation analysis of TB incidence rate and influencing factors
In this study, Spearman correlation analysis was used to examine the association between relevant factors and TB incidence rate. These factors included meteorological factors such as total precipitation (TP), total sunshine hours (TSH), average relative humidity (ARH), average wind velocity (AWV), minimum temperature (MINT), maximum temperature (MAXT), average temperature (AVET), and social development factors such as dependent rate (DR), birth rate, death rate, natural increase rate (NIR), number of medical and health institutions (NMHI), number of health personnel (NHP), number of beds in medical and health institutions (NBMHI), gross domestic product per capita (GDP), and urbanization rate (UR).
Analysis of the generalized additive model (GAM)
The GAM is a semiparametric regression model based on generalized linear and additive models, with a greater variety of choices for assumptions about the distribution of the response variable compared with multiple linear regression [12]. A model with a quasi-Poisson distribution family (link function identity) was used to assess the effect of influential factors on TB incidence rate. The model formula is given by [13]:
![]() |
In this formula, Xi represents the explanatory variables, Y is the response variable, µ is the expected value of the response variable, g(
) is the linking function, si is the smoothing function, α is the intercept, and k = df−1. In this study, Y represents the incidence of TB, X represents various factors, and k = 4, df = 3.
We used F to evaluate the contribution of various factors to TB incidence and effective degrees of freedom (df) to determine whether the factors were linearly related to the incidence rate [12, 14]. The larger the F value, the greater the impact of the factor on the TB incidence rate. The closer the effective df is to 1, the stronger the linear relationship between the factors and incidence rate. When the effective df is not 1, this indicates a non-linear relationship between the two factors. The larger the effective df, the more significant the non-linear relationship.
Prediction model theory
In this study, the Exponential Smoothing Model (ESM), Long Short-Term Memory (LSTM) Model, and Seasonal Autoregressive Integrated Moving Average model (SARIMA) were selected to fit the incidence rate. The fitting performance of the models was evaluated based on the Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) metrics. The model with the best fitting performance was chosen to predict the tuberculosis incidence rate in mainland China from 2020 to 2025. The construction steps of the three models are as follows:
The ESM theory
Exponential Smoothing is a commonly used time-series forecasting method. Its core principle is to perform a weighted average of past observations, with the weights decaying exponentially over time. That is, recent data has larger weights, while older data has relatively smaller weights [15, 16].
Basic Idea: The Exponential Smoothing prediction model is based on the concept that the future values of a time series are more closely related to recent actual observations. The closer the data is to the forecast period, the greater its impact on the forecast value. Therefore, when calculating the forecast value, it assigns higher weights to recent data and relatively lower weights to older data. The calculation formulas are as follows:
Single exponential smoothing
This is applicable to stationary time series without obvious trends and seasonality. The formula is
, where
is the smoothed value at time
,
is the actual observed value at time
,
is the smoothing constant (0 <
< 1), and
is the smoothed value at time
. It can be seen that the smoothed value at time
is a weighted average of the actual observed value at time
and the smoothed value at time
.
Double exponential smoothing
It is used for time series with a linear trend. It is based on single exponential smoothing. First, calculate the single exponential smoothing value
, and then calculate the double exponential smoothing value
. The prediction model is
, where
,
, and
is the number of forecast lead periods.
Triple exponential smoothing
This is suitable for time series with non - linear trends or seasonality. Based on double exponential smoothing, calculate the triple exponential smoothing value
. The prediction model is more complex and involves more parameters to describe different characteristics of the time series.
The LSTM model theoy
Long Short - Term Memory (LSTM) is a special type of Recurrent Neural Network (RNN) designed specifically to address the problems of gradient vanishing or gradient explosion that traditional RNNs encounter when dealing with long - sequence data. It can effectively capture long - term dependencies in sequences [17]. Its principle is based on the following key components and mechanisms:
Cell state
Core Structure: LSTM introduces the key concept of the cell state, which is similar to a conveyor belt running through the entire LSTM chain. Information can be transferred relatively easily on it. The cell state can store information for a long time during the sequence processing, just like a memory, enabling LSTM to handle long - distance dependencies.
Information Flow: The update and transfer of the cell state are rather special. Unlike traditional RNNs that simply update the hidden state at each time step, LSTM controls the addition, deletion, and retention of information through specific gating mechanisms, thus achieving effective management of long - term information.
Gating mechanisms
Function: It determines which new information will be added to the cell state.
Calculation Process: At each time step
, the input gate receives the current input
and the previous hidden state
. Through a sigmoid function, it outputs a vector
between 0 and 1. Each element in this vector indicates whether the corresponding input feature is allowed to enter the cell state at the current time step. At the same time, a tanh function generates a candidate value vector
containing new information that may be added to the cell state. Finally, the new information is selectively added to the cell state through
represents element - wise multiplication).
Forget gate
Function: It decides which information to discard from the cell state.
Calculation Process: The forget gate also receives the current input
and the previous hidden state
. Through the sigmoid function, it outputs a forget - gate vector
with element values also between 0 and 1. This vector is multiplied element - by - element with the cell state
. A value close to 0 means the information at the corresponding position will be forgotten, and a value close to 1 means the information will be retained, i.e.,
.
Output gate
Function: It determines which parts of the cell state will be output as the output of the current time step.
Calculation Process: The output gate receives the current input
and the previous hidden state
. Through the sigmoid function, it generates an output - gate vector
. At the same time, the cell state
is transformed by the tanh function to get
. Finally, the output
is obtained as the hidden state output of the current time step. This output can be used as the input for the next time step or as the final prediction output of the model, depending on the specific application scenario.
The SARIMA model theoy
Box and Jenkins proposed the SARIMA model, which is based on the autoregressive integrated moving average (ARIMA) model and incorporates seasonal components to predict changes in future time series data [18]. The basic form of the SARIMA model is SARIMA(p,d,q)(P,D,Q)s, where d and D represent differences and seasonal differences, respectively; p and P represent the AR and SAR parts, respectively, which are the dependencies between the current value and past values; q and Q represent MA and SMA, respectively, representing the dependency between current values and previous error, respectively; s is the seasonality periodicity [19]. The model formula is given by [18]:
![]() |
In this formula, B is the shift operator, ∇ = 1-B; Yt represents the incidence rate of TB at time t;
is the estimated residual;
and
are polynomials for the p, P-order AR and SAR coefficients, respectively; and
,
are polynomials for the q, Q-order MA and SMA coefficients, respectively.
SARIMA model process:
Augmented dickey (ADF) test
According to the ADF results, we determined whether the time series was stable (P < 0.05 indicated the series was stationary). If the original series was not stationary, we performed differencing or seasonal differencing until it transformed into a stationary time series and we determined the values of d and D.
Model identification and parameter estimation
We determined the order of model parameters p, P, q, and Q based on the autocorrelation (ACF) and partial autocorrelation (PACF) graphs and the actual situation of the model.
Model testing and optimization
Based on the parameter values determined in the previous steps, the model was constructed, and the Ljung (LB) test was used to diagnose the model. If the P-value of the LB test was greater than 0.05, the information provided by the data was considered completely extracted [20].
Statistical analysis
For temporal trend analysis, we used Joinpoint regression software (version 4.9.1.0; Statistical Methodology and Applications Branch, Surveillance Research Program, National Cancer Institute, Bethesda, MD, USA). The “mgcv” package was used for analyzing factors. The “SimpleExpSmoothing”, “LSTM”, and “SARIMAX” package was used for predicting the incidence of TB in Python software (version 3.11.4; Python Software Foundation, Wilmington, Delaware, USA). All visualization tasks were performed using R software (version 4.2.1). A P-value less than 0.05 indicated a statistically significant difference. The specific formulae of MAPE, RMSE, and MAE are provided below [20].
![]() |
![]() |
![]() |
In these formulae, n represents the sample size, Yi represents the monthly actual incidence rate of TB for each month, and Pi represents the predicted incidence rate.
Results
Regional and temporal distribution of TB in China
A comprehensive analysis of the incidence rates was conducted across all 31 provinces and cities in China. The number of TB cases was 670,538, and the rate was 47.76 per 100,000 in 2020 (Table 1). Figure 1 A shows that the incidence rate of TB decreased in China between 2004 and 2020 (AAPC = −2.81%, P = 0.002), and the most noticeable decline was observed in the northeastern region among the four economic zones (AAPC = −4.33%, P < 0.001). In the western region, the incidence rate was higher than that in the central region, followed by the northeastern region and the eastern region (Fig. 1B; Table 1). At the provincial level, the incidence rates were high in Xinjiang, Tibet, Guizhou, and Qinghai, whereas they were relatively lower in Tianjin, Shanghai, Shandong, and Beijing (Fig. 1 C and Figure S2).
Table 1.
The distribution of tuberculosis in 4 regions and 31 provinces/cities OfChina in 2004 and 2020
| Region | 2004 | 2020 | ||
|---|---|---|---|---|
| Number | Rate(/100,000) | Number | Rate(/100,000) | |
| China | 970,279 | 74.64 | 670,538 | 47.76 |
| Easten | 275,968 | 62.23 | 194,776 | 35.96 |
| Guangdong | 60,268 | 75.77 | 58,065 | 50.40 |
| Hebei | 37,923 | 55.78 | 24,997 | 32.93 |
| Shandong | 30,643 | 33.42 | 24,917 | 24.74 |
| Zhejiang | 42,927 | 91.34 | 24,521 | 41.92 |
| Jiangsu | 48,235 | 63.73 | 22,922 | 28.40 |
| Fujian | 29,086 | 80.89 | 16,217 | 40.82 |
| Hainan | 10,032 | 123.21 | 7,875 | 83.38 |
| Beijing | 7,164 | 48.36 | 6,150 | 28.56 |
| Shanghai | 6,784 | 50.70 | 5,908 | 24.33 |
| Tianjin | 2,906 | 31.27 | 3,204 | 20.51 |
| Central | 259,715 | 71.31 | 189,468 | 50.87 |
| Hunan | 42,883 | 64.20 | 52,539 | 75.94 |
| Henan | 66,887 | 69.01 | 41,712 | 43.27 |
| Hubei | 59,744 | 99.44 | 31,329 | 52.86 |
| Anhui | 34,414 | 53.41 | 26,656 | 41.87 |
| Jiangxi | 37,515 | 87.58 | 26,141 | 56.02 |
| Shanxi | 18,272 | 55.13 | 11,091 | 29.74 |
| Northeastern | 84,396 | 79.77 | 42,960 | 39.80 |
| Liaoning | 22,965 | 55.16 | 19,098 | 43.89 |
| Heilongjiang | 38,612 | 102.93 | 15,316 | 40.83 |
| Jilin | 22,819 | 85.62 | 8,546 | 31.76 |
| Western | 350,200 | 95.22 | 243,334 | 63.73 |
| Sichuan | 74,103 | 85.04 | 46,218 | 55.19 |
| Guizhou | 46,728 | 122.23 | 34,976 | 96.54 |
| Guangxi | 58,087 | 119.62 | 34,913 | 70.39 |
| Yunnan | 23,369 | 53.67 | 29,182 | 60.07 |
| Xinjiang | 26,421 | 142.02 | 27,728 | 109.89 |
| Chongqing | 39,906 | 127.49 | 20,836 | 66.69 |
| Shaanxi | 33,459 | 91.47 | 18,319 | 47.26 |
| Inner Mongolia | 18,521 | 77.68 | 10,008 | 39.41 |
| Gansu | 19,933 | 76.26 | 8,119 | 30.67 |
| Qinghai | 3,873 | 71.78 | 5,705 | 93.85 |
| Xizang | 1,941 | 74.32 | 5,264 | 150.13 |
| Ningxia | 3,859 | 66.08 | 2,066 | 29.74 |
Fig. 1.
Analysis of trends in tuberculosis incidence rates among various economic zones (A and B) and provinces/cities (C) in China from 2004 to 2020
All regions except for Xizang and Qinghai showed a downward trend in TB incidence. In particular, the incidence of TB increased from 74.32 cases per 100,000 in 2004 to 150.13 cases per 100,000 in 2020 (AAPC = 5.15%, P < 0.001) in Xizang and increased from 71.78 cases per 100,000 in 2004 to 93.85 cases per 100,000 in 2020 (AAPC = 2.28%, P < 0.001) in Qinghai. Another observation from our study was that most regions (such as Beijing, Guizhou, Hebei, and Shanxi) showed a significant inflection point around 2008 (from 2007 to 2009), indicating an increase in incidence before this point and a decrease afterwards, which was noted for most regions investigated in this study (Table 2).
Table 2.
The trends of tuberculosis incidence from 2004 to 2020 in 31 provinces/regions
| Region | AAPC(%) | 95%CI (%) | P | |
|---|---|---|---|---|
| Lower | Upper | |||
| China | −2.81 | −4.58 | −1.01 | 0.002 |
| Estern | −3.47 | −4.79 | −2.14 | < 0.001 |
| Central | −2.42 | −3.71 | −1.12 | < 0.001 |
| Northeastern | −4.33 | −6.14 | −2.48 | < 0.001 |
| Western | −2.43 | −4.63> | −0.17 | 0.035 |
| Anhui | −3.55 | −5.07 | −2.00 | < 0.001 |
| Beijing | −3.31 | −5.16 | −1.41 | 0.001 |
| Fujian | −4.61 | −7.79 | −1.32 | 0.006 |
| Gansu | −4.87 | −7.10 | −2.59 | < 0.001 |
| Guangdong | −3.00 | −6.43 | 0.56 | 0.097 |
| Guangxi | −3.53 | −4.31 | −2.75 | < 0.001 |
| Guizhou | −1.90 | −5.49 | 1.83 | 0.314 |
| Hainan | −2.65 | −5.38 | 0.17 | 0.065 |
| Hebei | −2.99 | −3.82 | −2.16 | < 0.001 |
| Henan | −4.30 | −5.52 | −3.07 | < 0.001 |
| Heilongjiang | −5.84 | −7.52 | −4.12 | < 0.001 |
| Hubei | −4.19 | −4.93 | −3.46 | < 0.001 |
| Hunan | −1.05 | −2.27 | 0.19 | 0.091 |
| Jilin | −5.47 | −7.20 | −3.71 | < 0.001 |
| Jiangsu | −5.79 | −6.32 | −5.26 | < 0.001 |
| Jiangxi | −3.28 | −4.32 | −2.22 | < 0.001 |
| Liaoning | −1.07 | −4.19 | 2.16 | 0.513 |
| Inner Mongolia | −5.24 | −6.38 | −4.08 | < 0.001 |
| Ningxia | −4.81 | −5.64 | −3.97 | < 0.001 |
| Qinghai | 3.28 | 1.91 | 4.67 | < 0.001 |
| Shandong | −3.20 | −4.33 | −2.06 | < 0.001 |
| Shanxi | −4.85 | −7.14 | −2.51 | < 0.001 |
| Shaanxi | −4.18 | −4.91 | −3.43 | < 0.001 |
| Shanghai | −3.21 | −4.12 | −2.29 | < 0.001 |
| Sichuan | −3.88 | −4.60 | −3.16 | < 0.001 |
| Tianjing | −3.26 | −5.34 | −1.13 | 0.003 |
| Xizang | 5.15 | 3.61 | 6.71 | < 0.001 |
| Xinjiang | −3.21 | −10.32 | 4.47 | 0.403 |
| Yunnan | −0.37 | −1.14 | 0.40 | 0.32 |
| Zhejiang | −4.74 | −5.78 | −3.69 | < 0.001 |
| Chongqing | −4.63 | −6.17 | −3.06 | < 0.001 |
AAPC average annual percent change, CI confidence interval
Assessment of the impact of influencing factors on TB incidence rate
Owing to the significant impact of the COVID-19 pandemic on TB notification cases in 2020, this study focused on analyzing notification cases and influencing factors from 2004 to 2019. Spearman correlation analysis demonstrated that DR(r = 0.148, P < 0.001), birth rate(r = 0.458, P < 0.001), death rate(r = 0.090, P < 0.001), NIR(r = 0.358, P < 0.001), NMHI(r=−0.198, P < 0.001), NHP(r= −0.338, P < 0.001), NBMHI(r= −0.079, P < 0.001), GDP(r=−0.570, P < 0.001), and UR(r=−0.403, P < 0.001) were significantly correlated with TB incidence rate (Fig. 2). When the number of lag months was 0 and 1, only AWV among meteorological factors was significantly correlated with the incidence rate (Fig. 2 A and B). However, the TSH and ARH levels had the strongest correlation with the incidence rate at 2 months (Fig. 2 C). In contrast, AWV(r=−0.340, P < 0.001), MINT(r=−0.200, P < 0.001), MAXT(r=−0.240, P < 0.001), and AVET(r=−230, P < 0.001) showed the strongest correlation with the incidence rate at 3 months (Fig. 2D).
Fig. 2.
Spearman correlation analysis between influencing factors and tuberculosis incidence rates, with lags of 0 months (A), 1 month (B), 2 months (C), and 3 months (D) Notes: TP: total precipitation; TSH: total sunshine hours; ARH: average relative humidity; AWV: average wind velocity; MINT: minimum temperature; MAXT: maximum temperature; AVET: average temperature; DR: dependent rate; NIR: natural increase rate; NMHI: number of medical and health institutions; NHP: number of health personnel; NBMHI: number of beds in medical and health institutions; GDP: gross domestic product per capita; UR: urbanization rate
Based on the Spearman correlation analysis results, we included TSH (lag = 2 months), ARH (lag = 2 months), AWV (lag = 3 months), MINT (lag = 3 months), MAXT (lag = 3 months), AVET (lag = 3 months), and all the social development factors in the GAMs for further analysis. The model results showed that all indicators had a P-value less than 0.001, indicating a significant impact on the incidence rate. The effective df for all cubic spline functions was greater than 1, indicating significant nonlinear relationships between each indicator and incidence rate. Figure 3 and Tables S1–S6 demonstrate the relationship between meteorological factors and TB incidence. For TSH less than 182 h, the incidence rate decreased with increasing TSH, whereas the opposite trend was observed for TSH exceeding 182 h. ARH and AWV showed nonlinear positive and negative associations with the incidence rate, respectively. The effects of MINT and MAXT on the incidence rate showed opposite patterns, with MINT being negatively associated with the incidence rate before 13.42 °C and showing a positive trend afterwards, while the MAXT started to show a slight negative correlation with the incidence rate after 20.30 °C, with a previously positive trend. AVET showed a nonlinear negative association with the incidence rate. As shown in Fig. 4 and Tables S7–S14, the DR and birth rate were positively associated with the TB incidence rate, while other factors (death rate, NIR, NMHI, NHP, NBMHI, GDP, and UR) were negatively associated with incidence rate.
Fig. 3.
Relationship between meteorological factors and the incidence rate of tuberculosis according to the generalized additive model
Fig. 4.
Relationship between social development factors and the incidence rate of tuberculosis according to the generalized additive model
Based on the F value, the NBMHI(F = 706.379), death rate(F = 448.845), and UR(F = 329.863) in social development factors, as well as the AWV(F = 221.910), TSH(F = 105.880), and MAXT(F = 53.600) in meteorological factors were the main contributors to the incidence rate of TB in China. At the regional level, health resources (such as NMHI, NHP, and NBMHI) made a significant contribution to various regions, while GDP had a considerable impact on the western regions. From the perspective of meteorological factors, MINT and TSH significantly contributed to the incidence rate of TB in the four regions. The specific contributions of these factors are shown in Fig. 5.
Fig. 5.
Contribution of various factors to the incidence rate of tuberculosis in China (A), the eastern (B), central (C), northeastern (D), and western (E) regions
The prediction results
Among the three models, SARIMA exhibited the best performance. The specific fitting results of the model are presented in the Table 15. Therefore, in this study, SARIMA was employed to predict the incidence rate of tuberculosis. Figure S3 shows the time series and trends, seasonal decomposition, and residual decomposition results of the incidence rates of pulmonary TB in China and the four other regions from January 2004 to December 2019. After the original time-series data stabilized, the P-value of the ADF test was less than 0.05, indicating that the series had become stationary. Subsequently, the optimal model combinations were determined based on the ACF and PACF plots (Figure S4). The optimal parameters provided by the model were SARIMA(3,0,4)(2,1,1)12, SARIMA(4,1,0)(2,1,1)12, SARIMA(3,0,0)(2,1,1)12, SARIMA(2,0,0)(2,1,2)12, and SARIMA(2,0,0)(2,0,1)12. The Akaike information criterion values at this point were 359.46, 421.33, 332.31, and 545.41, while the Bayesian information criterion values were 397.78, 234.91, 446.87, 357.85, and 568.21. The P-values from the Ljung–Box test for residuals were greater than 0.05, indicating that the information provided by the data was completely extracted.
Figure 6 and Table S16 show that the changes in the incidence rates between 2004 and 2019 for China and the four regions were consistent with the trends of the fitted values, indicating a good fit of the model (Table S17 shows the values of MAPE, RMSE, and MAE; Figure S5 demonstrates that the residuals followed a normal distribution). Therefore, the model can be used for extrapolation and prediction. Based on the prediction results, except for the eastern region, there was a significant difference between the predicted and actual values for China and the rest of the regions in 2020.
Fig. 6.
Results of predicted incidence rate of tuberculosis in China (A), the eastern (B), central (C), northeastern (D), and western (E) regions
By 2025, the incidence of TB in China is projected to be 52.460/100,000, with the lowest rate in the eastern region at 30.781/100,000 and the highest in the western region at 81.438/100,000. The incidence of TB in the central and northeastern regions are projected to be 59.152/100,000 and 52.401/100,000, respectively (Fig. 6; Table 3).
Table 3.
The predicted value of the SARIMA model
| Date | China(95%CI) | Eastern(95%CI) | Central(95%CI) | Northeastern(95%CI) | Western(95%CI) | |
|---|---|---|---|---|---|---|
| 2020 | 01 | 5.203(3.991, 6.415) | 3.605(2.792, 4.419) | 5.408(3.950, 6.865) | 5.276(4.158, 6.394) | 7.331(5.499, 9.162) |
| 02 | 4.565(3.284, 5.846) | 3.069(2.216, 3.921) | 5.005(3.376, 6.635) | 4.813(3.625, 6.001) | 6.247(4.270, 8.223) | |
| 03 | 5.178(3.854, 6.503) | 3.722(2.836, 4.608) | 5.606(3.860, 7.352) | 5.079(3.817, 6.341) | 7.037(4.937, 9.137) | |
| 04 | 4.922(3.534, 6.311) | 3.584(2.629, 4.540) | 5.286(3.418, 7.154) | 4.801(3.514, 6.088) | 6.813(4.664, 8.962) | |
| 05 | 5.014(3.605, 6.424) | 3.664(2.665, 4.663) | 5.267(3.318, 7.215) | 4.877(3.574, 6.179) | 7.116(4.939, 9.292) | |
| 06 | 4.727(3.299, 6.155) | 3.415(2.355, 4.476) | 4.890(2.881, 6.900) | 4.617(3.307, 5.927) | 7.011(4.821, 9.201) | |
| 07 | 4.607(3.161, 6.053) | 3.414(2.305, 4.522) | 4.813(2.754, 6.871) | 4.380(3.066, 5.693) | 6.837(4.640, 9.033) | |
| 08 | 4.538(3.076, 6.001) | 3.356(2.204, 4.508) | 4.842(2.746, 6.938) | 4.140(2.825, 5.456) | 6.603(4.403, 8.804) | |
| 09 | 4.170(2.693, 5.648) | 3.078(1.880, 4.277) | 4.519(2.394, 6.644) | 3.849(2.533, 5.166) | 6.216(4.014, 8.418) | |
| 10 | 4.156(2.663, 5.648) | 2.950(1.709, 4.191) | 4.611(2.463, 6.760) | 4.006(2.689, 5.323) | 6.224(4.021, 8.427) | |
| 11 | 3.872(2.366, 5.378) | 2.728(1.445, 4.011) | 4.290(2.124, 6.457) | 3.739(2.422, 5.056) | 5.814(3.610, 8.017) | |
| 12 | 3.706(2.187, 5.225) | 2.568(1.244, 3.891) | 3.997(1.816, 6.177) | 3.299(1.982, 4.616) | 5.747(3.543, 7.951) | |
| 2021 | 01 | 5.303(3.722, 6.885) | 3.559(2.113, 5.006) | 5.669(3.436, 7.902) | 5.477(4.133, 6.820) | 7.843(5.605, 10.082) |
| 02 | 4.491(2.886, 6.095) | 2.900(1.398, 4.403) | 5.178(2.922, 7.435) | 4.949(3.602, 6.296) | 6.687(4.442, 8.932) | |
| 03 | 5.138(3.514, 6.763) | 3.583(2.028, 5.139) | 5.772(3.497, 8.046) | 5.063(3.712, 6.414) | 7.290(5.040, 9.540) | |
| 04 | 4.908(3.262, 6.553) | 3.463(1.847, 5.079) | 5.437(3.146, 7.728) | 4.765(3.413, 6.118) | 7.037(4.784, 9.289) | |
| 05 | 4.980(3.320, 6.641) | 3.515(1.846, 5.184) | 5.336(3.032, 7.640) | 4.956(3.603, 6.309) | 7.307(5.054, 9.561) | |
| 06 | 4.697(3.023, 6.372) | 3.244(1.518, 4.969) | 4.935(2.622, 7.248) | 4.451(3.097, 5.805) | 7.182(4.928, 9.436) | |
| 07 | 4.610(2.922, 6.298) | 3.249(1.472, 5.027) | 4.891(2.569, 7.212) | 4.288(2.934, 5.642) | 6.730(4.476, 8.985) | |
| 08 | 4.481(2.780, 6.181) | 3.176(1.348, 5.004) | 4.788(2.461, 7.116) | 4.141(2.787, 5.495) | 6.463(4.208, 8.718) | |
| 09 | 4.159(2.447, 5.871) | 2.911(1.033, 4.788) | 4.546(2.214, 6.879) | 3.779(2.425, 5.133) | 6.129(3.874, 8.383) | |
| 10 | 4.129(2.406, 5.852) | 2.784(0.858, 4.710) | 4.587(2.251, 6.923) | 4.025(2.671, 5.379) | 6.073(3.818, 8.328) | |
| 11 | 3.813(2.080, 5.547) | 2.555(0.582, 4.528) | 4.222(1.883, 6.561) | 3.894(2.540, 5.247) | 5.657(3.402, 7.912) | |
| 12 | 3.628(1.885, 5.371) | 2.391(0.372, 4.410) | 3.960(1.619, 6.302) | 3.568(2.214, 4.921) | 5.618(3.363, 7.873) | |
| 2022 | 01 | 5.227(3.409, 7.044) | 3.412(1.279, 5.545) | 5.636(3.205, 8.068) | 5.437(4.075, 6.800) | 7.737(5.384, 10.090) |
| 02 | 4.470(2.630, 6.311) | 2.775(0.582, 4.968) | 5.170(2.711, 7.628) | 4.872(3.508, 6.236) | 6.638(4.270, 9.007) | |
| 03 | 5.085(3.226, 6.944) | 3.438(1.187, 5.688) | 5.755(3.276, 8.234) | 4.974(3.609, 6.339) | 7.256(4.873, 9.640) | |
| 04 | 4.873(2.993, 6.754) | 3.331(1.016, 5.646) | 5.415(2.915, 7.915) | 4.806(3.441, 6.172) | 7.018(4.629, 9.407) | |
| 05 | 4.941(3.047, 6.836) | 3.368(0.995, 5.741) | 5.319(2.804, 7.833) | 4.924(3.558, 6.290) | 7.284(4.892, 9.676) | |
| 06 | 4.674(2.766, 6.581) | 3.110(0.676, 5.544) | 4.928(2.402, 7.455) | 4.238(2.872, 5.605) | 7.167(4.773, 9.561) | |
| 07 | 4.511(2.591, 6.430) | 3.124(0.633, 5.616) | 4.885(2.349, 7.420) | 4.244(2.878, 5.611) | 6.792(4.397, 9.186) | |
| 08 | 4.350(2.419, 6.281) | 3.023(0.476, 5.569) | 4.780(2.237, 7.323) | 4.110(2.744, 5.477) | 6.542(4.146, 8.937) | |
| 09 | 4.074(2.132, 6.016) | 2.787(0.184, 5.389) | 4.54(1.991, 7.089) | 3.708(2.342, 5.075) | 6.208(3.813, 8.604) | |
| 10 | 4.009(2.057, 5.961) | 2.645(−0.011, 5.301) | 4.565(2.012, 7.119) | 3.928(2.561, 5.294) | 6.168(3.772, 8.563) | |
| 11 | 3.684(1.722, 5.646) | 2.416(−0.293, 5.124) | 4.201(1.643, 6.758) | 3.716(2.350, 5.082) | 5.769(3.373, 8.165) | |
| 12 | 3.537(1.566, 5.508) | 2.269(−0.491, 5.03) | 3.956(1.396, 6.517) | 3.515(2.150, 4.881) | 5.726(3.330, 8.122) | |
| 2023 | 01 | 5.143(3.037, 7.249) | 3.258(0.349, 6.167) | 5.611(2.949, 8.274) | 5.305(3.812, 6.798) | 7.759(5.292, 10.225) |
| 02 | 4.412(2.273, 6.550) | 2.641(−0.340, 5.623) | 5.148(2.455, 7.841) | 4.759(3.250, 6.268) | 6.690(4.212, 9.168) | |
| 03 | 5.034(2.870, 7.198) | 3.304(0.253, 6.355) | 5.738(3.021, 8.454) | 4.939(3.412, 6.467) | 7.281(4.792, 9.769) | |
| 04 | 4.809(2.615, 7.003) | 3.189(0.058, 6.319) | 5.403(2.662, 8.143) | 4.844(3.311, 6.378) | 7.048(4.556, 9.541) | |
| 05 | 4.886(2.675, 7.098) | 3.237(0.036, 6.439) | 5.317(2.560, 8.074) | 4.864(3.326, 6.401) | 7.305(4.810, 9.800) | |
| 06 | 4.613(2.385, 6.841) | 2.980(−0.298, 6.257) | 4.926(2.156, 7.696) | 4.229(2.690, 5.768) | 7.191(4.695, 9.687) | |
| 07 | 4.476(2.233, 6.719) | 2.989(−0.359, 6.338) | 4.876(2.095, 7.657) | 4.274(2.734, 5.814) | 6.814(4.317, 9.311) | |
| 08 | 4.347(2.09, 6.605) | 2.901(−0.516, 6.317) | 4.794(2.004, 7.584) | 4.095(2.554, 5.635) | 6.570(4.072, 9.067) | |
| 09 | 4.041(1.770, 6.312) | 2.652(−0.833, 6.138) | 4.540(1.743, 7.336) | 3.714(2.173, 5.254) | 6.249(3.751, 8.747) | |
| 10 | 3.997(1.713, 6.281) | 2.515(−1.037, 6.067) | 4.580(1.779, 7.382) | 3.871(2.331, 5.412) | 6.207(3.709, 8.705) | |
| 11 | 3.686(1.39, 5.982) | 2.288(−1.330, 5.905) | 4.223(1.417, 7.028) | 3.546(2.005, 5.087) | 5.820(3.322, 8.317) | |
| 12 | 3.530(1.222, 5.837) | 2.136(−1.546, 5.818) | 3.965(1.156, 6.774) | 3.342(1.802, 4.883) | 5.779(3.282, 8.277) | |
| 2024 | 01 | 5.134(2.734, 7.535) | 3.129(−0.695, 6.952) | 5.629(2.742, 8.516) | 5.264(3.497, 7.032) | 7.752(5.187, 10.317) |
| 02 | 4.379(1.950, 6.807) | 2.501(−1.405, 6.406) | 5.160(2.248, 8.072) | 4.748(2.953, 6.543) | 6.718(4.142, 9.294) | |
| 03 | 5.008(2.556, 7.459) | 3.168(−0.816, 7.151) | 5.749(2.818, 8.679) | 4.972(3.147, 6.798) | 7.292(4.707, 9.878) | |
| 04 | 4.785(2.308, 7.263) | 3.052(−1.019, 7.123) | 5.412(2.463, 8.362) | 4.840(3.003, 6.676) | 7.067(4.478, 9.657) | |
| 05 | 4.862(2.367, 7.356) | 3.100(−1.050, 7.251) | 5.321(2.358, 8.284) | 4.853(3.010, 6.696) | 7.316(4.724, 9.908) | |
| 06 | 4.588(2.078, 7.098) | 2.840(−1.394, 7.074) | 4.929(1.955, 7.902) | 4.342(2.496, 6.188) | 7.206(4.613, 9.799) | |
| 07 | 4.462(1.937, 6.986) | 2.849(−1.464, 7.162) | 4.881(1.899, 7.864) | 4.312(2.464, 6.160) | 6.843(4.25, 9.437) | |
| 08 | 4.329(1.791, 6.868) | 2.762(−1.628, 7.152) | 4.789(1.800, 7.779) | 4.104(2.256, 5.953) | 6.607(4.013, 9.201) | |
| 09 | 4.025(1.473, 6.576) | 2.511(−1.956, 6.978) | 4.541(1.546, 7.536) | 3.755(1.906, 5.604) | 6.296(3.702, 8.891) | |
| 10 | 3.983(1.419, 6.547) | 2.376(−2.166, 6.918) | 4.577(1.578, 7.576) | 3.898(2.049, 5.748) | 6.256(3.662, 8.851) | |
| 11 | 3.670(1.094, 6.246) | 2.148(−2.468, 6.764) | 4.216(1.214, 7.219) | 3.564(1.715, 5.413) | 5.881(3.287, 8.476) | |
| 12 | 3.508(0.921, 6.095) | 1.994(−2.695, 6.683) | 3.962(0.957, 6.967) | 3.289(1.440, 5.139) | 5.842(3.248, 8.436) | |
| 2025 | 01 | 5.113(2.442, 7.783) | 2.993(−1.836, 7.821) | 5.625(2.544, 8.707) | 5.318(3.323, 7.312) | 7.752(5.097, 10.406) |
| 02 | 4.362(1.666, 7.058) | 2.364(−2.553, 7.281) | 5.158(2.053, 8.264) | 4.805(2.792, 6.818) | 6.750(4.086, 9.414) | |
| 03 | 4.987(2.270, 7.705) | 3.029(−1.973, 8.032) | 5.747(2.624, 8.87) | 5.007(2.973, 7.040) | 7.306(4.633, 9.979) | |
| 04 | 4.769(2.027, 7.511) | 2.916(−2.179, 8.011) | 5.410(2.268, 8.552) | 4.817(2.776, 6.857) | 7.088(4.411, 9.764) | |
| 05 | 4.844(2.086, 7.602) | 2.962(−2.219, 8.143) | 5.320(2.165, 8.475) | 4.881(2.836, 6.927) | 7.329(4.650, 10.008) | |
| 06 | 4.573(1.801, 7.346) | 2.702(−2.569, 7.973) | 4.928(1.763, 8.093) | 4.401(2.354, 6.448) | 7.222(4.542, 9.902) | |
| 07 | 4.436(1.650, 7.223) | 2.712(−2.644, 8.069) | 4.881(1.707, 8.054) | 4.314(2.265, 6.362) | 6.870(4.190, 9.550) | |
| 08 | 4.298(1.498, 7.098) | 2.622(−2.818, 8.062) | 4.789(1.609, 7.970) | 4.118(2.069, 6.166) | 6.642(3.961, 9.322) | |
| 09 | 4.002(1.189, 6.814) | 2.375(−3.149, 7.898) | 4.540(1.355, 7.726) | 3.772(1.723, 5.821) | 6.340(3.660, 9.021) | |
| 10 | 3.954(1.130, 6.778) | 2.238(−3.367, 7.843) | 4.576(1.387, 7.766) | 3.942(1.893, 5.991) | 6.301(3.621, 8.982) | |
| 11 | 3.639(0.804, 6.475) | 2.010(−3.676, 7.696) | 4.216(1.023, 7.409) | 3.666(1.617, 5.715) | 5.938(3.258, 8.619) | |
| 12 | 3.483(0.638, 6.329) | 1.858(−3.908, 7.623) | 3.962(0.766, 7.158) | 3.360(1.311, 5.409) | 5.900(3.220, 8.581) | |
SARIMA Seasonal AutoRegressive Integrated Moving Average, CI confidence interval
Discussion
This study analyzed the trends, influencing factors, and future projections of TB incidence in different economic regions of mainland China. Compared to previous research [21, 22], our study offers several distinct advantages. First, the long-term trend analysis of TB incidence provides a more comprehensive understanding of the changing burden of tuberculosis in China. Second, we thoroughly investigated the impact of meteorological and socio-economic factors on TB incidence across various economic regions, offering insights with greater practical relevance. Third, the selection of an advanced machine learning model for predicting future TB incidence trends enhances the reliability and credibility of our findings.
Our results show that TB incidence rates were highest in the west where the socioeconomic status is low and lowest in the east where the socioeconomic status is higher. The WHO has highlighted that the TB incidence rate TB is closely related to economy and nutrition [2]. The lack of living conditions, health infrastructure, and medical resources in western China, the high prevalence of nutrition deficiency, and the low level of nutritional knowledge may be one of the reasons for the high TB incidence rate [23]. Moreover, we observed an increasing incidence rate in two provinces located in the western region of China: Xizang and Qinghai. The majority of cases occurred among farmers, herders, and Tibetans who make their living by raising cattle and sheep in Xizang and Qinghai [16, 24], and the highest prevalence of bovine TB was observed in Xizang, Xinjiang, and Qinghai in China [25]. However, without proper scientific animal husbandry management, Mtb can easily be transmitted from cattle and sheep to humans [26], leading to difficulties for Tibetans residing in remote and underdeveloped rural areas with poor sanitation conditions and poor health awareness. When individuals contract Mtb, it can lead to a delayed presentation of symptoms and further transmission of the bacterium Mtb within households, further compounding the problem.
Notably, the number of TB cases has been decreasing in most provinces since 2008. This decline, along with the previous upward trend, can be attributed to China’s expanded free treatment coverage. This expansion includes the free management of patients with smear-negative pulmonary TB and has been based on the implementation of a policy for providing free anti-TB treatment to existing patients with infectious pulmonary TB since 2005. In recent years, the Chinese government has been making continuous adjustments and optimizing policies related to TB, such as strengthening laboratory construction, increasing drug sensitivity testing, and adjusting medical insurance to reduce the patients’ financial burden, all of which have contributed to a reduction in TB incidence. However, for future prevention and control of TB, it is necessary to pay attention to the reasonable allocation of health resources, continue to expand health assistance in western regions, and adapt suitable measures to regions where people of minority nationalities reside. Nutritional interventions, which are associated with a significant reduction in the incidence of TB in households (39–48%), may accelerate the reduction of TB incidence and nutritional comorbidities in countries or communities [27]. Moreover, electronic medication monitors could play a significant role in supervising patients taking medication, which would have a positive impact on TB programs in high-burden and low-resource settings [28].
This study showed that overexposure, ARH, and AWV were positively related to TB incidence rate, and lower MINT, higher MAXT, and lower AVET were related to an increased risk of TB. Martineau [28] proposed that sunlight is crucial for human vitamin D synthesis, and that vitamin D plays a key role in the host immune response to TB. An increase in AWV is conducive to the diffusion of air pollutants, thus reducing the concentration of pollutants and the health risks in people prone to TB. Therefore, increased sunlight exposure is recommended in winter for high-risk TB groups as well as preventive measures aimed at reducing exposure to adverse weather conditions.
In terms of indicators related to household economic burden, DR, birth rate, and death rate, were correlated with TB incidence, indicating that the greater the household economic burden, the higher the TB incidence. This burden is particularly pronounced in underdeveloped areas where some older adults lack pension insurance, further exacerbating the family’s financial strain [29, 30]. Notably, China was identified as an aging society in the fifth national population census conducted in 2000, and the proportion of older people has continued to increase in recent years [31, 32].
Another finding in this study was that health resources (NMHI, NHP, and NBMHI) were negatively correlated with the incidence of TB. A study conducted in Pakistan highlighted that the inadequate availability of medical facilities and inconvenient transportation contributed to delayed healthcare seeking, leading to the worsening of the disease and further transmission of Mtb within households and communities in rural areas [33]. In this study, we found that some economic factors, such as NIR, GDP, and UR, were negatively correlated with TB incidence. The impact of economic factors on TB is multifaceted. From one viewpoint, regions with a low socioeconomic status could better satisfy the demand for healthcare resources, such as the vaccination of newborns with Bacillus Calmette–Guérin (BCG) [34], regular screening for TB in communities and schools, and the implementation of free medical policies for patients with TB [35, 36]. In contrast, residents in developed areas usually have higher levels of education and healthier lifestyles, possess higher levels of awareness regarding TB, and are more concerned about their own health [37].
Based on our findings, this study proposes the following recommendations. Economic factors play a crucial role and can influence various aspects. Therefore, governments and health departments should pay attention to the equitable distribution of healthcare resources, especially in economically disadvantaged areas, ensuring that the distribution of healthcare institutions meets the healthcare needs of residents. Special attention should be paid to impoverished families and those with a high DR, particularly older persons lacking adequate old-age insurance. Furthermore, the optimization of reimbursement policies for the medical expenses of patients with TB should be prioritized. Continuous efforts should be made to promote TB screening and health education in communities and schools to enhance public health awareness.
In recent years, machine learning models have been widely applied in disease early-warning and prediction. Models such as ESM, LSTM, SARIMA, and Prophet have significantly contributed to disease forecasting. Tianmu Chen et al. analyzed the temporal changes of 24 notifiable infectious diseases in China before and after the COVID-19 pandemic using multiple machine learning models, demonstrating their strong performance in time-series analysis of diseases and their value in informing epidemic trend predictions [38]. This study employed three widely used and high-performance machine learning models—ESM, LSTM, and SARIMA—for data fitting, and found that SARIMA exhibited the best fitting performance. Guo Jingzhe, et al. similarly identified ARIMA as the optimal model in tuberculosis prediction studies [21]. The advantages of this model lie in its ability to simultaneously capture trend, seasonality, and randomness in time-series data by integrating autoregressive, differencing, moving average, and seasonal differencing components, making it flexible for data with periodic fluctuations [18]. Specifically, it eliminates the need for subjective pre-specification of trend and seasonal forms, automatically identifying complex patterns through model order determination; differencing effectively removes non-stationarity to improve prediction accuracy; and it is suitable for multi-period seasonal data, with mature parameter estimation methods that ensure statistically significant predictions. These characteristics render it highly practical and effective for time-series forecasting in economics, disease epidemiology, meteorology, and other fields [20]. However, The model has several drawbacks. It relies on linearity and stationarity assumptions, making it poorly adapted to complex real-world data with nonlinearity or nonstationarity, such as complex time-series data of disease epidemics [18–20]. Its long-term forecasting capability is limited: as the prediction horizon extends, error accumulation leads to a sharp decline in accuracy, rendering it unreliable for predicting trends several years ahead. Additionally, it highly depends on the precision and completeness of seasonal factors and a large amount of high-quality historical data. Poor data quality—such as noise, outliers, or disrupted seasonality—will degrade its predictive performance [20].
The WHO has consistently mentioned the elimination of TB in its annual TB reports [2], with the target of reducing the incidence of TB by 50% between 2015 and 2025. For China, this means a goal of reducing the TB incidence rate in China from 63.416 per 100,000 people to 31.708 per 100,000 people by 2025. This study investigated whether China and its regions could achieve their targets by 2025. After removing the impact of the COVID-19 pandemic on TB, The results show that apart from the eastern region, the other three regions in China did not meet the target. In particular, the western region had the largest gap from the target and its predicted incidence rate was far higher than the national average. In recent years, China has issued multiple policies for patient referral, examination, diagnosis, treatment, and management of TB. However, there were significant differences in the timing of implementation, effectiveness of enforcement, and funding allocation across the different regions. For example, in developed southern Chinese cities, such as Shenzhen [39], the TB control program has received increased financial support for molecular testing technology, which has quickly been implemented in primary care for TB diagnosis. The proportion of patients undergoing molecular testing has increased from 36.5% in 2018 to 86.9% in 2020. The most significant increase in the number of medical institutions occurred in Hebei Province; however, the change in the number of medical institutions established each year in the western and central regions was low from 2008 to 2018. In addition, a previous survey on childhood vaccination in China’s western region found that the BCG vaccination compliance rate was very low [40]. One important factor contributing to this difference is that less-developed provinces often face financial difficulties, leading their governments to invest less in social security programs [41]. Therefore, to narrow the gap towards achieving TB control goals, it is particularly important to integrate various social resources (medical assistance, poverty alleviation programs, charitable funds) to combat TB in the western region, while strengthening cooperation and communication with the eastern region to learn from successful prevention and control experiences. It is also necessary to monitor the schedule and sustainability of policy implementation in the western region.
The following limitations exist in this study. First, the accuracy of TB incidence reporting utilized in this research may have been compromised by potential underreporting biases. Second, the absence of comprehensive data on atmospheric pollutants prior to 2015 in China restricted our ability to investigate the correlation between atmospheric pollution levels and TB incidence rates during the period spanning 2004 to 2014. Third, in our analysis of the influencing factors, it is essential to note that the conclusions drawn were based solely on observed correlations rather than on establishing causative relationships. Fourth, when using the SARIMA prediction model for research, the forecasting results can be unstable for long-term predictions; however, the model still has reference value for this study. Furthermore, the COVID-19 pandemic has had a significant impact on the reported number of tuberculosis cases.
Conclusion
In conclusion, the TB incidence rate in the western region of China, particularly in Xizang and Qinghai, consistently exceeded that of other regions from 2004 to 2020. Factors such as health resource distribution and economic conditions showed a negative correlation with TB incidence, significantly influencing the overall disease burden in the country. Projections indicate that by 2025, regions outside the eastern provinces are unlikely to meet the WHO targets. To address these challenges effectively, it is crucial for health authorities to not only allocate resources judiciously, but also ensure rigorous enforcement of health policies. Tailored interventions are essential, particularly in western regions where a significant number of minority populations reside. Strategic measures should focus on enhancing nutritional support, increasing vaccination compliance rates, and implementing comprehensive strategies to promote timely medication adherence among patients with TB.
Supplementary Information
Acknowledgements
The authors express their gratitude to the Data-centre of China Public Health Science for sharing valuable disease data.
Abbreviations
- TB
tuberculosis
- Mtb
mycobacterium tuberculosis
- WHO
world health organization
- COVID-19
corona virus disease 2019
- CCDC
Chinese center for disease control and prevention
- APC
annual percentage change
- AAPC
average annual percentage change
- CI
confidence interval
- TP
total precipitation
- TSH
total sunshine hours
- ARH
average relative humidity
- AWV
average wind velocity
- MINT
minimum temperature
- MAXT
maximum temperature
- AVET
average temperature
- DR
dependent rate
- NIR
natural increase rate
- NMHI
number of medical and health institutions
- NHP
number of health personnel
- NBMHI
number of beds in medical and health institutions
- GDP
gross domestic product per capita
- UR
urbanization rate
- GAM
generalized additive model
- DF
degrees of freedom
- SARIMA
seasonal autoregressive integrated moving average
- ESM
Exponential Smoothing Model
- LSTM
Long Short-Term Memory
- RNN
Recurrent Neural Network
- ADF
augmented Dickey-Fuller
- ACF
autocorrelation function
- PACF
partial autocorrelation function
- LB
Ljung-Box
- MAPE
mean absolute percentage error
- RMSE
root mean square error
- MAE
mean absolute error
Authors’ contributions
HL, HC and LXZ: conceptualization, formal analysis, and writing— original draft preparation. XL and LL: software and writing—review and editing. CD: methodology. CZ and XZ: validation and visualization. JB and SY: software and funding acquisition. WZ and YX: investigation, resources, and supervision. All authors contributed to the article and approved the submitted version.
Funding
This study was funded by grants from the National Natural Science Foundation of China (12031010), the Special Grant for the Prevention and Control of Infectious Diseases (2018ZX10713003). The funders had no role in study design, data collection and analysis, the decision to publish, or the preparation of the manuscript.
Data availability
Data is provided within the manuscript or supplementary information files.The datas used were publicly for this study. The website of the data is: http://www.phsciencedata.cn/.
Declarations
Ethics approval and consent to participate
Based on the sources and content of the data used in this research, it was drawn from publicly accessible databases, conducted using population-level disease surveillance data, without the concerns of conflict of interests, societal sensitivity, or personal privacy. According to the Notice of China’s Pilot Ethical Review Guidelines for Science and Technology, the research in this case is exempt from requiring ethics review. http://www.nhc.gov.cn/qjjys/s7946/202302/c3374c180dc5489d85f95df5b46afaf5.shtml; https://www.gov.cn/zhengce/zhengceku/202310/content_6908045.htm.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Hengliang Lv, Hui Chen and Xueli Zhang contributed equally to this work.
Contributor Information
Wenyi Zhang, Email: zwy0419@126.com.
Yuanyong Xu, Email: xyy_827@sina.com.
References
- 1.Hershkovitz I, Donoghue HD, Minnikin DE, May H, Lee OY, Feldman M, et al. Tuberculosis origin: the neolithic scenario. Tuberculosis. 2015;95(Suppl 1):S122-6. [DOI] [PubMed] [Google Scholar]
- 2.World Health organization. Global Tuberculosis Report 2024. Available: https://www.who.int/teams/global-tuberculosis-programme/tb-reports/global-tuberculosis-report-2024.(cited 18 Feb 2025).
- 3.Chinese Government Website. Notice on Printing and Issuing the Action Plan for Controlling Tuberculosis (2019–2022). Available: https://www.gov.cn/gongbao/content/2019/content_5437149.htm.(cited 18 Feb 2025).
- 4.Dong Z, Yao HY, Yu SC, Huang F, Liu JJ, Zhao YL, et al. Changes in notified incidence of pulmonary tuberculosis in china, 2005–2020. Biomed Environ Sci. 2023;36(2):117–26. [DOI] [PubMed] [Google Scholar]
- 5.Jiang H, Liu M, Zhang Y, Yin J, Li Z, Zhu C, et al. Changes in incidence and epidemiological characteristics of pulmonary tuberculosis in mainland China, 2005–2016. JAMA Netw Open. 2021;4(4):e215302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.GBD 2021 Tuberculosis collaborators. Global, regional, and national age-specific progress towards the 2020 milestones of the WHO end TB strategy: a systematic analysis for the global burden of disease study 2021. Lancet Infect Dis. 2024. 10.1016/S1473-3099(24)00007-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lv HL, Wang LH, Zhang XL, Dang CX, Liu F, Zhang X, et al. Further analysis of tuberculosis in eight high-burden countries based on the global burden of disease study 2021 data. Infect Dis Poverty. 2024;13(1):70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Yang GJ, Ouyang HQ, Zhao ZY, Li WH, Fall IS, Djirmay AG, et al. Discrepancies in neglected tropical diseases burden estimates in China: comparative study of real-world data and global burden of disease 2021 data (2004–2020). BMJ. 2025;388:e080969. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.National Bureau of Statistics of China. The Division of Eastern, Central, Western, and Northeastern Regions of China. 2011. Available: http://www.stats.gov.cn/ztjc/zthd/sjtjr/dejtjkfr/tjkp/201106/t20110613_71947.htm. (cited 18 Feb 2025).
- 10.Kim HJ, Fay MP, Feuer EJ, Midthune DN. Permutation tests for joinpoint regression with applications to cancer rates. Stat Med. 2000;19:335–51. [DOI] [PubMed] [Google Scholar]
- 11.Lv HL, Zhang X, Zhang XL, Bai JZ, You SM, Li X, et al. Global prevalence and burden of multidrug-resistant tuberculosis from 1990 to 2019. BMC Infect Dis. 2024;24(1):243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ravindra K, Rattan P, Mor S, Aggarwal AN. Generalized additive models: building evidence of air pollution, climate change and human health. Environ Int. 2019;132:104987. [DOI] [PubMed] [Google Scholar]
- 13.Verbeke T. Generalized Additive Models: an Introduction with R by S. N. Wood. J R Stat Soc Ser A Stat Soc. 2007;170:262–262.
- 14.Li Z, Liu Q, Zhan M, Tao B, Wang J, Lu W. Meteorological factors contribute to the risk of pulmonary tuberculosis: a multicenter study in Eastern China. Sci Total Environ. 2021;793:148621. [DOI] [PubMed] [Google Scholar]
- 15.Yang W, Su A, Ding L. Application of exponential smoothing method and SARIMA model in predicting the number of admissions in a third-class hospital in Zhejiang Province. BMC Public Health. 2023;23:2309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Shang Y, Zhang TT, Wang ZF, Ma BZ, Yang N, Qiu YT, et al. Spatial epidemiological characteristics and exponential smoothing model application of tuberculosis in Qinghai plateau, China. Epidemiol Infect. 2022. 10.1017/S0950268822000036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–80. [DOI] [PubMed] [Google Scholar]
- 18.He Z, Tao H. Epidemiology and ARIMA model of positive-rate of influenza viruses among children in Wuhan, China: a nine-year retrospective study. Int J Infect Dis. 2018;74:61–70. [DOI] [PubMed] [Google Scholar]
- 19.Lu S. Research on GDP forecast analysis combining BP neural network and ARIMA model. Comput Intell Neurosci. 2021;2021:1026978. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.ArunKumar KE, Kalaga DV, Sai Kumar CM, Chilkoor G, Kawaji M, Brenza TM. Forecasting the dynamics of cumulative COVID-19 cases (confirmed, recovered and deaths) for top-16 countries using statistical machine learning models: Auto-Regressive integrated moving average (ARIMA) and seasonal Auto-Regressive integrated moving average (SARIMA). Appl Soft Comput. 2021;103:107161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Guo J, Liu C, Liu F, Zhou E, Ma R, Zhang L, et al. Tuberculosis disease burden in China: a spatio-temporal clustering and prediction study. Front Public Health. 2025;12:1436515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Li Z, Zhang L, Liu Y. Analysis of the epidemiological trends of tuberculosis in China from 2000 to 2021 based on the joinpoint regression model. BMC Infect Dis. 2024;24:1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Qiu Y, Ding C, Zhang Y, Yuan F, Zhao B, Hao L, et al. Geographical distribution differences of nutrition and health knowledge among Chinese adults in 2021. Wei Sheng Yan Jiu. 2022;51:881–5. [DOI] [PubMed] [Google Scholar]
- 24.Hu M, Feng Y, Li T, Zhao Y, Wang J, Xu C, et al. Unbalanced risk of pulmonary tuberculosis in China at the subnational scale: spatiotemporal analysis. JMIR Public Health Surveill. 2022;8:e36242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Song YH, Li D, Zhou Y, Zhao B, Li JM, Shi K, et al. Prevalence of bovine tuberculosis in yaks between 1982 and 2020 in mainland China: a systematic review and meta-analysis. Vector Borne Zoonotic Dis. 2021;21(6):397–405. [DOI] [PubMed] [Google Scholar]
- 26.Olea-Popelka F, Muwonge A, Perera A, Dean AS, Mumford E, Erlacher-Vindel E, et al. Zoonotic tuberculosis in human beings caused by Mycobacterium bovis-a call for action. Lancet Infect Dis. 2017;17(1):e21–5. [DOI] [PubMed] [Google Scholar]
- 27.Bhargava A, Bhargava M, Meher A, Benedetti A, Velayutham B, Sai Teja G, et al. Nutritional supplementation to prevent tuberculosis incidence in household contacts of patients with pulmonary tuberculosis in India (RATIONS): a field-based, open-label, cluster-randomised, controlled trial. Lancet. 2023;402(10402):627–40. [DOI] [PubMed] [Google Scholar]
- 28.Wei X, Hicks JP, Zhang Z, Haldane V, Pasang P, Li L, et al. Effectiveness of a comprehensive package based on electronic medication monitors at improving treatment outcomes among tuberculosis patients in Tibet: a multicentre randomised controlled trial. Lancet. 2024;403(10430):913–23. [DOI] [PubMed] [Google Scholar]
- 29.Liu L, Sun R, Gu Y, Ho KC. The effect of china’s health insurance on the labor supply of Middle-aged and elderly farmers. Int J Environ Res Public Health. 2020;17:6689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Chen X, Giles J, Yao Y, Yip W, Meng Q, Berkman L, et al. The path to healthy ageing in China: a Peking University-Lancet commission. Lancet. 2022;400(10367):1967–2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Fang EF, Xie C, Schenkel JA, Wu C, Long Q, Cui H, et al. A research agenda for ageing in China in the 21st century (2nd edition): focusing on basic and translational research, long-term care, policy and social networks. Ageing Res Rev. 2020;64:101174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Cheng J, Sun YN, Zhang CY, Yu YL, Tang LH, Peng H, et al. Incidence and risk factors of tuberculosis among the elderly population in China: a prospective cohort study. Infect Dis Poverty. 2020;9(1):13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rabbani U, Sahito A, Nafees AA, Kazi A, Fatmi Z. Pulmonary tuberculosis is associated with biomass fuel use among rural women in Pakistan: an age- and residence-matched case-control study. Asia Pac J Public Health. 2017;29:211–8. [DOI] [PubMed] [Google Scholar]
- 34.Portnoy A, Clark RA, Quaife M, Weerasuriya CK, Mukandavire C, Bakker R, et al. The cost and cost-effectiveness of novel tuberculosis vaccines in low- and middle-income countries: a modeling study. PLoS Med. 2023;20(1):e1004155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Zhang QY, Yang DM, Cao LQ, Liu JY, Tao NN, Li YF, et al. Association between economic development level and tuberculosis registered incidence in Shandong, China. BMC Public Health. 2020;20(1):1557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Abou Jaoude GJ, Garcia Baena I, Nguhiu P, Siroka A, Palmer T, Goscé L, et al. National tuberculosis spending efficiency and its associated factors in 121 low-income and middle-income countries, 2010-19: a data envelopment and stochastic frontier analysis. Lancet Glob Health. 2022;10(5):e649–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Dong Z, Wang QQ, Yu SC, Huang F, Liu JJ, Yao HY, et al. Age-period-cohort analysis of pulmonary tuberculosis reported incidence, China, 2006–2020. Infect Dis Poverty. 2022;11(1):85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Li K, Rui J, Song W, Luo L, Zhao Y, Qu H, et al. Temporal shifts in 24 notifiable infectious diseases in China before and during the COVID-19 pandemic. Nat Commun. 2024;15(1):3891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hong CY, Wang FL, Zhang YT, Tao FX, Ji LC, Lai PX, et al. Time-trend analysis of tuberculosis diagnosis in Shenzhen, China between 2011 and 2020. Front Public Health. 2023;11:1059433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Zhang Y, Zhou H, Wang Y. Study on immunization status among children aged 12–23 months in 14 countries of Western China. CJCHC. 2013;21(08):796–8. [Google Scholar]
- 41.Long Q, Jiang WX, Zhang H, Cheng J, Tang SL, Wang WB. Multi-source financing for tuberculosis treatment in china: key issues and challenges. Infect Dis Poverty. 2021;10. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data is provided within the manuscript or supplementary information files.The datas used were publicly for this study. The website of the data is: http://www.phsciencedata.cn/.












