Abstract
Background
Communicable diseases remain a significant public health challenge in Asia, driven by diverse climatic, socioeconomic, and healthcare-related factors. Despite reductions in diseases such as tuberculosis and malaria, persistent hotspots highlight the need for deeper investigation. This study applies machine learning and spatial analysis techniques to examine patterns and determinants of communicable diseases across 41 countries from 2000 to 2022.
Methods
Data were sourced from global repositories, including WHO, CRU TS, WDI, and UNICEF, covering disease cases (e.g., tuberculosis, dengue, malaria), climaticvariables (e.g., precipitation, humidity), and healthcare metrics (e.g., hospital bed density). Missing values were imputed using random forest methods. Outlier detection was conducted using Mahalanobis distances, identifying and addressing significant deviations to ensure data consistency. Models like XGBoost and Random Forest were assessed using RMSE, MAE, and R². SHAP and XAI frameworks improved interpretability, while Gi* spatial statistics revealed disease hotspots and disparities.
Results
Tuberculosis cases declined from 8.01 million (2000) to 7.54 million (2022), with hotspots in India (Gi* = 3.07) and Nepal (Gi* = 4.67). Malaria cases dropped from 27.00 million (2000) to 7.96 million (2022), yet Bangladesh (Gi* = 4.13) and Pakistan (Gi* = 4.17) exhibited sustained risk. Dengue peaked at 2.71 million cases in 2019, with current hotspots in Malaysia (Gi* = 2.4) and Myanmar (Gi* = 0.79). Spatial disparities underscore the influence of precipitation, relative humidity, and healthcare gaps. XGBoost achieved remarkable accuracy (e.g., tuberculosis: RMSE = 0.94, R² = 0.91), and SHAP analysis revealed critical predictors such as climatic factors.
Conclusion
This study demonstrates the effectiveness of integrating machine learning, spatial analysis, and XAI to uncover disease determinants and guide targeted interventions. The findings offer actionable insights for improving disease surveillance, resource allocation, and public health strategies across Asia.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12942-025-00433-7.
Keywords: Communicable diseases, Explainable AI (XAI), Machine learning models, Hotspot identification, SHAP (Shapley additive explanations)
Introduction
Communicable diseases (CODs) remain an enduring threat to public health across the globe, presenting complex challenges that demand attention, particularly in Asia [1]. This vast and diverse region serves as a striking example of how climatic, socioeconomic, and healthcare-related factors intertwine to shape the dynamics of disease spread. Conditions such as tuberculosis, malaria, and dengue fever continue to affect millions, marking themselves not only as significant health burdens but also as major disruptors of societal and economic progress [2, 3]. The sheer scale of their prevalence and the severity of their impact highlight the urgency for effective action.
In Asia, climatic conditions, such as the seasonal patterns of monsoon rains and temperature fluctuations, can create favorable environments for the proliferation of pathogens and disease vectors [4–6]. At the same time, socioeconomic factors like income disparities and population density amplify vulnerabilities, exposing entire communities to these illnesses [7]. Limited access to healthcare services compounds the problem, particularly in regions where resources are scarce, and systems are stretched thin. The consequences of communicable diseases ripple outward, affecting families, communities, and economies with financial strain and productivity losses.
To tackle these challenges head-on, it is imperative to delve deeper into the underlying determinants and spatial distributions of communicable diseases. Understanding their complex interplay is key to devising targeted prevention strategies and resource allocation plans. Advances in analytical methods and technologies offer hope, enabling researchers and policymakers to decipher intricate patterns and identify solutions that can save lives and alleviate the burden of disease. With innovation and collaboration, it is possible to turn the tide against these persistent threats to public health.
A critical gap in existing research lies in the limited application of integrated methods that combine machine learning, spatial analysis, and explainable artificial intelligence (XAI) in the context of communicable diseases [8]. While machine learning algorithms have gained traction for their accuracy and predictive capabilities, their use in conjunction with spatial analysis techniques like Gi* statistics remains underexplored. Moreover, the interpretability of machine learning models is often a challenge, emphasizing the need for XAI to provide actionable insights for public health stakeholders [9].
This study aims to address these gaps by leveraging artificial intelligence and spatial analysis to explore the interplay of health, climate, and socioeconomic factors in driving communicable diseases in Asia from 2000 to 2022. Specifically, the study seeks to utilize state-of-the-art machine learning algorithms, including random forest and XGBoost to identify key predictors and forecast disease prevalence. Spatial statistical techniques, including Gi* statistics, were employed to evaluate spatial patterns and identify disease hotspots [10]. The overarching objective of this research is to provide a holistic understanding of the determinants of communicable diseases by integrating machine learning, spatial analysis, and explainable artificial intelligence (XAI). Incorporating XAI techniques, such as SHAP (Shapley Additive Explanation), allowed for quantifying individual predictor contributions and enhancing the interpretability of the models [8, 11, 12]. By doing so, the study aims to enhance predictive accuracy, identify geographic disparities, and offer actionable, interpretable insights that can guide evidence-based public health interventions. The findings are expected to bridge existing knowledge gaps and contribute to the development of targeted strategies to mitigate the burden of communicable diseases across Asia.
Materials and methods
Data collection and study site
This study analyzes communicable diseases in Asia from 2000 to 2022, focusing on 41 countries within the region as the study site. Data is collected from reputable sources, including the World Health Organization (WHO) [13], the Climate Research Unit (CRU TS) [14], the World Development Indicators (WDI) [15], and United Nations International Children’s Emergency Fund (UNICEF) [16]. Variables include disease incidence rates (e.g., tuberculosis, dengue, malaria), climate factors (e.g., temperature, precipitation), socioeconomic indicators (e.g., GDP growth, education levels), healthcare infrastructure metrics (e.g., hospital bed density), and environmental land-use statistics. The study site encompasses diverse geographical, climatic, and socioeconomic contexts, enabling a comprehensive analysis of spatial patterns and determinants of disease prevalence using machine learning and spatial analysis techniques (See detail about it on supplementary information section: Spatio-temporal data of communicable diseases in Asia and socioeconomic, climatic, and healthcare drivers of communicable diseases) (Fig. 1 and Table S1).
Fig. 1.
Machine learning and XAI framework. CV: Cross-Validation; RF: Random Forest; XGBoost: Extreme Gradient Boosting; LightGBM: Light Gradient Boosting Machine; DT: Decision Tree; SVM: Support Vector Machine; LR: Logistic Regression; BR: Bayesian Regression; XAI: Explainable Artificial Intelligence; SHAP: Shapley Additive Explanation
Data preprocessing and analysis
The dataset underwent meticulous preprocessing to ensure accuracy and reliability for subsequent analysis. All variables were harmonized through country-level aggregation and standardized across years (2000–2022) to account for temporal heterogeneity. Missing values were imputed using the missRanger library [17], which employs random forest-based methodologies to estimate and fill in missing entries efficiently. This approach preserves the integrity of the dataset by leveraging correlations among variables. Outlier detection was conducted through the computation of Mahalanobis distances, which measures the multivariate distance of data points from the distribution center. Any data points that deviated significantly were either corrected or excluded to minimize their influence on the analysis. To address multicollinearity among predictors, variance inflation factor (VIF) analysis was performed to identify and mitigate highly correlated variables. Standardization was applied to all factors by setting their mean to zero and variance to one, ensuring uniform scaling across variables [18]. This standardization enhances the comparability of features and optimizes the performance of machine learning algorithms. Potential geographic biases between the training and testing sets were explored by assessing unequal regional representation, thereby ensuring that the generalizability of the models was not compromised (Fig. S1). A full inventory of variables, domains, sources, and codes is provided in Table S1. Spatio-temporal patterns of communicable diseases and associated drivers are described in detail in the Supplementary Information section.
Disease classification by transmission pathway
To facilitate targeted analysis, communicable diseases were systematically classified based on their primary modes of transmission. This categorization was informed by established epidemiological frameworks from the World Health Organization (WHO) [13] and previous literature on global and regional disease transmission pathways. Four broad categories were defined (Fig. 1):
Vector-borne diseases: This type of disease stransmitted through arthropod vectors such as mosquitoes or sandflies, including dengue, malaria, leishmaniasis, and Japanese encephalitis.
Airborne or respiratory diseases: Airborne or respiratory diseases transmitted via respiratory droplets or aerosols, encompassing tuberculosis, measles, diphtheria, and pertussis.
Sexually or bloodborne diseases: Sexually or bloodborne diseases transmitted through sexual contact, blood transfusion, or vertical transmission, including HIV, congenital syphilis, and rubella.
Contact, fecal–oral, or direct transmission diseases: Contact, fecal–oral, or direct transmission diseases spread through physical contact or contaminated sources, comprising poliomyelitis, leprosy, and yaws.
Each disease was assigned to a single dominant transmission pathway based on its primary mode of spread, though secondary pathways were acknowledged where relevant. This classification provided a structured framework for subsequent spatial and temporal analyses, enabling comparison of transmission-specific patterns, environmental determinants, and public health vulnerabilities across the Asian region.
The classification scheme ensured analytical consistency across the dataset and facilitated interpretation of spatio-temporal clustering results by transmission mechanism rather than by disease alone.
Model selection and evaluation
The study utilized a range of machine learning algorithms, including Random Forest, XGBoost, LightGBM, Decision Tree, SVM, Logistic Regression, and Bayesian Regression, all implemented using RStudio (version 4.4.2) [19]. To ensure temporal integrity and robust evaluation, the dataset was partitioned into a training set (2000–2018) and a testing set (2019–2022). To determine the most effective models, key performance metrics were calculated, including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and R-squared (R²) values. These metrics ensured comprehensive evaluation by assessing both prediction accuracy and the models’ ability to generalize to unseen data. Based on the evaluation, the best-performing models were chosen for future investigation. The Random Forest method was chosen because of its capacity to handle complicated interactions between variables and its great interpretability [20]. Similarly, XGBoost was chosen for its high computational efficiency and ability to attain remarkable accuracy even on big datasets [21]. Grid search and cross-validation approaches were used to tune these models’ hyperparameters, optimizing parameters such as the number of trees, learning rate, and maximum depth to improve predictive accuracy [22](See detail about it on supplementary information section: Statistical analysis; Model implementation and interpretation). Spatial statistical analysis, using Gi* statistics, identified significant disease hotspots and cold spots over the year (2000-22). This approach provided critical geographic insights, highlighting regional disparities and informing targeted public health interventions. Furthermore, the study incorporated explainable artificial intelligence (XAI) techniques using the DALEX library to enhance interpretability and transparency of the machine learning models. The integration of SHAP (Shapley Additive Explanation) techniques further enhanced the analysis by quantifying individual predictor contributions to the models’ predictions. This SHAP contribution analysis offered deeper insights into the importance of specific features, providing an interpretable framework to understand how variables influenced disease outcomes [21]. Through these methods, the study ensured the selection of models that excel in both accuracy and interpretability while integrating spatial analysis, XAI, and SHAP contributions to address the geographic and explanatory dimensions of communicable diseases (See detail about it on supplementary information section: Statistical analysis; Model implementation and interpretation) (Fig. 1 and Table S2).) (Fig. 1 and Table S2).
Cross-validation
K-fold cross-validation (k = 10) was performed to achieve a reliable and unbiased assessment. The dataset was divided into ten equal-sized folds, with the model trained on nine of them and verified on the remaining fold in each iteration. This method was performed ten times, allowing each fold to be used as a testing set once. To assess model correctness and dependability, average performance measures (MSE, RMSE, MAE, and R²) were determined over all folds. This method lowered the danger of overfitting while also ensuring consistent model performance across diverse subsets of data. The implementation of cross-validation was conducted in RStudio (version 4.4.2), leveraging efficient libraries like caret and xgboost to streamline the process. This method provided a robust framework for evaluating the predictive models and optimizing them for communicable disease modeling (See detail about it on supplementary information section: Statistical analysis).
Results
Trends and Spatial analysis of communicable diseases in Asia (2000–2022)
Vector-borne diseases
Dengue and malaria exhibited prominent hotspots concentrated in South and Southeast Asia, with strong spatial clustering identified in Bangladesh (Gi* = 3.58 for dengue; 4.13 for malaria), Nepal (4.66; 3.85), Pakistan (3.44; 3.22), and Malaysia (5.14; 0.02). Leishmaniasis hotspots were primarily found in Pakistan (Gi* = 4.17) and Bangladesh (1.31), whereas Japanese encephalitis showed localized clustering in Mongolia (4.69) and China (1.81). These geographic patterns are supported by the case incidence trends and descriptive statistics over time (Fig. 2, S2-S4; Tables S3-S5).
Fig. 2.
(A) Spatial distribution of total communicable disease cases in Asia (2000–2022), showing geographic variation and prevalence patterns; (B) Country-wise boxplot depicting the spread of disease cases across nations, highlighting disparities; (C) Yearly trends of total communicable disease (COD) cases and affected countries in Asia (2000–2022), where the x-axis represents the years, the left y-axis scales disease cases, and the right y-axis (secondary axis) reflects the number of countries reporting cases, visualized through bars (cases) and a red line (countries); (D) Spatial analysis using Getis-Ord Gi* statistics identifying significant hotspots and cold spots of disease cases in Asia during the analyzed period
Airborne/respiratory diseases
Significant clustering of tuberculosis, measles, diphtheria, and pertussis was observed across South Asia, notably in Bangladesh (tuberculosis: 3.71; measles: 2.00; diphtheria: 3.91; pertussis: 3.64), Nepal (4.67; 3.28; 3.89; 4.66), Pakistan (3.77; 3.32; 3.14; 3.75), and India (3.07; 2.92; 1.18; 2.01). These findings align with consistently high case counts reported annually, indicating enduring transmission in these regions (Fig. 2, S2-S4; Tables S3-S5).
Sexually/bloodborne diseases
HIV hotspots were comparatively limited, with a notable cluster identified in Malaysia (Gi* = 5.14). Localized outbreaks of congenital syphilis and rubella were evident in Mongolia (5.65; 4.69), Pakistan (3.14; 1.91), and Malaysia (0.02; 2.20), mirroring spatial and temporal case patterns captured in surveillance data (Fig. 2, S2-S4; Tables S3-S5).
Contact/fecal-oral/direct transmission diseases
Hotspots for poliomyelitis and leprosy were primarily situated in South Asia, with significant clustering in Bangladesh (leprosy: 3.96; poliomyelitis: −0.25), Nepal (3.90; −0.25), and Pakistan (3.10; −0.32). Yaws presented minor clusters in select locations. These spatial distributions correspond with endemic trends and yearly incidence reports (Fig. 2, S2-S4; Tables S3-S5).
Overall communicable disease burden
The aggregated spatial analysis revealed persistent high-burden clusters in South Asia, particularly Bangladesh (Gi* = 4.09), Nepal (4.05), Pakistan (3.36), and India (2.54), marking these countries as consistent hotspots over the study period. This concentration of disease burden reflects patterns observed in the compiled case data and incidence statistics (Fig. 2, S2-S4; Tables S3-S5).
Relationships between communicable diseases and influencing factors
The descriptive statistics highlight variability in climate and health risk factors across Asia from 2000 to 2022. Maximum temperature was (25.42 ± 7.27 °C), while precipitation was (80.81 ± 69.66 mm), reflecting the region’s diverse climatic conditions. Relative humidity was (17.24 ± 8.06%), and the number of wet days was (7.46 ± 4.37 days). Regarding health risk factors, tobacco use prevalence was (21.94 ± 15.3%), access to basic drinking water services was (63.55 ± 39.78%), and access to basic sanitation services was (57.39 ± 38.22%) (Table S6).
Tuberculosis is positively correlated with air pollution (0.28), agricultural land (0.18), and relative humidity (0.19), while negatively associated with hospital bed density (−0.14). Malaria shows positive relationships with precipitation (0.22) and maximum temperature (0.17), and weak negative correlations with population growth (−0.03). Dengue is strongly associated with relative humidity (0.38), minimum temperature (0.30), and number of wet days (0.32). Japanese encephalitis is positively correlated with agricultural land (0.11). Leishmaniasis displays positive links with forest area (0.24) and a negative correlation with precipitation (−0.14). Rubella has weak associations, such as with relative humidity (0.07) and access to electricity (0.06). Pertussis shows positive relationships with air pollution (0.21) and negative correlations with population growth (−0.07). Congenital syphilis has limited correlations, showing minimal external influence. total communicable diseases (Total COD) are positively associated with air pollution (0.23) and negatively correlated with government health expenditure (−0.23) (Fig. S5 and Table S7-S11).
Selecting optimal predictive models
XGBoost consistently delivered the highest accuracy for complex diseases like tuberculosis (RMSE = 0.94, MAE = 0.58, R² = 0.91) and rubella (RMSE = 2.18, MAE = 1.03). Random Forest performed better for measles (RMSE = 3.30, MAE = 2.62) and diphtheria (RMSE = 1.82, MAE = 1.27), showing its strength in more linear disease patterns (Fig. 3, S6, S7 and Table S12-S15). These results highlight the importance of selecting models based on disease complexity and optimizing hyperparameters for precision (See detail about it on supplementary information section: Selecting optimal predictive models and their evaluation).
Fig. 3.
Performance metric comparison across diseases (test set: 2019–2022); MSE: Mean Squared Error; RMSE: Root Mean Squared Error; MAE: Mean Absolute Error; MAPE: Mean Absolute Percentage Error
Feature contributions in predicting communicable disease cases
The analysis of predicting cases for communicable diseases highlights key insights into the impact of various factors, measured through Random Forest (RF) and XGBoost models.
Vector-borne diseases
Vector-borne disease predictions were shaped primarily by climatic and environmental variables.
Predictions for dengue reflected climatic influences contributing positively (RF: 5.49%, XGBoost: 7.09%), with demographic data contributing negative values (RF: −15.66%, XGBoost: −22.21%). Malaria cases were shaped by positive contributions from urban infrastructure and climatic factors (RF: 19.79%, XGBoost: 13.64%), while demographic metrics yielded significant negative impacts (RF: −12.96%, XGBoost: −35.05%). Leishmaniasis predictions were driven by environmental factors such as temperature range, contributing (RF: 24.14%, XGBoost: 25.66%) positively, while negative impacts (RF: −6.13%, XGBoost: −9.73%) were relatively minor. Japanese encephalitis cases followed similar trends, with positive contributions (RF: 11.37%, XGBoost: 21.91%) and notable negative impacts (RF: −12.50%, XGBoost: −35.36%) (Fig. S8 and Table S16).
Airborne/respiratory diseases
The prediction of airborne diseases demonstrated the significant role of healthcare and environmental predictors. For tuberculosis, positive contributions (RF: 9.42%, XGBoost: 7.89%) were shaped by environmental and healthcare variables, while negative values (RF: −21.06%, XGBoost: −68.58%) reflected demographic challenges. Measles cases saw strong positive influences (RF: 14.37%, XGBoost: 9.56%), with negative contributions (RF: −13.65%, XGBoost: −27.27%) linked to demographic trends. Diphtheria cases followed similar patterns, with positive impacts (RF: 14.58%, XGBoost: 8.84%) driven by healthcare access and immunization indicators, while negative contributions (RF: −12.24%, XGBoost: −44.77%) were attributed to demographic constraints. Pertussis cases were driven by healthcare-related positive impacts (RF: 12.89%, XGBoost: 12.91%), while negative demographic effects were significant (RF: −6.98%, XGBoost: −24.09%) (Fig. S8 and Table S16).
Sexually transmitted and bloodborne diseases
Predictions for sexually and bloodborne diseases were strongly influenced by healthcare access indicators. HIV cases relied heavily on healthcare density and resources, contributing positively (RF: 19.14%, XGBoost: 29.73%) to predictions, while negative values (RF: −4.71%, XGBoost: −6.74%) reflected demographic pressures. For congenital syphilis, predictions emphasized healthcare infrastructure with positive contributions (RF: 25.63%, XGBoost: 13.15%), and rural pressures negatively influenced outcomes (RF: −14.09%, XGBoost: −65.03%). Similarly, rubella predictions showcased healthcare infrastructure’s positive contributions (RF: 18.18%, XGBoost: 26.09%), while mild negative impacts (RF: −2.11%, XGBoost: −1.70%) arose from demographic factors (Fig. S8 and Table S16).
Contact, fecal-oral, or direct transmission diseases
Diseases transmitted via contact or fecal-oral routes revealed a mix of influences. Poliomyelitis predictions were dominated by high positive contributions (RF: 29.44%, XGBoost: 27.75%), with negative impacts (RF: −8.69%, XGBoost: −17.98%) revealing gaps tied to mortality-related metrics (Fig. S8 and Table S16). Yaws cases were positively influenced by insecticide-treated net ownership (RF: 8.09%, XGBoost: 19.89%), while demographic metrics led to strong negative impacts (RF: −20.13%, XGBoost: −71.02%). Similarly, leprosy cases highlighted healthcare resources (RF: 22.06%, XGBoost: 7.28%) as key contributors, with strong negative influences from population dynamics (RF: −25.34%, XGBoost: −65.73%).
Total communicable disease
The prediction of total communicable diseases (COD) emphasized the significant role of healthcare infrastructure and environmental factors. Notably, urban insecticide-treated net ownership (RF: 15.63%) and government health expenditure (RF: 14.81%, XGBoost: 7.76%) showed strong positive contributions, underscoring the impact of preventive interventions and health funding. Population size exerted a major negative influence, particularly in XGBoost predictions (RF: −13.52%, XGBoost: −47.08%), reflecting demographic pressures that hinder disease control efforts. Additionally, hospital bed density positively influenced XGBoost results (10.92%), highlighting healthcare capacity as a key factor. Overall, these findings illustrate the complex interplay of healthcare resources, environmental protection, and population dynamics in shaping total COD predictions (Fig. 4 and Table S16).
Fig. 4.
Feature contribution for predicting total CODs (Random Forest and XGBoost); EL8: Insecticide-treated net ownership by household (Urban); DE4: Domestic general government health expenditure (%); EL6: Insecticide-treated net ownership by household (National); CL4: Potential evapotranspiration (mm); CL2: Temperature range (°C); EL7: Insecticide-treated net ownership by household (Rural); DE1: Population, total; EL3: Agricultural land (% of land area); HR1: Prevalence of tobacco use (%); EL4: Forest area (% of land area); HI1: Hospital bed density (per 10,000 population); CL1: Cloud cover (%); DE3: Population growth (annual %); HR7: Population using at least basic sanitation services (%); CL9: Relative humidity (%); CL5: Precipitation (mm)
Discussion
The study of communicable diseases in Asia from 2000 to 2022 reveals a complex landscape of progress and persistent challenges. While diseases, such as tuberculosis and malaria, have declined over time, others, such as dengue fever, continue to resurface in specific regions. These patterns illustrate the complexities of illness dynamics, which are influenced by the interaction of climatic variables, healthcare accessibility, and socioeconomic status. The results align with previous studies on climate-sensitive diseases, while also highlighting regional vulnerabilities that require tailored responses [23, 24].
Tuberculosis, despite the declining burden, remains a significant public health concern.Persistent hotspots in India and Nepal suggest that environmental exposures (e.g., air pollution, humidity), healthcare gaps, and pathogen-specific factors such as strain virulence and drug resistance sustain transmission [25]. Addressing these factors through improved living conditions, air quality regulation, and consistent access to treatment is essential [26].
Malaria’s decline reflects successful interventions, yet continued risk in Bangladesh and Pakistan underscores the influence of precipitation and temperature [27–30]. Periodic surges in malaria are likely modulated by vector ecology, including mosquito breeding cycles, larval development, and biting patterns, which interact with seasonal climatic fluctuations to drive transmission intensity. Strengthening vector control and climate-adaptive policies remains key to sustaining progress.
Dengue presents a more dynamic and unpredictable challenge. The periodic surges observed in regions like Malaysia and Myanmar emphasize the role of environmental conditions such as high humidity and frequent wet days in driving outbreaks [31]. These conditions likely influence Aedes mosquito breeding cycles, larval survival, and biting behavior, thereby shaping the timing and magnitude of dengue epidemics. This demonstrates the urgent need for integrated urban planning and infrastructure improvements to reduce mosquito breeding grounds. Furthermore, public health campaigns must focus on equipping communities with knowledge and tools for effective vector control [32]. Adaptive surveillance systems that incorporate real-time climatic data can also help predict and prevent outbreaks, mitigating their impact on health systems [33].
Poliomyelitis’s near eradication is a beacon of hope, showcasing the power of coordinated global health initiatives and vaccination campaigns [34]. However, diseases like rubella and congenital syphilis remind us of lingering gaps in healthcare access that disproportionately affect certain populations [35]. Transmission dynamics, including maternal-infant contact for congenital syphilis and host immunity patterns for rubella, are critical determinants of local outbreaks. Expanding immunization programs, improving prenatal care, and fostering health equity will be vital to addressing these gaps and eliminating these diseases entirely [36].
Machine learning models, including XGBoost and random forest, reveals their potential to transform how we approach communicable diseases. XGBoost stood out for its ability to accurately capture the complex, nonlinear interactions among climatic, healthcare, and demographic factors [37]. For diseases with simpler dynamics, such as measles and diphtheria, Random Forest proved reliable, demonstrating the value of tailoring model selection based on specific disease characteristics [38]. These models offer actionable insights for public health planning.
Perhaps most transformative is the role of SHAP and XAI in making these predictive models transparent and interpretable [39]. SHAP analyses in this study revealed key determinants of disease dynamics: climatic factors such as humidity and temperature for dengue and leishmaniasis, vector-related factors influencing mosquito population dynamics, or healthcare infrastructure and resources for HIV. By quantifying the contribution of these factors, SHAP allows decision-makers to identify and address the root causes of disease vulnerability with precision. For instance, understanding the influence of climate variables on malaria and dengue can guide early warning systems and targeted vector control measures, reducing the risk of outbreaks [40]. Similarly, insights into healthcare-related factors can help prioritize investments in infrastructure, workforce, and outreach programs to bridge gaps in disease control.
XAI further enhances the utility of these models by demystifying their predictions. In the past, traditional epidemiological models often struggled with explaining their results, limiting their use in policy and practice. XAI, however, enables policymakers to understand and trust predictions, ensuring they align with ethical and equitable health objectives [41]. For example, XAI-based models can pinpoint areas where healthcare deficits exacerbate disease vulnerability, guiding resource allocation to where it is needed most [42].
The findings also emphasize the critical need to address the broader context of climate change. Shifting precipitation patterns, rising temperatures, and changing habitats for disease vectors are reshaping the epidemiological landscape, making it more volatile and challenging to predict. Integrating vector activity proxies, seasonal patterns, and pathogen-specific characteristics with real-time environmental monitoring can further improve predictive accuracy and outbreak preparedness. Advanced systems driven by SHAP and XAI can provide early warnings for outbreaks, enabling proactive resource allocation and swift responses [43]. This proactive approach is crucial for diseases like dengue and malaria, whose dynamics are deeply intertwined with environmental factors.
Despite the improvements made in this study, persisting hotspots in Bangladesh, Myanmar, and Pakistan reveal disparities in healthcare access as well as the need for targeted interventions. These findings highlight the need for joint efforts between politicians, healthcare professionals, and communities to eliminate structural barriers and expand the reach of public health initiatives. Educational campaigns and local partnerships may allow communities to have an active role in disease prevention, ensuring that programs are culturally relevant and durable.For example, in Bangladesh, SHAP analysis revealed humidity and precipitation as key drivers of dengue outbreaks. Integrating these insights into early warning systems could enable health authorities to deploy vector control measures ahead of peak transmission seasons. In Nepal, low hospital bed density was linked to tuberculosis hotspots, suggesting targeted infrastructure investment could reduce disease burden.
In conclusion, the analysis of communicable diseases in Asia underscores the intertwined nature of climate, healthcare, and socioeconomic factors in shaping disease dynamics. While advanced tools like SHAP and XAI bring unprecedented clarity and precision to disease prediction and prevention, their potential can only be fully realized when coupled with robust policy action and community engagement. By addressing systemic inequities, leveraging cutting-edge technology, and fostering collaboration across sectors, we can move closer to a future where communicable diseases no longer exert such a profound toll on populations. The lessons from this journey serve as a blueprint for tackling global health challenges, inspiring hope for a healthier, more equitable world.
Limitations
This study provides valuable insights into communicable disease dynamics across Asia, but certain consideration are important. The analysis relies on secondary data, which may introduces potential biases, such as underreporting, missing records, and discrepancies in data collection methodologies across countries and time periods. The modeling framework applied uniform approach across diseases, without fully distinguishing transmission mechanisms, pathogen biology, or vector ecology, limiting specificity for vector-borne diseases such as dengue and malaria. Seasonal and monthly variations, critical for time-sensitive diseases, were not fully modeled, and spatial analysis at the country level may obscure sub-national and urban–rural differences. Additionally, causal relationships and confounding factors such as education, income, and gender disparities were not fully examined. Future studies integrating pathogen-level data, vector ecology, and finer-scale spatiotemporal information would enhance predictive accuracy and public health relevance.
Conclusion
This study underscores both achievements and challenges in addressing communicable diseases across Asia from 2000 to 2022. While declines in diseases such as tuberculosis, malaria, and poliomyelitis reflect the success of sustained public health interventions. In contrast, recurring hotspots for dengue and other climate-sensitive diseases underscore the ongoing influence of environmental conditions, healthcare access, and socioeconomic disparities.By integrating machine learning, spatial analysis, and Explainable AI (XAI), the study offers interpretable insights into disease drivers and geographic risk patterns. These tools enhance understanding of complex interactions and support data-informed decision-making for targeted interventions. While not a replacement for traditional epidemiological approaches, predictive models can complement existing strategies by identifying emerging risks and guiding resource allocation.Looking forward, future research should build on this framework by developing disease-specific models at finer spatial and temporal scales. Incorporating real-time surveillance data and advanced techniques such as spatiotemporal deep learning will further improve responsiveness and precision. Strengthening the integration of predictive analytics into public health systems can support more localized, equitable, and proactive disease control across diverse settings.
Supplementary Information
Supplementary Material 1. Table S1. Key variables and data sources for analysis (2000-22).
Table S2. Model Parameters Summary.
Table S3. Descriptive Statistics of Communicable Disease Cases in Asia (2020–2022).
Table S4. Yearly Trends in Communicable Disease Cases and Affected Countries in Asia (2000–2022).
Table S5. Getis-Ord Gi Statistics for Communicable Disease Hotspots and Coldspots in Asia.
Table S6. Descriptive statistics for the factors in Asia (2000–2022).
Table S7. Correlation analysis between different communicable diseases and climate factors.
Table S8. Correlation analysis between different communicable diseases and health and risk factors.
Table S9. Correlation analysis between different communicable diseases and demographic and economic factors.
Table S10. Correlation analysis between different communicable diseases and healthcare system and infrastructure.
Table S11. Correlation analysis between different communicable diseases and environmental and land use factors.
Table S12. Random forest tuning results for different communicable diseases.
Table S13. XGboost tuning results for different communicable diseases.
Table S14. Model performance comparison across communicable diseases (Train Data: 2000–2018).
Table S15. Model performance comparison across communicable diseases (Test Data: 2019–2022, Best Models).
Table S16. Feature Contribution for predicting different CODs (Random Forest and XGBoost.
Acknowledgements
This study was made possible through the utilization of data from esteemed global repositories, including the World Health Organization (WHO), Climate Research Unit (CRU TS), World Development Indicators (WDI), and United Nations International Children’s Emergency Fund (UNICEF). We express our deep gratitude to these organizations for their dedication to providing reliable, accessible datasets that underpin advancements in global health research. Their contributions have enabled an in-depth examination of disease dynamics and determinants across Asia. Furthermore, we extend appreciation to all stakeholders who continue to work tirelessly in the fight against communicable diseases, including policymakers, public health professionals, and community leaders whose efforts drive impactful change. This work aims to complement their ongoing initiatives and inspire continued collaboration for a healthier future.
Author contributions
MSR: Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Data curation, Formal analysis, Visualization, Methodology, Writing - original draft. ABS: Conceptualization, Data curation, Methodology, Visualization, Formal analysis, Writing - review and editing.
Funding
This work was not externally funded.
Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Declarations
Competing interests
The authors declare no competing interests.
Supporting information
1. Spatio-temporal data of communicable diseases in Asia. 2.Socioeconomic, climatic, and healthcare drivers of communicable diseases. 3. Statistical analysis. 4. Selecting optimal predictive models and their evaluation. 5. Model implementation and interpretation. 5.1. Random forest modeling pipeline. 5.2. XGBoost modeling pipeline. 5.3. Explainable AI (XAI) and SHAP analysis.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Communicable diseases in the South-East Asia Region of the World Health Organization. towards a more effective response. [cited 1 May 2025]. Available: https://iris.who.int/handle/10665/270631 [DOI] [PMC free article] [PubMed]
- 2.Dye C, After. 2015: infectious diseases in a new era of health and development. Philosophical Transactions of the Royal Society B: Biological Sciences. 2014;369. 10.1098/RSTB.2013.0426 [DOI] [PMC free article] [PubMed]
- 3.Vector-borne diseases. [cited 31 Aug 2025]. Available: https://www.who.int/news-room/fact-sheets/detail/vector-borne-diseases
- 4.Liang L, Gong P. Climate change and human infectious diseases: a synthesis of research findings from global and spatio-temporal perspectives. Environ Int. 2017;103:99–108. 10.1016/J.ENVINT.2017.03.011. [DOI] [PubMed] [Google Scholar]
- 5.Wu X, Lu Y, Zhou S, Chen L, Xu B. Impact of climate change on human infectious diseases: empirical evidence and human adaptation. Environ Int. 2016;86:14–23. 10.1016/J.ENVINT.2015.09.007. [DOI] [PubMed] [Google Scholar]
- 6.Rahman MS, Anika AA, Raka RA, Muratovic AK. Impact of climate change on emerging infectious diseases and human physical and mental health in Bangladesh. Health Care Sci. 2025;4:62–5. 10.1002/HCS2.129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Philippe P, Mansi O. Nonlinearity in the epidemiology of complex health and disease processes. Theor Med Bioeth. 1998;19:591–607. 10.1023/A:1009979306346. [DOI] [PubMed] [Google Scholar]
- 8.Rahman MS, Shiddik MAB. Explainable artificial intelligence for predicting dengue outbreaks in Bangladesh using eco-climatic triggers. Glob Epidemiol. 2025;10:100210. 10.1016/J.GLOEPI.2025.100210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Imrie F, Davis R, van der Schaar M. Multiple stakeholders drive diverse interpretability requirements for machine learning in healthcare. Nat Mach Intell. 2023;2023(5):8. 10.1038/s42256-023-00698-2. [Google Scholar]
- 10.Ijumulana J, Ligate F, Bhattacharya P, Mtalo F, Zhang C. Spatial analysis and GIS mapping of regional hotspots and potential health risk of fluoride concentrations in groundwater of Northern Tanzania. Sci Total Environ. 2020;735:139584. 10.1016/J.SCITOTENV.2020.139584. [DOI] [PubMed] [Google Scholar]
- 11.Rahman MS, Shiddik AB. Utilizing artificial intelligence to predict and analyze socioeconomic, environmental, and healthcare factors driving tuberculosis globally. Sci Rep. 2025;15:1–14. 10.1038/S41598-025-96973-. W;SUBJMETA=308,692,699,700;KWRD=DISEASES,HEALTH+CARE,MEDICAL+RESEARCH. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Rahman MdS. ShiddikMdAB. Unraveling global malaria incidence and mortality using machine learning and artificial intelligence–driven Spatial analysis. Sci Rep. 2025;15. 10.1038/S41598-025-12872-0. [DOI] [PMC free article] [PubMed]
- 13.Indicators. [cited 31 Aug 2025]. Available: https://www.who.int/data/gho/data/indicators
- 14.High-resolution gridded datasets. [cited 31 Aug 2025]. Available: https://crudata.uea.ac.uk/cru/data/hrg/
- 15.World Development Indicators | DataBank. [cited 31 Aug 2025]. Available: https://databank.worldbank.org/reports.aspx?source=world-development-indicators
- 16.UNICEF DATA - Child Statistics. [cited 31 Aug 2025]. Available: https://data.unicef.org/
- 17.Stekhoven DJ, Bühlmann P. Missforest-non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28:112–8. 10.1093/BIOINFORMATICS/BTR597. [DOI] [PubMed] [Google Scholar]
- 18.Milligan GW, Cooper MC. A study of standardization of variables in cluster analysis. J Classif. 1988;5:181–204. 10.1007/BF01897163. [Google Scholar]
- 19.https://cran.r-project.org/bin/windows/base/. [cited 10 Jul 2025]. Available: https://cran.r-project.org/bin/windows/base/
- 20.Heesterbeek H, Anderson RM, Andreasen V, Bansal S, DeAngelis D, Dye C et al. Modeling infectious disease dynamics in the complex landscape of global health. Science (1979). 2015;347. 10.1126/SCIENCE.AAA4339. [DOI] [PMC free article] [PubMed]
- 21.Rahman, MdS. ShiddikMdAB. Unraveling global malaria incidence and mortality using machine learning and artificial intelligence–driven Spatial analysis. Sci Rep. 2025;15:1–11. 10.1038/S41598-025-12872-0. SUBJMETA=1629,255,692,699;KWRD=DISEASES,MALARIA. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Belete DM, Huchaiah MD. Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results. Int J Comput Appl. 2022;44:875–86. 10.1080/1206212X.2021.1974663. [Google Scholar]
- 23.Heesterbeek H, Anderson RM, Andreasen V, Bansal S, DeAngelis D, Dye C et al. Modeling infectious disease dynamics in the complex landscape of global health. Science (1979). 2015;347. 10.1126/SCIENCE.AAA4339/ASSET/07500115-CBD2-4198-8BF1-A86C41EE7601/ASSETS/GRAPHIC/347_AAA4339_FA.JPEG [DOI] [PMC free article] [PubMed]
- 24.Eisenberg JNS, Desai MA, Levy K, Bates SJ, Liang S, Naumoff K, et al. Environmental determinants of infectious disease: A framework for tracking causal links and guiding public health research. Environ Health Perspect. 2007;115:1216–23. 10.1289/EHP.9806/ASSET/DC754A8E-571F-40AC-9BC5-DC0A866B6A7F/ASSETS/GRAPHIC/EHP0115-001216F5.JPG. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Izah SC, Ogwu MC, Shahsavani A, editors. Air Pollutants in the Context of One Health. 2024;134. 10.1007/978-3-031-74165-4
- 26.Lönnroth K, Jaramillo E, Williams BG, Dye C, Raviglione M. Drivers of tuberculosis epidemics: the role of risk factors and social determinants. Soc Sci Med. 2009;68:2240–6. 10.1016/J.SOCSCIMED.2009.03.041. [DOI] [PubMed] [Google Scholar]
- 27.Haque U, Hashizume M, Glass GE, Dewan AM, Overgaard HJ, Yamamoto T. The role of climate variability in the spread of malaria in Bangladeshi highlands. PLoS One. 2010;5:e14341. 10.1371/JOURNAL.PONE.0014341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Noureen A, Aziz R, Ismail A, Trzcinski AP. The impact of climate change on waterborne diseases in Pakistan. Sustainability and Climate Change. 2022;15(2):138–52. 10.1089/SCC.2021.0070. [Google Scholar]
- 29.Chowdhury AH, Rahman MS. Spatio-temporal pattern and associate meteorological factors of airborne diseases in Bangladesh using geospatial mapping and spatial regression model. Health Sci Rep. 2024;7:e2176. 10.1002/HSR2.2176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hossain Chowdhury AI, Siddikur Rahman MI. Machine learning and spatio-temporal analysis of meteorological factors on waterborne diseases in Bangladesh. PLoSNegl Trop Dis. 2025;19:e0012800. 10.1371/JOURNAL.PNTD.0012800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Olivia LW, Obanda V, Bucht G, Mosomtai G, Otieno V, Ahlm C, et al. Global emergence of alphaviruses that cause arthritis in humans. Infect Ecol Epidemiol. 2015. 10.3402/IEE.V5.29853. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Samsudin NA, Othman H, Siau CS, Zaini Z, ‘Izzat I. Exploring community needs in combating Aedes mosquitoes and dengue fever: a study with urban community in the recurrent hotspot area. BMC Public Health. 2024;24:1–12. 10.1186/S12889-024-18965-1/FIGURES/3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Pley C, Evans M, Lowe R, Montgomery H, Yacoub S. Digital and technological innovation in vector-borne disease surveillance to predict, detect, and control climate-driven outbreaks. Lancet Planet Health. 2021;5:e739–45. 10.1016/S2542-5196(21)00141-8. [DOI] [PubMed] [Google Scholar]
- 34.Aylward RB, Acharya A, England S, Agocs M, Linkins J. Global health goals: lessons from the worldwide effort to eradicate poliomyelitis. Lancet. 2003;362:909–14. 10.1016/S0140-6736(03)14337-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Francis JG, Francis LP. Sustaining surveillance: the importance of information for. Public Health. 2021;6. 10.1007/978-3-030-63928-0.
- 36.Black RE, Taylor CE, Arole S, Bang A, Bhutta ZA, Chowdhury AMR, et al. Comprehensive review of the evidence regarding the effectiveness of community–based primary health care in improving maternal, neonatal and child health: 8. Summary and recommendations of the expert panel. J Glob Health. 2017;7:010908. 10.7189/JOGH.07.010908. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Chen Y, Zhang X, Grekousis G, Huang Y, Hua F, Pan Z, et al. Examining the importance of built and natural environment factors in predicting self-rated health in older adults: an extreme gradient boosting (XGBoost) approach. J Clean Prod. 2023;413:137432. 10.1016/J.JCLEPRO.2023.137432. [Google Scholar]
- 38.Hasan MK, Jawad MT, Dutta A, Awal MA, Islam MA, Masud M, et al. Associating measles vaccine uptake classification and its underlying factors using an ensemble of machine learning models. IEEE Access. 2021;9:119613–28. 10.1109/ACCESS.2021.3108551. [Google Scholar]
- 39.Hassija V, Chamola V, Mahapatra A, Singal A, Goel D, Huang K, et al. Interpreting Black-Box models: A review on explainable artificial intelligence. Cognit Comput. 2024;16:45–74. 10.1007/S12559-023-10179-8/FIGURES/14. [Google Scholar]
- 40.Hussain-Alkhateeb L, Ramírez TR, Kroeger A, Gozzer E, Runge-Ranzinger S. Early warning systems (EWSs) for chikungunya, dengue, malaria, yellow fever, and Zika outbreaks: what is the evidence? A scoping review. PLoS Negl Trop Dis. 2021;15:e0009686. 10.1371/JOURNAL.PNTD.0009686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Albahri AS, Duhaim AM, Fadhel MA, Alnoor A, Baqer NS, Alzubaidi L, et al. A systematic review of trustworthy and explainable artificial intelligence in healthcare: assessment of quality, bias risk, and data fusion. Inf Fusion. 2023;96:156–91. 10.1016/J.INFFUS.2023.03.008. [Google Scholar]
- 42.Alizadehsani R, Oyelere SS, Hussain S, Jagatheesaperumal SK, Calixto RR, Rahouti M, et al. Explainable artificial intelligence for drug discovery and development: a comprehensive survey. IEEE Access. 2024;12:35796–812. 10.1109/ACCESS.2024.3373195. [Google Scholar]
- 43.Application of Artificial Intelligence in Wastewater Treatment. Application of Artificial Intelligence in Wastewater Treatment. 2024. 10.1007/978-3-031-69433-2
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary Material 1. Table S1. Key variables and data sources for analysis (2000-22).
Table S2. Model Parameters Summary.
Table S3. Descriptive Statistics of Communicable Disease Cases in Asia (2020–2022).
Table S4. Yearly Trends in Communicable Disease Cases and Affected Countries in Asia (2000–2022).
Table S5. Getis-Ord Gi Statistics for Communicable Disease Hotspots and Coldspots in Asia.
Table S6. Descriptive statistics for the factors in Asia (2000–2022).
Table S7. Correlation analysis between different communicable diseases and climate factors.
Table S8. Correlation analysis between different communicable diseases and health and risk factors.
Table S9. Correlation analysis between different communicable diseases and demographic and economic factors.
Table S10. Correlation analysis between different communicable diseases and healthcare system and infrastructure.
Table S11. Correlation analysis between different communicable diseases and environmental and land use factors.
Table S12. Random forest tuning results for different communicable diseases.
Table S13. XGboost tuning results for different communicable diseases.
Table S14. Model performance comparison across communicable diseases (Train Data: 2000–2018).
Table S15. Model performance comparison across communicable diseases (Test Data: 2019–2022, Best Models).
Table S16. Feature Contribution for predicting different CODs (Random Forest and XGBoost.
Data Availability Statement
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.




