Machine learning-based prediction of mortality risk from air pollution-induced acute coronary syndrome in the Western Pacific region

Sazzli Kasim; Sorayya Malek; Song Cheen; Putri Nur Fatin; Kiew Xue Ning; Hanis Hamidi; Wan Azman Wan Ahmad; Khairul Shafiq Ibrahim; Kazuaki Negishi; Meriam Nik Sulaiman; Alan Fong

doi:10.1038/s41598-025-15410-0

. 2026 Jan 27;16:3486. doi: 10.1038/s41598-025-15410-0

Machine learning-based prediction of mortality risk from air pollution-induced acute coronary syndrome in the Western Pacific region

Sazzli Kasim ^1,^2,^✉, Sorayya Malek ^3,^✉, Song Cheen ^4,^5,^✉, Putri Nur Fatin ³, Kiew Xue Ning ³, Hanis Hamidi ², Wan Azman Wan Ahmad ⁶, Khairul Shafiq Ibrahim ², Kazuaki Negishi ⁷, Meriam Nik Sulaiman ⁸, Alan Fong ⁹

PMCID: PMC12847821 PMID: 41588015

Abstract

Air pollution is a growing cardiovascular risk in Southeast Asia, particularly in the Western Pacific region where transboundary haze and urban emissions are prevalent. Despite its relevance, traditional risk scores for Acute Coronary Syndrome (ACS) often overlook environmental factors. This study aims to assess the predictive value of air pollution exposure on ACS mortality using machine learning (ML) techniques, thereby addressing this clinical-environmental data gap. We combined clinical data from the National Cardiovascular Disease Database (NCVD) of Malaysia and daily air quality data (NOx, SO₂, O₃, PM₁₀) from the Department of Environment (2006–2017). ML algorithms including logistic regression, random forest (RF), XGBoost, and ensemble learning were developed to predict in-hospital mortality. SHapley Additive exPlanations (SHAP) were applied to enhance model interpretability. Model performance was compared against the conventional TIMI risk scores for STEMI and NSTEMI patients. From 14,145 ACS cases, the RF model achieved the highest AUC (0.843), outperforming TIMI scores (0.791 for STEMI, 0.565 for NSTEMI). Net Reclassification Index improvements were 8.71% (STEMI) and 86.94% (NSTEMI), both statistically significant (p < 0.001). SHAP analysis identified NOx and O₃, along with clinical factors like Killip class and fasting blood glucose, as top contributors to mortality prediction. Our results highlight the feasibility of combining environmental and clinical features to improve ACS mortality prediction using ML models. While the model shows strong potential for Malaysia, external validation in other Western Pacific populations is necessary before broader generalization. This framework can inform future region-specific public health interventions targeting pollution-related cardiovascular risk.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-15410-0.

Subject terms: Computational models, Machine learning, Computational biology and bioinformatics, Cardiology, Risk factors

Introduction

Air pollution is the leading environmental risk factor for global health and the fourth leading cause of mortality worldwide, posing a substantial global health challenge. Although it is a well-established contributor to cardiovascular morbidity and mortality, its specific impact on acute coronary syndrome (ACS) particularly when explored using advanced machine learning (ML) algorithms remains poorly characterized¹. This underscores a critical need for more nuanced, data-driven approaches to better understand how environmental factors influence ACS outcomes.

In the Western Pacific region, the impact of air pollution on health, particularly its association with ACS, mirrors the challenges faced by Malaysia. Urbanization and industrialization contribute significantly to deteriorating air quality in these countries, exacerbating health issues.

Air quality concerns in Southeast Asia, a region characterized by diverse environmental challenges, significantly impact public health². In Indonesia, sharing similar environmental challenges with Malaysia, especially in densely populated cities, sees a direct link between high vehicular and industrial emissions and increased ACS cases. Singapore, despite its urban density, maintains better air quality through strict controls, yet isn’t immune to the health risks during periods of transboundary haze. In Thailand, urban centers like Bangkok suffer from severe pollution, raising ACS risks. The Philippines faces a similar plight, where air pollution is a pressing public health concern. In contrast, Brunei, with lower industrial and vehicular emissions, experiences better air quality and potentially fewer pollution-related ACS cases compared to Malaysia.

Key pollutants such as nitrogen oxides (NO_x), sulfur dioxide (SO₂), ozone (O₃), and particulate matter 10 (PM10) are common culprits, triggering health problems. These countries face the dual challenge of fostering economic growth while mitigating environmental health risks. Regional cooperation, especially in managing transboundary haze, and effective national policies are crucial for improving air quality and reducing the associated health burdens^3,4.

In Western Pacific region, research on this air pollution association with ACS using both conventional statistical and machine learning (ML) methods is limited or none to our best knowledge. The burden of ACS is high in Malaysia with 20–25% of all deaths in public hospitals attributed to coronary artery diseases, with higher mortality rate reported for the 30-day mortality following myocardial infarction⁵. According to the World Health Organization (WHO), air pollution is also a significant health concern in the Western Pacific Region, where about 2.2 million people die each year due to air pollution-related causes⁶.

Prior studies have shown that machine learning (ML) and stacked ensemble learning (EL) algorithms outperform conventional models in predicting acute coronary syndrome (ACS) mortality, including in population-specific registries across the Western Pacific region⁷. These studies consistently report high predictive accuracy using various ML algorithms. For example, a systematic review and meta-analysis encompassing 12 studies with over 250,000 ACS patients confirmed the superior performance of ML models over conventional methods⁸. In China, Yang et al.⁹ developed an XGBoost model for STEMI patients that outperformed both the GRACE and TIMI scores, even when externally validated with a reduced set of variables. In Malaysia, Kasim et al.¹⁰ applied an XGBoost model to a cohort of NSTEMI/UA patients and found it superior to TIMI across in-hospital, 30-day, and 1-year mortality predictions. In Iran, Kermani et al.¹¹ reported that ensemble methods like Random Forest and XGBoost outperformed other models in predicting in-hospital mortality among AMI patients undergoing PCI. Additionally, the PRAISE study by D’Ascenzo et al.¹² demonstrated the generalizability of an ML-based score for 1-year all-cause mortality using diverse international datasets. Collectively, these findings underscore the predictive advantages of ML and EL models over traditional risk scores in ACS populations and support their growing role in clinical risk stratification.

Although these approaches show strong predictive performance, they predominantly focus on clinical predictors and do not account for environmental risk factors such as air pollution. This limitation is especially important in the Western Pacific region, where pollution-related health risks are rising. While the link between air pollution and cardiovascular outcomes has been established^13,14, its integration into individualized mortality prediction models for hospitalized ACS patients remains scarce in the ML domain.

To date, there is a notable absence of studies applying ML techniques to examine the contribution of environmental exposures especially air quality data within established clinical risk scoring systems such as the Thrombolysis in Myocardial Infarction (TIMI) score. This creates a significant research gap in understanding and modeling ACS mortality risk associated with air pollution, particularly in low- and middle-income countries across Asia. This leaves a significant gap in our understanding and ability to use ML to predict mortality risk associated with air pollution.

This study is novel in its use of ML and stacked EL algorithms to improve mortality risk stratification of ACS outcomes in the context of air pollution. It aims to investigate the relationship between air pollution and the incidence of ACS in Western Pacific region with a focus on Malaysia. The study seeks to identify the most significant air pollution features contributing to the risk of ACS mortality, employing interpretable ML algorithms to provide deeper insights into this critical health issue. This study is particularly relevant to Western Pacific countries with environmental and health profiles like Malaysia. Countries like Indonesia, Thailand, and the Philippines, which face significant urban air pollution challenges, would greatly benefit from these findings. Similarly, Singapore, despite its stricter air quality controls, can utilize these insights, especially during haze episodes. These countries, sharing similar air pollution issues and cardiovascular health challenges, can adapt the study’s methodologies and findings to improve public health outcomes in their respective regions.

Methods

Study design and data sources

Study overview

In this study, we examine the relationship between air pollution and the onset of ACS in Malaysia, focusing on its subtypes ST-Elevation Myocardial Infarction (STEMI) and Non-ST-Elevation Myocardial Infarction/Unstable Angina (NSTEMI/UA). We specifically study key air pollutants like NOx, SO₂, O₃, and PM10, known contributors to cardiovascular diseases¹⁵.

Our analysis leverages ML algorithms, including logistic regression (LR), support vector machine (SVM), random forest (RF), naïve bayes (NB), eXtreme gradient boosting (XGBoost) and stacked EL. The study combines clinical and environmental data to investigate factors influencing mortality in ACS patients, particularly the impact of air pollution. The SHapley Additive exPlanations (SHAP) explainer was used to better understand and improve the predictability and transparency of these models. The study development flowchart is shown in Fig. 1 below.

Fig. 1 — Graphical Workflow of ML Model Development.

Study data

The data for this study was collected from two primary sources: The National Cardiovascular Disease Database (NCVD) for ACS data and the Department of Environment (DOE), Malaysia for air quality measurements. Both datasets were received as structured data.

The NCVD, supported by Malaysia’s Ministry of Health, gathers data on cardiovascular diseases. Our focus is on the NCVD-ACS registry, which includes information from 25 Malaysian hospitals and spans 2006 to 2017. The Medical Review & Ethics Committee (MREC) of the Ministry of Health approved the registry in 2007 (Approval Code: NMRR-07-38-164), with the UiTM ethics committee (Reference number: 600-TNCPI (5/1/6)) and the National Heart Association of Malaysia (NHAM) also granting approval. Key patient data, such as demographics, clinical profiles, and treatment details, are meticulously collected. Patient mortality is verified annually by the National Registration Department of Deaths. The NCVD data was accessed on 9th July 2021. The data used in this study were anonymized prior to analysis, as our research focuses solely on the values and features, without access to any personal information about the patients. All procedures in this study were conducted in accordance with the Declaration of Helsinki. Informed consent was waived by the Medical Review & Ethics Committee (MREC) of the Ministry of Health Malaysia, as the NCVD-ACS data used in this study were anonymized prior to access and analysis.

The study analyzed air quality data from the DOE Malaysia from January 1, 2006, to April 13, 2017, including 61,816 daily measurements of NOx, SO₂, O₃, and PM10. The DOE air quality data were accessed on 23rd June 2021. This information, which included 24-h mean concentrations from a network of monitoring stations, was combined with NCVD-ACS information from hospitals within a 15-km radius¹⁶. Air quality data on time lag 0 aligning with the daily reporting of ACS onset in patient records to assess the impact of air pollution on ACS patients’ mortality risk. Google Earth analysis was instrumental in clarifying the spatial relationships among monitoring stations, hospitals, and air quality. This contributed to better integration of air quality data with NCVD-ACS datasets, ensuring temporal matching of environmental exposure to ACS events.

Outcomes and candidate predictors

The primary outcome of the study was the mortality of ACS patients in relation to air pollution, considering post-ACS onset. It sought to comprehend the impact of air pollution on these mortality rates by investigating the relationship between ACS and air pollution predictors to predict ACS mortality outcomes.

Air pollution exposure assessment and rationale for hospital-based assignment

Recent ISI indexed studies from 2022 to 2025 continue to highlight the relevance of short-term air pollution exposure, particularly to NO₂, SO₂, and O₃, in triggering acute coronary events or influencing mortality outcomes. These findings are especially relevant for Southeast Asia and the Western Pacific. Wang et al.¹⁷ conducted a multi-city study in China and found that short-term exposure to PM2.5, NO₂, and SO₂ was significantly associated with increased hospital admissions for acute myocardial infarction (AMI). He et al.¹⁴, in Shanghai, reported that hourly exposure to PM2.5 and NO₂ correlated strongly with the onset of AMI, emphasizing the importance of very short-term triggers. Lee et al.¹⁸ from South Korea showed that acute exposure to PM2.5 and O₃ increased the risk of out-of-hospital cardiac arrest, a severe consequence commonly linked to ACS. In Malaysia, Mohamad Roslan et al.¹⁹ investigated cardiovascular admissions in Klang and found that NO₂ was significantly associated with ischemic heart disease admissions, especially in interaction with PM10, making it particularly relevant to our study. Similarly, Han et al.²⁰ observed that higher short-term PM2.5 and O₃ exposure, coupled with cold weather, was linked to elevated AMI mortality in Taiwan.

Given these findings, this study uses hospital-based air pollution data as a proxy for short-term exposure in ACS patients. The attribution of exposure based on the hospital district rather than patient home addresses was chosen due to practical limitations in accessing individual residential data and the common practice in Malaysia of patients presenting to the nearest tertiary hospital within approximately 100 km of their homes. Although we acknowledge that patients may not always be in their residential or working area at symptom onset, such occurrences are relatively uncommon. The selected approach aligns with established practices in air pollution epidemiology, where hospital or district-level exposure data are often used to estimate ambient conditions during the acute phase preceding hospitalization^14,19.

From a methodological standpoint, the choice of exposure assignment in studies of acute events like ACS depends on the hypothesized exposure window and data availability. Residence-based exposure is typically preferred when evaluating long-term or cumulative effects, using pollutant levels from monitors near the home. However, this method does not account for the time patients spend away from home. Conversely, hospital-based exposure, especially in time-series and case-crossover designs, is considered appropriate for short-term exposure studies and has been widely used when fine-scale geolocation data are not available. It reflects the ambient environment where the acute episode likely culminated, particularly when analyzing lag periods of zero to seven days prior to the event. While hybrid or advanced exposure models that incorporate spatio-temporal or personal monitoring data offer improved precision, such approaches are not feasible in large-scale retrospective hospital datasets like ours. Thus, the hospital-based exposure model used here represents a pragmatic and scientifically justified method for evaluating short-term environmental contributions to ACS onset and mortality in the Malaysian context.

Data preparation

Data preprocessing

The source dataset from NVCD registry comprised 54 variables across 54,000 records. For this study, we focused on 14 key input features identified as significant in a previous study by Kasim et al.⁷. This selection refined the dataset to 14,145 instances, specifically tailored for model development involving ACS patients.

The merged dataset was examined for potential errors, missing values, duplicate records, and outliers. Steps were taken to address these issues systematically: rows with incomplete data or outliers were removed, prioritizing the retention of complete cases. This approach not only improved the dataset’s quality but also minimized the risk of introducing biases or inaccuracies in the ML and stacked EL models, as supported by findings from Psychogyios et al.²¹.

To address the significant class imbalance inherent in mortality prediction tasks, we employed the Random OverSampling Examples (ROSE) technique during model training. Traditional classifiers often struggle in imbalanced datasets, tending to favor the majority class (survivors) while underperforming on the minority class (mortality), which is clinically the most critical. ROSE generates synthetic examples of the minority class using a smoothed bootstrap approach that estimates the conditional density of the data, thereby creating new instances that are similar but not identical to existing ones. This method enhances model sensitivity and recall for mortality prediction, improves generalizability, and reduces the risk of overfitting associated with simple duplication methods^22,23. The effectiveness of ROSE in improving classification performance, particularly for rare outcomes, has been validated in biomedical and clinical informatics research, making it a suitable choice for our ACS mortality dataset.

For missing data, we opted for complete-case analysis by excluding records with missing values in key predictor variables. This approach was chosen to preserve the integrity and interpretability of the model, particularly given the risk of bias introduced by imputation methods when the missingness mechanism is uncertain or potentially not at random. Complete-case analysis provides unbiased parameter estimates when data are Missing Completely At Random (MCAR) and is particularly appropriate when the proportion of missing data is relatively low, as in our dataset²⁴. Moreover, while multiple imputations offer an alternative, it relies heavily on the Missing At Random (MAR) assumption and can result in misestimation or implausible imputations if the imputation model is mis specified²⁵. Given these considerations, the combined use of ROSE for class balancing and complete-case analysis for missing data represents a robust and transparent preprocessing strategy aligned with best practices in clinical prediction modeling.

The detailed breakdown of selected in-hospital variables stratified by survival outcome is provided in Supplementary Table 1. The distribution of variables between the training and testing datasets is summarized in Supplementary Table 2. Performance metrics for each machine learning model—including AUC values, confidence intervals, and statistical comparisons using DeLong’s test—are reported in Supplementary Table 3. The dataset was cleaned and merged with air pollution exposure data from the NCVD Registry and the Department of Environment (DOE), Malaysia, prior to model development and evaluation.

Feature selection in this study was guided by both clinical relevance and findings from our previous work using the same NCVD-ACS registry. In our earlier publication, we developed a validated model to predict in-hospital mortality among ACS patients in Malaysia, using variables routinely collected in the NCVD dataset. These include demographic factors (e.g., age, gender), clinical history (e.g., diabetes, hypertension), vital signs, biochemical parameters, and treatment variables all of which have well-established links to ACS outcomes.

For this study, we extended the feature set by incorporating air pollution variables—NOx, SO₂, O₃, and PM₁₀—based on data availability from the Department of Environment (DOE) and their documented associations with cardiovascular events in regional studies. Notably, Mohamad Roslan et al.¹⁹ demonstrated a strong link between NO₂ and ischemic heart disease admissions in Klang, Malaysia. Our approach builds on validated variables from prior model development while introducing regionally relevant environmental predictors, ensuring both methodological continuity and enhanced insight into ACS mortality risk within the Malaysian context.

Data splitting and cross-validation

Data partitioning was depicted in Fig. 2, with 70% allocated for model training and the remaining 30% reserved for validation, following the guidelines in literature²⁶.

K-fold cross-validation was employed in this study. This technique involves dividing the input data into 'k' folds, for instance, k = 5, resulting in the dataset being split into 5 parts. The model undergoes training and evaluation 5 times, using each fold once for testing and the others for training²⁷. This method validates the performance of the developed ML models, ensuring the selection of the best model²⁸.

Data balancing and data normalization

Data imbalance often found in medical datasets where class instances vary, leading to reduced classifier performance and bias towards the majority class²⁹. To predict ACS mortality amid air pollution effectively, we utilized the ROSE method for the training dataset only. ROSE is renowned for its efficacy in binary classification with imbalanced classes. It creates balanced samples for both continuous and categorical data using a smoothed bootstrap approach. This technique maintains the data’s integrity and produces synthetic samples for underrepresented classes, enhancing the accuracy and unbiased of the model²³.

For continuous variables such as age, heart rate, high-density lipoprotein (HDLC), low-density lipoprotein (LDLC), fasting blood glucose (FBG), NOx, SO₂, O₃, and PM10, data normalization was applied using the min–max normalization approach. Previous research has shown that data normalization can significantly improve the accuracy of ML algorithms³⁰.

Machine learning model development

The detailed flow for classification model development presents the sequence of steps in our ML application, including model development, hyper parameter tuning, and the selection of the best-performing model (Fig. 3).

Fig. 3 — The flowchart of the classification ML predictive models’ development.

Hyperparameter tuning

Each of the models in our study went through hyperparameter tuning, which is critical for optimizing performance and ensuring the accuracy and robustness of our analysis, particularly in the context of air pollution and ACS incidence.

For this purpose, we utilized the ‘caret’ package in R, known for its ability to streamline the training of complex models³¹. This package was selected for its consistency in delivering outcomes across various model complexities. The hyperparameters values for optimum ML model performance for classification models are included in Supplementary Table 4.

Machine learning performance evaluation

The Area Under the Receiver Operating Characteristic (AUROC) curve is a key metric for evaluating classification models, particularly in medical diagnosis, as noted by Fawcett³². AUROC provides a consistent measure. It assesses a classifier’s ability to distinguish between classes based on true positive and false positive rates, independent of class distributions. This quality makes AUROC a reliable and informative tool for evaluating classifier performance.

The raw testing dataset was used to evaluate the model’s performance without using the ROSE balancing method. This approach was chosen to improve the model’s performance in real-world scenarios.

Model interpretation and comparison

SHAP analysis

To address the ‘black box’ nature of ML algorithms, we used SHAP to interpret ML model predictions³³. Utilizing the ‘shap’ library, we computed SHAP values, which provide a unified measure of feature importance. This involved training ML models, making predictions, and then applying the SHAP explainer to these models. SHAP values reveal the contribution of each feature to the predictions, thereby enhancing the global interpretability of the models and understanding of feature importance.

Comparative analysis of ML models and conventional methods: NRI and performance metrics

The Net Reclassification Index (NRI) measures improvement in classifying individuals into higher or lower risk categories when a new model is compared with a pre-existing risk strategy, particularly for the prediction of ACS risk in relation to air pollution³⁴.

The study adopts a mortality risk threshold for high and low-risk patients, as proposed by Correia et al.³⁵, which is applied to the most effective machine learning models for NRI calculation. The determination of suitable cut-off points for the TIMI risk score, particularly for STEMI and NSTEMI/UA patients, is aligned with recognized standards in the field.

The TIMI (Thrombolysis In Myocardial Infarction) risk score was selected as the clinical benchmark in our study due to its widespread use and practicality in in-hospital settings, particularly in Malaysia and other regions within the Western Pacific. Compared to the GRACE score, which incorporates a broader set of variables and is more suitable for long-term risk prediction, TIMI is simpler, additive in structure, and relies on clinical variables that are readily available upon admission making it especially valuable for early triage and mortality risk assessment. Several studies^36,37 have shown that the TIMI score performs comparably to GRACE in predicting in-hospital mortality, particularly in acute settings like ST-elevation myocardial infarction (STEMI). Its integration into emergency protocols and ease of bedside use have led to its adoption in national and hospital-level ACS care guidelines. As such, TIMI provides a clinically relevant and contextually appropriate baseline for evaluating the added predictive value of machine learning models for ACS-related in-hospital mortality.

Best model deployment on the web

The best performing algorithm identified in the study, the RF algorithm, has been implemented in an online platform using web programming languages. This web-based system features an interface for mortality prediction, utilizing both ACS and air pollution parameters. Additionally, it incorporates a database for storing patient results, which aids in the ongoing evaluation and enhancement of the system. A reporting mechanism is also integrated, further augmenting its utility in clinical and research settings.

Results

Baseline characteristic

Table 1 presents data on 14,145 in-hospital ACS patients, featuring 18 selected variables, with age being a primary demographic factor. The average age was 59 years (SD = 39). Notable statistical differences (p < 0.001) were observed between survivors and non-survivors in variables such as age, heart rate, Killip class, FBG, HDLC, LDLC, medication usage, and exposure to NOx and O₃. The overall mortality rate was 6.1%. Of these patients, 58.47% were diagnosed with STEMI, while the rest had NSTEMI and UA, evaluated using the TIMI risk score.

Table 1.

Baseline Characteristics for in-hospital selected variables.

Variables	Features	In-Hospital Selected Variables
Variables	Features	All cases (n = 14,145)	Survivors (n = 13,287)	Non-survivors (n = 858)	p-value
ACS Stratum	STEMI	8271 (58.5%)	7659 (57.6%)	612 (71.3%)	< 0.001
	NSTEMI	3460 (24.5%)	3244 (24.4%)	216 (25.2%)
	UA	2414 (17.1%)	2384 (17.9%)	30 (3.5%)
Age*		20.9 ± 96.6	20.9 ± 96.6	23.2 ± 92.2	< 0.001
Heart Rate*		22 ± 200	27 ± 200	22 ± 182	< 0.001
Chronic Angina (< 2 Weeks)		9610 (67.9%)	9031 (68.0%)	579 (67.5%)	0.767
Killip class*	I:	9767 (69.0%)	9561 (72.0%)	206 (24.0%)	< 0.001
	II:	2712 (19.2%)	2520 (19.0%)	192 (22.4%)
	III:	659 (4.7%)	545 (4.1%)	114 (13.3%)
	IV:	1007 (7.1%)	661 (5.0%)	346 (40.03%)
ECG Abnormalities*		3967 (28.0%)	3688 (27.8%)	279 (32.5%)	0.003
High Density Lipoprotein*		0.50 ± 4.94	0.50 ± 4.94	0.50 ± 3.00	< 0.001
Low Density Lipoprotein*		0.50 ± 18.0	0.60 ± 18.0	0.50 ± 9.44	< 0.001
Fasting Blood Glucose*		3.00 ± 49.0	3.00 ± 49.0	3.00 ± 46.4	< 0.001
Cardiac Catheterization		5166 (36.5%)	4878 (36.7%)	288 (33.6%)	0.064
Coronary Artery Bypass Graft (CABG)		124 (0.9%)	114 (0.9%)	10 (1.2%)	0.349
Statin*		13,278 (93.9%)	12,533 (94.3%)	745 (86.8%)	< 0.001
Other Lipid Lowering Agent		434 (3.1%)	422 (3.2%)	12 (1.4%)	0.003
Oral Hypoglycemic Agent*		3424 (24.2%)	3340 (25.1%)	84 (9.8%)	< 0.001
Antiarrhythmic agent*		680 (4.8%)	570 (4.3%)	110 (12.8%)	< 0.001
Nitrogen Oxides*		0 ± 209.22	0 ± 137.74	0 ± 187.87	< 0.001
Sulfur Dioxides		0 ± 192.05	0 ± 207.03	0 ± 211.81	0.024
Ozone*		0 ± 148.71	0 ± 129.91	0 ± 124.71	< 0.001
Particulate Matter 10		0 ± 390	0 ± 390	0 ± 322	0.643

Open in a new tab

The asterisk (*) indicated that the variable difference between the survivors and non-survivors’ groups is statistically significant (p-value < 0.001).

Classification models performance evaluation

Table 2 shows the classification model performance in this study using selected in-hospital features on the remaining 30% testing dataset. The detailed performance metrics is included in the supplementary section (Supplementary Table 5). The results show that ML and the stacked EL algorithms outperformed TIMI risk scores in predicting STEMI and NSTEMI outcomes in the presence of air pollution.

Table 2.

The AUC of ML models and TIMI risk score for in-hospital selected features based on 30% testing dataset.

Predictive models	The area under the ROC Curve (95% CI)
Predictive models	In-Hospital Selected Features
Logistic Regression	0.834 (0.803–0.865)
SVM (Linear)	0.833 (0.803–0.864)
Random Forest	0.843 (0.813–0.873)
Naïve Bayes	0.838 (0.807–0.869)
XGBoost	0.836 (0.804–0.868)
Stacked EL	0.842 (0.812–0.873)
TIMI (STEMI)	0.791 (0.757–0.825)
TIMI (NSTEMI)	0.565 (0.505–0.625)

Open in a new tab

The RF model demonstrated high predictive accuracy, with an AUC of 0.843 (95% CI: 0.813–0.873) (p-value 0.001). The TIMI risk score performed poorly, with AUCs of 0.791 for STEMI and 0.565 for NSTEMI. While TIMI’s performance in predicting STEMI mortality risk is acceptable, its effectiveness in predicting NSTEMI mortality risk is significantly lower when compared to other models.

The ROC curve for the predictive models based on the testing dataset is shown in Fig. 4. ROC curves for in-hospital mortality prediction, stratified by STEMI and NSTEMI, are presented in Figs. 5.

Fig. 5 — ROC Curves of ML, stacked EL model and TIMI for in-Hospital selected variables mortality prediction for (a) STEMI and (b) NSTEMI patients.

SHAP analysis

The SHAP summary plots for the RF with in-hospital selected features offers a detailed view of feature importance, merging it with the effects of each feature on the testing dataset (Fig. 6).

Fig. 6 — SHAP summary plot of RF model based on in-hospital selected features.

Features such as Killip Class, FBG, patient’s age, heart rate, and usage of oral hypoglycemic agents are linked with higher negative effects on the outcome. This association suggests that an increase in these features correlates with an increase mortality risk. In addition, NOx and O₃ were found to have the strongest association with mortality risk among in-hospital ACS patients. Conversely, CABG appears to have the least influence. Overall, the plot reveals that both clinical and environmental factors significantly affect the model’s mortality risk predictions for ACS patients.

Comparison of ML to TIMI risk score to the validation dataset

Figures 7 and 8 illustrate the comparison of the RF model for mortality risk against the TIMI risk score for both STEMI and NSTEMI respectively. TIMI Risk Score for STEMI has a scale of 0–14 while TIMI Risk Score for NSTEMI has a scale of 0–7. We categorized ML score patients as low risk with the probability < 50% and high-risk stratum as ≥ 50%. This is equivalent to TIMI low risk of score ≤ 5 and a high-risk score of > 5 for both STEMI and NSTEMI risk scores³⁵.

Fig. 7 — Performance breakdown of the TIMI risk score for in-hospital selected variables mortality prediction for both STEMI and NSTEMI patients.

Fig. 8 — Performance breakdown of the ML model (RF model) for in-hospital selected variables mortality prediction for both STEMI and NSTEMI patients.

The RF model correctly classified 25.53% of STEMI patients and 18.37% of NSTEMI patients as high risk (risk probability greater than 50%), in comparison with TIMI score, it correctly classified 19.53% for STEMI patients and 9.38% for NSTEMI patients.

Table 3 tabulates the percentage of mortality in the patients with predicted low-risk (TIMI score: < 5; ML (STEMI) probabilities: < 0.5; ML (NSTEMI) probabilities: < 0.4) and high-risk (TIMI score: > 5; ML (STEMI) probabilities: ≥ 0.5; ML (NSTEMI) probabilities: ≥ 0.4).

Table 3.

Percentage distribution of patient mortality as classified by TIMI Score and ML models across in-hospital selected features datasets.

Dataset	Predictive Models	High-Risk Threshold	Low-Risk (%)	High-Risk (%)
In-Hospital Selected Features	TIMI (STEMI)	> 5	0.85	19.53
	TIMI (NSTEMI)	> 5	4.86	9.38
	RF (STEMI)	> 0.5	3.03	25.53
	RF (NSTEMI)	> 0.4	1.78	18.37

Open in a new tab

Hence, the ML models demonstrated better predictive accuracy for mortality among high-risk patients compared to the TIMI risk score. Furthermore, ML models demonstrated the greatest improvement in predicting mortality among NSTEMI patients in the context of air pollution exposure.

Net reclassification index (NRI) analysis

The ML models had significantly better accuracy as assessed by NRI. The net reclassification for STEMI patients using the RF was 8.71%, as shown in Table 4, indicating a statistically improvement over the initial TIMI risk score (p < 0.001). The NRI for NSTEMI patients, as shown in Table 5, shown that the RF model improved by 86.94%, substantially outperforming the original TIMI risk score (p < 0.001).

Table 4.

Net Reclassification Improvement (NRI) of the RF Model compared to the TIMI risk score using the in-hospital selected features dataset. The table depicts the comparative performance of the RF model against the TIMI Risk Score for STEMI patients.

In-hospital Selected Features
		Number of individuals		Reclassification		Net correctly reclassified (%)
		Random Forest		Increased risk	Decreased risk
		Low risk	High risk	Increased risk	Decreased risk
Individuals with events (died) (n = 171)
	TIMI score			34	26	8/171 = 4.68%
	Low risk	37	34
	High risk	26	74
Individuals without events (alive) (n = 2334)
	TIMI score			124	218	94/2334 = 4.03%
	Low risk	1798	124
	High risk	218	194
Net Reclassification Index (NRI)	4.68 + 4.03 = 8.71%
Z, p-value
Z, p-value	189.41, p < 0.001
Conclusion	It was statistically significant. The predictive power of the RF model was improved as compared to the TIMI Risk Scores Model in predicting the mortality rate of ACS STEMI patients in the presence of air pollution, and the proportion of correct classification increased by 8.71%

Open in a new tab

Table 5.

Net Reclassification Improvement (NRI) of the RF Model Compared to the TIMI Risk Score using the In-Hospital Selected Features Dataset. The table depicts the comparative performance of the RF model against the TIMI Risk Score for NSTEMI Patients.

In-hospital Selected Features
		Number of individuals		Reclassification		Net correctly reclassified (%)
		Random Forest		Increased risk	Decreased risk
		Low risk	High risk	Increased risk	Decreased risk
Individuals with events (died) (n = 86)
	TIMI score			54	2	52/86 = 95.35%
	Low risk	29	54
	High risk	2	1
Individuals without events (alive) (n = 1653)
	TIMI score			162	23	− 139/1653 = − 0.084
	Low risk	1462	162
	High risk	23	6
Net Reclassification Index (NRI)	95.35 + (− 0.084) = 86.94%
Z, p-value
Z, p-value	994.7, p < 0.001
Conclusion	It was statistically significant. The predictive power of the RF model was improved as compared to the TIMI Risk Scores Model in predicting the mortality rate of ACS NSTEMI patients in the presence of air pollution, and the proportion of correct classification increased by 86.94%

Open in a new tab

Deployment of the best ML algorithm online

The best performing ML algorithm RF has been integrated into an online system available at https://myheartacsair.uitm.edu.my/home.php.

This system features a ML-based calculator for predicting mortality using air pollution parameters, a database for storing and retrieving information, and a reporting system. Detailed descriptions of the system are included in the supplementary materials, which also feature a video walkthrough of the study. This system’s framework is designed to be easily adaptable for use in Western Pacific countries with similar environmental and health profiles.

Discussion

Our study is at the forefront of integrating environmental data with clinical features to predict ACS mortality, utilizing ML and stacked EL methodologies. This integrative approach reflects a novel direction in digital health, emphasizing the significance of environmental determinants in clinical outcomes. The robust performance of our RF model (AUC = 0.840) compared to traditional TIMI scores suggests that ML algorithms can offer more accurate risk assessments, particularly in the context of Western Pacific region particularly Southeast Asia’s unique environmental and clinical landscape. The study’s relevance to Western Pacific countries lies in its innovative approach to integrating environmental health data with medical insights. By applying ML to understand the relationship between air pollution and ACS, it offers a model that can be adapted by countries with similar environmental and health challenges as Malaysia. This includes nations like Indonesia, Thailand, the Philippines, Singapore, Vietnam, and Cambodia, which face issues with urban air pollution and its impact on cardiovascular health. The methodology and findings of the study could guide these countries in developing targeted public health strategies and policies, enhancing ACS prevention and management in the context of environmental health.

The findings can be summarised as follows: (i) RF (AUC = 0.840) outperform other ML and EL models when using in-hospital selected features. (ii) ML model and stacked EL developed using in-hospital selected features (AUC ranging from 0.82–0.84) outperformed conventional risk scoring score TIMI in in-hospital features (STEMI AUC = 0.791 and NSTEMI AUC = 0.659). (iii) SHAP summary plots illustrate the model’s explainability, among the air pollutants, NOx and O₃ shows impacts towards the mortality risk in ACS patients. This supports the hypothesis that environmental factors contribute significantly to ACS outcomes, a critical consideration for regions with high pollution levels. (iv) for continuous validation and wide applicability of ML models it should be deployed in a web-based system platform.

Previous research has shown that models based on ML perform better in classification tasks than models based on conventional risk scores in ACS mortality studies^7,9,11. Similar findings were reported in our study as well. The absence of environmental factors in conventional risk scoring method is notable, given the growing evidence of the influence of environmental factors, specifically air pollution, on cardiovascular health³⁸.

Application of ML algorithms is promising for predicting the in-hospital mortality risk of ACS patients in the presence of air pollution, particularly the RF algorithm that exhibited superior performance. The Random Forest (RF) algorithm is well-recognized for its robustness in handling high-dimensional data and capturing complex inter-feature interactions Liaw & Wiener³⁹. This strength was demonstrated in the study by Hadanny et al.⁴⁰, which reported that the RF model achieved an AUC of 0.848 for predicting 30-day mortality in STEMI patients—significantly outperforming both the GRACE score (AUC = 0.773) and the TIMI score (AUC = 0.729). These results underscore the RF model’s superior predictive performance in mortality risk stratification. Although the findings pertain specifically to 30-day post-STEMI mortality, they align with the improved classification performance of RF observed in this study for predicting in-hospital mortality among ACS patients, further validating its utility in complex clinical risk prediction tasks.

Stacked ensemble learning (EL) was employed in this study to potentially enhance model performance. However, due to the already strong predictive power of the base models, the stacked EL did not yield significant improvements. This finding aligns with Zhang et al.⁴¹, who also observed minimal performance gains from stacked EL when base models already demonstrated high accuracy, highlighting the complexity and limited interpretability of stacked ensembles as potential drawbacks.

Nevertheless, other ACS-related studies have demonstrated the benefit of stacking methods. For instance, Wang et al. (2024) developed a two-layer stacked ensemble model to predict in-hospital mortality in 3,061 NSTEMI patients treated between 2017 and 2022 in a leading Chinese hospital. The stacking model achieved an AUC of 0.987, outperforming all individual models, including RF (0.948), SVM (0.942), and GBDT (0.920). It also exceeded others in accuracy, precision, recall, and F1-score. The study concluded that stacked EL effectively combines the strengths of multiple algorithms, enabling more accurate risk stratification and earlier clinical intervention for high-risk NSTEMI patients.

The TIMI (Thrombolysis in Myocardial Infarction) score, though widely used in acute coronary syndrome (ACS) risk stratification, has notable limitations in diverse, contemporary clinical settings. Developed from Western cohorts in the pre-modern reperfusion era, it uses a limited number of clinical variables and has demonstrated only modest predictive performance in non-Western populations. A previous validation study in an Asian STEMI cohort reported a moderate AUC of 0.78 for the TIMI score, further highlighting its limited applicability in higher-risk populations⁴². Although the GRACE score typically exhibits better discriminative ability than TIMI, its manual complexity and derivation from predominantly Caucasian datasets reduce its utility and generalizability in Asian contexts. In contrast, recent studies have shown that machine learning (ML) and ensemble learning (EL) models outperform traditional scores like TIMI and GRACE across various populations and mortality endpoints. A systematic review and meta-analysis by Al-Khayyat et al.⁸, covering over 250,000 ACS patients, reported a pooled C-statistic of 0.88 (95% CI: 0.86–0.91) for the best-performing ML models, significantly higher than the 0.82 (95% CI: 0.80–0.85) achieved by conventional methods. In China, Yang et al.⁹ developed an XGBoost model using the CAMI registry for STEMI patients, attaining an AUC of 0.896 outperforming GRACE (AUC 0.809) and TIMI (AUC 0.782). Even in external validation using only 10 variables on the China PEACE dataset, the model maintained an AUC of 0.840, exceeding GRACE (AUC 0.762) and TIMI (AUC 0.789). In Malaysia, Kasim et al.¹⁰ applied an XGBoost model to a NSTEMI/UA cohort, achieving an AUC of 0.88 for in-hospital mortality compared to just 0.55 for the TIMI score. These findings support the growing evidence that ML and EL methods offer superior predictive performance over traditional risk scores and hold significant promise for improving ACS risk assessment, particularly in Asian populations. The situation is worsened by the environmental factors that are commonly found in the Asia, as studies have shown that Asian countries are heavily impacted by air pollution and significantly contributes to premature mortality³⁸.

This is further supported from the findings from NRI of STEMI and NSTEMI patients using the in-hospital selected variables produced an NRI of 8.71%, and 86.94% respectively when compared to the original TIMI risk score. We can see that significant improvement is added to the NSTEMI population, a cohort that accounts for half or more of all ACS cases worldwide. In medical field, a small increase in the performance of predictive models is vital and capable of giving a significant impact⁴³. In this study, we found that TIMI underestimated mortality risk in both lower and higher risk groups. This may cause treatment to be delayed, increasing avoidable deaths.

The SHAP summary plot indicates that higher values of Killip class, FBG, age, and heart rate correlate with poorer outcomes, consistent with existing literature⁴⁴. SHAP analysis also confirms that statins and oral hypoglycaemic agents are important in the management of ACS, lowering mortality risk, which is consistent with the findings of Sposito and Chapman⁴⁵ on the benefits of early statin therapy after acute coronary events.

The SHAP analysis reveals that elevated NOx and O₃ levels are significantly associated with an increased mortality risk in ACS patients, correlating with baseline data that shows a clear link between these air pollutants and patient mortality.

In existing research on air pollutants affecting ACS patients, the emphasis is often on PM2.5's impact⁴⁶. Nevertheless, our study highlights the importance of NOx and O₃. Although PM2.5 is widely studied for its adverse health effects, our results suggest NOx and O₃ are equally important, particularly in regions with high or increasing levels of these pollutants.

Conclusion

In conclusion, our findings support the use of ML models that incorporate a broader range of determinants, including environmental factors, into clinical decision-making processes. Future research should explore the integration into clinical workflows, as well as their effectiveness in real-world settings and their potential to inform public health strategies. The findings suggest shift in digital health risk assessment models, toward a more comprehensive model that recognizes the complex interplay between environmental exposures and health outcomes related to ACS. Finally, deploying machine learning models in a web-based system platform is crucial for continuous validation and wider applicability, especially in the context of healthcare research like ACS prediction in relation to air pollution. A web platform allows for real-time data analysis, accessibility by healthcare professionals globally, and the ability to continually update the model with new data. This approach enhances the model’s accuracy, usability, and relevance to diverse regions, particularly in the Western Pacific countries with similar environmental health challenges.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1^{(45.1KB, docx)}

Acknowledgements

This study received funding from the Ministry of Science, Technology, and Innovation (MOSTI) under the Technology Development Fund (TeD 1) (Grant Number: TDF03211036), which played a pivotal role in its successful execution. The views and findings presented in this publication are solely those of the authors and do not necessarily reflect the official policy or position of MOSTI. We also express our sincere gratitude to the National Heart Association of Malaysia (NHAM) and the Department of Environment (DOE) for their invaluable contribution of critical data, which significantly enhanced the quality and accuracy of our research. Their support has been essential in enabling us to delve deeper into the complexities of this study. In addition to these primary contributors, this project was brought to fruition thanks to the collective efforts and insights of numerous individuals and institutions whose dedication to the advancement of medical research is truly commendable. Their expertise and guidance have been indispensable throughout the course of this study.

Author contributions

Sazzli Kasim and Sorayya Malek conceived the article’s content and structure, acquired funding, and supervised the study’s planning and execution. Song Cheen, Putri Nur Fatin and Khairul Shafiq Ibrahim cleaned and pre-processed the data. Nurulain Ibrahim and Putri Nur Fatin performed the data analysis and visualisation. Data and results interpretation was done by Song Cheen, Putri Nur Fatin, Sorayya Malek and Sazzli Kasim. Song Cheen, Sorayya Malek and Xue Ning Kiew prepared the original draft. Hanis Hamidi, Wan Azman Wan Ahmad, Khairul Shafiq Ibrahim, Kazuaki Negishi, Meriam Nik Sulaiman, Alan Fong provided feedback, critical revisions on the results and work as proof-readers for manuscript.

Funding

Ministry of Science, Technology, and Innovation (MOSTI) under the Technology Development Fund (TeD1). This work was supported by the Higher Institution Centre of Excellence (HICoE) research grant 600-RMC/MOHE HICoE CARE-I 5/3 (01/2025) awarded to the Cardiovascular Advancement and Research Excellence Institute (CARE Institute), Universiti Teknologi MARA.

Data availability

The datasets obtained and analyzed during the current study are not publicly available due to institutional and ethical restrictions. The clinical data utilized in this study originate from the National Cardiovascular Disease Database (NCVD) of the National Heart Association of Malaysia (NHAM) and include sensitive patient information that requires multiple institutional agreements and ethical approvals for access. Similarly, the air pollution data were obtained from the Department of Environment (DOE) Malaysia and necessitate a formal application process for acquisition. These datasets are available from the corresponding author upon reasonable request, subject to necessary institutional approvals and application processes.

Declarations

Competing interests

Authors SM, SC, and PNF are affiliated with the Bioinformatics Department, Faculty of Science, University of Malaya, and have primarily contributed to the development of most ML models in this study. SK, HH, and KSI, serving as cardiologists in the Department of Medicine at Universiti Teknologi MARA, have provided valuable insights related to cardiovascular aspects of our research. W.A.W. Ahmad, from the Faculty of Medicine at the University of Malaya, has contributed significantly to the study’s perspective and supported the utilization of NCVD-ACS data. KN, affiliated with the Sydney Medical School Nepean, University of Sydney, Australia, brings an international perspective to our research, with previous experience in air pollution and cardiovascular disease studies. M.N. Sulaiman, from Environmental Engineering, Faculty of Engineering, University of Malaya, adds a multidisciplinary angle to the study. AF, affiliated with the Sarawak Heart Centre at the Sarawak General Hospital in Kuching, Malaysia, has also provided support on the NCVD-ACS data. All other authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Change history

2/20/2026

The original online version of this Article was revised: The original version of this Article omitted an affiliation for Sazzli Kasim. Their correct affiliations are: “Cardiovascular Advancement and Research Excellence Institute (CARE Institute), Universiti Teknologi MARA, Selangor, Malaysia.” and “Cardiology, Faculty of Medicine, Universiti Teknologi Mara (UiTM), Shah Alam, Malaysia.” Additionally, the Funding section was incomplete. The Funding section now reads: “Ministry of Science, Technology, and Innovation (MOSTI) under the Technology Development Fund (TeD1). This work was supported by the Higher Institution Centre of Excellence (HICoE) research grant 600-RMC/MOHE HICoE CARE-I 5/3 (01/2025) awarded to the Cardiovascular Advancement and Research Excellence Institute (CARE Institute), Universiti Teknologi MARA.” The original article has been corrected.

Contributor Information

Sazzli Kasim, Email: sazzlishahlan@uitm.edu.my.

Sorayya Malek, Email: sorayya@um.edu.my.

Song Cheen, Email: song.cheen@monash.edu.

References

1.Roth, G. et al. Global burden of cardiovascular diseases and risk factors, 1990–2019: update from the GBD 2019 study. J. Am. Coll. Cardiol.76(25), 2982–3021 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Diplomat T. [Online]. Available: https://thediplomat.com/2023/11/extinguishing-a-point-of-contention-examining-transboundary-haze-in-southeast-asia/. [Accessed November 2024] (2023).
3.Cheong, K. H. et al. Acute health impacts of the Southeast Asian transboundary haze problem: A review. Int. J. Environ. Res. Public Health16(18), 3286 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.P. plc, Past trends and future projections of air quality and health implications in different countries/cities. Prudential plc, 2023.
5.M. Ministry of Health Malaysia, Cardiovascular diseases clinical practice guidelines. 2017. [Online]. Available: http://www.moh.gov.my/moh/resources/Penerbitan/CPG/CARDIOVASCULAR/3.pdf.
6.WHO. Ambient air pollution: A global assessment of exposure and burden of disease. World Health Organization, 121 (2018).
7.Kasim, S. et al. In-hospital mortality risk stratification of Asian ACS patients with artificial intelligence algorithm. PLoS ONE17(12), e0278944 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Al-Khayyat, A., Al-Abdouli, A., Awan, A. & El-Deeb, Y. Machine learning versus traditional approaches to predict all-cause mortality for acute coronary syndrome: A systematic review and meta-analysis. Am. J. Cardiol.41, 145–153 (2024). [Google Scholar]
9.Yang, X., Zhang, Y., Liu, Y., Li, H. & Wang, Z. A machine learning model for predicting in-hospital mortality in patients with ST-segment elevation myocardial infarction: Development and validation of an XGBoost model. J. Med. Internet Res.26(1), e50067 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Kasim, S. et al. Data analytics approach for short- and long-term mortality prediction following acute non-ST-elevation myocardial infarction (NSTEMI) and unstable angina (UA) in Asians. PLoS ONE19(2), e0298036 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Kermani, M., Azizi, M., Ghodsi, M. & Taghdisi, M. H. Predicting in-hospital mortality in patients with acute myocardial infarction: A comparison of machine learning approaches. Clin. Cardiol.48(3), 123–130 (2025). [Google Scholar]
12.D’Ascenzo, F. et al. Machine learning-based prediction of adverse events following an acute coronary syndrome (PRAISE): a modelling study of pooled datasets. The Lancet397(10270), 199–207 (2021). [Google Scholar]
13.Phosri, A., Ueda, K., Thepnoo, W., Charoenrattanaporn, C. & Honda, A. Effects of particulate matter (PM₁₀) on hospital admissions for cardiovascular and respiratory diseases in Chiang Mai, Thailand during the haze episode. Atmosphere13(3), 490 (2022). [Google Scholar]
14.He, Z., Zhang, Y., Wang, S., Huang, Z. & Chen, Y. Hourly exposure to air pollutants and the onset of acute myocardial infarction in Shanghai, China. Chemosphere313, 137547 (2023). [DOI] [PubMed] [Google Scholar]
15.Franchini, M. & Mannucci, P. M. Air pollution and cardiovascular disease. Thromb. Res.129(3), 230–234 (2012). [DOI] [PubMed] [Google Scholar]
16.Khir, M. S. M. et al. Spatio-temporal analysis of PM10 in Southern Peninsular Malaysia. Int. J. Eng. Technol.7(3), 27–30 (2018). [Google Scholar]
17.Wang, L., Zhang, Y., Li, F., Li, C. & Xu, H. Mortality prediction of inpatients with NSTEMI in a premier hospital in China based on stacking model. PLoS ONE19(12), e0312448 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Lee, J., Kim, H. & Shin, J. Association of short-term exposure to ambient air pollution with out-of-hospital cardiac arrest: A nationwide study in South Korea. Environ. Res.218, 114943 (2023). [DOI] [PubMed] [Google Scholar]
19.Mohamad Roslan, M. A., Latif, M. T., Dominick, D., Ooi, M. C. G. & Abd Hamid, H. H. Acute effects of air pollution on cardiovascular hospital admissions in the port district of Klang, Malaysia: A time-series analysis. Environ. Res.243, 117859 (2024). [DOI] [PubMed] [Google Scholar]
20.Han, C. L., Lung, S. C. C., Wu, C. D. & Hwang, J. S. Effects of exposure to air pollution and cold weather on acute myocardial infarction mortality. Atmosphere16(4), 469 (2024). [Google Scholar]
21.Psychogyios, K., Ilias, L. & Askounis, D. Comparison of missing data imputation methods using the framingham heart study dataset. In 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) (2022).
22.Menardi, G. & Torelli, N. Training and assessing classification rules with imbalanced data. Data Min. Knowl. Disc.28(1), 92–122 (2014). [Google Scholar]
23.Lunardon, N., Menardi, G. & Torelli, N. ROSE: A package for binary imbalanced learning. R J.6(1), 79–89 (2014). [Google Scholar]
24.Jakobsen, J. C., Gluud, C., Wetterslev, J. & Winkel, P. When and how to perform multiple imputation in clinical research. BMC Med. Res. Methodol.17, 162 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Twisk, J. W. R. Handling missing data in clinical research. J. Clin. Epidemiol.151, 156–161 (2022). [Google Scholar]
26.Hastie, T., Tibshirani, R., Friedman, J. H. & Friedman, J. H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction Vol. 2 (Springer, New York, 2009). [Google Scholar]
27.Ajitesh, K. K-Fold Cross Validation—Python Example. 1 March 2023. [Online]. Available: https://vitalflux.com/k-fold-cross-validation-python-example/.
28.Rukshan, P. k-fold cross-validation explained in plain English. 19 December 2020. [Online]. Available: https://towardsdatascience.com/k-fold-cross-validation-explained-in-plain-english-659e33c0bc0.
29.Domingues, I., Amorim, J. P., Abreu, P. H., Duarte, H. & Santos, J. Evaluation of oversampling data balancing techniques in the context of ordinal classification. In 2018 International Joint Conference on Neural Networks (IJCNN) (2018).
30.Tina, R. P. & Sherekar, S. S. Performance analysis of naive Bayes and j48 classification algorithm for data classification. Int. J. Comput. Sci. Appl.6, 256–261 (2013). [Google Scholar]
31.M. Kuhn and contributors, "Caret Package," 2023. [Online]. Available: https://cran.r-project.org/web/packages/caret/vignettes/caret.html.
32.Fawcett, T. An introduction to ROC analysis. Pattern Recogn. Lett.27(8), 861–874 (2006). [Google Scholar]
33.Rao, S., Mehta, S., Kulkarni, S., Dalvi, H., Katre, N. & Narvekar, M. A study of LIME and SHAP model explainers for autonomous disease predictions. In 2022 IEEE Bombay Section Signature Conference (IBSSC), Bombay (2022).
34.Zhou, Y., Wang, H. & Liu, H. Generalized function projective synchronization of incommensurate fractional-order chaotic systems with inputs saturation. Int. J. Fuzzy Syst.21(3), 823–836 (2019). [Google Scholar]
35.Correia, L. et al. Prognostic value of TIMI score versus GRACE score in ST-segment elevation myocardial infarction. Arq. Bras. Cardiol.103, 98–106 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Antman, E. M. et al. The TIMI risk score for unstable angina/non–ST elevation MI: A method for prognostication and therapeutic decision making. JAMA284(7), 835–842 (2000). [DOI] [PubMed] [Google Scholar]
37.de Araújo Gonçalves, P., Ferreira, J., Aguiar, C. & Seabra-Gomes, R. TIMI, PURSUIT, and GRACE risk scores: Sustained prognostic value and interaction with revascularization in NSTE-ACS. Eur. Heart J.26(9), 865–872 (2005). [DOI] [PubMed] [Google Scholar]
38.Lelieveld, J., Evans, J., Fnais, M., Giannadaki, D. & Pozzer, A. The contribution of outdoor air pollution sources to premature mortality on a global scale. Nature525, 367–371 (2015). [DOI] [PubMed] [Google Scholar]
39.Liaw, A. & Wiener, M. Classification and regression by randomForest. R News2(3), 18–22 (2002). [Google Scholar]
40.Hadanny, A. et al. Predicting 30-day mortality after ST-elevation myocardial infarction: Machine learning-based random forest and its external validation using two independent nationwide datasets. J. Cardiol.78(5), 439–446 (2021). [DOI] [PubMed] [Google Scholar]
41.Zhang, Z., Chen, L., Xu, P. & Hong, Y. Predictive analytics with ensemble modeling in laparoscopic surgery: a technical note. Laparosc. Endos. Robot. Surg.5(1), 25–34 (2022). [Google Scholar]
42.Selvarajah, S. et al. Impact of cardiac care variation on ST-elevation myocardial infarction outcomes in Malaysia. Am. J. Cardiol.111(9), 1270–1276 (2013). [DOI] [PubMed] [Google Scholar]
43.Alahmar, A., Mohammed, E. & Benlamri, R. Application of data mining techniques to predict the length of stay of hospitalized patients with diabetes. In 2018 4th International Conference on Big Data Innovations and Applications (Innovate-Data), Barcelona (2018).
44.Van Den Berg, P. & Body, R. The HEART score for early rule out of acute coronary syndromes in the emergency department: a systematic review and meta-analysis. Eur. Heart J. Acute Cardiovasc. Care7(2), 111–119 (2018). [DOI] [PubMed] [Google Scholar]
45.Sposito, A. & Chapman, M. Arteriosclerosis, thrombosis, and vascular biology, 22(10). In Statin therapy in acute coronary syndromes: mechanistic insight into clinical benefit, 1524–1534 (2002).
46.Chen, R. et al. Hourly air pollutants and acute coronary syndrome onset in 1.29 million patients. Circulation145(24), 1749–1760 (2022). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1^{(45.1KB, docx)}

Data Availability Statement

[CR1] 1.Roth, G. et al. Global burden of cardiovascular diseases and risk factors, 1990–2019: update from the GBD 2019 study. J. Am. Coll. Cardiol.76(25), 2982–3021 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Diplomat T. [Online]. Available: https://thediplomat.com/2023/11/extinguishing-a-point-of-contention-examining-transboundary-haze-in-southeast-asia/. [Accessed November 2024] (2023).

[CR3] 3.Cheong, K. H. et al. Acute health impacts of the Southeast Asian transboundary haze problem: A review. Int. J. Environ. Res. Public Health16(18), 3286 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.P. plc, Past trends and future projections of air quality and health implications in different countries/cities. Prudential plc, 2023.

[CR5] 5.M. Ministry of Health Malaysia, Cardiovascular diseases clinical practice guidelines. 2017. [Online]. Available: http://www.moh.gov.my/moh/resources/Penerbitan/CPG/CARDIOVASCULAR/3.pdf.

[CR6] 6.WHO. Ambient air pollution: A global assessment of exposure and burden of disease. World Health Organization, 121 (2018).

[CR7] 7.Kasim, S. et al. In-hospital mortality risk stratification of Asian ACS patients with artificial intelligence algorithm. PLoS ONE17(12), e0278944 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Al-Khayyat, A., Al-Abdouli, A., Awan, A. & El-Deeb, Y. Machine learning versus traditional approaches to predict all-cause mortality for acute coronary syndrome: A systematic review and meta-analysis. Am. J. Cardiol.41, 145–153 (2024). [Google Scholar]

[CR9] 9.Yang, X., Zhang, Y., Liu, Y., Li, H. & Wang, Z. A machine learning model for predicting in-hospital mortality in patients with ST-segment elevation myocardial infarction: Development and validation of an XGBoost model. J. Med. Internet Res.26(1), e50067 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Kasim, S. et al. Data analytics approach for short- and long-term mortality prediction following acute non-ST-elevation myocardial infarction (NSTEMI) and unstable angina (UA) in Asians. PLoS ONE19(2), e0298036 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Kermani, M., Azizi, M., Ghodsi, M. & Taghdisi, M. H. Predicting in-hospital mortality in patients with acute myocardial infarction: A comparison of machine learning approaches. Clin. Cardiol.48(3), 123–130 (2025). [Google Scholar]

[CR12] 12.D’Ascenzo, F. et al. Machine learning-based prediction of adverse events following an acute coronary syndrome (PRAISE): a modelling study of pooled datasets. The Lancet397(10270), 199–207 (2021). [Google Scholar]

[CR13] 13.Phosri, A., Ueda, K., Thepnoo, W., Charoenrattanaporn, C. & Honda, A. Effects of particulate matter (PM₁₀) on hospital admissions for cardiovascular and respiratory diseases in Chiang Mai, Thailand during the haze episode. Atmosphere13(3), 490 (2022). [Google Scholar]

[CR14] 14.He, Z., Zhang, Y., Wang, S., Huang, Z. & Chen, Y. Hourly exposure to air pollutants and the onset of acute myocardial infarction in Shanghai, China. Chemosphere313, 137547 (2023). [DOI] [PubMed] [Google Scholar]

[CR15] 15.Franchini, M. & Mannucci, P. M. Air pollution and cardiovascular disease. Thromb. Res.129(3), 230–234 (2012). [DOI] [PubMed] [Google Scholar]

[CR16] 16.Khir, M. S. M. et al. Spatio-temporal analysis of PM10 in Southern Peninsular Malaysia. Int. J. Eng. Technol.7(3), 27–30 (2018). [Google Scholar]

[CR17] 17.Wang, L., Zhang, Y., Li, F., Li, C. & Xu, H. Mortality prediction of inpatients with NSTEMI in a premier hospital in China based on stacking model. PLoS ONE19(12), e0312448 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Lee, J., Kim, H. & Shin, J. Association of short-term exposure to ambient air pollution with out-of-hospital cardiac arrest: A nationwide study in South Korea. Environ. Res.218, 114943 (2023). [DOI] [PubMed] [Google Scholar]

[CR19] 19.Mohamad Roslan, M. A., Latif, M. T., Dominick, D., Ooi, M. C. G. & Abd Hamid, H. H. Acute effects of air pollution on cardiovascular hospital admissions in the port district of Klang, Malaysia: A time-series analysis. Environ. Res.243, 117859 (2024). [DOI] [PubMed] [Google Scholar]

[CR20] 20.Han, C. L., Lung, S. C. C., Wu, C. D. & Hwang, J. S. Effects of exposure to air pollution and cold weather on acute myocardial infarction mortality. Atmosphere16(4), 469 (2024). [Google Scholar]

[CR21] 21.Psychogyios, K., Ilias, L. & Askounis, D. Comparison of missing data imputation methods using the framingham heart study dataset. In 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) (2022).

[CR22] 22.Menardi, G. & Torelli, N. Training and assessing classification rules with imbalanced data. Data Min. Knowl. Disc.28(1), 92–122 (2014). [Google Scholar]

[CR23] 23.Lunardon, N., Menardi, G. & Torelli, N. ROSE: A package for binary imbalanced learning. R J.6(1), 79–89 (2014). [Google Scholar]

[CR24] 24.Jakobsen, J. C., Gluud, C., Wetterslev, J. & Winkel, P. When and how to perform multiple imputation in clinical research. BMC Med. Res. Methodol.17, 162 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Twisk, J. W. R. Handling missing data in clinical research. J. Clin. Epidemiol.151, 156–161 (2022). [Google Scholar]

[CR26] 26.Hastie, T., Tibshirani, R., Friedman, J. H. & Friedman, J. H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction Vol. 2 (Springer, New York, 2009). [Google Scholar]

[CR27] 27.Ajitesh, K. K-Fold Cross Validation—Python Example. 1 March 2023. [Online]. Available: https://vitalflux.com/k-fold-cross-validation-python-example/.

[CR28] 28.Rukshan, P. k-fold cross-validation explained in plain English. 19 December 2020. [Online]. Available: https://towardsdatascience.com/k-fold-cross-validation-explained-in-plain-english-659e33c0bc0.

[CR29] 29.Domingues, I., Amorim, J. P., Abreu, P. H., Duarte, H. & Santos, J. Evaluation of oversampling data balancing techniques in the context of ordinal classification. In 2018 International Joint Conference on Neural Networks (IJCNN) (2018).

[CR30] 30.Tina, R. P. & Sherekar, S. S. Performance analysis of naive Bayes and j48 classification algorithm for data classification. Int. J. Comput. Sci. Appl.6, 256–261 (2013). [Google Scholar]

[CR31] 31.M. Kuhn and contributors, "Caret Package," 2023. [Online]. Available: https://cran.r-project.org/web/packages/caret/vignettes/caret.html.

[CR32] 32.Fawcett, T. An introduction to ROC analysis. Pattern Recogn. Lett.27(8), 861–874 (2006). [Google Scholar]

[CR33] 33.Rao, S., Mehta, S., Kulkarni, S., Dalvi, H., Katre, N. & Narvekar, M. A study of LIME and SHAP model explainers for autonomous disease predictions. In 2022 IEEE Bombay Section Signature Conference (IBSSC), Bombay (2022).

[CR34] 34.Zhou, Y., Wang, H. & Liu, H. Generalized function projective synchronization of incommensurate fractional-order chaotic systems with inputs saturation. Int. J. Fuzzy Syst.21(3), 823–836 (2019). [Google Scholar]

[CR35] 35.Correia, L. et al. Prognostic value of TIMI score versus GRACE score in ST-segment elevation myocardial infarction. Arq. Bras. Cardiol.103, 98–106 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Antman, E. M. et al. The TIMI risk score for unstable angina/non–ST elevation MI: A method for prognostication and therapeutic decision making. JAMA284(7), 835–842 (2000). [DOI] [PubMed] [Google Scholar]

[CR37] 37.de Araújo Gonçalves, P., Ferreira, J., Aguiar, C. & Seabra-Gomes, R. TIMI, PURSUIT, and GRACE risk scores: Sustained prognostic value and interaction with revascularization in NSTE-ACS. Eur. Heart J.26(9), 865–872 (2005). [DOI] [PubMed] [Google Scholar]

[CR38] 38.Lelieveld, J., Evans, J., Fnais, M., Giannadaki, D. & Pozzer, A. The contribution of outdoor air pollution sources to premature mortality on a global scale. Nature525, 367–371 (2015). [DOI] [PubMed] [Google Scholar]

[CR39] 39.Liaw, A. & Wiener, M. Classification and regression by randomForest. R News2(3), 18–22 (2002). [Google Scholar]

[CR40] 40.Hadanny, A. et al. Predicting 30-day mortality after ST-elevation myocardial infarction: Machine learning-based random forest and its external validation using two independent nationwide datasets. J. Cardiol.78(5), 439–446 (2021). [DOI] [PubMed] [Google Scholar]

[CR41] 41.Zhang, Z., Chen, L., Xu, P. & Hong, Y. Predictive analytics with ensemble modeling in laparoscopic surgery: a technical note. Laparosc. Endos. Robot. Surg.5(1), 25–34 (2022). [Google Scholar]

[CR42] 42.Selvarajah, S. et al. Impact of cardiac care variation on ST-elevation myocardial infarction outcomes in Malaysia. Am. J. Cardiol.111(9), 1270–1276 (2013). [DOI] [PubMed] [Google Scholar]

[CR43] 43.Alahmar, A., Mohammed, E. & Benlamri, R. Application of data mining techniques to predict the length of stay of hospitalized patients with diabetes. In 2018 4th International Conference on Big Data Innovations and Applications (Innovate-Data), Barcelona (2018).

[CR44] 44.Van Den Berg, P. & Body, R. The HEART score for early rule out of acute coronary syndromes in the emergency department: a systematic review and meta-analysis. Eur. Heart J. Acute Cardiovasc. Care7(2), 111–119 (2018). [DOI] [PubMed] [Google Scholar]

[CR45] 45.Sposito, A. & Chapman, M. Arteriosclerosis, thrombosis, and vascular biology, 22(10). In Statin therapy in acute coronary syndromes: mechanistic insight into clinical benefit, 1524–1534 (2002).

[CR46] 46.Chen, R. et al. Hourly air pollutants and acute coronary syndrome onset in 1.29 million patients. Circulation145(24), 1749–1760 (2022). [DOI] [PubMed] [Google Scholar]

PERMALINK

Machine learning-based prediction of mortality risk from air pollution-induced acute coronary syndrome in the Western Pacific region

Sazzli Kasim

Sorayya Malek

Song Cheen

Putri Nur Fatin

Kiew Xue Ning

Hanis Hamidi

Wan Azman Wan Ahmad

Khairul Shafiq Ibrahim

Kazuaki Negishi

Meriam Nik Sulaiman

Alan Fong

Abstract

Supplementary Information

Introduction

Methods

Study design and data sources

Study overview

Fig. 1.

Study data

Outcomes and candidate predictors

Air pollution exposure assessment and rationale for hospital-based assignment

Data preparation

Data preprocessing

Data splitting and cross-validation

Fig. 2.

Data balancing and data normalization

Machine learning model development

Fig. 3.

Hyperparameter tuning

Machine learning performance evaluation

Model interpretation and comparison

SHAP analysis

Comparative analysis of ML models and conventional methods: NRI and performance metrics

Best model deployment on the web

Results

Baseline characteristic

Table 1.

Classification models performance evaluation

Table 2.

Fig. 4.

Fig. 5.

SHAP analysis

Fig. 6.

Comparison of ML to TIMI risk score to the validation dataset

Fig. 7.

Fig. 8.

Table 3.

Net reclassification index (NRI) analysis

Table 4.

Table 5.

Deployment of the best ML algorithm online

Discussion

Conclusion

Supplementary Information

Acknowledgements

Author contributions

Funding

Data availability

Declarations

Competing interests

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases