Skip to main content
BMJ Open logoLink to BMJ Open
. 2025 Nov 29;15(11):e108527. doi: 10.1136/bmjopen-2025-108527

Development of explainable machine learning models to predict side effects in patients with rheumatoid arthritis taking methotrexate treatment: a nationwide multicentre cohort study

Junbeom Jang 1, Woo Jin Kim 2, Sang Won Park 3,, Ki Won Moon 4
PMCID: PMC12666182  PMID: 41320203

Abstract

Objectives

Methotrexate (MTX) effectively controls rheumatoid arthritis (RA) but often leads to side effects (SE) such as gastrointestinal (GI) issues, liver toxicity and bone marrow suppression. To develop clinically interpretable machine learning (ML) models that accurately predict MTX-related SE in patients with RA taking MTX. The aim was to enhance predictive accuracy and to identify patient-specific risk factors using explainable artificial intelligence (XAI), thereby enabling transparent clinical interpretation. We specifically sought to address the unmet need for individualised risk stratification using real-world, multicentre observational data.

Design

Retrospective case-control study.

Setting

Across 23 rheumatology clinics in South Korea, based on data from a nationwide multicentre cohort.

Participants

A total of 5077 patients with RA were initially enrolled from the Korean Observational Study Network for Arthritis. After excluding those with missing clinical, demographic or prescription data and those not receiving MTX, 2375 patients remained eligible. Among these, 1654 and 1218 patients were included in the overall SE and GI SE analysis groups, respectively, after 1:1 propensity score matching. All patients were aged ≥18 years and met the 1987 American College of Rheumatology classification criteria.

Primary and secondary outcome measures

The primary outcome was the presence of SE in patients with RA taking MTX, categorised into overall SE and GI SE, based on standardised patient questionnaires and clinical assessments. The secondary outcome was the identification of key predictors using SHapley Additive exPlanations (SHAP) to enhance the interpretability of ML predictions.

Results

Among six ML classifiers, extreme gradient boosting demonstrated the highest performance in predicting overall SE (area under the curve (AUC) 0.781, F1 score 0.672, area under the precision-recall curve (AUPRC) 0.757) and GI SE (AUC 0.701, F1 score 0.690, AUPRC 0.670). SHAP analysis identified key predictive features including age, physician visual analogue scale score, alanine aminotransferase, Health Assessment Questionnaire score, celecoxib use and drug adherence. Logistic regression confirmed statistical significance for multiple variables (eg, OR 4.63; 95% CI 1.41 to 20.90 for non-adherence >30 days; OR 1.45; 95% CI 1.14 to 1.85 for celecoxib use). DeLong’s test indicated that boosting models significantly outperformed support vector machine (p<0.001).

Conclusions

Interpretable ML models using real-world clinical data can accurately predict SE in patients with RA taking MTX. These models may facilitate early identification of high-risk individuals and inform personalised treatment strategies. Integration into clinical decision support systems could improve MTX safety monitoring. Further prospective validation in external cohorts is warranted.

Keywords: machine learning, case-control studies, rheumatology


STRENGTHS AND LIMITATIONS OF THIS STUDY.

  • This study developed interpretable machine learning models based on real-world, multicentre data from over 2300 patients with rheumatoid arthritis treated with methotrexate, improving the relevance and potential clinical applicability of the findings.

  • The models were trained and evaluated using robust methodology, including propensity score matching, multiple classifiers and explainable artificial intelligence (SHapley Additive exPlanations), enabling identification of patient-specific risk factors in a clinically transparent manner.

  • All features used for prediction were routinely collected in practice, supporting future integration of the model into electronic medical record systems and clinical decision support workflows.

  • As a retrospective study, the findings are susceptible to residual confounding and do not establish causal relationships between predictors and side-effect outcomes.

  • External validation using independent datasets or prospective cohorts was not performed, which limits generalisability across populations and healthcare systems.

Introduction

Rheumatoid arthritis (RA) is a chronic autoimmune disease that affects approximately 18.5 million individuals globally, with a prevalence in South Korea estimated to range from 0.19% to 1.85%.1,3 Methotrexate (MTX) is widely recommended as the first-line disease-modifying antirheumatic drug (DMARD) for RA based on international treatment guidelines.4 5 While MTX is effective in disease control, up to 47% of patients experience an inadequate therapeutic response, require additional treatments or report adverse side effects (SE).6 Commonly reported SE include gastrointestinal (GI) disturbances, liver function abnormalities and haematological toxicity.6 These issues frequently complicate long-term treatment adherence and clinical outcomes, underscoring the need for better tools to predict and manage MTX-related SE in real-world clinical practice.

Co-administration of non-steroidal anti-inflammatory drugs (NSAIDs) with MTX is common in the management of RA. Still, it significantly increases the risk of adverse drug reactions, including GI bleeding, hepatic dysfunction and renal impairment.7 8 The long-term use of MTX itself has been associated with a broad spectrum of SE, most notably GI intolerance and hepatotoxicity. Many previous studies have reported that up to 70% of patients with RA receiving MTX experience treatment-related adverse events (AEs).9,15 GI SE, such as nausea, diarrhoea and dyspepsia, are reported in approximately 20%–30% of patients. Elevated liver enzymes are observed in 15%–25%, and haematological abnormalities occur less frequently but remain clinically relevant.14 15 Notably, one cohort study identified moderate-to-severe MTX intolerance in nearly 68% of patients with RA or psoriatic arthritis, with 19% experiencing severe intolerance.12 In addition to drug-related factors, several patient-specific variables have been implicated as predictors of MTX-related toxicity. Studies have shown that female sex, younger age, smoking status and high baseline disease activity, as measured by the Disease Activity Score in 28 joints (DAS28), are significantly associated with an increased risk of adverse effects.16,21 These associations highlight the multifactorial nature of MTX toxicity and the limitations of conventional statistical approaches in capturing complex, non-linear interactions between clinical, demographic and behavioural risk factors. Accordingly, there is a critical need for robust, data-driven tools that can integrate diverse patient-level variables to improve individual risk stratification. Such tools may support early identification of patients at high risk for MTX intolerance and inform clinical decision-making aimed at minimising toxicity while optimising therapeutic outcomes.

Machine learning (ML) has increasingly been adopted in various medical fields, including rheumatology, for its ability to uncover complex patterns in high-dimensional clinical data and enhance predictive performance beyond conventional statistical methods. In RA, ML applications have primarily focused on predicting treatment efficacy, such as MTX discontinuation5 22 or response classification based on DAS28-erythrocyte sedimentation rate (ESR) or American College of Rheumatology (ACR)20 criteria.22 23 However, relatively little attention has been given to using ML for the prediction of adverse drug reactions, particularly MTX-related toxicity. Given the high prevalence and clinical impact of MTX-associated SE, reliable prediction models can serve as a core component of clinical decision support systems (CDSS), enabling the early identification of high-risk patients and facilitating individualised treatment planning. Despite this clinical relevance, robust ML-based tools for predicting SE in RA remain underdeveloped, especially those trained on standardised, multicentre datasets with real-world applicability. Furthermore, the integration of explainable artificial intelligence (XAI) techniques, such as SHapley Additive exPlanations (SHAP), enables individualised interpretation of ML predictions.24 25 These methods facilitate transparency by quantifying the contribution of each input variable to a given prediction, providing case-level explanations that support clinical interpretation and informed decision-making.

In this study, we aimed to develop and validate ML models to predict treatment-related SE in patients with RA receiving MTX. Using a large-scale, real-world dataset derived from multiple medical centres across South Korea, we stratified patients into two outcome groups: those experiencing overall SE and those with GI SE. By incorporating SHAP, we identified clinically relevant risk factors contributing to SE occurrence and evaluated the predictive performance of several ML classifiers. Our goal was to propose an optimised, interpretable model framework that could support personalised risk assessment and future CDSS in rheumatology.

Methods

Study design and data source

This retrospective case-control study aimed to classify the occurrence of SE in patients with RA using ML. ML models were developed to predict drug-related SE, focusing on both overall SE and GI SE. The definition of GI SE is provided in online supplemental table S1. Participants for this study were recruited from the Korean Observational Study Network for Arthritis (KORONA) cohort, a dataset compiled by the Clinical Research Center for RA, encompassing data from 23 hospitals across South Korea.26 The KORONA cohort is a prevalent cohort of patients with RA who had been diagnosed and received drug treatment prior to enrolment. Enrolment was conducted between 2009 and 2012 and participants were followed annually for approximately 5 years. All patients included in this study had been receiving MTX treatment before enrolment. Information on both overall SE and GI SE was obtained at the time of cohort enrolment (2009–2012) using a structured questionnaire. The question about overall SE was: “Have you experienced any drug-related discomfort after starting treatment?” The question about GI SE was: “Have you experienced any discomfort related to the gastrointestinal tract?” Additionally, we used the Strengthening the Reporting of Observational Studies in Epidemiology reporting guideline to draft this manuscript.27

Participants

A total of 5077 patients were enrolled between July 2009 and March 2012 through interviews, self-administered questionnaires and clinical examinations. Disease activity was evaluated by qualified physicians. Of the 5077 patients enrolled in the KORONA database, 2375 patients were selected after excluding those with missing values and those who were not prescribed MTX. We excluded patients for the following reasons: missing demographic information (n=111), missing clinical data (n=2050), missing drug information (n=130) and non-prescription use of MTX (n=411). Finally, the overall SE and GI SE patient groups were investigated on 1654 and 1218 patients, respectively, after 1:1 propensity score matching (PSM) (figure 1). All patients with RA were over 18 years of age, recruited by rheumatologists during routine clinic visits, met the 1987 ACR classification criteria for rheumatology and were scheduled for routine blood tests.

Figure 1. Flow chart of patient selection. ACR, American College of Rheumatology; DAS28, Disease Activity Score in 28 joints; DMARD, disease-modifying antirheumatic drug; ESR, erythrocyte sedimentation rate; GI, gastrointestinal; ICD-10, International Classification of Diseases, 10th Revision; MTX, methotrexate; VAS, visual analogue scale.

Figure 1

Variables

The 58 variables analysed in this study are detailed in online supplemental table S2 and reflect patient status at the time of cohort registration. These variables were categorised into demographic and clinical data. Demographic information included family history (first-degree and second-degree relatives), body mass index (BMI), smoking status, alcohol consumption and national health insurance status. Smoking was classified as: (1) never (fewer than 100 cigarettes in a lifetime), (2) quit (previously smoked but no longer active) or (3) current (actively smoking at the time of data collection). Alcohol use was categorised as: (1) never (excludes occasional ceremonial sips), (2) abstinent (previously consumed but no longer active) or (3) current (actively consuming alcohol). Insurance status was classified as either national health insurance or medical aid enrolment. Comorbidities, including hypertension, diabetes and thyroid disease, were recorded based on the patient’s status at the time of data collection. Additional health-related variables included fracture history, prior surgeries for RA, hospitalisation, morning stiffness and bone mineral density check-ups, all noted according to the patient’s reported experience.

Disease duration (in months) was calculated from the time of diagnosis to the cohort entry date. The ACR criteria were documented using the 1987 standard, recording the number of items met at the time of initial diagnosis. Laboratory data, including white blood cell count, haemoglobin, alanine aminotransferase (ALT), aspartate transaminase, blood urea nitrogen (BUN) and anticyclic citrullinated peptide antibody levels, reflected the most recent test conducted within 3 months of cohort registration. Medication usage was recorded based on prescriptions at the time of cohort entry. Drug compliance was categorised by the number of missed days in the past months as follows: 0 days missed, 1–5 days missed, 6–15 days missed, 16–30 days missed and >30 days missed. The DAS28-ESR was calculated using tender/swollen joint counts, ESR and the visual analogue scale (VAS) for global health.27 The Health Assessment Questionnaire (HAQ) was assessed using the Korean adaptation.28

ML models

Six ML models, including support vector machine (SVM), random forest (RF), gradient boosting machine (GBM), extreme gradient boosting (XGB), light gradient boosting (LGB) and categorical boosting (CAT), were constructed to classify both overall SE and GI SE.29,34 SVM handles high-dimensional data by creating a decision hyperplane for non-linear classification using kernel functions. RF enhances accuracy and robustness through an ensemble of decision trees and bootstrap sampling. GBM improves predictive accuracy by sequentially training decision trees to correct errors from prior iterations. XGB optimises gradient boosting with parallel processing and bucket-based data discretisation. LGB improves efficiency through gradient-based sampling and feature bundling. CAT processes categorical variables efficiently with minimal preprocessing, using ordered boosting to reduce overfitting and target leakage. The dataset was split into 80% training and 20% testing subsets. Minimum-maximum normalisation was applied to mitigate the impact of scale differences on model performance.35 For model validation, stratified fivefold cross-validation and Bayesian optimisation were used to fine-tune hyperparameters.36 Model performance was assessed using two approaches presented as follows:

  • A fixed test dataset comprising 20% of the total data.

  • Results generated from 1000 bootstrap samples, reported as the mean and 95% CI.

All ML programming was conducted in Python (V.3.8.10) using Scikit-learn (V.1.2.0). Hyperparameter tuning was performed using Optuna (V.3.3.0) with the Tree-structured Parzen Estimator algorithm.36

Model performance assessment and interpretation

The performance of the ML models was evaluated using the following metrics: accuracy, precision, recall, F1 score, area under the curve (AUC), specificity, Brier score and area under the precision-recall curve (AUPRC). Accuracy measures the overall correctness of the model. Precision evaluates the proportion of true positive predictions among all positive predictions. Recall (sensitivity) assesses the model’s ability to detect positive cases. Specificity measures the model’s ability to identify negative cases correctly. The F1 score represents the harmonic mean of precision and recall, balancing these two aspects of performance. Although these metrics rely on threshold-based evaluations, the Brier score and AUPRC provide additional insights. The Brier score assesses the accuracy of probabilistic predictions, with lower values indicating more accurate performance. In contrast, the AUPRC evaluates the precision-recall trade-off, which is particularly useful for datasets with class imbalances.

Interpretation

To identify key predictors of SE classification, we used SHAP values, which quantify the contribution of key features.24 37 SHAP provides both global explanations ranking feature importance across the cohort and local explanations illustrating how each predictor influences an individual patient’s risk. Additionally, since SHAP assumes feature independence, we analysed the correlations among the model features to ensure that the results are reliable and interpretable (online supplemental figure S1). For each SE group, we inspected the top features ranked by mean absolute SHAP value and visualised SHAP summary plots to show how high or low values of each feature affected mortality risk. This analysis highlighted the most influential factors for each SE.

Statistical analysis

To enhance robustness and balance baseline covariates, we performed 1:1 nearest neighbour PSM, evaluated using standardised mean difference (SMD), where SMD <0.1 indicated negligible differences between groups (onlinesupplemental table S3 figure S2).38 Patient characteristics were reported as means with SD or counts (%). Differences between the SE and non-SE groups in overall SE and GI SE, respectively, were assessed using independent t-tests for continuous variables and χ² tests for categorical variables. This is constructed for descriptive purposes to compare baseline characteristics between groups. As this analysis was exploratory in nature, no multiple comparison correction was applied. Statistical significance and interpretation of associations were primarily based on the multivariate logistic regression models. ORs were calculated using univariate and multivariate logistic regression models. To statistically assess differences in the discriminative performance among models, DeLong’s test was applied to compare AUC values. This test is suitable for comparing correlated ROC performance derived from the same set of subjects. To ensure feature independence, as assumed by SHAP, Pearson’s correlation analysis confirmed low correlations among features, with no pairs exceeding a threshold of 0.7 (online supplemental figure S4), indicating minimal multicollinearity. All statistical analyses were conducted using R software (V.4.4.1).

Results

Patient characteristics

Original patient characteristics are summarised in online supplemental table S4, with PSM characteristics detailed in table 1 and online supplemental table S5. In the overall SE group, the mean age was 54.8±11.4 years and the mean BMI was 22.7±3.1. The proportion of females was high at 1461 (88.3%), consistent with the entire cohort. Specific comorbidities, including pulmonary tuberculosis, gastritis and diabetes mellitus, showed significant differences (p<0.05) between patients with and without SE. Additional clinical variables, such as fracture history, hospitalisation, systolic blood pressure (SBP) and tender/swollen joint count, were also significantly associated with the presence of SE (p<0.05). The physician’s VAS score was 26.7±17.3, and sleep disturbance scores were 32.0±30.0, both of which were significantly elevated (p<0.05) in the SE group. Medication-related factors, including drug adherence and the use of hydroxychloroquine, leflunomide, other DMARDs, celecoxib and nabumetone, also showed significant differences (p<0.05).

Table 1. Investigated patient characteristics.

Overall side effects GI side effects
Total (n=1654) Non-existent (n=827) Exist (n=827) P value Total (n=1218) Non-existent (n=609) Exist (n=609) P value
Age 54.8±11.4 55.8±10.5 53.9±12.1 <0.05 54.7±11.6 55.3±10.9 54.0±12.2 <0.05
Sex (male) 193 (11.7%) 113 (13.7%) 80 (9.7%) <0.05 123 (10.1%) 71 (11.7%) 52 (8.5%) 0.071
BMI 22.7±3.1 22.9±2.9 22.5±3.2 <0.05 22.6±3.1 22.7±2.9 22.4±3.2 0.112
MTX 12.68±2.92 12.55±2.99 12.79±2.82 0.086 12.63±2.92 12.51±2.99 12.73±2.83 0.192
National health insurance 1573 (95.1%) 791 (95.6%) 782 (94.6%) 0.305 1155 (94.8%) 586 (96.2%) 569 (93.4%) <0.05
Pulmonary tuberculosis 91 (5.5%) 33 (4.0%) 58 (7.0%) <0.05 70 (5.7%) 28 (4.6%) 42 (6.9%) 0.085
Gastritis 315 (19.0%) 136 (16.4%) 179 (21.6%) <0.05 249 (20.4%) 98 (16.1%) 151 (24.8%) <0.001
Gastric ulcer 85 (5.1%) 35 (4.2%) 50 (6.0%) 0.095 70 (5.7%) 25 (4.1%) 45 (7.4%) <0.05
Diabetes mellitus 133 (8.0%) 79 (9.6%) 54 (6.5%) <0.05 88 (7.2%) 53 (8.7%) 35 (5.7%) <0.05
Fracture history 300 (18.1%) 131 (15.8%) 169 (20.4%) <0.05 225 (18.5%) 89 (14.6%) 136 (22.3%) <0.05
Hospitalisation 467 (28.2%) 212 (25.6%) 255 (30.8%) <0.05 344 (28.2%) 157 (25.8%) 187 (30.7%) 0.056
SBP 126.4±14.6 125.3±13.0 127.5±16.0 <0.05 126.4±14.2 125.5±13.0 127.2±15.3 <0.05
Morning stiffness 932 (56.3%) 452 (54.7%) 480 (58.0%) 0.165 691 (56.7%) 325 (53.4%) 366 (60.1%) <0.05
Tender joint count 3.5±4.3 3.1±3.9 3.9±4.7 <0.001 3.6±4.3 3.1±3.8 4.0±4.7 <0.001
Swollen joint count 1.7±2.7 1.6±2.5 1.9±2.8 <0.05 1.8±2.6 1.6±2.4 2.0±2.9 <0.05
Methotrexate dose (mg/week)
Drug adherence (days) <0.001 <0.05
 0 1111 (67.2%) 588 (71.1%) 523 (63.2%) 791 (64.9%) 412 (67.7%) 379 (62.2%)
 1–5 346 (20.9%) 165 (20.0%) 181 (21.9%) 266 (21.8%) 133 (21.8%) 133 (21.8%)
 6–15 110 (6.7%) 38 (4.6%) 72 (8.7%) 92 (7.6%) 34 (5.6%) 58 (9.5%)
 16–30 46 (2.8%) 16 (1.9%) 30 (3.6%) 41 (3.4%) 16 (2.6%) 25 (4.1%)
 >30 16 (1.0%) 3 (0.4%) 13 (1.6%) 12 (1.0%) 4 (0.7%) 8 (1.3%)
 Inapplicable 25 (1.5%) 17 (2.1%) 8 (1.0%) 16 (1.3%) 10 (1.6%) 6 (1.0%)
Alternative medicines 1190 (71.9%) 569 (68.8%) 621 (75.1%) <0.05 881 (72.3%) 428 (70.3%) 453 (74.4%) 0.109
Hydroxychloroquine 451 (27.3%) 252 (30.5%) 199 (24.1%) <0.05 335 (27.5%) 185 (30.4%) 150 (24.6%) <0.05
Leflunomide 547 (33.1%) 244 (29.5%) 303 (36.6%) <0.05 407 (33.4%) 184 (30.2%) 223 (36.6%) <0.05
Other DMARDs 169 (10.2%) 64 (7.7%) 105 (12.7%) <0.05 134 (11.0%) 53 (8.7%) 81 (13.3%) <0.05
Celecoxib 519 (31.4%) 226 (27.3%) 293 (35.4%) <0.001 392 (32.2%) 173 (28.4%) 219 (36.0%) <0.05
Nabumetone 374 (22.6%) 209 (25.3%) 165 (20.0%) <0.05 272 (22.3%) 142 (23.3%) 130 (21.3%) 0.409
ALT 20.4±10.8 19.6±9.2 21.2±12.2 <0.05 19.9±10.6 19.2±9.2 20.7±11.8 <0.05
AST 22.3±7.5 21.9±6.5 22.7±8.5 <0.05 22.1±7.5 21.8±6.6 22.5±8.3 0.151
DAS28-ESR 3.8±1.3 3.7±1.3 3.9±1.3 <0.05 3.8±1.3 3.7±1.3 3.9±1.3 <0.05
HAQ 0.7±0.6 0.6±0.6 0.8±0.7 <0.001 0.7±0.6 0.6±0.6 0.8±0.6 <0.001
Physician VAS 24.1±16.5 22.1±15.4 26.2±17.4 <0.001 24.6±16.2 22.4±14.7 26.7±17.3 <0.001
Sleep disturbances 29.5±29.2 28.0±28.4 31.0±30.0 <0.05 30.3±29.3 28.7±28.4 32.0±30.0 <0.05

Only variables with significant p values are listed. Drug adherence refers to the number of days each patient has not taken their medication over the past 2 months. Sex is recorded as male. National health insurance is provided to enrolled patients. Comorbidities, including pulmonary tuberculosis, fracture history, hospitalisation, morning stiffness and use of alternative medicines, are recorded for individuals experiencing them. All drugs from hydroxychloroquine to nabumetone are presented for those who have been prescribed them. Continuous variables were presented as mean±SD, and categorical variables were presented as numbers and percentages. P values for continuous variables were calculated using Student’s t-test, and p values for categorical variables were calculated using the χ2 test. Drug variables indicate whether each medication is currently prescribed to the patient.

ALT, alanine aminotransferase; AST, aspartate transaminase; BMI, body mass index; DAS28, Disease Activity Score in 28 joints; DMARD, disease-modifying antirheumatic drug; ESR, erythrocyte sedimentation rate; GI, gastrointestinal; HAQ, Health Assessment Questionnaire; MTX, methotrexate; SBP, systolic blood pressure; VAS, visual analogue scale.

OR by logistic regression

The results of univariate and multivariate logistic regression analyses for each SE group are summarised in table 2 and online supplemental table S6.

Table 2. ORs for each variable in each group.

Overall side effects GI side effects
Univariate analysis Multivariate analysis Univariate analysis Multivariate analysis
OR (95% CI) P value OR (95% CI) P value OR (95% CI) P value OR (95% CI) P value
Age 0.99 (0.98 to 0.99) <0.001 0.97 (0.96 to 0.98) <0.001 0.99 (0.98 to 1.00) <0.05 0.97 (0.96 to 0.99) <0.001
National health insurance 1.26 (0.81 to 1.99) 0.300 1.79 (1.07 to 3.07) <0.05 1.76 (1.01 to 3.14) <0.05
Pulmonary tuberculosis 1.81 (1.18 to 2.84) <0.05 1.84 (1.15 to 3.01) <0.05 1.54 (0.94 to 2.54) 0.087
Gastritis 1.40 (1.10 to 1.80) <0.05 1.43 (1.10 to 1.87) <0.05 1.72 (1.30 to 2.29) <0.001 1.88 (1.39 to 2.55) <0.001
Gastric ulcer 1.46 (0.94 to 2.28) 0.100 1.86 (1.14 to 3.12) <0.05 1.74 (1.02 to 3.01) <0.05
Fracture history 1.36 (1.06 to 1.76) <0.05 1.34 (1.01 to 1.77) <0.05 1.68 (1.25 to 2.26) <0.001 1.65 (1.20 to 2.28) <0.05
Hospitalisation 1.29 (1.04 to 1.60) <0.05 1.27 (1.00 to 1.61) <0.05 1.28 (0.99 to 1.64) 0.056
SBP 1.01 (1.00 to 1.02) <0.05 1.02 (1.01 to 1.03) <0.001 1.01 (1.00 to 1.02) <0.05 1.01 (1.00 to 1.02) <0.05
Tender joint count 1.04 (1.02 to 1.07) <0.001 1.04 (1.01 to 1.09) <0.05 1.05 (1.02 to 1.08) <0.001 1.05 (1.00 to 1.10) <0.05
Drug adherence (days)
 1–5 1.23 (0.97 to 1.57) 0.089 1.28 (0.98 to 1.65) 0.067 1.09 (0.82 to 1.44) 0.600 1.18 (0.88 to 1.59) 0.300
 6–15 2.13 (1.42 to 3.24) <0.001 2.29 (1.49 to 3.57) <0.001 1.85 (1.19 to 2.92) <0.05 2.24 (1.39 to 3.65) <0.05
 16–30 2.11 (1.15 to 4.00) <0.05 2.16 (1.13 to 4.27) <0.05 1.70 (0.90 to 3.29) 0.110 1.69 (0.86 to 3.40) 0.130
 >30 4.87 (1.56 to 21.3) <0.05 4.63 (1.41 to 20.9) <0.05 2.17 (0.68 to 8.20) 0.200 2.33 (0.70 to 9.05) 0.200
 Inapplicable 0.53 (0.21 to 1.20) 0.140 0.45 (0.18 to 1.07) 0.081 0.65 (0.22 to 1.77) 0.400 0.56 (0.18 to 1.60) 0.300
Other DMARDs 1.73 (1.25 to 2.41) <0.001 1.95 (1.34 to 2.83) <0.001 1.61 (1.12 to 2.33) <0.05 1.73 (1.14 to 2.62) <0.05
Celecoxib 1.46 (1.18 to 1.80) <0.001 1.45 (1.14 to 1.85) <0.05 1.42 (1.11 to 1.80) <0.05 1.45 (1.12 to 1.89) <0.05
ALT 1.01 (1.00 to 1.02) <0.05 1.02 (1.00 to 1.04) <0.05 1.01 (1.00 to 1.02) <0.05 1.02 (1.01 to 1.03) <0.05
DAS28-ESR 1.11 (1.03 to 1.19) <0.05 0.87 (0.76 to 1.00) 0.052 1.11 (1.02 to 1.22) <0.05 0.82 (0.70 to 0.97) <0.05
HAQ 1.40 (1.20 to 1.64) <0.001 1.32 (1.07 to 1.63) <0.05 1.48 (1.23 to 1.78) <0.001 1.40 (1.09 to 1.79) <0.05
Physician VAS 1.02 (1.01 to 1.02) <0.001 1.02 (1.01 to 1.03) <0.001 1.02 (1.01 to 1.02) <0.001 1.02 (1.01 to 1.03) <0.001

Only variables with significant p values are listed. Drug adherence refers to the number of days each patient has not taken their medication over the past 2 months. National health insurance is provided to enrolled patients. Comorbidities, including pulmonary tuberculosis, fracture history and hospitalisation, are recorded for individuals experiencing them. All drugs, from other DMARDs to celecoxib, are available for those who have been prescribed them.

ALT, alanine aminotransferase; DAS28, Disease Activity Score in 28 joints; DMARD, disease-modifying antirheumatic drug; ESR, erythrocyte sedimentation rate; GI, gastrointestinal; HAQ, Health Assessment Questionnaire; SBP, systolic blood pressure; VAS, visual analogue scale.

In the overall SE group, lower drug adherence—specifically missing doses for >30 days—showed the strongest association with SE (adjusted OR 4.63; 95% CI 1.41 to 20.9; p<0.05). Comorbid pulmonary tuberculosis (OR 1.84; 95% CI 1.15 to 3.01; p<0.05), use of other DMARDs (OR 1.95; 95% CI 1.34 to 2.83; p<0.05) and celecoxib use (OR 1.45; 95% CI 1.14 to 1.85; p<0.05) were also independent predictors of SE occurrence.

In the GI SE group, several clinical and treatment factors were significant. Fracture history (OR 1.65; 95% CI 1.20 to 2.28; p<0.05), higher HAQ scores (OR 1.40; 95% CI 1.09 to 1.79; p<0.05), physician VAS (OR 1.02; 95% CI 1.01 to 1.03; p<0.001) and celecoxib use (OR 1.45; 95% CI 1.12 to 1.89; p<0.05) increased GI SE risk, whereas older age (OR 0.97; 95% CI 0.96 to 0.99; p<0.001) and higher DAS28-ESR (OR 0.82; 95% CI 0.70 to 0.97; p<0.05) were inversely associated.

Taken together, poor medication adherence and comorbid conditions such as pulmonary tuberculosis and fracture history were consistent risk factors across models, highlighting the interplay of treatment-related and disease-related factors in MTX-related SE.

Model performance

Table 3 and figure 2 present the test performance of the six ML models developed to classify each SE group, with the training results from fivefold cross-validation presented in online supplemental table S7.

Table 3. Model test results performance.

ACC Precision Recall F1 score AUC Specificity AUPRC Brier score
Overall SVM 0.575±0.006 0.605±0.009 0.448±0.007 0.514±0.007 0.606±0.006 0.703±0.007 0.598±0.009 0.244±0.001
RF 0.608±0.006 0.621±0.009 0.571±0.009 0.594±0.007 0.650±0.007 0.647±0.007 0.652±0.009 0.240±0.000
GBM 0.691±0.005 0.702±0.008 0.669±0.008 0.684±0.006 0.781±0.005 0.713±0.008 0.743±0.008 0.191±0.002
XGB 0.684±0.005 0.703±0.008 0.646±0.008 0.672±0.006 0.781±0.005 0.723±0.007 0.757±0.008 0.194±0.002
LGB 0.679±0.005 0.702±0.008 0.629±0.007 0.662±0.006 0.763±0.005 0.730±0.007 0.725±0.009 0.205±0.002
CAT 0.662±0.005 0.646±0.008 0.725±0.007 0.683±0.006 0.721±0.006 0.596±0.008 0.714±0.008 0.216±0.002
GI SVM 0.496±0.006 0.498±0.007 0.919±0.005 0.645±0.006 0.547±0.007 0.072±0.005 0.541±0.009 0.313±0.003
RF 0.532±0.006 0.519±0.007 0.870±0.006 0.650±0.006 0.646±0.008 0.194±0.007 0.619±0.011 0.244±0.001
GBM 0.634±0.006 0.596±0.008 0.828±0.007 0.693±0.007 0.679±0.007 0.438±0.008 0.643±0.011 0.228±0.002
XGB 0.623±0.007 0.585±0.008 0.843±0.007 0.69±0.007 0.701±0.007 0.402±0.008 0.670±0.010 0.234±0.003
LGB 0.621±0.006 0.584±0.008 0.841±0.007 0.689±0.007 0.707±0.007 0.401±0.008 0.677±0.010 0.231±0.003
CAT 0.637±0.007 0.643±0.010 0.616±0.009 0.629±0.008 0.698±0.008 0.658±0.008 0.667±0.011 0.224±0.003

Results of each model were presented as mean with 95% CI.

ACC, accuracy; AUC, area under the curve; AUPRC, area under the precision-recall curve; CAT, categorical boosting; GBM, gradient boosting machine; GI, gastrointestinal; LGB, light gradient boosting machine; RF, random forest; SVM, support vector machine; XGB, extreme gradient boosting.

Figure 2. The ROC curve of predictive classification. The predictive classification results for the drug SE group, by six ML models, are presented. (A) Overall SE, (B) GI SE, (C) best optimisation model of XGB for both groups. AUC, area under the curve; CAT, categorical boosting; GBM, gradient boosting machine; GI, gastrointestinal; LGB, light gradient boosting machine; ML, machine learning; RF, random forest; ROC, receiver operating characteristic; SE, side effect; SVC, support vector classifier; XGB, extreme gradient boosting.

Figure 2

In the overall SE group, the GBM and XGB achieved the highest area under the curve (AUC) of 0.781±0.005, followed by LGB with an AUC of 0.763±0.005. The CAT and RF models demonstrated AUCs of 0.721±0.006 and 0.650±0.007, respectively. In the GI SE group, the LGB model exhibited the highest AUC of 0.707±0.007, followed by XGB with an AUC of 0.701±0.007. As a result of comparing the performance of the two models, GBM and LGB showed the best performance in terms of AUC. However, considering the comparison including other metrics, XGB consistently showed superior performance. In both cases, XGB showed an AUPRC of 0.757±0.008 and 0.670±0.01, respectively, and an F1 score of 0.672±0.006 and 0.690±0.007, respectively.

Additionally, the AUC comparison results for the two cases were analysed using DeLong’s test, as shown in table 4. In the overall SE group, SVM showed statistically significant differences compared with GBM, XGB and LGB at p<0.001, and with the CAT model at p=0.006 (p<0.05). In the case of RF, statistically significant differences were observed with GBM, XGB and LGB, while no statistical significance was found between the AUCs of the Boosting series models. In the GI SE group, SVM showed statistical significance compared with the LGB model at p<0.001, while GBM, XGB and CAT showed statistical significance at p<0.05. There was no statistical significance between RF and the remaining models.

Table 4. ROC comparison by DeLong’s test.

Total side effect Overall side effect
SVC versus RF 0.317 0.050
SVC versus GBM <0.001 0.007
SVC versus XGB <0.001 0.001
SVC versus LGB <0.001 <0.001
SVC versus CAT 0.006 0.002
RF versus GBM 0.001 0.455
RF versus XGB 0.001 0.216
RF versus LGB 0.005 0.159
RF versus CAT 0.077 0.275
GBM versus XGB 0.981 0.624
GBM versus LGB 0.643 0.508
GBM versus CAT 0.142 0.731
XGB versus LGB 0.626 0.864
XGB versus CAT 0.135 0.884
LGB versus CAT 0.314 0.751

CAT, categorical boosting; GBM, gradient boosting machine; LGB, light gradient boosting machine; RF, random forest; ROC, receiver operating characteristic; SVC, support vector classifier; XGB, extreme gradient boosting.

Identification of risk factors and interpretation

The SHAP interpretation results for the six ML models are shown in figure 3 and online supplemental figures S3, S4. Figure 3 highlights the SHAP results for the XGB models in both groups. The SHAP summary plot visualises the effects of the top features on model predictions, with each instance represented by a dot for each feature. The distribution of points along the SHAP value axis indicates the magnitude of each feature’s effect on the model output. The x-axis represents the SHAP values, which reflect the impact of each feature on the prediction outcome, while the y-axis lists the ranked features. In the overall SE group, the top 10 risk factors significantly influencing the predictions are age, BUN, BMI, WBC count, HAQ score, SBP, haemoglobin, physician’s VAS score, celecoxib use and fatigue. These variables were consistent across all the other ML models. For the GI SE group, the following variables were assessed: age, gastritis, BMI, BUN, HAQ score, haemoglobin, C reactive protein, oral steroids, VAS assessment by physicians and disease duration. Notably, while celecoxib had a notable effect in the overall SE group, it had a minimal impact within the GI SE group.

Figure 3. Interpretation of best performance results model by SHapley Additive exPlanations (SHAP) for each side effect group. ACR, American College of Rheumatology; ALT, alanine aminotransferase; AST, aspartate aminotransferase; BMD, bone mineral density; BMI, body mass index; BP, blood pressure; BUN, blood urea nitrogen; CCP, cyclic citrullinated peptide; CRP, C reactive protein; DAS28, Disease Activity Score in 28 joints; DMARD, disease-modifying antirheumatic drug; ESR, erythrocyte sedimentation rate; HAQ, Health Assessment Questionnaire; NSAID, non-steroidal anti-inflammatory drug; RA, rheumatoid arthritis; RF, rheumatoid factor; VAS, visual analogue scale.

Figure 3

Discussion

This study aimed to develop six ML models that accurately predict SE in patients with RA taking MTX, and to enhance model interpretability using SHAP. Through a multicentre Korean cohort (KORONA), we constructed prediction models for two clinical outcomes: overall SE and GI SE. The incorporation of XAI allowed us to identify patient-level predictors with clinical relevance, offering a foundation for individualised risk assessment. RA is a systemic autoimmune disease affecting over 18 million people worldwide with a prevalence of 0.19% in South Korea.1,3 Although MTX remains the standard first-line treatment, adverse effects—particularly GI toxicity—are common, with reported rates as high as 70%.911,13 These toxicities often lead to dose reduction or discontinuation, underscoring the need for tools that enable early identification of high-risk patients. Among the six ML models evaluated, the XGB algorithm demonstrated the best optimisation performance for overall SE prediction (AUC of 0.781±0.005, accuracy of 0.684±0.005, F1 score of 0.672±0.006 and the AUPRC of 0.757±0.008). Additionally, in GI SE, the same model showed the AUC of 0.701±0.007, accuracy of 0.623±0.007, F1 score of 0.690±0.007 and AUPRC of 0.670±0.010. These models surpassed traditional classifiers in both discrimination and reliability. Importantly, SHAP-enabled model interpretation allowed transparent, case-level explanations of prediction outcomes. Such SHAP-enhanced ML frameworks combine strong predictive power with clinical interpretability, making them suitable for future integration into CDSS in rheumatology.

Several recent studies have used the KORONA dataset to investigate various aspects of RA in Korean patients. For instance, Choi et al examined disparities in the use of biological agents based on socioeconomic status,39 Kim et al studied the impact of menopause on RA clinical outcomes40 and Park et al applied ML techniques to predict osteoporosis in patients with RA.41 However, to date, no studies have focused on the SE experienced by patients receiving MTX using this dataset. Kearsley-Fleet et al used Kaplan-Meier survival analysis and logistic regression to study MTX use in patients with juvenile idiopathic arthritis, reporting that 54% discontinued treatment within 2 years and 37% experienced adverse drug reactions.18 Similarly, Hu et al applied ML to predict MTX-induced hepatotoxicity in 782 Chinese patients with RA using electronic medical records (EMR).42 While both studies provided valuable insights into MTX safety, the absence of laboratory data limited their ability to accurately identify predictors of adverse outcomes, underscoring the importance of more comprehensive datasets.18 42 Kim et al identified novel genetic markers, such as rs12551103 and rs13265933, associated with dermatological AEs in patients with RA treated with tumour necrosis factor-alpha inhibitors, emphasising the role of FERM Domain Containing 3 in these complications.43 Vodencarevic et al conducted the first ML-based flare prediction study using high-quality randomised controlled trial data, identifying key features such as biological DMARD dose change rates and inflammatory markers.44 However, both studies were constrained by small sample sizes, highlighting the need for large-scale research to validate their findings.43 44 Conversely, our study focused solely on patients with RA using MTX, developing predictive ML models for two types of SE and comparing these models to identify the most optimised approach for classification.

This study presents several strengths that support its clinical relevance and translational value. First, we developed interpretable ML models to predict SE in patients with RA taking MTX, using a large-scale, multicentre real-world cohort (KORONA). Unlike previous ML-based studies, which have primarily focused on treatment efficacy such as MTX response or discontinuation prediction,5 22 23 our work targets the underexplored yet clinically critical domain of drug safety. Notably, we applied XAI techniques—specifically SHAP—to quantify the influence of individual risk factors. This enhanced model transparency and enabled case-level interpretation, supporting trust and utility in clinical decision-making.24 25 37 Second, we demonstrated that the use of routinely collected clinical variables—including demographics, physician VAS, laboratory profiles (ALT, BUN, haemoglobin) and treatment patterns (eg, NSAID and steroid co-administration)—can enable accurate prediction of both overall SE and GI SE. Key predictors identified in our models, such as elevated ALT and HAQ scores, have been previously associated with MTX intolerance and liver toxicity.6 14 15 42 45 Additionally, SBP variability and co-use of cyclooxygenase-2 inhibitors, both of which emerged as meaningful contributors, are supported by previous findings linking them to vascular complications and GI toxicity.46,48 These results indicate that our models are grounded in clinically relevant mechanisms and could be readily integrated into risk assessment protocols. Third, our work advances the clinical utility of predictive modelling by emphasising interpretable, individualised risk assessment, a critical component for CDSS. While prior RA-related ML studies have shown high predictive performance, few have incorporated both interpretable AI and statistically validated multicentre cohorts.41 43 44 By combining SHAP-based interpretability with logistic regression confirmation of feature relevance, our models deliver high AUC while maintaining transparency—an essential feature for clinician adoption. Fourth, we addressed confounding bias by applying 1:1 PSM, which enhanced baseline balance and supported causal inference.49 Compared with other balancing techniques such as oversampling or synthetic augmentation, PSM provided improved reliability for observational datasets while preserving real-world generalisability. Furthermore, our validation strategy—stratified fivefold cross-validation combined with 1000-iteration bootstrap resampling—ensures robust and reproducible model performance.50 Finally, this study bridges a critical translational gap by demonstrating that advanced, explainable ML models can predict treatment-limiting AEs in patients with RA using variables that are readily available in clinical practice. The framework proposed here can be feasibly implemented into an EMR-linked CDSS for real-time patient risk stratification and treatment optimisation. Given the high global burden of RA,1,3 the limitations of MTX despite guideline endorsement4 6 and the frequent co-use of NSAIDs and steroids that elevate AE risks,78 47 47 51,54 our findings provide timely and actionable insights for improving therapeutic safety.

Despite its strengths, our study also has some limitations. First, AE information was collected at cohort enrolment using a structured questionnaire that was uniformly applied across all participating hospitals in the KORONA cohort. The questionnaire included the following items: “Have you experienced any drug-related discomfort after starting treatment?” (overall SE) and “Have you experienced any discomfort related to the gastrointestinal tract?” (GI SE). While this standardised form ensured consistent data capture, the attribution of AEs specifically to MTX cannot be fully established because most patients were on combination regimens including NSAIDs and other DMARDs. Thus, the model predicts the occurrence of treatment-related AEs in patients with RA receiving MTX, rather than MTX-specific toxicity. Second, although MTX dosage information was included as a baseline variable, detailed cumulative exposure, dosing history and interval data were unavailable. This limitation may have restricted our ability to evaluate potential dose-response relationships. Third, one patient information, including comorbidities and family histories, was derived from self-reported questionnaires. Although these data offer valuable clinical insights, they may introduce recall bias and lack the quantitative reliability of clinically measured parameters. Fourth, a portion of participants was excluded due to missing data or during the PSM process. While this reduced the sample size, it improved internal validity by enhancing covariate balance, as verified by SMD <0.1 across both continuous and categorical variables. Fifth, table 1 comparisons were exploratory and not corrected for multiple testing. However, the main statistical inferences were drawn from multivariate logistic regression analyses, which mitigates the risk of false positives. Sixth, as the KORONA dataset reflects clinical practices from 2009 to 2012, the treatment landscape for RA has evolved—particularly with the increased use of biologics and targeted synthetic DMARDs. Nonetheless, MTX remains the cornerstone first-line therapy for RA, and the identified risk factors are still mechanistically and clinically relevant in current practice. Finally, this study focused on identifying key predictive features for ML models, but their performance was not validated using independent external cohorts. Additionally, a longitudinal follow-up was not conducted to assess temporal variations in SE occurrence. Although data were obtained from multiple medical institutions, the cohort predominantly comprised patients with high adherence and regular hospital attendance, which may limit generalisability to populations with lower healthcare accessibility.

Conclusion

In conclusion, our study successfully demonstrated the predictive classification of SE in patients with RA using real-world large cohort data from South Korea. By analysing nationwide observational data, encompassing clinical, pathological and lifestyle factors, we identified key risk predictors of drug-related SE, thereby enhancing the reliability and clinical relevance of our findings. Six ML models were developed, achieving high accuracy and AUC based on 58 variables. Furthermore, XAI methods highlighted critical predictors, including age, physician VAS score, drug adherence and celecoxib use. These results provide robust prognostic tools to aid clinicians in optimising treatment strategies and improving the management of patients with RA.

Supplementary material

online supplemental file 1
bmjopen-15-11-s001.pptx (1.4MB, pptx)
DOI: 10.1136/bmjopen-2025-108527
online supplemental file 2
bmjopen-15-11-s002.docx (81.2KB, docx)
DOI: 10.1136/bmjopen-2025-108527

Acknowledgements

The authors acknowledge the invaluable contributions of all investigators in the Korean Observational Study Network for Arthritis (KORONA).

Footnotes

Funding: This study was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health and Welfare, Republic of Korea (RS-2025-24535069 and HI23C0733). In addition, Eisai Korea provided support for this study.

Prepublication history and additional supplemental material for this paper are available online. To view these files, please visit the journal online (https://doi.org/10.1136/bmjopen-2025-108527).

Provenance and peer review: Not commissioned; externally peer reviewed.

Patient consent for publication: Not applicable.

Data availability free text: Data are available on reasonable request and provided at the discretion of the corresponding author. This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy or reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages) and is not responsible for any errors or omissions arising from translation, adaptation or otherwise.

Patient and public involvement: Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Ethics approval: This study was approved by the Ethics Committee or Institutional Review Board of Kangwon National University Hospital (approval no. KNUH-2024-05-013), which waived the requirement for informed consent because anonymised data were used. In addition, all experiments were performed in accordance with relevant guidelines and regulations.

Data availability statement

Data are available on reasonable request.

References

  • 1.Shi G, Liao X, Lin Z, et al. Estimation of the global prevalence, incidence, years lived with disability of rheumatoid arthritis in 2019 and forecasted incidence in 2040: results from the Global Burden of Disease Study 2019. Clin Rheumatol. 2023;42:2297–309. doi: 10.1007/s10067-023-06628-2. [DOI] [PubMed] [Google Scholar]
  • 2.Kim H, Cho SK, Kim JW, et al. An increased disease burden of autoimmune inflammatory rheumatic diseases in Korea. Semin Arthritis Rheum. 2020;50:526–33. doi: 10.1016/j.semarthrit.2019.11.007. [DOI] [PubMed] [Google Scholar]
  • 3.Hur N-W, Choi C-B, Uhm W-S, et al. The Prevalence and Trend of Arthritis in Korea: Results from Korea National Health and Nutrition Examination Surveys. J Korean Rheum Assoc. 2008;15:11. doi: 10.4078/jkra.2008.15.1.11. [DOI] [Google Scholar]
  • 4.Smolen JS, Landewé RBM, Bijlsma JWJ, et al. EULAR recommendations for the management of rheumatoid arthritis with synthetic and biological disease-modifying antirheumatic drugs: 2019 update. Ann Rheum Dis. 2020;79:685–99. doi: 10.1136/annrheumdis-2019-216655. [DOI] [PubMed] [Google Scholar]
  • 5.Duong SQ, Crowson CS, Athreya A, et al. Clinical predictors of response to methotrexate in patients with rheumatoid arthritis: a machine learning approach using clinical trial data. Arthritis Res Ther. 2022;24:162. doi: 10.1186/s13075-022-02851-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Shea B, Swinden MV, Tanjong Ghogomu E, et al. Folic acid and folinic acid for reducing side effects in patients receiving methotrexate for rheumatoid arthritis. Cochrane Database Syst Rev. 2013;2014:CD000951. doi: 10.1002/14651858.CD000951.pub2. [DOI] [Google Scholar]
  • 7.Singh G, Ramey DR, Morfeld D, et al. Gastrointestinal tract complications of nonsteroidal anti-inflammatory drug treatment in rheumatoid arthritis. A prospective observational cohort study. Arch Intern Med. 1996;156:1530–6. [PubMed] [Google Scholar]
  • 8.Singh G, Rosen Ramey D. NSAID induced gastrointestinal complications: the ARAMIS perspective--1997. Arthritis, Rheumatism, and Aging Medical Information System. J Rheumatol Suppl. 1998;51:8–16. [PubMed] [Google Scholar]
  • 9.McKendry RJ, Cyr M. Toxicity of methotrexate compared with azathioprine in the treatment of rheumatoid arthritis. A case-control study of 131 patients. Arch Intern Med. 1989;149:685–9. [PubMed] [Google Scholar]
  • 10.Zhang A, Sun H, Qiu S, et al. NMR‐based metabolomics coupled with pattern recognition methods in biomarker discovery and disease diagnosis. Magnetic Reson in Chemistry. 2013;51:549–56. doi: 10.1002/mrc.3985. [DOI] [Google Scholar]
  • 11.Zhang A, Sun H, Wang X. Potentiating therapeutic effects by enhancing synergism based on active constituents from traditional medicine. Phytother Res. 2014;28:526–33. doi: 10.1002/ptr.5032. [DOI] [PubMed] [Google Scholar]
  • 12.Ćalasan MB, Bosch OFC, Creemers MCW, et al. Prevalence of methotrexate intolerance in rheumatoid arthritis and psoriatic arthritis. Arthritis Res Ther. 2013;15:R217. doi: 10.1186/ar4413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Li X, Zhang A, Sun H, et al. Metabolic characterization and pathway analysis of berberine protects against prostate cancer. Oncotarget. 2017;8:65022–41. doi: 10.18632/oncotarget.17531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Salliot C, van der Heijde D. Long-term safety of methotrexate monotherapy in patients with rheumatoid arthritis: a systematic literature research. Ann Rheum Dis. 2009;68:1100–4. doi: 10.1136/ard.2008.093690. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Albrecht K, Müller-Ladner U. Side effects and management of side effects of methotrexate in rheumatoid arthritis. Clin Exp Rheumatol. 2010;28:S95–101. [Google Scholar]
  • 16.Möttönen T, Hannonen P, Korpela M, et al. Delay to institution of therapy and induction of remission using single‐drug or combination–disease‐modifying antirheumatic drug therapy in early rheumatoid arthritis. Arthritis & Rheumatism . 2002;46:894–8. doi: 10.1002/art.10135. [DOI] [PubMed] [Google Scholar]
  • 17.Ganggayah MD, Taib NA, Har YC, et al. Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med Inform Decis Mak. 2019;19:48. doi: 10.1186/s12911-019-0801-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kearsley-Fleet L, Vicente González L, Steinke D, et al. Methotrexate persistence and adverse drug reactions in patients with juvenile idiopathic arthritis. Rheumatology (Sunnyvale) 2019;58:1453–8. doi: 10.1093/rheumatology/kez048. [DOI] [Google Scholar]
  • 19.Saevarsdottir S, Wedrén S, Seddighzadeh M, et al. Patients with early rheumatoid arthritis who smoke are less likely to respond to treatment with methotrexate and tumor necrosis factor inhibitors: Observations from the Epidemiological Investigation of Rheumatoid Arthritis and the Swedish Rheumatology Register cohorts. Arthritis & Rheumatism . 2011;63:26–36. doi: 10.1002/art.27758. [DOI] [PubMed] [Google Scholar]
  • 20.Drouin J, Haraoui B, 3e Initiative Group Predictors of clinical response and radiographic progression in patients with rheumatoid arthritis treated with methotrexate monotherapy. J Rheumatol. 2010;37:1405–10. doi: 10.3899/jrheum.090838. [DOI] [PubMed] [Google Scholar]
  • 21.Teitsma XM, Jacobs JWG, Welsing PMJ, et al. Inadequate response to treat-to-target methotrexate therapy in patients with new-onset rheumatoid arthritis: development and validation of clinical predictors. Ann Rheum Dis. 2018;77:1261–7. doi: 10.1136/annrheumdis-2018-213035. [DOI] [PubMed] [Google Scholar]
  • 22.Duquesne J, Bouget V, Cournède PH, et al. Machine learning identifies a profile of inadequate responder to methotrexate in rheumatoid arthritis. Rheumatology (Oxford) 2023;62:2402–9. doi: 10.1093/rheumatology/keac645. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Lee S, Kang S, Eun Y, et al. Machine learning-based prediction model for responses of bDMARDs in patients with rheumatoid arthritis and ankylosing spondylitis. Arthritis Res Ther. 2021;23:254. doi: 10.1186/s13075-021-02635-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Ahmad MA, Eckert C, Teredesai A. Interpretable machine learning in healthcare. BCB ’18; Washington DC USA. Aug 15, 2018. pp. 559–60. Available. [Google Scholar]
  • 25.Xu H, Tang RSY, Lam TYT, et al. Artificial Intelligence-Assisted Colonoscopy for Colorectal Cancer Screening: A Multicenter Randomized Controlled Trial. Clin Gastroenterol Hepatol. 2023;21:337–46. doi: 10.1016/j.cgh.2022.07.006. [DOI] [PubMed] [Google Scholar]
  • 26.Sung Y-K, Cho S-K, Choi C-B, et al. Korean Observational Study Network for Arthritis (KORONA): establishment of a prospective multicenter cohort for rheumatoid arthritis in South Korea. Semin Arthritis Rheum. 2012;41:745–51. doi: 10.1016/j.semarthrit.2011.09.007. [DOI] [PubMed] [Google Scholar]
  • 27.Elm E von, Altman DG, Egger M, et al. Strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. BMJ. 2007;335:806–8. doi: 10.1136/bmj.39335.541782.AD. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Prevoo MLL, Van’T Hof MA, Kuper HH, et al. Modified disease activity scores that include twenty-eight-joint counts development and validation in a prospective longitudinal study of patients with rheumatoid arthritis. Arthritis & Rheumatism. 1995;38:44–8. doi: 10.1002/art.1780380107. [DOI] [PubMed] [Google Scholar]
  • 29.Bae SC, Cook EF, Kim SY. Psychometric evaluation of a Korean Health Assessment Questionnaire for clinical research. J Rheumatol. 1998;25:1975–9. [PubMed] [Google Scholar]
  • 30.Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. COLT92; Pittsburgh Pennsylvania USA. Jul, 1992. pp. 144–52. [DOI] [Google Scholar]
  • 31.Breiman L. Random Forests. Mach Learn. 2001;45:5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]
  • 32.Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Statist. 2001;29:1189–232. doi: 10.1214/aos/1013203451. [DOI] [Google Scholar]
  • 33.Chen T, Guestrin C. XGBoost: a scalable tree boosting system; 2016. pp. 785–94. [DOI] [Google Scholar]
  • 34.Ke G, Meng Q, Finley T, et al. LightGBM: a highly efficient gradient boosting decision tree. 2017:3149–57.
  • 35.Prokhorenkova L, Gusev G, Vorobev A, et al. CatBoost: unbiased boosting with categorical features. 2019. http://arxiv.org/abs/1706.09516 Available.
  • 36.Patro SGK, sahu KK. Normalization: A Preprocessing Stage. International Advanced Research Journal in Science, Engineering and Technology. 2015:20–2. doi: 10.17148/IARJSET.2015.2305. [DOI] [Google Scholar]
  • 37.Akiba T, Sano S, Yanase T, et al. Optuna: a next-generation hyperparameter optimization framework; Oct 28, 2019. pp. 2623–31. [DOI] [Google Scholar]
  • 38.Lundberg SM, Lee SI. A unified approach to interpreting model predictions
  • 39.Choi Y, Kim HJ, Park J, et al. Acute and post-acute respiratory complications of SARS-CoV-2 infection: population-based cohort study in South Korea and Japan. Nat Commun. 2024;15:4499. doi: 10.1038/s41467-024-48825-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Kim HW, Lee YJ, Ha YJ, et al. Impact of socioeconomic status on biologics utilization in rheumatoid arthritis: revealing inequalities and healthcare efficiency. Korean J Intern Med. 2024;39:668–79. doi: 10.3904/kjim.2023.276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Park EH, Kang EH, Lee YJ, et al. Impact of early age at menopause on disease outcomes in postmenopausal women with rheumatoid arthritis: a large observational cohort study of Korean patients with rheumatoid arthritis. RMD Open. 2023;9:e002722. doi: 10.1136/rmdopen-2022-002722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Lee C, Joo G, Shin S, et al. Prediction of osteoporosis in patients with rheumatoid arthritis using machine learning. Sci Rep. 2023;13:21800. doi: 10.1038/s41598-023-48842-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Hu Q, Wang H, Xu T. Predicting Hepatotoxicity Associated with Low-Dose Methotrexate Using Machine Learning. JCM. 2023;12:1599. doi: 10.3390/jcm12041599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Kim W, Oh S-J, Kim H-J, et al. Development of a Risk Prediction Model for Adverse Skin Events Associated with TNF-α Inhibitors in Rheumatoid Arthritis Patients. JCM. 2024;13:4050. doi: 10.3390/jcm13144050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Vodencarevic A, Tascilar K, Hartmann F, et al. Advanced machine learning for predicting individual risk of flares in rheumatoid arthritis patients tapering biologic drugs. Arthritis Res Ther . 2021;23:67. doi: 10.1186/s13075-021-02439-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Curtis JR, Beukelman T, Onofrei A, et al. Elevated liver enzyme tests among patients with rheumatoid arthritis or psoriatic arthritis treated with methotrexate and/or leflunomide. Ann Rheum Dis. 2010;69:43–7. doi: 10.1136/ard.2008.101378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Myasoedova E, Crowson CS, Green AB, et al. Longterm blood pressure variability in patients with rheumatoid arthritis and its effect on cardiovascular events and all-cause mortality in RA: a population-based comparative cohort study. J Rheumatol. 2014;41:1638–44. doi: 10.3899/jrheum.131170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Vonkeman HE, van de Laar MAFJ. Nonsteroidal anti-inflammatory drugs: adverse effects and their prevention. Semin Arthritis Rheum. 2010;39:294–312. doi: 10.1016/j.semarthrit.2008.08.001. [DOI] [PubMed] [Google Scholar]
  • 49.Franck H, Rau R, Herborn G. Thrombocytopenia in patients with rheumatoid arthritis on long-term treatment with low dose methotrexate. Clin Rheumatol. 1996;15:163–7. doi: 10.1007/BF02230334. [DOI] [PubMed] [Google Scholar]
  • 50.Sherbini AA, Sharma SD, Gwinnutt JM, et al. Prevalence and predictors of adverse events with methotrexate mono- and combination-therapy for rheumatoid arthritis: a systematic review. Rheumatology (Oxford) 2021;60:4001–17. doi: 10.1093/rheumatology/keab304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Park SW, Yeo NY, Kang S, et al. Early Prediction of Mortality for Septic Patients Visiting Emergency Room Based on Explainable Machine Learning: A Real-World Multicenter Study. J Korean Med Sci. 2024;39:e53. doi: 10.3346/jkms.2024.39.e53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Hernández-Díaz S, Rodríguez LAG. Steroids and risk of upper gastrointestinal complications. Am J Epidemiol. 2001;153:1089–93. doi: 10.1093/aje/153.11.1089. [DOI] [PubMed] [Google Scholar]
  • 53.Fardet L, Kassar A, Cabane J, et al. Corticosteroid-Induced Adverse Events in Adults. Drug Saf. 2007;30:861–81. doi: 10.2165/00002018-200730100-00005. [DOI] [PubMed] [Google Scholar]
  • 54.Piper JM, Ray WA, Daugherty JR, et al. Corticosteroid Use and Peptic Ulcer Disease: Role of Nonsteroidal Anti-inflammatory Drugs. Ann Intern Med. 1991;114:735–40. doi: 10.7326/0003-4819-114-9-735. [DOI] [PubMed] [Google Scholar]

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    online supplemental file 1
    bmjopen-15-11-s001.pptx (1.4MB, pptx)
    DOI: 10.1136/bmjopen-2025-108527
    online supplemental file 2
    bmjopen-15-11-s002.docx (81.2KB, docx)
    DOI: 10.1136/bmjopen-2025-108527

    Data Availability Statement

    Data are available on reasonable request.


    Articles from BMJ Open are provided here courtesy of BMJ Publishing Group

    RESOURCES