Skip to main content
Journal of Managed Care & Specialty Pharmacy logoLink to Journal of Managed Care & Specialty Pharmacy
. 2026 Mar;32(3):336–347. doi: 10.18553/jmcp.2026.32.3.336

Predicting early discontinuation of adalimumab in patients with rheumatoid arthritis using machine learning: A specialty pharmacy–based approach

Angie H Yoon 1,2,, Peter Gedeck 2, Marlette Oelofsen 3
PMCID: PMC12948748  PMID: 41760567

Abstract

BACKGROUND:

Patients with rheumatoid arthritis (RA) prescribed adalimumab often discontinue treatment within 6 months because of a perceived lack of benefit. Specialty pharmacies are well-positioned to intervene early, but identifying patients at high risk for early discontinuation is difficult to predict with existing tools.

OBJECTIVE:

To develop a predictive model using machine learning (ML) to identify patients with RA at high risk of discontinuing adalimumab within 6 months, enabling targeted pharmacist interventions.

METHODS:

We used the retrospective data of patients with RA who initiated adalimumab at a specialty pharmacy between 2020 and 2023. Eligible patients completed patient-reported assessment at first dispense and maintained medication adherence (proportion of days covered ≥80%) before discontinuation. A total of 38 features were collected at pharmacy service initiation from an integrated dispensing and clinical management platform. Predictors were selected based on low missingness (≤20%) and low interfeature correlation (|Pearson coefficient|≤0.4). Several ML classification models were trained and evaluated using metrics including area under the receiver operating characteristic curve (AUC-ROC) and F1 score.

RESULTS:

Of 300 eligible patients with RA, 37.7% were classified as high risk for discontinuing adalimumab within 6 months owing to loss of efficacy. A total of 19 predictors were selected, including sex, age, treatment initiation status (new vs transfer), pain score, joint swelling, morning stiffness, RA duration, body mass index, bone health, infection, history of joint injury, and comorbidities. Elastic Net achieved the highest performance (AUC-ROC = 0.886; F1 score = 0.741) followed closely by linear discriminant analysis and support vector machines, which also performed well in identifying high-risk patients.

CONCLUSIONS:

Predictive modeling using routinely collected specialty pharmacy data can identify patients with RA at risk of early adalimumab discontinuation. In particular, the Elastic Net regularized logistic regression model offered high discriminative performance and may support pharmacist-led follow-up and timely interventions to reduce medication waste and improve patient outcomes.

Plain language summary

We used pharmacy records and patient-reported information to build a computer model that helps find patients with rheumatoid arthritis (RA) likely to stop taking adalimumab early because of a lack or loss of effectiveness. Our model, based on real-world data, can help pharmacists act early by reaching out to these patients to improve care.

Implications for managed care pharmacy

This study offers a potential machine learning tool for specialty pharmacy to identify patients with RA at high risk of adalimumab discontinuation. Early flagging supports targeted follow-up and timely therapy adjustments, more effective medication use, reduced waste, and improved patient outcomes.


Rheumatoid arthritis (RA) is a chronic autoimmune disease that requires long-term treatment with disease-modifying antirheumatic drugs (DMARDs). Biologic DMARDs (bDMARDs) are typically initiated when conventional synthetic DMARDs (csDMARDs) fail, but many patients discontinue bDMARDs within the first 6 months, often because of lack of benefit or therapeutic failure.15 These early discontinuations can delay disease control, increase health care costs, and lead to unnecessary medication waste.6 In some cases, even initial responders discontinue later because of secondary loss of efficacy and the development of antidrug antibodies.7 Early identification of patients at risk for discontinuation could support timely treatment adjustments and improve the efficient use of specialty medications.8

Artificial intelligence (AI) use is increasingly applied across health care, from radiologic interpretation to drug discovery.9 In RA, AI techniques offer promising applications in diagnosis, disease classification, and flare detection as well as monitoring disease activity and predicting treatment outcomes.10,11 Machine learning (ML) studies have shown promise for predicting treatment outcomes, with reported area under the receiver operating characteristic curve (AUC-ROC) values ranging from 0.63 to 0.92.1218 Many of these models, however, have relied on data types such as laboratory results, imaging, biomarkers, or genetic data that are not consistently available in routine care settings. To improve applicability, other approaches have explored more accessible predictors, including patient-reported outcomes (PROs), pain scores, and self-assessments that reflect the patient experience.1924 Similarly, outcome definitions have extended beyond clinical disease activity scores, which may not be routinely collected in all care environments. Alternative endpoints such as treatment persistence, therapy changes, or medication discontinuation, have been used, as they can be captured through hospital records, pharmacy dispensing data, or insurance claims, making them feasible for use in real-world prediction tasks.18,2427 These adaptations in both predictors and outcome definitions support a broader application of predictive modeling.

Specialty pharmacies are uniquely positioned to support the application of AI in rheumatology. They provide comprehensive medication management for complex bDMARD therapy by monitoring therapeutic response and managing adverse effects.28,29 As integral members of the RA health care community, they routinely collect clinically relevant patient data through onboarding assessments, ongoing counseling, and refill follow-ups. These interactions generate longitudinal, real-time data that can inform ML model development. By leveraging the frequency and accessibility of pharmacist-patient contact, specialty pharmacies offer a practical setting for implementing predictive tools in patient care. This study aimed to evaluate whether real-world data collected during routine specialty pharmacy care could support ML-based prediction of early adalimumab discontinuation.

Methods

STUDY DESIGN AND DATA COLLECTION

This retrospective study used patient data extracted from integrated dispensing and clinical management platforms at a mail order specialty pharmacy. The dispensing system, Complete Patient Records (CPR+) captured prescription fill history, diagnostic codes, demographics, external medication lists, and uploaded chart notes. The clinical management platform (therigySTM) documented pharmacist-collected patient assessments, chart note review, communication notes from onboarding, refills, and follow-up interactions.

The study population included adult patients with RA who initiated care at the specialty pharmacy either as new starts or transfers between 2020 and 2023 and who completed the baseline PRO assessment including pain score and quality of life (QOL). All patients had adalimumab prescriptions and documented International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes within the M05 and M06 categories. The most common codes were M05.70, M05.79, M06.00, and M06.9, corresponding to uncomplicated RA with or without rheumatoid factor. It was not restricted by the number of ICD-10-CM codes or the type of encounter setting. All features used for predictive modeling were collected at onboarding and reflected the patient’s condition at or immediately before the initiation of specialty pharmacy services. Prescriber chart notes were required to be present in the pharmacy system at baseline. Also, patients were eligible for inclusion if they maintained adequate adherence (proportion of days covered [PDC] ≥ 80%) during the first 6 months. Patients were then classified based on treatment duration (discontinued within 6 months as high risk vs continued for ≥6 months as regular risk). Among those who discontinued early, only patients with documented lack or loss of efficacy were retained; patients whose discontinuation was attributed to access barriers, insurance changes, adverse effects, or unknown reasons were excluded. Discontinuation reasons were identified based on documentation from pharmacist follow-up assessments and prescriber communications as recorded in CPR+ and therigySTM. Despite these safeguards, some misclassification may remain because of incomplete follow-up or undocumented cases.

DATA PREPROCESSING

A comprehensive preprocessing step was conducted to address inconsistencies resulting from data collection. Some features were recorded more than once, particularly during the clinical onboarding chart note review and the initial counseling session. Additionally, patient-reported symptom measures such as pain score and morning stiffness were often provided as ranges rather than single values, reflecting the fluctuating nature of these symptoms even within the same day.

Missing data were commonly observed and arose from several sources. Subjective assessments, including pain scores and QOL, required patient participation. Disease-specific characteristics were inconsistently documented across patients because of variability in chart note styles among different prescribers. Laboratory data often exhibited heterogeneity in units and reference ranges across reporting laboratories, making them difficult to harmonize.

To address these issues, we retained symptom values recorded closest to the therapy start date. For patients with unstable symptom reporting, we selected the values reflecting the worst-case severity at baseline. Although this may inflate symptom severity, it reflects a real-world approach in the absence of standardized reporting and will be evaluated against methods (eg, averaging) in future work. Laboratory values with inconsistent formatting or unclear units were excluded from the analysis. Missing values were handled through an imputation and filtering process, as detailed in the “Feature Selection” section. A description of the collected data is provided in Table 1.

TABLE 1.

Patient Characteristics, Descriptive Analysis, and Missing Values

High risk (n = 113) Regular risk (n = 187) P valuea Missing, n (%); imputation value
Categorical variables, n (%)
 Sex Female 91 (80.5) 143 (76.5) 0.50 0
Male 22 (19.5) 44 (23.5)
 Therapy type of adalimumab New therapy 90 (79.6) 81 (43.3) <0.001 0
Continuation 23 (20.4) 106 (56.7)
 Serology at initial diagnosis Seronegative 38 (36.8) 41 (25.8) 0.076 38 (12.7); seropositive
Seropositive 65 (63.2) 118 (74.2)
 Presence of joint swelling Yes 94 (88.7) 100 (61.3) <0.001 31 (10.3); yes
No 12 (11.3) 63 (38.7)
 History of joint injury Yes 25 (22.7) 19 (10.4) <0.001 14 (4.7); no
No 85 (77.3) 163 (89.6)
 Fibromyalgia Yes 25 (22.7) 19 (10.4) 0.007 8 (2.7); no
No 85 (77.3) 163 (89.6)
 Insomnia Yes 30 (29.1) 26 (14.3) 0.004 16 (5.3); no
No 73 (70.9) 155 (85.7)
 Depression Yes 54 (50.9) 58 (35.6) 0.018 31 (10.3); no
No 52 (49.1) 105 (64.4)
 Osteoporosis/osteopenia Yes 50 (45.5) 63 (34.8) 0.092 9 (3.0); no
No 60 (54.5) 118 (65.2)
 Infectionb Yes 33 (29.5) 32 (18.0) 0.032 10 (3.3); no
No 79 (70.5) 146 (82.0)
 Other autoimmune disease Yes 39 (35.1) 31 (17.2) <0.001 9 (3.0); no
No 72 (64.9) 149 (82.8)
 Number of prior csDMARDs 0 6 (5.3) 21 (11.2) <0.001 0
1 45 (39.8) 105 (56.1)
2 40 (35.4) 49 (26.2)
3 18 (15.9) 10 (5.3)
4 4 (3.5) 2 (1.1)
 Number of concomitant csDMARDs 0 36 (31.9) 47 (25.1) 0.20 0
1 68 (60.2) 113 (60.4)
2 9 (8.0) 27 (14.4)
 Number of comorbidityc 0 27 (23.9) 65 (34.8) 0.15 0
1 27 (23.9) 48 (25.7)
2 28 (24.8) 36 (19.3)
3 19 (16.8) 30 (16.0)
4 7 (6.2) 5 (2.7)
5 5 (4.4) 3 (1.6)
Continuous variables, mean (SD)
 Age, years 53 (12.2) 52 (13.3) 0.70 0
 Body mass index 32 (7.3) 31 (7.1) 0.30 0
 Paind 7.24 (2.1) 3.84 (2.3) <0.001 0
 Morning stiffness, hours 2.38 (2.8) 0.82 (1.4) <0.001 54 (18.0); 1.45
 RA duration, years 6.6 (6.2) 7.5 (5.0) 0.20 0

The high-risk group discontinued within 6 months because of a lack or loss of efficacy. The regular-risk group continued treatment for at least 6 months.

a

P values for categorical variables were calculated using the Pearson chi-square test except for the number of prior csDMARDs and the number of comorbidities, which were evaluated using the Fisher exact test. For continuous variables, the Welch 2-sample t-test was applied.

b

Infection category includes chronic hepatitis B or C, latent tuberculosis infection, and any acute infection in the past 1 month.

c

Number of comorbidities excludes fibromyalgia, insomnia, depression, osteoporosis, osteopenia, and other autoimmune diseases.

d

Pain severity was assessed using the numeric rating scale, ranging from 0 (no pain) to 10 (worst imaginable pain).

csDMARD = conventional synthetic disease-modifying antirheumatic drug; RA = rheumatoid arthritis.

FEATURE SELECTION

From the initial pool of patient characteristics, features with 20% or more missingness were excluded to ensure data integrity and practical applicability. The remaining variables were further assessed through exploratory data analysis (EDA) to evaluate their distribution and ability to distinguish between the 2 risk groups.

Additionally, we applied zero variance filtering during preprocessing to confirm that none of the selected features had constant values across all observations. Highly correlated features were excluded to reduce redundancy.30,31 An absolute Pearson correlation coefficient threshold of 0.4 was used to identify highly correlated variables. As shown in Supplementary Figure 1 (90.5KB, pdf) (available in online article), the highest observed absolute correlation was 0.37, so the selected threshold had no effect on feature selection. Missing data in the selected features were handled as part of model development. We applied median imputation for missing numeric values and mode imputation for missing categorical variables.3133

MODEL DEVELOPMENT

ML Models

The complete dataset was randomly split into a training set comprising 80% (n = 240) of the observations and a test set containing the remaining 20% (n = 60). Model development and evaluation were performed using the ‘tidymodels’ framework in R (version 4.4.1). Multiple supervised classification algorithms were evaluated. These included linear discriminant analysis (LDA); Elastic Net regularized logistic regression (Elastic Net); support vector machines (SVM) with linear, polynomial, and radial basis function kernels; random forest (RF); k-nearest neighbors (KNN); and extreme gradient boosting (XGBoost). For KNN, we applied 3 commonly used weighting methods: rectangular, Gaussian, and optimal weighting functions. These are widely adopted in practice and were selected to explore how different neighbor-weighing strategies might affect model performance during hyperparameter tuning. Each model was trained using 10-fold cross validation (CV) of the training set, where the data were split into 10 parts, with each part used once for validation and the rest for training. This approach improves model reliability by reducing variance and guarding against overfitting.30

Hyperparameter Tuning

Hyperparameter tuning was performed to optimize the performance of Elastic Net, SVM, RF, KNN, and XGBoost by selecting the best combination of model-specific parameters. Many ML models rely on hyperparameters that influence how the model learns from data. Rather than using default values, we conducted systematic tuning to identify the configuration that yields the most accurate and generalizable predictions. Bayesian optimization was used for Elastic Net, SVM, RF, and XGBoost to search for optimal hyperparameter configurations. For KNN, we used grid search, a simpler approach appropriate for models with fewer tuning parameters.30 To guide the selection of the best model within each algorithm, we used the AUC-ROC as the primary performance metric. AUC-ROC was chosen for its robustness in assessing the model’s discriminative ability across various classification thresholds.30,34 A summary of the hyperparameters used for tuning in each algorithm and their optimized values for the selected models are provided in Tables 2 and 3.

TABLE 2.

Selected Hyperparameters and Classification Thresholds

Tuning
Model Hyperparametersa Thresholda,b
LDA n/ac 0.34
Elastic net Penalty = 0.0368, mixture = 0.4793 0.32
RF mtry = 3, min_n = 38 0.45
KNN Neighbors = 50, weight function = rectangulard 0.31
SVM with linear kernele Cost = 0.0170, margin = 0.0825 0.34
XGBoost mtry = 11, min_n = 5, tree_depth = 8, learn_rate = 0.0074, loss_function = 4.6130, sample_size = 0.6757 0.31
a

Hyperparameters are reported to 4 decimal places, performance metrics to 3, and thresholds to 2 for clarity and consistency.

b

Thresholds were selected to maximize the F1 score on cross validation.

c

No hyperparameters are applicable for logistic regression and LDA.

d

Three weighting functions were evaluated for KNN: rectangular (equal weight to all neighbors), Gaussian (exponentially higher weight to closer neighbors), and optimal (automatically selects a kernel function to optimize classification performance).

e

Three kernel types were evaluated for SVM: linear, polynomial, and radial basis function.

cost = penalty for misclassification; KNN = k-nearest neighbors; LDA = linear discriminant analysis; learn_rate = learning rate for boosting; loss_function = objective/loss function used for optimization; margin = margin width for separation; min_n = minimum observations required to split a node; mixture = proportion of L1 vs L2 penalty; mtry = number of predictors sampled for each split; n/a = not applicable; neighbors = number of nearest neighbors; penalty = overall regularization strength; RF = random forest; sample_size = proportion of training data used in each boosting round; SVM = support vector machines; tree_depth = maximum depth of each tree; weight function = distance weighting method; XGBoost = extreme gradient boosting.

TABLE 3.

Model Performance Metrics

Performance metricsa
Model Set Accuracy Recall Precision PR-AUC F1 score AUC-ROC
LDA CV 0.816 0.733 0.768 0.827 0.740 0.892
Test 0.770 0.870 0.645 0.799 0.741 0.871
Elastic Net CV 0.837 0.733 0.822 0.830 0.764 0.894
Test 0.770 0.870 0.645 0.802 0.741 0.886
RF CV 0.833 0.722 0.813 0.840 0.756 0.893
Test 0.787 0.696 0.727 0.740 0.711 0.865
KNN CV 0.728 0.311 0.867 0.823 0.447 0.869
Test 0.754 0.435 0.833 0.797 0.571 0.883
SVM with linear kernel CV 0.837 0.711 0.835 0.841 0.758 0.901
Test 0.770 0.826 0.655 0.807 0.731 0.880
XGBoost CV 0.841 0.778 0.801 0.853 0.778 0.903
Test 0.787 0.783 0.692 0.757 0.735 0.817

Bold values indicate the 2 highest performing models for each metric.

a

Hyperparameters are reported to 4 decimal places, performance metrics to 3, and thresholds to 2 for clarity and consistency.

AUC-ROC = area under the receiver operating characteristic curve; CV = cross validation; KNN = k-nearest neighbors; LDA = linear discriminant analysis; PR-AUC = area under the precision-recall curve; RF = random forest; SVM = support vector machines; XGBoost = extreme gradient boosting.

Threshold Selection

Following model selection, we applied threshold tuning to determine the optimal cutoff at which predicted probabilities would be classified as either high risk or regular risk. This process was essential to align with the goals of our study. The primary objective was to proactively identify high-risk patients while also minimizing the burden of unnecessary follow-up by limiting false positives.

To balance these priorities, we optimized the classification threshold to maximize the F1 score, which is the harmonic mean of precision and recall (sensitivity). Most classification algorithms generate predicted probabilities rather than fixed class labels, and the default threshold of 0.5 may not be appropriate for imbalanced datasets such as ours. In this study, precision refers to the proportion of patients predicted to be high risk who actually discontinued therapy early, whereas recall refers to the proportion of truly high-risk patients correctly identified by the model.

Threshold selection was performed within the 10-fold CV framework, and the optimal cutoff was subsequently applied to the test set for final model evaluation. As shown in Tables 2 and 3, the selected optimal thresholds to maximize F1 score were similar across models, ranging from 0.31 to 0.34 except for RF (0.45). The model development flowchart is presented in Figure 1.

FIGURE 1.

Model Development Flowchart

FIGURE 1

Overview of the supervised machine learning pipeline, including preprocessing, model development with cross validation and hyperparameter tuning, threshold selection, and performance evaluation.

AUC-ROC = area under the receiver operating characteristic curve; Elastic Net = Elastic Net regularized logistic regression; KNN = k-nearest neighbors; LDA = linear discriminant analysis; PR-AUC = area under the precision-recall curve; RF = random forest; SVM = support vector machine; XGBoost = extreme gradient boosting.

PERFORMANCE METRICS

To evaluate model performance, we assessed both classification metrics and class probability-based metrics. The selected classification metrics were accuracy, F1 score, precision, and recall. The probability-based metrics included AUC-ROC and area under the precision-recall curve (PR-AUC).

The reported metrics were selected based on their relevance to the study objectives. Accuracy was included for reference but is less informative, as it treats both high risk and regular risk equally and can remain high even when high-risk patients are missed. Emphasis was placed on metrics focusing on the event of our interest (high risk). F1 score, precision, recall, AUC-ROC, and PR-AUC better capture the model’s ability to identify and prioritize the high-risk patients.

Model performance was evaluated at 2 key stages. First, after hyperparameter tuning, we reported all 5 metrics for the selected model from each algorithm on the CV training set (Tables 2 and 3). Corresponding ROC curves for each model are presented in Figure 2.

FIGURE 2.

Receiver Operating Characteristic Curves for Each Classification Model

FIGURE 2

Each panel displays the cross validation receiver operating characteristic (solid red) and the holdout test receiver operating characteristic (solid green).

XGBoost = extreme gradient boosting.

Second, following threshold selection, we evaluated the final model performance by comparing these same metrics between the CV training set and the test set (Tables 2 and 3). To facilitate comparison across models, a grouped bar chart was used to visually display the F1 score and AUC-ROC across datasets (Supplementary Figure 2 (90.5KB, pdf) ). ROC curves for the final models on the test set are also shown in Figure 2.

A subgroup sensitivity analysis was conducted on the final model to evaluate whether model performances were consistent between new start patients (initiated adalimumab at our pharmacy) and transfer patients (already on adalimumab before transferring to our pharmacy). The DeLong test was used to compare AUC-ROC values between the 2 subgroups to evaluate whether the performance difference was statistically significant.

Results

PATIENT CHARACTERISTICS AND EDA

Among 1,640 patients with RA who opted into our clinical services for bDMARDs between 2000 and 2023, 300 met the study inclusion criteria. Of these, 113 patients (37.7%) were classified as high risk, and 187 patients (62.3%) were classified as regular risk (Figure 3). An initial pool of 38 characteristics was examined, covering demographics, disease-specific factors, RA therapy–related factors, laboratory values, comorbidities, patient-reported assessment, supplement use, past medical history, family history, alcohol consumption, and smoking status.

FIGURE 3.

Study Sample Attribution Diagram

FIGURE 3

The diagram outlines the inclusion and exclusion criteria applied to patients with RA onboarded between 2020 and 2023, leading to classification into high-risk and regular-risk groups. The high-risk group discontinued within 6 months because of a lack or loss of efficacy. The regular-risk group continued treatment for at least 6 months.

PDC = proportion of days covered; QOL = quality of life; RA = rheumatoid arthritis.

Descriptive analysis and EDA were conducted to understand the dataset. A total of 19 features were selected as predictor variables through the feature selection process described in the Methods section. These consist of age, sex, type of adalimumab therapy (new start or continuation), RA duration, body mass index, prior csDMARD use, concomitant csDMARD use, pain score, serology at diagnosis, morning stiffness, joint swelling, infection status, history of joint injury, osteoporosis, depression, insomnia, fibromyalgia, other autoimmune disease, and other comorbidities.

Descriptive statistics for patient characteristics and missing data rates are presented in Table 1. In total, 19 characteristics were excluded owing to more than 20% missingness, with rates ranging from 27.3% to 60.3%. They included laboratory data, treatment history (number of prior bDMARDs and steroid duration), provider-assessed disease characteristics (disease severity and erosion), patient lifestyle factors (smoking and alcohol use), family history of autoimmune disease, and patient-reported QOL. Supplementary Figures 3 and 4 (90.5KB, pdf) display a visual comparison of key features between the 2 risk groups using bar charts and density plots. Several features were notably associated with the high-risk patient group. They included adalimumab therapy type as new start and greater symptom burden (elevated pain scores, longer duration of morning stiffness, and presence of joint swelling).

A correlation heatmap (Supplementary Figure 1 (90.5KB, pdf) ) summarizes the relationships among numeric variables. Comorbidity showed the strongest positive correlation with age (Pearson correlation coefficient, r = 0.37), whereas all other pairwise correlations had absolute r values below 0.24.

MODEL PERFORMANCE

After hyperparameter tuning, XGBoost demonstrated the highest overall performance on the CV training set, achieving a mean AUC-ROC of 0.901 and an F1 score of 0.778 (Tables 2 and 3). However, on the test set, Elastic Net achieved the highest performance, with an AUC-ROC of 0.886 and an F1 score of 0.741. Figure 2 displays the ROC curves of the models on the training set.

Both LDA and SVM also performed well on the test set, achieving AUC-ROCs of 0.871 and 0.88 and F1 scores of 0.741 and 0.731, respectively. The version of SVM used in threshold selection was based on the linear kernel, which had yielded the best AUC-ROC during CV. KNN underperformed with a test set F1 score of 0.571 despite competitive results on the CV training set. Although XGBoost performed best during training, its test set performance declined (AUC-ROC = 0.817), suggesting possible overfitting (Tables 2 and 3; Figure 2).

As shown in Figure 2, ROC curves for the Elastic Net model demonstrated the strongest separation between the high- and regular-risk groups across the full range of thresholds. XGBoost, in comparison, showed a flatter ROC curve consistent with its reduced test set performance. PR-AUC further supported Elastic Net as providing the best trade-off between precision and recall on the test set.

To gain insights into feature influence, we examined the final Elastic Net model, which identified 13 patient characteristics with nonzero coefficients. Features positively associated with high risk included pain (1.160), joint swelling (0.558), infection history (0.498), joint injury (0.483), use of adalimumab as new start (0.475), depression (0.388), morning stiffness (0.380), fibromyalgia (0.336), other autoimmune disease (0.066), and age (0.020). Conversely, features negatively associated with high risk included concomitant csDMARDs (−0.379) and seropositivity (−0.019). These findings suggest that both symptomatic burden and underlying comorbidities contribute to early treatment discontinuation risk. In subgroup analysis, Elastic Net demonstrated slightly higher AUC-ROC in transfer patients compared with new start patients (0.918 vs 0.888). However, the difference was not statistically significant (P = 0.479), indicating comparable model performance across subgroups.

Discussion

Our study expands the application of ML in rheumatology beyond traditional care settings by incorporating a diverse set of real-world data. Using information available at the time of clinical service initiation at a specialty pharmacy, we demonstrated the feasibility of predicting patients at high risk of early discontinuation because of a perceived lack of benefit. This represents a meaningful contribution, as predictive modeling has rarely been applied to specialty pharmacy datasets despite their potential. Among the models evaluated, Elastic Net demonstrated the most consistent performance across CV and test sets. This reinforces its potential utility as a relatively interpretable yet flexible algorithm for pharmacy settings. Unlike black-box models such as XGBoost or SVM, Elastic Net offers interpretability through its model coefficients, which reflect variable importance. In our case, this allows pharmacists to understand which patient features strongly influence risk predictions. The observed variation in performance between CV and test sets also highlights the importance of evaluating generalizability beyond training data, especially when using more complex models. To further assess robustness, we conducted a subgroup sensitivity analysis by separating new start and transfer patients. Elastic Net maintained statistically comparable performance across both subgroups, supporting its generalizability to distinct patient populations within the specialty pharmacy setting.

Although laboratory data were initially considered to enrich the clinical context, they were excluded because of high missingness caused by inconsistent units from external sources. This underscores the challenge of laboratory data harmonization in pharmacy settings. In contrast, PROs were available and served as a valuable source of information. As detailed in the Results section on Elastic Net’s nonzero coefficients, key predictive factors included symptom burden (eg, pain, joint swelling, or morning stiffness), medical history (eg, infection or joint injury), comorbidities, and treatment regimen (eg, concomitant csDMARDs). This highlights the value of combining PROs with clinical characteristics. These patient-reported measures may provide early signals of treatment failure and are critical for identifying patients who may benefit from additional support. This approach enables specialty pharmacists to proactively engage high-risk patients earlier than standard follow-up intervals through enhanced counseling and symptom monitoring. For example, pharmacists may assess adherence to concomitant csDMARDs, evaluate pain-related comorbidities such as fibromyalgia, and coordinate care by referring patients to dietitians when lifestyle-related factors (eg, poor diet or weight gain) may be contributing to symptom worsening. Additional interventions may include reaching out to prescribers to discuss therapy adjustments or securing timely refill authorizations for supportive medications.

To enhance clinical utility, we applied post hoc threshold adjustment to prioritize patients at high risk of discontinuation. This strategy aimed to balance sensitivity and precision, minimizing missed opportunities for intervention while limiting false positives. The final thresholds resulted in 19.7%-50.8% of patients being flagged as high risk across models, suggesting a manageable caseload for pharmacists. Although the F1 score guided initial model selection, clinical workflows may favor higher sensitivity, even at the expense of more false positives, given the comparatively greater cost of missed treatment failure. The decision to prioritize sensitivity vs precision should be tailored to the operational context of each pharmacy. Threshold recalibration serves as a flexible tool to align model outputs with these varying goals. For pharmacies focused on cost-efficiency, future work could incorporate cost-benefit optimization frameworks that better reflect real-world clinical and economic trade-offs.35

LIMITATIONS

Several limitations of this study should be considered. First, the analysis was conducted using data from a single specialty pharmacy, which may limit the generalizability of the findings. Patients with missing baseline assessments were excluded from the analytic sample to leverage PROs captured closest to the time of pharmacy service initiation. We also defined the cohort as patients who demonstrated a PDC of at least 80% during the first 6 months of service to reduce the risk of misclassifying poor adherence as treatment failure. As a result, our final sample may reflect a subset of patients who were more engaged, and the model may underestimate discontinuation risk among high-risk but nonadherent patients. Future research should validate these models in broader, prospective cohorts across diverse specialty pharmacy settings.

Second, although efforts were made to accurately define early discontinuation based on patient- or provider-reported data, outcome misclassification remains possible. Although we excluded known nonefficacy cases (eg, insurance barriers or adverse effects), residual misclassification may have influenced model training.

Third, although features were selected through a structured process and cut off for missingness and correlation, a detailed analysis of missing data patterns was beyond the scope of this study. Incorporating formal diagnostics and advanced imputation strategies such as Multivariate Imputation by Chained Equations36 in future work may further strengthen model robustness. Additionally, future studies should include tools like SHapley Additive exPlanations37 to explain individual predictions by identifying the most influential features for each patient. This will help explore how model interpretability can be operationalized to promote clinical uptake.

The study period overlapped with the COVID-19 pandemic, which may have affected patient behavior, therapy access, and pharmacy operations. Although such disruptions could influence adherence and data availability, analyzing discontinuation patterns during this time remains valuable for informing future mitigation strategies and model development in similar scenarios.

Despite these data limitations, our findings demonstrate that predictive modeling can still generate actionable insights to support patient identification and service prioritization. Importantly, this study highlights the potential role of specialty pharmacists as domain experts in future model development. Their clinical experience and deep understanding of specialty pharmacy data can meaningfully contribute to building and refining ML tools, which is an essential aspect of applying AI in real-world health care settings.

Finally, consistent with the principles of the “no free lunch” theorem, no single model is universally optimal.38 Continued evaluation of diverse modeling approaches remains essential as pharmacy data sources and care setting evolve.

Conclusions

This study demonstrates the feasibility of using routinely collected specialty pharmacy data including both clinical and patient-reported measures to develop ML models that predict early discontinuation of adalimumab among patients with RA. Among the models evaluated, Elastic Net and LDA achieved strong and consistent discriminative performance. These findings highlight the potential for predictive modeling to be operationalized within specialty pharmacy workflows to support targeted outreach, timely treatment adjustments, and improved patient outcomes. Future work will focus on prospective validation, incorporating model interpretability, and exploring integration into value-based care strategies.

Acknowledgments

The authors acknowledge David Skomo of HealthDyne and Robert Nicholas Page of Welldyne for their valuable advice on the clinical and operational relevance of predictive models to specialty pharmacy.

References

  • 1.Fraenkel L, Bathon JM, England BR, et al. 2021 American College of Rheumatology guideline for the treatment of rheumatoid arthritis. Arthritis Rheumatol. 2021;73(7):1108-23. doi: 10.1002/art.41752 [DOI] [PubMed] [Google Scholar]
  • 2.Smolen JS, Landewé RBM, Bergstra SA, et al. EULAR recommendations for the management of rheumatoid arthritis with synthetic and biological disease-modifying antirheumatic drugs: 2022 update. Ann Rheum Dis. 2023;82(1):3-18. doi: 10.1136/ard-2022-223356 [DOI] [PubMed] [Google Scholar]
  • 3.Zink A, Strangfeld A, Schneider M, et al. Effectiveness of tumor necrosis factor inhibitors in rheumatoid arthritis in an observational cohort study: Comparison of patients according to their eligibility for major randomized clinical trials. Arthritis Rheum. 2006;54(11):3399-407. doi: 10.1002/art.22193 [DOI] [PubMed] [Google Scholar]
  • 4.Bolge SC, Goren A, Tandon N. Reasons for discontinuation of subcutaneous biologic therapy in the treatment of rheumatoid arthritis: A patient perspective. Patient Prefer Adherence. 2015;9:121-31. doi: 10.2147/PPA.S70834 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Souto A, Maneiro JR, Gómez-Reino JJ. Rate of discontinuation and drug survival of biologic therapies in rheumatoid arthritis: A systematic review and meta-analysis of drug registries and health care databases. Rheumatology (Oxford). 2016;55(3):523-34. doi: 10.1093/rheumatology/kev374 [DOI] [PubMed] [Google Scholar]
  • 6.Gu T, Mutebi A, Stolshek BS, Tan H. Cost of biologic treatment persistence or switching in rheumatoid arthritis. Am J Manag Care. 2018;24(8 Spec No.):SP338-45. [PubMed] [Google Scholar]
  • 7.Kalden JR, Schulze-Koops H. Immunogenicity and loss of response to TNF inhibitors: Implications for rheumatoid arthritis treatment. Nat Rev Rheumatol. 2017;13(12):707-18. doi: 10.1038/nrrheum.2017.187 [DOI] [PubMed] [Google Scholar]
  • 8.Pappas DA, Kremer JM, Griffith J, et al. Long-term effectiveness of adalimumab in patients with rheumatoid arthritis: An observational analysis from the Corrona Rheumatoid Arthritis Registry. Rheumatol Ther. 2017;4(2):375-89. doi: 10.1007/s40744-017-0077-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med. 2022;28(1):31-8. doi: 10.1038/s41591-021-01614-0 [DOI] [PubMed] [Google Scholar]
  • 10.Gilvaz VJ, Reginato AM. Artificial intelligence in rheumatoid arthritis: Potential applications and future implications. Front Med (Lausanne). 2023;10:1280312. doi: 10.3389/fmed.2023.1280312 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Momtazmanesh S, Nowroozi A, Rezaei N. Artificial intelligence in rheumatoid arthritis: Current status and future perspectives: A state-of-the-art review. Rheumatol Ther. 2022;9(5):1249-304. doi: 10.1007/s40744-022-00475-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Alten R, Behar C, Merckaert P, et al. Predicting abatacept retention using machine learning. Arthritis Res Ther. 2025;27(1):20. doi: 10.1186/s13075-025-03484-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Kim BY, Koo J, Yeo J, et al. Machine learning-based prediction of treatment response to anti-tumor necrosis factor and Janus kinase inhibitor therapies in patients with rheumatoid arthritis. Ann Rheum Dis. 2024;83:803-4. doi: 10.1136/annrheumdis-2024-eular.2784 [DOI] [Google Scholar]
  • 14.Guan Y, Zhang H, Quang D, et al. Machine learning to predict anti–tumor necrosis factor drug responses of rheumatoid arthritis patients by integrating clinical and genetic markers. Arthritis Rheumatol. 2019;71(12):1987-96. doi: 10.1002/art.41056 [DOI] [PubMed] [Google Scholar]
  • 15.Koo BS, Eun S, Shin K, et al. Machine learning model for identifying important clinical features for predicting remission in patients with rheumatoid arthritis treated with biologics. Arthritis Res Ther. 2021;23(1):178. doi: 10.1186/s13075-021-02567-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Benavent D, Carmona L, García Llorente JF, et al. Artificial intelligence to predict treatment response in rheumatoid arthritis and spondyloarthritis: A scoping review. Rheumatol Int. 2025;45(4):91. doi: 10.1007/s00296-025-05825-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Matsuo H, Kamada M, Imamura A, et al. Machine learning-based prediction of relapse in rheumatoid arthritis patients using data on ultrasound examination and blood test. Sci Rep. 2022;12(1):7224. doi: 10.1038/s41598-022-11361-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Öberg Sysojev A, Delcoigne B, Frisell T, et al. ; Impact of genetics on predicting persistence to treatment with methotrexate in early rheumatoid arthritis. Ann Rheum Dis. 2024;83(Suppl 1):810. doi: 10.1136/annrheumdis-2024-eular.2921 [DOI] [Google Scholar]
  • 19.Salehi F, Lopera Gonzalez LI, Bayat S, et al. Machine learning prediction of treatment response to biological disease-modifying antirheumatic drugs in rheumatoid arthritis. J Clin Med. 2024;13(13):3890. doi: 10.3390/jcm13133890 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kalweit M, Burden AM, Boedecker J, Hügle T, Burkard T. Patient groups in rheumatoid arthritis identified by deep learning respond differently to biologic or targeted synthetic DMARDs. PLoS Comput Biol. 2023;19(6):e1011073. doi: 10.1371/journal.pcbi.1011073 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Lee S, Kang S, Eun Y, et al. Machine learning-based prediction model for responses of bDMARDs in patients with rheumatoid arthritis and ankylosing spondylitis. Arthritis Res Ther. 2021;23(1):254. doi: 10.1186/s13075-021-02635-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Sun Z, Wang F, Chen J, et al. Establishment and verification of a nomogram and a preliminary study on predicting the clinical response of conventional synthetic disease-modifying antirheumatic drugs (csDMARDs) in rheumatoid arthritis patients. Ann Transl Med. 2022;10(24):1365. doi: 10.21037/atm-22-5791 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Venerito V, Angelini O, Fornaro M, Cacciapaglia F, Lopalco G, Iannone F. A machine learning approach for predicting sustained remission in rheumatoid arthritis patients on biologic agents. J Clin Rheumatol. 2022;28(2):e334-9. doi: 10.1097/RHU.0000000000001720 [DOI] [PubMed] [Google Scholar]
  • 24.Vodencarevic A, Tascilar K, Hartmann F, et al. Prediction of flares for rheumatoid arthritis patients on biologic DMARDs using machine learning and subsets of variables available to physicians, patients and payers. Ann Rheum Dis. 2020;79:959-60. doi: 10.1136/annrheumdis-2020-eular.1553 [DOI] [Google Scholar]
  • 25.Shipa MRA, Di Cicco M, Balogh E, et al. Drug-survival profiling of second-line biologic therapy in rheumatoid arthritis: Choice of another tumour necrosis factor inhibitor or a biologic of different mode of action? Mod Rheumatol. 2023;33(4):700-7. doi: 10.1093/mr/roac086 [DOI] [PubMed] [Google Scholar]
  • 26.Ukalovic D, Leeb BF, Rintelen B, et al. Prediction of ineffectiveness of biological drugs using machine learning and explainable AI methods: Data from the Austrian Biological Registry BioReg. Arthritis Res Ther. 2024;26(1):44. doi: 10.1186/s13075-024-03277-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Jin Y, Landon J, Krueger W, Liede A, Kim S. Predicting treatment change in rheumatoid arthritis patients treated with TNF inhibitors as first-line biologic agent. Arthritis Rheumatol. 2021;73(suppl 9). [Google Scholar]
  • 28.Barlow JF, Faris RJ, Wang W, et al. Impact of specialty pharmacy on treatment costs for rheumatoid arthritis. Am J Pharm Benefits. 2012;4(6). [Google Scholar]
  • 29.Barat E, Soubieux A, Brevet P, et al. Impact of the clinical pharmacist in rheumatology practice: A systematic review. Healthcare (Basel). 2024;12(15):1463. doi: 10.3390/healthcare12151463 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning: With Applications in R. 2nd ed. Springer; 2021. [Google Scholar]
  • 31.Kuhn M, Johnson K. Applied Predictive Modeling. Springer; 2013. [Google Scholar]
  • 32.Graham JW. Missing data analysis: Making it work in the real world. Annu Rev Psychol. 2009;60(1):549-76. doi: 10.1146/annurev.psych.58.110405.085530 [DOI] [PubMed] [Google Scholar]
  • 33.Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2nd ed. John Wiley & Sons; 2002:59-74. [Google Scholar]
  • 34.Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett. 2006;27(8):861-74. doi: 10.1016/j.patrec.2005.10.010 [DOI] [Google Scholar]
  • 35.Sooklal S, Hosein P. A benefit optimization approach to the evaluation of classification algorithms. In: Silhavy R, ed. Artificial Intelligence in Industry 4.0. Proceedings of the International Conference on Artificial Intelligence 2019. Springer; 2019:36-46. [Google Scholar]
  • 36.Van Buuren S, Groothuis-Oudshoorn K. MICE: multivariate imputation by chained equations in R. J Stat Softw. 2011;45(3):1-67. doi: 10.18637/jss.v045.i03 [DOI] [Google Scholar]
  • 37.Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems. Vol 30. NIPS; 2017:4765-4774. [Google Scholar]
  • 38.Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Trans Evol Comput. 1997;1(1):67-82. doi: 10.1109/4235.585893 [DOI] [Google Scholar]

Articles from Journal of Managed Care & Specialty Pharmacy are provided here courtesy of Academy of Managed Care Pharmacy

RESOURCES