Skip to main content
Journal of Translational Autoimmunity logoLink to Journal of Translational Autoimmunity
. 2026 Mar 14;12:100365. doi: 10.1016/j.jtauto.2026.100365

Machine learning for predicting macrophage activation syndrome in adult patients with Still's disease

Yihe Zheng 1, Changyi Yang 1, Xiaoxuan Cai 1, Jie Zhao 1, Zile Chen 1, Shuni Ying 1, Jianjun Qiao 1,
PMCID: PMC13010447  PMID: 41884119

Abstract

Background

Macrophage activation syndrome (MAS) secondary to Still's disease is a potentially fatal complication, associated with mortality rates exceeding 10%. Early identification is critical for survival but remains challenging due to the lack of specific predictive biomarkers.

Objective

To develop and test an explainable model to predict MAS in adult patients with Still's disease using routine baseline clinical parameters, and to implement it as an interactive tool.

Methods

We conducted a retrospective model development and testing study across four hospital sites from Aug 1, 2019 to Jul 31, 2025. Adults meeting the Yamaguchi criteria for Still's disease were included. Demographics, imaging/physical findings, and routine laboratory tests within 48 h of admission were analyzed. Predictors were selected using nested cross-validated LASSO, and five algorithms (logistic regression, random forest, SVM, XGBoost, and LightGBM) were compared. Model interpretability was assessed with SHAP, and a bedside score was derived using Firth's penalized logistic regression.

Results

A total of 312 patients with Still's disease was included, with model development in two centers (n = 226) and testing in two independent centers (n = 86). The final XGBoost model retained five key predictors: ferritin, splenomegaly, platelet count, total cholesterol, and erythrocyte sedimentation rate, achieving an AUC of 0.839 in the test set, with a sensitivity of 0.824, specificity of 0.710, acceptable calibration (Brier 0.136), and favorable net clinical benefit. The derived 0–10 bedside risk score stratified the training cohort into low- (1%), intermediate- (14.6%), and high-risk (75%) MAS groups.

Conclusions

We present an interpretable machine learning model based on baseline data and simplified risk score for predicting in-hospital MAS in adult patients with Still's disease. To our knowledge, this study represents one of the larger adult cohorts assembled for Still's disease-associated MAS.

Keywords: Still's disease, Macrophage activation syndrome, Machine learning

1. Introduction

Still's disease is now recognized as a single disease entity encompassing both pediatric- and adult-onset manifestations [1]. It is a rare systemic autoinflammatory disorder characterized by high-spiking fevers, evanescent salmon-colored rash, and arthritis or arthralgia, often accompanied by marked hyperferritinaemia [2]. Approximately 10 to 25% of adult patients with Still's disease may develop macrophage activation syndrome (MAS), a severe hyperinflammatory complication that can lead to life-threatening multisystem organ dysfunction [[3], [4], [5]]. Clinical manifestations are heterogeneous, ranging from persistent fever, hepatosplenomegaly, cytopenia, elevated ferritin level, and disseminated intravascular coagulopathy to fulminant multiorgan dysfunction [6]. Importantly, MAS can occur at any stage of Still's disease and may progress rapidly over hours to days, posing significant diagnostic and therapeutic challenges [7,8]. Early detection of MAS is therefore crucial, as timely initiation of immunosuppressive therapy may substantially improve outcomes [9].

Despite the clinical urgency, early prediction of MAS remains challenging. Current diagnostic and classification frameworks for MAS and related hemophagocytic lymphohistiocytosis (HLH) are primarily developed to establish a formal diagnosis rather than to enable proactive risk stratification. Several pivotal diagnostic assessments are either not routinely available in most clinical settings or only become abnormal after the optimal therapeutic window has narrowed [3]. This underscores a critical void in accessible, admission-based tools that support early and actionable risk assessment.

Machine-learning (ML) methods provide a pragmatic strategy for MAS risk prediction by integrating multiple variables and capturing complex, potentially non-linear relationships that are not well addressed by conventional one-dimensional statistical analyses [10]. This is particularly relevant in Still's disease, where systemic inflammatory features overlap substantially between active disease and evolving MAS and where multicollinearity among inflammatory markers is common [11]. Given the rarity of the condition, automated feature selection and regularization may also help mitigate redundancy and improve robustness. In addition, tools such as SHAP can provide individualized, transparent explanations of model outputs, helping bridge algorithmic predictions with clinical trust.

In this study, we developed and evaluated an explainable ML model to predict in-hospital MAS among adult patients with Still's disease at the point of admission by using high-accessibility variables, which include physical examination findings and routine hematological indices. We identified a parsimonious set of predictors and derived a simplified bedside risk score to facilitate early risk stratification and support timely therapeutic intervention in this rare but high-mortality complication.

2. Methods

2.1. Study design and participants

This model development and testing study was conducted using data from four campuses of the First Affiliated Hospital of Zhejiang University (ie, Qingchun Campus, Chengzhan Campus, Yuhang Campus, and Zhijiang Campus). Patients diagnosed with Still's disease in any one of these four campuses between August 1, 2019, and July 31, 2025, were recruited. Patients were eligible for inclusion if they were aged at least 18 years and met the Yamaguchi classification criteria for Still's disease [12]. Exclusion criteria were: (1) prior diagnosis of HLH/MAS; (2) evidence of active or suspected infection, a prior diagnosis of malignancy, or other autoimmune diseases; (3) glucocorticoid or immunosuppressive therapy within 2 weeks prior to admission; (4) lack of more than two essential HLH/MAS diagnostic parameters; (5) lack of more than 20% candidate predictor data (Fig. 1a). MAS diagnosis was adjudicated by at least two physicians according to the modified HLH-2009 diagnostic criteria (Fig. 1b) [13]. This study was approved by the Ethics Committee of the First Affiliated Hospital of Zhejiang University, and the requirement for informed consent was waived due to the retrospective nature. This study was reported following the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement [14].

Fig. 1.

Fig. 1

Overview of model development and validation workflow.

A) Inclusion and exclusion criteria. B) Adapted HLH-2009 diagnostic criteria employed for our study. C) Experimental design. M MAS; NM non-MAS.

2.2. Data preprocessing

Fig. 1c summarizes the experimental design. Demographics, imaging findings (hepatomegaly, splenomegaly, lymphadenopathy, pleural effusion, and pericardial effusion), and 55 routine laboratory parameters (Supplementary Table 1) reported within 48 h of admission were extracted from each patient's electronic medical records. Baseline systemic symptoms (arthralgia or arthritis, sore throat, lymphadenopathy, and rash) were ascertained from clinical documentation. Rash, including both typical evanescent rashes fading with defervescence and atypical persistent pruritic eruptions, was adjudicated based on clinical notes and photographic/histopathological records when available. Lesions with mixed features were classified as persistent. For analyses, rash phenotype was coded as 0 = no rash, 1 = typical evanescent rash, and 2 = atypical persistent pruritic eruptions. Fever was defined as a documented peak body temperature ≥39.0 °C within 48 h of admission, consistent with the characteristic spiking fever of Still's disease and aligned with the Yamaguchi criteria.

Variables with ≥20% missingness were excluded. Remaining missing values were imputed using multiple imputation by chained equations (MICE; method = 'pmm', m = 5, seed = 123). Training and test cohorts were imputed separately to prevent data leakage. Data from Qingchun Campus and Chengzhan Campus were used for model training, while data from Yuhang Campus and Zhijiang Campus were reserved as an independent test set; these sites operate with geographically separated teams, autonomous medical teams, and site-specific instrumentation.

2.3. Feature selection and model construction

Feature selection was performed exclusively in the training cohort to prevent information leakage. A three-stage approach was employed. First, pairwise Pearson correlation coefficients were calculated to identify highly correlated feature pairs (|r| ≥ 0.90); within each pair, the variable with greater missingness was removed. Second, multicollinearity was addressed using variance inflation factor (VIF) filtering with a prespecified cutoff of 5, iteratively removing the feature with the highest VIF until all remaining predictors had VIF <5. Third, predictor stability was assessed using LASSO-penalized logistic regression (L1 penalty; α = 1) with class weights proportional to inverse class frequencies to mitigate outcome imbalance. Feature selection was embedded within a nested, stratified 10-fold cross-validation framework. Within each outer fold, categorical predictors were dummy encoded and continuous predictors were z-score standardized using parameters estimated from the outer-fold training data only, and the same transformations were then applied to the held-out fold.

The LASSO regularization parameter (λ) was tuned via an inner 5-fold cross-validation using area under the receiver operating characteristic curve (AUC) as the optimization criterion. The λ at one standard error above the minimum (λ1se) was preferred to prioritize model parsimony; if the one-standard-error solution yielded fewer than five predictors, λmin (or the smallest λ achieving ≥5 predictors) was used instead. Feature stability was quantified by selection frequency across the 10 outer folds, and predictors selected in at least 50% of folds were considered stable. The final feature set was restricted to five predictors to meet events-per-variable requirements and reduce the risk of overfitting [15]. When more than five predictors met the stability criterion, he five predictors with the largest absolute LASSO coefficients were retained. A six-predictor model was also evaluated but did not improve discrimination or overall performance compared with the five-predictor model (Supplementary Table 2; Supplementary Fig. 1).

Using the final stable feature set, five machine learning algorithms were fitted: ridge-penalized logistic regression (LR), random forest (RF), support vector machine with a radial basis function kernel (SVM), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM). Class weights inversely proportional to class frequencies were applied across all algorithms to address outcome imbalance.

Hyperparameters for each algorithm were optimized via grid search using 5-fold cross-validation with AUC as the optimization criterion. To enhance comparability of predicted probabilities across algorithms, Platt scaling was fitted on the training cohort and applied to the external test cohort. Cross-validated training performance was evaluated using out-of-fold (OOF) predictions without post-hoc calibration to avoid additional resampling complexity. Detailed hyperparameter search spaces and optimal configurations for each algorithm are provided in Supplementary Table 3.

2.4. Model evaluation

Model performance was assessed using internal validation in the training cohort with stratified 10-fold cross-validation and independent evaluation in a prespecified test cohort. Discrimination was quantified using the AUC.

In the training cohort, AUC was estimated from pooled OOF predictions obtained during cross-validation, with 95% confidence intervals derived from 2000 bootstrap resamples of the OOF predictions. In the independent test cohort, predicted probabilities were first calibrated using Platt scaling fitted on the training data, and test AUCs and corresponding 95% confidence intervals were estimated using DeLong's method. Pairwise comparisons of test cohort AUCs across algorithms were conducted using DeLong's test and complemented by net reclassification improvement (NRI) and integrated discrimination improvement (IDI). When performance differences were not statistically significant, the most parsimonious and clinically interpretable model was selected.

Given the clinical severity of MAS and the priority of minimizing false-negative predictions, operating thresholds were determined exclusively from training cohort OOF predictions using a sensitivity-constrained strategy. Specifically, multiple candidate thresholds were evaluated, and the final operating threshold was selected to achieve a prespecified target sensitivity of at least 80% while maximizing specificity. The Youden index–derived threshold was computed for reference but was not used for final evaluation. The selected operating threshold was then fixed and applied unchanged to both the training and independent test cohorts. Threshold-dependent performance metrics, including sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV), were calculated at the fixed operating threshold. Calibration was assessed using the Brier score. Decision curve analysis (DCA) was performed to evaluate net clinical benefit across a range of clinically relevant threshold probabilities and to assess the clinical utility of the selected model.

Feature contributions were quantified using SHapley Additive exPlanations (SHAP) values. For the XGBoost algorithm, TreeSHAP-style additive feature attributions were computed using the native prediction contribution method (predcontrib). For other algorithms, permutation-based SHAP approximations were obtained using the fastshap package with 20 Monte Carlo simulations.

Firth's penalized logistic regression was used to obtain bias-corrected coefficient estimates suitable for small-sample and rare-event settings during the development of a simplified risk score. Continuous predictors were categorized into quartiles based on training cohort distributions, while binary predictors were retained in their original form. Integer points were assigned to each predictor category in proportion to the corresponding β-coefficient from the penalized regression model. An open-access web calculator was implemented to automate score computation and risk-group assignment.

2.5. Statistical analysis

Continuous variables were presented as mean (s.d.) or median (IQR), as appropriate, and categorical variables as counts (%). Between-group comparisons used Student's t-test or the Wilcoxon rank-sum test for continuous variables and the χ2 test or Fisher's exact test for categorical variables, as appropriate. All statistical tests were two-sided, with p < 0.05 considered statistically significant. Analyses were performed using R (version 4.4.3).

3. Results

3.1. Baseline characteristics of the study population

Between August 1, 2019, and July 31, 2025, a total of 312 individuals were included in the study, comprising 226 patients in the training cohort and 86 patients in the independent test cohort (Fig. 1). MAS occurred in 41/226 (18.1%) training patients and 17/86 (19.8%) test patients. Baseline characteristics are summarized in Table 1 and Supplementary Table 4–6. Overall, age and sex distributions were broadly comparable between MAS and non-MAS groups in both cohorts (Table 1).

Table 1.

Baseline characteristics of included patients.

Characteristics Training Set
(N = 226)
Test Set
(N = 86)
MAS 41 (18.1%) 17 (19.8%)
Age, years
 Non-MAS 47.0 (35.0–60.0) 46.0 (30.0–57.0)
 MAS 46.0 (29.0–57.0) 35.0 (26.0–59.0)
Female
 Non-MAS 130 (70.3%) 46 (66.7%)
 MAS 32 (78.0%) 12 (70.6%)
Fever (≥39°C)
 Non-MAS 149 (80.5%) 53 (76.8%)
 MAS 35 (85.4%) 16 (94.1%)
Maximum body temperature, °C
 Non-MAS 39.39 (±0.82) 39.33 (±0.89)
 MAS 39.51 (±0.78) 39.74 (±0.75)
Splenomegaly
 Non-MAS 55 (29.7%) 12 (17.4%)
 MAS 14 (34.1%) 8 (47.1%)
White blood cell count (WBC, 10^9/L)
 Non-MAS 13.31 (±6.54) 13.45 (±7.97)
 MAS 12.34 (±7.40) 10.76 (±5.32)
Neutrophil count (NEU, 10^9/L)
 Non-MAS 11.20 (±6.44) 11.41 (±7.91)
 MAS 10.51 (±7.15) 9.01 (±5.15)
Platelet count (PLT, ×109/L)
 Non-MAS 298.10 (±111.81) 290.80 (±127.34)
 MAS 202.20 (±100.95) 237.71 (±143.34)
Total cholesterol (TC, mmol/L)
 Non-MAS 3.72 (±0.76) 3.76 (±0.78)
 MAS 4.26 (±1.08) 4.41 (±1.12)
C-reactive protein (CRP, mg/L)
 Non-MAS 89.03 (±66.25) 85.21 (±69.37)
 MAS 109.38 (±81.49) 102.13 (±93.09)
Aspartate transaminase (AST, U/L)
 Non-MAS 78.64 (±202.72) 63.41 (±74.89)
 MAS 116.71 (±167.78) 124.88 (±134.99)
Ferritin (ng/mL)
 Non-MAS 10276.94 (±17140.75) 11199.24 (±18957.48)
 MAS 24835.32 (±27025.27) 22087.28 (±27949.92)
Erythrocyte sedimentation rate (ESR, mm/h)
 Non-MAS 61.44 (±28.89) 60.29 (±29.73)
 MAS 37.71 (±29.15) 54.00 (±31.47)
Rash
Non-MAS
 Absent 26 (14.1%) 10 (14.5%)
 Typical/Evanescent rash 107 (57.8%) 38 (55.1%)
 Atypical/Persistent pruritic eruptions 52 (28.1%) 21 (30.4%)
MAS
 Absent 3 (7.3%) 2 (11.8%)
 Typical 19 (46.3%) 10 (58.8%)
 Atypical 19 (46.3%) 5 (29.4%)

Values are presented as mean ± standard deviation (SD) or median (interquartile range, IQR), as appropriate. Categorical variables are presented as n (%).

Patients with MAS exhibited a more severe inflammatory and organ injury phenotype, characterized by higher levels of aspartate aminotransferase (AST), triglycerides, total cholesterol (TC), lactate dehydrogenase (LDH), procalcitonin (PCT), and ferritin, along with lower platelet counts (PLT) and higher rates of splenomegaly and coagulation abnormalities. Compared with the training cohort, the test cohort showed lower rates of lymphadenopathy and pericardial effusion, as well as lower PCT levels.

3.2. Feature selection and model evaluation

Pearson correlation filtering removed lymphocyte percentage, neutrophil count, and direct bilirubin; the subsequent VIF filtering iteratively removed total protein, prothrombin time, AST, fibrinogen, monocyte percentage, and LDH (Supplementary Fig. 2 a-c). Cross-validated LASSO then identified a stable, parsimonious set of five predictors retained for model development: ferritin, splenomegaly, PLT, TC, and erythrocyte sedimentation rate (ESR) (Supplementary Fig. 2d; Supplementary Table 7).

Discrimination was estimated in the training cohort via stratified 10-fold cross-validation with OOF predictions to minimize optimism, and then assessed in the independent test cohort. Across five algorithms trained on the same feature set, test AUCs ranged from 0.617 to 0.839 (LR 0.762; RF 0.617; XGBoost 0.839; SVM 0.783; LightGBM 0.640; Fig. 2). DeLong's tests indicated no statistically significant pairwise AUC differences. However, NRI analyses favored XGBoost, showing a significant NRI of 0.653 (P = 0.012) compared with the reference LR model (Supplementary Table 8; Supplementary Fig. 3). We therefore selected XGBoost for downstream analyses.

Fig. 2.

Fig. 2

Discrimination and classification performance across algorithms.

The receiver operating characteristic (ROC) curves in the training set (A) and test set (B). C) Sensitivity and specificity at the prespecified operating threshold (0.0470). D) Confusion matrices. LR, logistic regression; RF, random forest; XGBoost, extreme gradient boosting; SVM, support vector machine; LightGBM, light gradient boosting machine.

Using a prespecified probability threshold of 0.0470 derived from the sensitivity-anchored strategy (Supplementary Fig. 4), XGBoost achieved a test cohort sensitivity of 0.824 and specificity of 0.710, corresponding to a PPV of 0.412, an NPV of 0.942, and an F2 score of 0.687. Probabilistic accuracy, summarized by the Brier score, was 0.111 in training OOF predictions and 0.136 in the test cohort. DCA suggested a positive net benefit compared with treat-all and treat-none strategies across a broad range of clinically reasonable threshold probabilities of approximately 0.05 to 0.60 (Supplementary Fig. 5).

3.3. Feature importance ranking

To characterize the drivers of model predictions, we computed SHAP values for the final XGBoost model. Ferritin, splenomegaly, and PLT were the leading contributors, with TC and ESR providing additional signal (Fig. 3a and b). Higher ferritin, splenomegaly, lower PLT, higher TC, and lower ESR increased predicted MAS risk (Fig. 3a), and an individual force plot illustrates how feature contributions aggregative an individual-level prediction (Fig. 3c). No predictor showed a negligible, cliff-like drop in attribution in our model.

Fig. 3.

Fig. 3

SHAP-based interpretation of the selected XGBoost model.

A) SHAP summary (beeswarm) plot showing the distribution of per-individual SHAP values for the predictors. Each dot represents one individual; the x-axis denotes the SHAP value. B) Global feature importance quantified by mean absolute SHAP value. C) SHAP force plot for an illustrative individual. SHAP, SHapley additive explanations; PLT, platelets count; TC, total cholesterol; ESR, erythrocyte sedimentation rate; splenomegaly_X1, indicator of splenomegaly.

Considering TC and thrombin time (TT) competed inclusion at the final LASSO-based selection step, we performed an additional SHAP analysis including TT as an alternative candidate predictor, achieved by extending the selection cutoff to the top six features. In this expanded specification, TT showed minimal incremental SHAP contribution, supporting retention of TC (Supplementary Fig. 1d and e).

3.4. Risk stratification and model development

For clinical usability, we translated the final XGBoost model into a five-variable integer risk score (0–10 points) based on ferritin, PLT, splenomegaly, ESR, and TC (Fig. 4a–c). The score showed good discrimination in training cohort (AUC 0.919, 95% CI 0.872–0.957) and preserved discrimination in the test cohort (AUC 0.788, 95% CI 0.686–0.884). Additionally, the Kolmogorov–Smirnov (KS) curve demonstrated a maximum KS statistic of 0.68 (Fig. 4e), indicating good separation between MAS and non-MAS patients.

Fig. 4.

Fig. 4

Five-variable integer risk score for early identification of MAS in adult patients with Still’s disease.

A) Scorecard showing point assignment for each predictor. B) Prespecified risk-group cutoffs (low 0–5, intermediate 6-7, high 8-10) and observed MAS rates. C) Overview of the score card. D) Score distributions in MAS vs non-MAS patients. E) KS curve for the score.

Using prespecified cut-offs, patients were stratified into low (0–5), intermediate (6–7), and high (8–10) risk groups. In the training cohort, observed MAS rates were 1.0% (1/101), 14.6% (13/89), and 75.0% (27/36), respectively (Fig. 4b–d). The high-risk group captured 65.9% of MAS events (27/41) with a PPV of 75.0%, while the low-risk group achieved an NPV of 99.0%. In the test cohort, the corresponding MAS rates were 7.5% (3/40), 22.2% (6/27), and 42.1% (8/19) (Supplementary Tables 9–10). An open-access web calculator implementing the score is available at https://maspredictor.shinyapps.io/clinical_score_app2241/.

4. Discussion

Our study successfully developed a clinically interpretable model for early risk stratification of MAS in adult patients with Still's disease, and the results were tested across independent centers. By strictly restricting model inputs to clinical parameters recorded the first 48 h of hospitalization, the model is explicitly aligned with early clinical decision-making and resource allocation. The final model identified baseline values of ferritin, splenomegaly, PLT, TC, and ESR as five predictors and applied XGBoost method, demonstrating good discrimination (training cohort: AUC 0.867, sensitivity 0.805, specificity 0.778; test cohort: AUC 0.839, sensitivity 0.824, specificity 0.710) at an operating threshold of 0.0470, alongside acceptable probabilistic accuracy. DCA further confirmed a positive net clinical benefit across a broad range of clinically relevant thresholds, supporting its potential utility for guiding early diagnostic escalation or empirical immunosuppressive therapy. By providing actionable risk estimates at the point of admission, the model is designed to serve as an early-warning system that complements existing diagnostic criteria, enabling timely monitoring and therapeutic escalation before overt clinical deterioration occurs.

To our knowledge, this study represents the first application of ML for early risk prediction of MAS in patients with Still's disease. Existing MAS criteria, such as the modified Ravelli criteria, MS score, 2016 EULAR/ACR/PRINTO, Hscore, and so on, are primarily developed for classification and may not be suitable for some patients with evolving or established MAS who would benefit from treatment [3]. Furthermore, there are currently no ML studies focusing on Still's disease-related MAS. Against this backdrop, our approach focuses on pragmatic variables and deliberately adopted conservative and robust modeling strategy to generate actionable risk estimates before overt clinical deterioration.

Several methodological choices were made to enhance robustness and transportability. Apart from strictly packed feature filtering and selection to prevent data leakage, the complete automatic ML pipeline could reduce redundancy among inflammatory markers and mitigate overfitting. Moreover, we re-evaluated every algorithm after obtaining the feature set and evaluated a set of thresholds. In this context, XGBoost served as a gradient-boosted decision-tree ensemble that aggregates multiple simple decision rules, enabling nonlinear relationships and interactions to be captured using routine admission variables. For interpretability, SHAP values quantifies each predictor's contribution to an individual prediction, allowing clinicians to understand why a given patient is classified as higher or lower risk. We also provided an open web calculator to facilitate real-time implementation when full model deployment is not feasible.

From a clinical perspective, our model and derived score may support differentiated care pathways at admission. The operating point was chosen to prioritize sensitivity, which is clinically reasonable given the high morbidity of missed or delayed MAS recognition. The low-risk group achieved a negative predictive value of 99% and 92.5% in both cohorts, allowing clinicians to confidently allocate these patients to standard monitoring pathways. Conversely, the high-risk stratum captured the majority of MAS events, justifying intensified surveillance, earlier consultation with rheumatology or hematology, and a lower threshold for initiating life-saving interventions such as interleukin-1 or interleukin-6 inhibitors. The AUC of the density plot, the K-S curves, and the clear risk gradient indicated by the notable event rates across the score strata indicate all support the clinical interpretability, even in cases where absolute risks differ across cohorts. A pragmatic stepwise implementation, starting from local calibration and pilot testing with audit-and-feedback and then linking each risk stratum to predefined monitoring and escalation pathways, may facilitate translation into routine care settings.

The selected five predictors are biologically plausible and map onto complementary domains along the MAS trajectory [16]. In SHAP analyses, ferritin and splenomegaly were the most influential contributors, consistent with the central role of hyperferritinaemia in amplifying cytokine storm and hepatic inflammation [17,18]. Thrombocytopenia and a paradoxically low ESR may reflect consumptive coagulopathy, bone-marrow suppression, hemophagocytosis, or reduced fibrinogen synthesis that is due to liver dysfunction, which occur in MAS [[19], [20], [21], [22]]. TC conferred independent predictive value over traditional indices, potentially reflecting the hepatic-metabolic axis of the disease [23,24]. Future studies incorporating serial laboratory trajectories and cytokine profiling may help clarify the biological pathways underpinning MAS development.

This study has several strengths. It provides the first insights into the ML model to predict Still's disease-associated MAS. The use of split-site design across geographically and operationally independent centers provides a better pragmatic form compared with random splits. Importantly, predictors were intentionally restricted to early, widely available parameters in hospitalization to maximize feasibility and transportability. The modeling pipeline emphasized prevention of information leakage, control of multicollinearity, and stability-oriented feature selection, while explainability was built in via SHAP to support clinician trust. The translated of the final model into a simplified integer-based score, together with a web calculator, further supports bedside implementation and clinical usability.

However, several limitations should also be acknowledged. First, the study lacked prospective validation, and future prospective studies are necessary to confirm clinical its utility. We did not evaluate whether model-assisted workflows translate into earlier recognition of MAS, more timely escalation of care, or improved survival; these are critical impact endpoints that should be assessed in future implementation studies Second, some clinical variables, including arthralgia and rash, may be subject to inter-observer variability and documentation bias, which could influence the results of feature selection. Third, transportability across diverse ethnic populations and healthcare systems worldwide warrants further validation. Finally, important indicators include interleukin-18, CXCL9 and so on were not involved in our study as they are not routinely available [25,26].

5. Conclusion

We developed and tested an interpretable ML model to predict in-hospital MAS of adult patients with Still's disease using five routine baseline variables as ferritin, splenomegaly, PLT, TC, and ESR. By enabling a shift from reactive diagnosis to proactive risk stratification, the proposed approach has the potential to support earlier intervention. Given the rarity of the Still's disease as well as the secondary MAS, and the difficulties to design specific prospective studies, findings of these adult patients may pragmatically increase our knowledge concerning the clinical management of this disease.

CRediT authorship contribution statement

Yihe Zheng: Writing – original draft, Methodology, Formal analysis, Data curation, Conceptualization. Changyi Yang: Data curation. Xiaoxuan Cai: Data curation. Jie Zhao: Data curation. Zile Chen: Data curation. Shuni Ying: Data curation. Jianjun Qiao: Writing – review & editing, Validation, Supervision.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under grant numbers 82373465 and 82573965.

Footnotes

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.jtauto.2026.100365.

Appendix A. Supplementary data

The following is the Supplementary data to this article.

Multimedia component 1
mmc1.pdf (1.3MB, pdf)

Data availability

Data will be made available on request.

References

  • 1.Fautrel B., Mitrovic S., De Matteis A., et al. EULAR/PReS recommendations for the diagnosis and management of Still's disease, comprising systemic juvenile idiopathic arthritis and adult-onset Still's disease. Ann. Rheum. Dis. 2024;83(12):1614–1627. doi: 10.1136/ard-2024-225851. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ruscitti P., Cantarini L., Nigrovic P.A., McGonagle D., Giacomelli R. Recent advances and evolving concepts in Still's disease. Nat. Rev. Rheumatol. 2024;20:116–132. doi: 10.1038/s41584-023-01065-6. [DOI] [PubMed] [Google Scholar]
  • 3.Nigrovic P.A. Macrophage activation syndrome. Arthritis Rheumatol Hoboken NJ. 2025;77:367–379. doi: 10.1002/art.43052. [DOI] [PubMed] [Google Scholar]
  • 4.Chaisrimaneepan N., Yingchoncharoen P., Pangkanon W., Kanitthamniyom C. Macrophage activation syndrome-associated adult onset still disease treatment: a scoping review of case reports and case series. Proc. - Bayl. Univ. Med. Cent. 2025;38:499–511. doi: 10.1080/08998280.2025.2482315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ruscitti P., Cipriani P., Ciccia F., Masedu F., Liakouli V., Carubbi F., et al. Prognostic factors of macrophage activation syndrome, at the time of diagnosis, in adult patients affected by autoimmune disease: analysis of 41 cases collected in 2 rheumatologic centers. Autoimmun. Rev. 2017;16:16–21. doi: 10.1016/j.autrev.2016.09.016. [DOI] [PubMed] [Google Scholar]
  • 6.Ruscitti P., Cipriani P., Masedu F., Iacono D., Ciccia F., Liakouli V., et al. Adult-onset still's disease: evaluation of prognostic tools and validation of the systemic score by analysis of 100 cases from three centers. BMC Med. 2016;14:194. doi: 10.1186/s12916-016-0738-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Dziedzic R., Bazan-Socha S., Korkosz M., Kosałka-Węgiel J. Characteristics of 21 patients with secondary hemophagocytic lymphohistiocytosis-insights from a single-center retrospective study. Med Kaunas Lith. 2025;61:977. doi: 10.3390/medicina61060977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Triggianese P., Vitale A., Lopalco G., Mayrink Giardini H.A., Ciccia F., Al-Maghlouth I., et al. Clinical and laboratory features associated with macrophage activation syndrome in Still's disease: data from the international AIDA network Still's Disease registry. Intern. Emerg. Med. 2023;18:2231–2243. doi: 10.1007/s11739-023-03408-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.El Jammal T., Guerber A., Prodel M., Fauter M., Sève P., Jamilloux Y. Diagnosing hemophagocytic lymphohistiocytosis with machine learning: a proof of concept. J. Clin. Med. 2022;11:6219. doi: 10.3390/jcm11206219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Goecks J., Jalili V., Heiser L.M., Gray J.W. How machine learning will transform biomedicine. Cell. 2020;181:92–101. doi: 10.1016/j.cell.2020.03.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Shakoory B., Geerlinks A., Wilejto M., Kernan K., Hines M., Romano M., et al. The 2022 EULAR/ACR points to consider at the early stages of diagnosis and management of suspected haemophagocytic lymphohistiocytosis/macrophage activation syndrome (HLH/MAS) Ann. Rheum. Dis. 2023;82:1271–1285. doi: 10.1136/ard-2023-224123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yamaguchi M., Ohta A., Tsunematsu T., Kasukawa R., Mizushima Y., Kashiwagi H., et al. Preliminary criteria for classification of adult Still's disease. J. Rheumatol. 1992;19:424–430. [PubMed] [Google Scholar]
  • 13.Filipovich A.H. Hemophagocytic lymphohistiocytosis (HLH) and related disorders. Hematol Am Soc Hematol Educ Program. 2009:127–131. doi: 10.1182/asheducation-2009.1.127. [DOI] [PubMed] [Google Scholar]
  • 14.Collins G.S., Reitsma J.B., Altman D.G., Moons K.G.M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350 doi: 10.1136/bmj.g7594. [DOI] [PubMed] [Google Scholar]
  • 15.Austin P.C., Allignol A., Fine J.P. The number of primary events per variable affects estimation of the subdistribution hazard competing risks model. J. Clin. Epidemiol. 2017;83:75–84. doi: 10.1016/j.jclinepi.2016.11.017. [DOI] [PubMed] [Google Scholar]
  • 16.Dong Y., Wang T., Wu H. Heterogeneity of macrophage activation syndrome and treatment progression. Front. Immunol. 2024;15 doi: 10.3389/fimmu.2024.1389710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Macovei L.A., Burlui A., Bratoiu I., Rezus C., Cardoneanu A., Richter P., et al. Adult-onset still's Disease-A complex disease, a challenging treatment. Int. J. Mol. Sci. 2022;23 doi: 10.3390/ijms232112810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Onuora S. Ferritin-induced NETs lead to cytokine storm in AOSD. Nat. Rev. Rheumatol. 2023;19:61. doi: 10.1038/s41584-022-00899-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Crayne C.B., Albeituni S., Nichols K.E., Cron R.Q. The immunology of macrophage activation syndrome. Front. Immunol. 2019;10:119. doi: 10.3389/fimmu.2019.00119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cron R.Q., Davi S., Minoia F., Ravelli A. Clinical features and correct diagnosis of macrophage activation syndrome. Expert Rev Clin Immunol. 2015;11:1043–1053. doi: 10.1586/1744666X.2015.1058159. [DOI] [PubMed] [Google Scholar]
  • 21.Eloseily E.M.A., Minoia F., Crayne C.B., Beukelman T., Ravelli A., Cron R.Q. Ferritin to erythrocyte sedimentation rate ratio: simple measure to identify macrophage activation syndrome in systemic juvenile idiopathic arthritis. ACR Open Rheumatol. 2019;1:345–349. doi: 10.1002/acr2.11048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Han J.H., Ahn M.-H., Jung J.-Y., Suh C.-H., Kwon J.E., Yim H., et al. The levels of CXCL12 and its receptor, CXCR4, as a biomarker of disease activity and cutaneous manifestation in adult-onset still's disease. Clin. Exp. Rheumatol. 2019;37(Suppl 121):67–73. [PubMed] [Google Scholar]
  • 23.Khovidhunkit W., Kim M.-S., Memon R.A., Shigenaga J.K., Moser A.H., Feingold K.R., et al. Effects of infection and inflammation on lipid and lipoprotein metabolism: mechanisms and consequences to the host. J. Lipid Res. 2004;45:1169–1196. doi: 10.1194/jlr.R300019-JLR200. [DOI] [PubMed] [Google Scholar]
  • 24.Ruscitti P., Cipriani P., Di Benedetto P., Ciccia F., Liakouli V., Carubbi F., et al. Increased level of H-ferritin and its imbalance with L-ferritin, in bone marrow and liver of patients with adult onset still's disease, developing macrophage activation syndrome, correlate with the severity of the disease. Autoimmun. Rev. 2015;14:429–437. doi: 10.1016/j.autrev.2015.01.004. [DOI] [PubMed] [Google Scholar]
  • 25.Rocco J.M., Oved J.H., Patel R.J., Herskovits A.Z., Nair N., Shakoory B., et al. CXCL9 as a novel prognostic marker to identify high-risk adults with hemophagocytic lymphohistiocytosis. Blood. 2025 doi: 10.1182/blood.2025030976. blood. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Weiss E.S., Girard-Guyonvarc’h C., Holzinger D., de Jesus A.A., Tariq Z., Picarsic J., et al. Interleukin-18 diagnostically distinguishes and pathogenically promotes human and murine macrophage activation syndrome. Blood. 2018;131:1442–1455. doi: 10.1182/blood-2017-12-820852. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1
mmc1.pdf (1.3MB, pdf)

Data Availability Statement

Data will be made available on request.


Articles from Journal of Translational Autoimmunity are provided here courtesy of Elsevier

RESOURCES