Abstract
Background
Motor fluctuations are a common complication in later stages of Parkinson's disease (PD) and significantly affect patients' quality of life. Robustly identifying risk and protective factors for this complication across distinct cohorts could lead to improved disease management.
Objectives
The goal was to identify key prognostic factors for motor fluctuations in PD by using machine learning and exploring their associations in the context of the prior literature.
Methods
We applied interpretable machine learning techniques for time‐to‐event analysis and prediction of motor fluctuations within 4 years in three longitudinal PD cohorts. Prognostic models were cross‐validated to identify robust predictors, and the performance, stability, calibration, and utility for clinical decision‐making were assessed.
Results
Cross‐validation analyses suggest the effectiveness of the models in identifying significant baseline predictors. Movement Disorder Society‐Unified Parkinson's Disease Rating Scale parts I and II, freezing of gait, axial symptoms, rigidity, and pathogenic GBA and LRRK2 variants were positively correlated with motor fluctuations. Conversely, motor fluctuations were inversely associated with tremors and late age of onset of PD. Cross‐cohort data integration provides more stable predictions, reducing cohort‐specific bias and enhancing robustness. Decision curve and calibration analysis confirms the models' practical utility and alignment of predictions with observed outcomes.
Conclusions
Interpretable machine learning models can effectively predict motor fluctuations in PD from baseline clinical data. Cross‐cohort data integration increases the stability of selected predictors. Calibration and decision curve analyses confirm the model's reliability and utility for practical clinical applications. © 2025 The Author(s). Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.
Keywords: cross‐cohort analysis, longitudinal cohorts, machine learning, motor fluctuations, predictive modeling
Motor fluctuations (MF) are a common complication in Parkinson's disease (PD), characterized by alternating periods of worsening and improving motor function. 1 They have a significant negative effect on emotional well‐being and quality of life 2 and are associated with increased healthcare costs 3 and higher hospitalization rates. 4 Management strategies include deploying complementary medications, dietary modifications, 5 and invasive therapies, such as deep brain stimulation (DBS) or pump systems. 6 , 7 The timing of MF varies considerably among patients, 8 with approximately half of them developing MF within 5 years. 9
Significant risk‐associated factors for developing MF include younger age at onset (AAO) 5 , 9 and the presence of rigidity and bradykinesia. 10 PD‐associated genetic variants, for example, in the GBA gene, 11 , 12 , 13 and gastrointestinal (GI) disorders have also been linked with MF. 5 , 8 , 14 Conversely, the tremor‐predominant subtype is associated with a decreased risk of MF. 9
Early and reliable prognosis of MF could pave the way for new therapeutic approaches to alleviate the progression of motor symptoms. 15 However, the disease‐modifying benefit of potential early medical, physical, dietary, or lifestyle adjustments or interventions still needs to be demonstrated. 14 , 16 For this purpose, identifying prognostic factors associated with MF could facilitate precision medicine trials to test preventive or early disease management strategies for improving patient outcomes. 17 , 18
Previous studies on MF have focused on individual cohorts, which can be prone to cohort‐specific biases. 19 To address this limitation, we analyze MF by comparing single‐ and multi‐cohort machine learning (ML) analyses. Cross‐cohort studies offer a more reliable identification of risk factors and exploration of their interactions by accounting for variability across different patient populations. By combining multi‐cohort analyses with interpretable ML approaches, we aim to provide more robust and insightful prognostic models that can pave the way for designing earlier and more effective therapeutic interventions.
Patients and Methods
Study Population
Three PD cohorts were covered in our analyses: the Luxembourg Parkinson's Study (LuxPARK, 395 participants), 19 the Parkinson's Progression Markers Initiative (PPMI, 485 participants), 20 and the French ICEBERG cohort study (ICEBERG, 116 participants) 21 (see Supplementary Section S1 for inclusion/exclusion criteria). We included PD patients with or without MF, where MF was defined according to the Movement Disorder Society‐Unified Parkinson's Disease Rating Scale (MDS‐UPDRS) part IV. 22 , 23
Furthermore, participants were required to meet the following inclusion criteria:
(1) A diagnosis of PD according to the United Kingdom Parkinson's Disease Society Brain Bank Diagnostic Criteria (UKPDSBB) criteria, 24 or the presence of at least two of the following symptoms: resting tremor, bradykinesia, or rigidity, with either resting tremor or bradykinesia being one of the essential symptoms; or a single asymmetric resting tremor or asymmetric bradykinesia. 25
(2) The presence of MF within 4 years following the baseline visit or the completion of a MF assessment without presenting symptoms within 4 years.
Participants were classified into MF+ (developed MF within 4 years from baseline) and MF− groups (no MF symptoms within the 4‐year). Specifically, inclusion in the MF+ group required a score of ≥1 on items 4.3 (time spent in off state), 4.4 (functional impact of fluctuations), or 4.5 (complexity of motor fluctuations). This information, available either from the stored MDS‐UPDRS assessments or from a corresponding dedicated clinical assessment variable capturing the same information, was consistently recorded in binary format (yes/no) for all participants.
Supplementary Table S1 details the number of patients per group meeting these criteria. Individuals with baseline MF were excluded from the validation set in the cross‐validation (CV) process and the external hold‐out set to prevent bias and overly optimistic predictive performance. Demographic and baseline clinical characteristics for the cross‐cohort analysis are shown in Supplementary Table S2, with cohort‐specific statistics for LuxPARK, PPMI, and ICEBERG in Supplementary Tables S3–S5.
ML Analysis of Motor Fluctuations
We applied ML approaches for classification and time‐to‐event analysis for MF prognosis in PD (see study workflow shown in Supplementary Fig. S1). The classification approaches predict MF occurrence within 4 years, whereas the time‐to‐event analysis estimates the duration until MF onset or the last follow‐up for censored data. Data preprocessing involved testing multiple cross‐study normalization methods (mean‐centering, 26 standardization, quantile normalization, 27 , 28 ComBat, 29 , 30 Ratio‐A, 31 M‐ComBat, 32 see details in the Supplementary Section S3). The CV workflow, including model evaluation and data processing steps, is detailed in Supplementary Figures S2–S3. Variable aggregation was applied (see Supplementary Table S6) to combine related features that reflect similar underlying clinical features, thereby improving robustness and ensuring a more informative representation of predictors across cohorts. The total MDS‐UPDRS part IV score was excluded from the input to avoid circularity. Furthermore, because the LuxPARK cohort was assessed exclusively during the on state, only on state measurements were included.
Nine classification algorithms, focusing on decision tree/rule‐based methods were evaluated, including AdaBoost, 33 , 34 CART, 35 CatBoost, 36 C4.5, 37 Fast Interpretable Greedy‐Tree Sums (FIGS), 38 GOSDT‐GUESSES, 39 GBoost, 40 Hierarchical Shrinkage (HS), 41 and XGBoost. 42 For time‐to‐event analysis, we used component‐wise GBoost (CW‐GBoost), 43 Survival Trees, 44 Extra Survival Trees (EST), 45 Survival GBoost, 46 LSVM, NLSVM, 47 Penalized Cox, 48 , 49 and Survival Random Forests (SRF). 50 Classification models incorporated the scikit‐learn (v1.2.2), imodels (v1.2.6), CatBoost (v1.1), and XGBoost (v1.6.2) packages, whereas scikit‐survival (v0.20.0) was used for time‐to‐event models. All preprocessing methods, including imputation (multiple imputation by chained equations [MICE] method for variables with <50% missingness) and cross‐study normalization, were performed independently in the training and validation sets to avoid data leakage (Supplementary Fig. S3). In each outer fold, an inner fold handled hyperparameter tuning and feature selection for model optimization (see Fig. 4 and Supplementary Section S4).
FIG. 1.

Illustration of the machine learning and cross‐validation workflow. The workflow involves training and evaluating machine learning models for motor fluctuation prognosis using 5‐fold cross‐validation to assess average performance. A 3‐fold nested cross‐validation on the training set data was used to optimize hyperparameters and select the most informative features. [Color figure can be viewed at wileyonlinelibrary.com]
We observed varying degrees of imbalance in MF outcomes across our study cohort (see Results for details). To address this, we implemented an undersampling technique during model training. This involved randomly removing samples from the majority class (MF+) to achieve a more balanced representation of outcomes. Although this reduces the overall sample size, it improves the model's ability to learn patterns associated with the minority class (MF−).
To ensure cross‐cohort comparability, MF analyses used only shared baseline features across all cohorts (see the list of features in Supplementary Tables S2–S5). Two types of models were trained independently:
(1) Comprehensive model, which incorporates all shared baseline features across cohorts, capturing potential interactions and nonlinearities influencing MF.
(2) Refined model, which excludes baseline MF and levodopa intake to facilitate discovery of new predictors and improve model interpretability and robustness.
Furthermore, we refer to the best model as the one achieving the highest average area under the receiver‐operating characteristic (ROC) curve (AUC) (classification), or concordance index (C‐index) (time‐to‐event analysis) across the hyperparameter optimization within the nested CV.
Model Interpretation
Shapley additive explanations (SHAP) value analysis 51 was used to interpret the MF prediction models by quantifying each variable's predictive impact. Furthermore, we examined the median conversion time for 50% of PD patients to develop MF, using log‐rank tests to compare Kaplan–Meier (KM) curves between groups for significant differences in time‐to‐MF occurrence.
Hazard ratios (HR) were derived from SHAP scores to quantify the relative MF risk, highlighting key associations. Confidence intervals (CI) for HR estimates were calculated using bootstrapping and the percentile method at a 5% significance level. 52 Additionally, the log‐rank test was used to determine the optimal threshold for continuous variables, enabling HR calculation by stratifying patients into high‐ and low‐risk groups.
Predictive Model Evaluation
MF prediction models were evaluated using the AUC for classification performance and the C‐index for time‐to‐MF analysis. The AUC (ranging from 0–1, where 0.5 indicates no discriminative ability, values below 0.5 suggest worse than random performance, and 1.0 indicates perfect discrimination) was used a measure of a model's discriminative ability in classification tasks, whereas the C‐index (with a similar range and interpretation) served as a comparable measure of discrimination in time‐to‐event models. Both metrics offer advantages over simpler performance measures such as accuracy, because they are insensitive to class imbalance and provide a more comprehensive assessment of model performance across different classification thresholds. DeLong's test and one‐shot nonparametric tests were used to compare hold‐out test set AUCs and C‐indices to identify significant differences. Using these statistical tests, we evaluated the impact of excluding the variables “levodopa medication” and “baseline MF” and compared models with and without cross‐study normalization (normalized vs. unnormalized model). P‐values were adjusted for multiple comparisons. 53 Furthermore, Bayesian signed‐rank tests were applied to compare the best models' cross‐validated performances across cohorts, 54 whereas model stability was assessed by examining the standard deviation (SD) across CV cycles.
Analysis of Feature Importance across Multiple Cohorts
We analyzed feature selection statistics for MF classification and time‐to‐MF analysis. First, the feature selection frequency was determined across CV cycles for each cohort. We, then, compared mean selection percentages across LuxPARK, PPMI, and ICEBERG cohorts to identify the most consistent predictors.
Statistical Analyses
Hypothesis tests were used to identify between‐group differences in baseline clinical characteristics: the Mann–Whitney U test for non‐normally distributed variables, the two‐sample t test for normally distributed variables, and Fisher's exact test for categorical variables. Normality was assessed using the Shapiro–Wilk test. Analysis of variance (ANOVA) combined with Tukey's honestly significant difference test was applied to compare normally distributed data across cohorts, whereas the Kruskal‐Wallis test combined with Dunn's test was used for non‐normal data.
Correlation analysis used Spearman's correlation for continuous/ordinal variables, point‐biserial correlation for binary and continuous/ordinal variables, Matthew's correlation coefficient (MCC) for binary variables, and Kendall's τ for ordinal variables. Statistical significance was defined as P < 0.05.
Decision Curve and Calibration Analysis
Decision curve analysis (DCA) on the hold‐out test set was used to evaluate the clinical utility of predictive models by comparing the net benefit across decision thresholds when using the model to make treatment decisions versus treating all patients or none, 55 , 56 assuming adequate interventions are available.
The area under the net benefit curve (AUNBC) quantifies the model's clinical utility, with a larger AUNBC indicating a greater decision‐making advantage. AUNBC differences were tested using bootstrapped hypothesis testing with 1000 replicates for P‐value estimation. 57
Additionally, calibration analysis was performed to assess the agreement between predicted probabilities and observed outcomes for MF classification. 56 For time‐to‐MF outcomes, predicted conversion probabilities were compared to KM estimates at year 4. 58 Calibration slope and mean squared error (MSE) were examined to ensure accuracy and reliability.
Results
Single‐Cohort Analyses
Among the three cohorts, the PPMI cohort provided the highest classification performance, with the best comprehensive MF model reaching a cross‐validated AUC of 0.71 ± 0.04 and a hold‐out AUC of 0.71 (Supplementary Table S7). The refined model (excluding baseline MF and levodopa medication) yielded a cross‐validated AUC of 0.70 ± 0.05 with a hold‐out AUC of 0.63 (Supplementary Table S8). For time‐to‐MF analysis, the comprehensive model achieved cross‐validated and hold‐out C‐indices of 0.71 ± 0.05 and 0.72 (Table 1), respectively, whereas the refined model had C‐indices of 0.69 ± 0.05 and 0.70 (Supplementary Table S9).
TABLE 1.
Summary of predictive performance metrics for comprehensive time‐to‐motor fluctuations
| Single‐cohort analyses | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Algorithm | LuxPARK | PPMI | ICEBERG | ||||||
| Mean (SD) | Hold‐out C‐index | No. of features | Mean (SD) | Hold‐out C‐index | No. of features | Mean (SD) | Hold‐out C‐index | No. of features | |
| CW‐GBoost | 0.669 (0.134) | 0.606 | 9 (9) | 0.708 (0.022) | 0.705 | 11 (23) | 0.587 (0.094) | 0.594 | 5 (13) |
| Extra Survival | 0.587 (0.139) | 0.629 | 10 (10) | 0.711 (0.047) | 0.718 | 108 (114) | 0.575 (0.145) | 0.642 | 10 (10) |
| Survival GBoost | 0.598 (0.081) | 0.588 | 81 (88) | 0.681 (0.044) | 0.684 | 12 (17) | 0.628 (0.093) | 0.687 | 12 (19) |
| LSVM | 0.605 (0.161) | 0.608 | 8 (8) | 0.679 (0.060) | 0.726 | 27 (27) | 0.482 (0.068) | 0.600 | 11 (11) |
| NLSVM | 0.532 (0.173) | 0.601 | 16 (16) | 0.695 (0.043) | 0.736 | 30 (30) | 0.501 (0.105) | 0.661 | 15 (15) |
| Penalized Cox | 0.533 (0.144) | 0.565 | 1 (2) | 0.700 (0.036) | 0.710 | 29 (37) | 0.518 (0.049) | 0.624 | 14 (14) |
| Survival RF | 0.569 (0.109) | 0.626 | 9 (9) | 0.687 (0.013) | 0.716 | 65 (94) | 0.596 (0.046) | 0.603 | 11 (11) |
| Survival Trees | 0.555 (0.050) | 0.595 | 14 (20) | 0.640 (0.032) | 0.637 | 6 (7) | 0.538 (0.054) | 0.633 | 6 (8) |
| Multi‐cohort analyses | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Algorithm | Cross‐Cohort | Leave‐ICEBERG‐out | Leave‐PPMI‐out | Leave‐LuxPARK‐out | ||||||||
| Mean (SD) | Hold‐out C‐index | No. of features | Mean (SD) | Hold‐out C‐index | No. of features | Mean (SD) | Hold‐out C‐index | No. of features | Mean (SD) | Hold‐out C‐index | No. of features | |
| CW‐GBoost | 0.646 (0.035) | 0.649 | 8 (9) | 0.687 (0.025) | 0.667 | 16 (27) | 0.630 (0.065) | 0.698 | 13 (24) | 0.734 (0.032) | 0.625 | 15 (24) |
| Extra Survival | 0.640 (0.045) | 0.680 | 158 (160) | 0.685 (0.011) | 0.588 | 144 (150) | 0.644 (0.033) | 0.698 | 21 (21) | 0.729 (0.032) | 0.617 | 123 (123) |
| Survival GBoost | 0.640 (0.032) | 0.654 | 18 (29) | 0.660 (0.069) | 0.656 | 11 (24) | 0.616 (0.043) | 0.690 | 12 (21) | 0.712 (0.024) | 0.560 | 29 (57) |
| LSVM | 0.627 (0.038) | 0.687 | 42 (42) | 0.674 (0.033) | 0.589 | 57 (57) | 0.632 (0.036) | 0.696 | 40 (40) | 0.725 (0.039) | 0.572 | 53 (53) |
| NLSVM | 0.614 (0.033) | 0.648 | 47 (47) | 0.672 (0.043) | 0.612 | 55 (55) | 0.625 (0.045) | 0.661 | 32 (32) | 0.727 (0.039) | 0.586 | 54 (54) |
| Penalized Cox | 0.589 (0.082) | 0.614 | 2 (3) | 0.671 (0.046) | 0.500 | 1 (3) | 0.607 (0.042) | 0.590 | 2 (4) | 0.749 (0.037) | 0.585 | 83 (118) |
| Survival RF | 0.641 (0.034) | 0.664 | 90 (129) | 0.681 (0.037) | 0.565 | 24 (55) | 0.619 (0.030) | 0.697 | 101 (115) | 0.728 (0.034) | 0.623 | 71 (103) |
| Survival Trees | 0.590 (0.058) | 0.589 | 2 (2) | 0.629 (0.064) | 0.489 | 4 (6) | 0.597 (0.043) | 0.642 | 10 (18) | 0.665 (0.036) | 0.614 | 8 (10) |
The table presents cross‐validated and hold‐out C‐indices for single and multi‐cohort analyses and the number of features used in each model. Models with the highest average cross‐validated C‐indices in each cohort analysis across the nested cross‐validation (CV) hyperparameter optimizations are highlighted in bold and are referred to as the best models. The model with the highest hold‐out C‐index is indicated in italics. The “number of features” column includes the number of candidate features selected during CV (shown in brackets) and the subset of features demonstrating significant predictive impact identified through permutation importance analysis (preceding the brackets).
Abbreviations: LuxPARK, Luxembourg Parkinson's Study; PPMI, Parkinson's Progression Markers Initiative; ICEBERG, French ICEBERG cohort study; SD, standard deviation; No, number; CW‐GBoost, component‐wise Gradient Boosting; CV, cross‐validation; Survival GBoost, Survival Gradient Boosting; LSVM, Linear Support Vector Machine; NLSVM, Naive Linear Support Vector Machine; RF, Random Forest.
Lower predictive performances observed for LuxPARK and ICEBERG (Supplementary Figs. S4–S7) can be explained by the more imbalanced MF outcome distribution in LuxPARK (71.1% MF+) and a limited sample size in ICEBERG. In addition, differences in patient characteristics may have influenced the results, with participants from LuxPARK presenting with longer disease duration and higher MDS‐UPDRS scores than PPMI and ICEBERG (Supplementary Table S10), and PPMI presenting with the lower MDS‐UPDRS scores overall.
The cohort datasets showed varying but generally low levels of average missingness (LuxPARK: 2.1%, PPMI: 0.7%, and ICEBERG: 4.6%), suggesting a limited impact on the observed predictive performances.
Informative Predictors from Single‐Cohort Analyses
Key MF predictors across cohorts include disease duration, AAO, body weight, axial symptoms, MDS‐UPDRS parts I and II, Benton Judgment of Line Orientation (JLO), and autonomic dysfunction, including GI (Supplementary Table S11). Supplementary Table S12 highlights the consistently selected features across algorithms and cohorts, further emphasizing the importance of disease duration, MDS‐UPDRS parts I and II, and body weight as key predictors. These variables are interrelated, with disease duration influencing many covariates, including axial symptoms, MDS‐UPDRS scores, Scale for Outcomes in Parkinson's disease for Autonomic symptoms (SCOPA‐AUT), and disease severity features in general.
Multi‐Cohort Analyses
Multi‐cohort models showed higher robustness of performance statistics and generalizability than the single‐cohort models, despite slightly lower average predictive performance than PPMI‐only models. The cross‐cohort classification model with the highest performance reached a cross‐validated AUC of 0.64 ± 0.01 and a hold‐out AUC of 0.62 (Supplementary Table S7). For time‐to‐MF analysis, the corresponding models provided a cross‐validated C‐index of 0.65 ± 0.04 and a hold‐out C‐index of 0.65 (Table 1).
Despite variation in performance when single cohorts were left out, multi‐cohort models generally provided more stable results in the CV. Supplementary Table S13 presents the significance of differences between the performance statistics for different model types (comprehensive vs. refined). In addition, we investigated the effect of cross‐study normalization (mean‐centering), showing that it significantly improves hold‐out performance in the cross‐cohort time‐to‐MF models (P = 0.011) (Supplementary Tables S13–S14).
Overall, multi‐cohort analyses provided more stable and robust models than single‐cohort analyses, although the PPMI cohort reached the best performance. Despite slightly lower performance, multi‐cohort models may still be preferable because of their greater stability (Supplementary Figs. S8–S11) and generalizability across different patient populations.
Informative Predictors from Multi‐Cohort Analyses
To identify informative clinical features for MF prognosis, we analyzed their predictive impact and statistical associations. In the cross‐cohort analyses, levodopa medication, disease duration, the presence of dyskinesia, and resting tremor were the most frequently selected features (Supplementary Table S15). Levodopa medication, disease duration, and rigidity were consistently linked to higher MF likelihood, whereas resting tremor, and Benton JLO (higher scores indicate better visuospatial ability) were negatively correlated with MF severity (Fig. 2, Supplementary Fig. S12, and Supplementary Table S16 for the correlations with MF severity).
FIG. 2.

Shapley additive explanations (SHAP) value plot illustrating the top individual predictors for the best comprehensive model in cross‐cohort motor fluctuations classification. SHAP plot, providing a detailed view of how individual predictors influence the model's predictions. This visualization allows us to understand not just the overall importance of each predictor, but also how different values of that predictor impact the likelihood of motor fluctuations across our dataset. Each row represents a predictor, ordered by overall importance from top to bottom, and each point represents an individual observation in the dataset. The color of the points ranges from blue (lower values) to red (higher values) for that predictor. The position of points on the x‐axis shows the impact on the model's prediction: points to the right of zero indicate the predictor pushed the model toward predicting motor fluctuations, and points to the left of zero indicate the predictor pushed the model away from predicting motor fluctuations. Therefore, the SHAP plot shows whether high or low values of a predictor are associated with a higher or lower likelihood of predicted motor fluctuations. [Color figure can be viewed at wileyonlinelibrary.com]
In the time‐to‐MF analysis, key predictors included levodopa medication, disease duration, and dyskinesia (Fig. 3 and Supplementary Fig. S13). Although HR for these factors in the best comprehensive time‐to‐MF model did not reveal substantial differences between patient groups, log‐rank tests showed significant differences for disease duration and dyskinesia. Levodopa, although a key predictor, did not display significant differences in the time‐to‐MF analysis (Table 2). We note that disease duration significantly correlates with other features, including AAO, axial symptoms, bradykinesia, dyskinesia, the Hoehn and Yahr (H&Y) stage, and rigidity (Supplementary Table S17), highlighting the complexity of PD progression.
FIG. 3.

Shapley additive explanations (SHAP) value plot illustrating the most informative predictors for the best comprehensive model in the cross‐cohort time‐to‐motor fluctuation analysis. SHAP plot, providing a detailed view of how individual predictors influence the model's predictions. This visualization allows us to understand not just the overall importance of each predictor, but also how different values of that predictor impact the likelihood of motor fluctuations across our dataset. Each row represents a predictor, ordered by overall importance from top to bottom, and each point represents an individual observation in the dataset. The color of the points ranges from blue (lower values) to red (higher values) for that predictor. The position of points on the x‐axis shows the impact on the model's prediction: points to the right of zero indicate the predictor pushed the model toward predicting motor fluctuations, and points to the left of zero indicate the predictor pushed the model away from predicting motor fluctuations. Therefore, the SHAP plot shows whether high or low values of a predictor are associated with a higher or lower likelihood of predicted motor fluctuations. [Color figure can be viewed at wileyonlinelibrary.com]
TABLE 2.
A summary of hazard ratio (HR), median conversion times with 95% confidence interval (CI), and P‐values from the log‐rank test for the top 15 predictors in the cross‐cohort time‐to‐MF model is presented below.
| Predictors | HR (95% CI) | Median conversion times (95% CI) | Log‐rank (P‐values) |
|---|---|---|---|
| Dyskinesia score | |||
| ≥1 | 1.00 (0.8–2.37) | 0.95 (0–2.44) | 1.44E−06 |
| <1 | 4.07 (3.77–4.36) | ||
| Levodopa treatment | |||
| Yes | 1.14 (1.00–1.56) | 3.01 (2.27–4.07) | 0.446 |
| No | 4.19 (3.77–4.51) | ||
| MDS‐UPDRS I‐fatigue (normal) | |||
| Yes | 0.70 (0.51–0.94) | 4.44 (4.03–4.76) | 0.116 |
| No | 3.03 (2.34–4.00) | ||
| MDS‐UPDRS IV‐painful off‐state dystonia (slight) | |||
| Yes | 5.06 (1.72–16.23) | 0 (0–0) | 9.87E−08 |
| No | 4.01 (3.44–4.28) | ||
| Pathogenic LRRK2 variant | |||
| Yes | 2.02 (1.41–2.92) | 0.99 (0.11–2.03) | 2.15E−06 |
| No | 4.11 (3.84–4.44) | ||
| Pathogenic GBA variant (pathogenic variant) | |||
| Yes | 3.78 (2.13–6.26) | 1.02 (0.28–2.26) | 5.45E−07 |
| No | 4.11 (3.79–4.44) | ||
| Disease duration since PD diagnosis (years) | |||
| ≥2 | 1.10 (0.98–1.49) | 2.76 (2.03–4.01) | 0.011 |
| <2 | 4.28 (4–4.68) | ||
| MDS‐UPDRS I‐depressed moods (mild) | |||
| Yes | 1.45 (1.00–2.57) | 3.01 (0–5.02) | 0.167 |
| No | 4.03 (3.31–4.28) |
The HR indicates the risk associated with each predictor, whereas the median conversion time and the log‐rank test assess the Kaplan–Meier curve differences between groups.
Abbreviations: HR, hazard ratio; CI, confidence interval; MF, motor fluctuations; MDS‐UPDRS, Movement Disorder Society‐Unified Parkinson's Disease Rating Scale; PD, Parkinson's disease.
Additionally, fatigue and dystonia were associated with increased MF likelihood (HR of 1.4 and 5.1, respectively). Genetic factors were also significant predictors, including pathogenic LRRK2 variants (HR, 2.0; 95% CI, [1.4–2.9]), and pathogenic GBA variants (HR, 3.8; 95% CI, [2.1–6.3]).
Levodopa equivalent daily dose (LEDD) analysis showed that MF+ patients had a significantly higher mean baseline LEDD (780.7 mg) compared to MF− patients (587.2 mg) (P = 4.5E−04) (Supplementary Table S18). Moreover, time‐to‐MF was significantly longer for patients with LEDD <400 mg (2.3 years) compared to those with LEDD ≥400 mg (1.3 years, P = 3.2E−03).
Decision Curve and Calibration Analysis
DCA and calibration analysis were applied to assess model reliability and clinical utility. Both classification models, such as GBoost and AdaBoost, and time‐to‐MF analysis models, such as LSVM and NLSVM, were well calibrated (Supplementary Table S19) and showed higher net benefits than simple “treat all” or “treat none” strategies (Supplementary Figs. S14–S17), supporting the models' potential for individualized risk assessment.
Discussion
Predicting MF in PD is challenging because of interindividual variability and a multitude of influencing factors. Our study addresses these complexities by evaluating the robustness and accuracy of interpretable prediction models across multiple cohorts. Instead of focusing on a single approach, we assessed several multivariable ML classification and time‐to‐MF models. In addition, we ranked predictors robustly using feature selection statistics across CV cycles and cohorts. Below, we discuss the potential of these ML models as prognostic tools and interpret the identified predictors.
Comparative Evaluation of Predictive Models
Previous studies predicting MF in PD have achieved varying degrees of success. A few candidate prognostic factors have been highlighted, for example, one study reported disease duration and levodopa dose as key predictors. 59 A further study confirmed that longer disease duration and younger AAO are significant predictors. 60 However, most prior studies relied on a single cohort with small sample sizes and a large number of variables, which can lead to model overfitting, potentially affecting generalizability. To help address these limitations, we used multivariable ML approaches, cross‐cohort analyses, and conducted external evaluations.
We used various modeling approaches, each with distinct advantages. Tree‐based algorithms such as AdaBoost, CatBoost, and XGBoost handle complex variable interactions for classification well, while support vector machines (LSVM and NLSVM) perform well in time‐to‐event analyses because of their ability to handle censored information. Although we focused on modeling approaches providing interpretable information on feature relevance, future studies might still consider more complex approaches, such as meta‐learning combinations of multiple algorithms, to further improve predictive performance.
Given the variability in MF severity score distributions across cohorts, our approach focused on a binary classification approach instead of modeling MF severity as a continuous variable. Although severity scores may provide more specific results, the distinctive score distributions suggest that severity assessments may differ strongly between studies, potentially leading to inconsistencies. Conversely, binary classification facilitates cross‐cohort comparability and enables performance evaluation using standard ROC‐based metrics, providing both robust and comparable assessments of discriminative ability.
Our results highlight the value of cross‐cohort data integration, which improved model robustness and generalizability. Although cross‐study prediction is more challenging than building models for a single cohort, the best‐performing cross‐cohort model achieved AUCs/C‐indices around 0.7, indicating a significant, although likely still improvable, discriminative ability. Moreover, the comparatively higher performance achieved for the PPMI cohorts indicate the benefits of large sample sizes and extended follow‐ups. Despite the fact that our longitudinal cross‐study classification presents additional challenges compared to previous cross‐sectional, single‐study analyses, the predictive power for MF prognosis in our study is comparable to that observed in previous studies on motor complications in PD. For instance, previous studies reached an AUC of 0.68 for dyskinesia prognosis, 61 an accuracy of 72% for cross‐sectional MF detection, 59 and AUC scores between 60% and 82% for cross‐sectional discrimination between on and off medication states. 62
The predictive models for MF classification and time‐to‐MF analysis also demonstrate superior net benefit over simple strategies such as “treat all” or “treat none” in DCA, supporting their potential role in risk‐based patient management. By identifying patients at higher risk of MF, these models may aid in designing earlier intervention strategies, including pharmacological or lifestyle interventions, to delay or mitigate symptom progression. In addition, calibration analysis confirmed a strong agreement between predicted and observed outcomes, supporting the reliability of the models for real‐world applications. This agreement indicates the potential for optimized versions of these models to be integrated into clinical decision making processes to improve patient stratification for precision medicine trials of MF‐targeted interventions. Future research should explore the prospective validation of such optimized models in clinical settings to assess their utility in guiding treatment strategies.
However, important challenges remain for clinical application. Variability in performance across cohorts suggests that cohort‐specific biases influence model accuracy. Identified prognostic features such as LEDD, disease duration, and non‐motor symptoms underscore the complexity of predicting MF onset, because these are known to vary widely among patients. 63 Although these models could inform future precision medicine studies, further optimization and external validation is still needed.
Interpretation of Models and Predictive Features
When interpreting baseline clinical features as MF prognostic predictors, it is important to consider the variations in baseline characteristics across the LuxPARK, PPMI, and ICEBERG cohorts. Longer disease durations in LuxPARK suggest current clinical characteristics in this cohort may have a greater influence on MF development. In contrast, lower average body weight and body mass index in ICEBERG may influence motor symptom progression because of potential differences in lifestyle, physical activity, and dietary habits. Although PPMI shares similar disease durations with ICEBERG, it displays significantly lower MDS‐UPDRS and SCOPA‐AUT scores, indicating other factors may be implicated in modulating symptom severity and non‐motor symptoms. Because of these cohort‐specific characteristics, our focus on cross‐cohort modeling ensures a more balanced interpretation of MF predictors.
Previous research has linked levodopa medication to MF, 64 , 65 but a recent trial contradicted these findings. 66 Our univariate analysis showed a significant but potentially indirect association, with higher LEDD (≥400 mg) correlating with increased MF risk. 9 , 15 However, multivariate ML models did not find a statistically significant difference in MF risk on baseline levodopa use, and KM curves indicated no significant impact. These results emphasize the value of multivariate models in capturing complex interactions that univariate analyses may overlook.
Significant correlations between levodopa intake at baseline, disease duration, and H&Y stage of 0.47 and 0.34 (P = 5.4E−55 and 1.5E−27), respectively, 14 suggest that levodopa's association with MF may be indirectly linked to disease progression (we note that the variable levodopa refers only to levodopa intake at baseline, not to LEDD). Besides its role in MF risk, higher LEDD correlated with greater MF severity (Spearman correlation in LuxPARK of 0.42, P = 1.7E−18), further supporting its association with symptom severity. Furthermore, the comprehensive and refined models demonstrated comparable hold‐out performance in cross‐cohort analyses, with no statistically significant differences. This suggests that, despite the apparent influence of levodopa in univariate analyses, its predictive value may be overshadowed by other strongly correlated predictors. The performance of the refined model indicates that predictors associated with levodopa, such as disease duration, can be sufficient for maintaining predictive capability when levodopa is excluded.
Disease duration and H&Y stage were consistently associated with an increased risk of developing MF, 12 reflecting associations between disease stage and motor impairments. 67 In addition, longer disease duration was significantly correlated with greater MF severity, further emphasizing its association with overall symptom burden (Spearman correlation of 0.41, P = 1.0E−38). Moreover, an earlier AAO was associated with longer disease duration (Spearman correlation of −0.22, P = 3.4E−12), which in turn was associated with a higher risk of MF (point‐biserial correlation of 0.3, P = 6.7E−22; see beginning of article). 5 , 9 These results highlight the association of disease progression with both the onset and severity of MF and the relevance of disease duration and staging in identifying high‐risk patients for targeted management strategies.
Higher baseline MDS‐UPDRS part II scores, reflecting severe motor impairments such as freezing of gait (FoG), are associated with the development of MF. 61 This is expected, as FoG often occurs during off‐phases, which are central to MF. Impaired movement speed, gait instability, and axial symptoms, including gait impairments, and rigidity, derived as aggregated measures from MDS‐UPDRS part II and III assessments, have also been linked with MF onset and severity. 10 , 22 , 68 Conversely, the observed negative correlation between tremors and MF may reflect the milder disease progression in tremor‐dominant patients. 69 In addition, superior visuospatial ability (Benton JLO) was associated with a lower risk of MF, potentially because of its association with reduced risk of FoG and lower tremor severity. 70 , 71
MDS‐UPDRS part I scores, capturing non‐motor symptoms, were also associated with an elevated MF risk, likely because of the high correlation with motor symptoms and general disease severity. 61 Furthermore, fatigue and sleep disturbances (including insomnia and excessive daytime sleepiness) were found significantly correlated 72 and to be positively associated with MF, 73 possibly because of their association with overall disease burden and medication response. 74 , 75 Patients experiencing fatigue are more likely to develop apathy and depression, 76 suggesting common underlying pathophysiological mechanisms, potentially linked to dopaminergic dysfunction. 77 Fatigue symptoms have been associated with impaired dopaminergic neurotransmission in the striatum, 78 a finding that may be affecting the levodopa response, 79 and consequently, the risk of MF. 80 Similarly, autonomic dysfunction showed positive MF associations, in line with previous studies. 8 , 81 Among autonomic symptoms, urinary dysfunction and GI correlate with the severity of motor and non‐motor impairments 82 , 83 and have been proposed as early markers of motor and non‐motor disability in PD, 84 , 85 preceding the onset of more pronounced motor complications.
Finally, considering genetic factors, the cross‐cohort analysis revealed that pathogenic GBA mutations are associated with increased risk of MF, 13 potentially reflecting the more aggressive disease progression in GBA mutation carriers. 11 , 13 Conversely, although pathogenic LRRK2 mutations are also associated with MF, 86 the effect size is lower (HR, 2.0 and 3.8 for LRRK2 and GBA). GBA mutation carriers also show more severe motor progression compared to LRRK2 mutation carriers, 87 a more advanced disease stage as measured by H&Y, 88 , 89 which is positively correlated with MF severity (Spearman correlation of 0.19, P = 3.3E−09). In addition, both GBA and LRRK2 mutations have been linked to dyskinesia, 90 a common PD complication linked to MF. This highlights the multifaceted impact of these genetic variants, disease progression, and MF severity, and indicates the need to consider these factors when assessing motor complications in PD.
Although some of our findings, such as the association of LEDD, disease duration, and disease severity with MF risk, align with the clinical experience of many practitioners and previous research, our study goes beyond confirming these clinical observations in several ways. First, we provide precise quantifications of these associations, which can inform evidence‐based clinical decision‐making. Second, by demonstrating these patterns across multiple cohorts, we offer robust, generalizable evidence for these clinical observations. Third, our comprehensive approach has identified less obvious predictors, such as specific genetic variants, which may not be as readily apparent in routine clinical practice. Finally, by combining these factors into robust, multi‐factorial predictive models validated across multiple cohorts, we offer a tool that goes beyond simple association to provide individualized risk prediction. These models may, therefore, help to lay the ground for the design of precision medicine trials and prognostic applications to enable earlier and potentially more effective interventions against MF in PD.
Practical Applications in Clinical Trial Design
The predictive models developed in this study have several potential applications in improving protocol development and participant recruitment for clinical trials focused on MF in PD. First, our models can aid in risk stratification of potential participants for upcoming precision medicine trials. By identifying patients at higher risk of developing MF, researchers can enrich their study population with individuals more likely to experience the outcome of interest. This could potentially reduce the required sample size and duration of trials, making them more efficient and cost‐effective. Second, the identified predictors can inform inclusion and exclusion criteria for trials. For instance, trials testing interventions to prevent or delay MF might specifically target patients with MF‐associated characteristics derived from our models, for example, patients with PD‐associated GBA and LRRK2 variants, given their higher risk of developing MF. Last, these models could be used to develop personalized follow‐up schedules in trials. Participants predicted to be at higher risk of MF might be monitored more frequently, allowing for more timely detection of MF onset.
Summary and Conclusions
This study combined interpretable ML techniques and data from three cohorts to identify robust MF predictors in PD. Cross‐cohort data integration improved model generalizability, reduced cohort‐specific biases, and improved overall stability and robustness. Moreover, decision curve and calibration analyses showed that the models offer greater benefits for clinical decision‐making than simple “treat all” or “treat none” strategies.
Among the key predictors, baseline LEDD was strongly associated with MF, likely because of its association with disease duration and progression. Furthermore, disease severity and disease stage, as indicated by variables such as MDS‐UPDRS part I scores, H&Y stage and symptoms associated with late disease stage, such as dyskinesia, were important predictors, and subjects with tremor‐dominant PD showed a less severe progression and delayed onset of MF.
Overall, this study highlights the potential of interpretable ML models for predicting MF in PD, emphasizing the benefits of cross‐cohort data integration for improving model stability and generalizability. Despite promising results in terms of statistical significance and net benefit of the models, variability in hold‐out performance across cohorts remains a challenge. These predictive models offer practical tools for future clinical trial design in PD research. By helping to identify patients at higher risk of developing MF, they can contribute to more targeted and efficient trials of preventive or early interventions. However, it is important to note that these models are imperfect predictors, and their use should be combined with expert clinical judgment and consideration of other relevant factors in trial design and patient care. Follow‐up research should further optimize and validate the predictive models across more diverse cohorts to increase their value design of future precision clinical trials. Moreover, although a standardized methodology was used to assess MF, capturing general MF risk patterns and facilitating cross‐cohort comparisons, this does not differentiate between more specific symptoms, such as gait disturbance versus tremor fluctuations. Future prospective studies incorporating more precise symptom tracking, such as patient diaries or wearable sensors, could provide deeper insights into symptom‐specific fluctuations and at the same time reveal features that are more robust for ML applications. These tools could provide a clearer understanding of specific patterns of MF and their associations with underlying predictors.
In conclusion, our study provides a comprehensive, data driven analysis of predictors for MF in PD. Although some of our findings, such as the importance of LEDD, disease duration, and disease severity, align with well‐established clinical experience, our work provides precise quantification of these associations, which can be valuable for clinical decision making and trial design. Moreover, we have identified additional predictors, particularly genetic factors such as GBA and LRRK2 variants, which add new dimensions to our understanding of MF risk. By integrating several of these factors into robust predictive models trained and validated across distinct cohorts, we offer a tool that could improve the design of precision medicine trials and potentially inform personalized patient management strategies. Future research should focus on prospective validation of these models and exploration of their utility in clinical practice and trial design.
Author Roles
Rebecca Ting Jiin Loo: Writing original draft, visualization, validation, methodology, formal analysis. Lukas Pavelka: Data collection, review & editing. Graziella Mangone: review & editing. Fouad Khoury: review & editing. Marie Vidailhet: review & editing. Jean‐Christophe Corvol: review & editing. Rejko Krüger: review & editing. Enrico Glaab: Review & major editing of all parts of the manuscript, supervision, methodology, investigation, funding acquisition.
Ethics Statement
All participants enrolled in the Luxembourg Parkinson's Study provided written informed consent. The study received approval from the National Research Ethics Committee (CNER Ref: 201407/13) and adhered to the principles outlined in the Declaration of Helsinki. Additionally, the Luxembourg Parkinson's Study is registered with ClinicalTrials.gov under the identifier NCT05266872.
Supporting information
Data S1.
Fig. S1. The study workflow includes single‐cohort and multi‐cohort analyses, focusing on motor fluctuations (MF) classification and time‐to‐MF analysis to identify key predictors for MF. In the single‐cohort analysis, predictive models were developed and validated separately for the LuxPARK, PPMI, and ICEBERG cohorts, identifying cohort‐specific predictors. The multi‐cohort analysis integrated data across all cohorts to increase the generalizability and robustness of the predictions.
Fig. S2. Workflow for data processing and model development, illustrating the process, including variable aggregation and the nested cross‐validation workflow (Fig. 1) for model training and evaluation. Data processing steps, including missing values imputation, cross‐study normalization, one‐hot encoding, under sampling, and feature selection, were conducted independently on the training and validation sets to avoid data leakage, as detailed in the Supplementary Fig. S3. The comprehensive model included all features, while the refined model was trained by excluding baseline motor fluctuations and levodopa intake.
Fig. S3. Overview of the data processing and analysis workflow applied during each cross‐validation cycle to optimize and evaluate the prognostic models. The workflow covers multiple steps, including missing value imputation, cross‐study normalization, one‐hot encoding, under sampling, and feature selection.
Fig. S4. A comparison of cross‐validated AUC scores and probabilities for superior predictive performance for the best comprehensive classification model for motor fluctuations in the multi‐cohort analyses. The upper section presents boxplots of cross‐validated AUC scores for each cohort, with the dotted line indicating the hold‐out AUC scores. The lower section shows probabilities of one cohort's predictive performance surpassing another's. The arrows indicate a higher probability of predictive performance, highlighting that the cohort with a higher likelihood of superior performance for the best model is the one towards which the arrows point.
Fig. S5. A comparison of cross‐validated AUC scores and probabilities for superior predictive performance for the best refined classification model for motor fluctuations in the multi‐cohort analyses. The upper section presents boxplots of cross‐validated AUC scores for each cohort, with the dotted line indicating the hold‐out AUC scores. The lower section shows probabilities of one cohort's predictive performance surpassing another's. The arrows indicate a higher probability of predictive performance, highlighting that the cohort with a higher likelihood of superior performance for the best model is the one towards which the arrows point.
Fig. S6. A comparison of cross‐validated C‐indices and probabilities for superior predictive performance for the best comprehensive time‐to‐MF model in the multi‐cohort analyses. The upper section presents boxplots of cross‐validated C‐indices for each cohort, with the dotted line indicating the hold‐out C‐indices. The lower section shows probabilities of one cohort's predictive performance surpassing another's. The arrows indicate a higher probability of predictive performance, highlighting that the cohort with a higher likelihood of superior performance for the best model is the one towards which the arrows point.
Fig. S7. A comparison of cross‐validated C‐indices and probabilities for superior predictive performance for the best refined time‐to‐MF model in the multi‐cohort analyses. The upper section presents boxplots of cross‐validated C‐indices for each cohort, with the dotted line indicating the hold‐out C‐indices. The lower section shows probabilities of one cohort's predictive performance surpassing another's. The arrows indicate a higher probability of predictive performance, highlighting that the cohort with a higher likelihood of superior performance for the best model is the one towards which the arrows point.
Fig. S8. Stability analysis for the comprehensive models for predicting motor fluctuations (MF) in Parkinson's disease (PD) across different algorithms and cohort studies. The stability of the model is evaluated by calculating the standard deviations of the area under the curve (AUC) values across the cross‐validation cycles. A lower standard deviation (SD) indicates a higher stability of the predictive models.
Fig. S9. Stability analysis of comprehensive time to motor fluctuation (MF) models in Parkinson's disease (PD) across different algorithms and cohort studies. The stability of the model is evaluated by calculating the standard deviations of the C‐indices across the cross‐validation cycles. A lower standard deviation (SD) indicates a higher stability of the predictive models.
Fig. S10. Stability analysis of refined models for predicting motor fluctuations (MF) in Parkinson's disease (PD) across different algorithms and cohort studies. The stability of the model is evaluated by calculating the standard deviations of the area under the curve (AUC) values across the cross‐validation cycles. A lower standard deviation (SD) indicates a higher stability of the predictive models.
Fig. S11. Stability analysis of refined time to motor fluctuation (MF) models in Parkinson's disease (PD) across different algorithms and cohort studies. The stability of the model is evaluated by calculating the standard deviations of the C‐indices across the different cross‐validation cycles. A lower standard deviation (SD) indicates a higher stability of the predictive models.
Fig. S12. SHAP value plot illustrating the most informative predictors for the best refined model for cross‐cohort classification of motor fluctuations. SHAP (SHapley Additive exPlanations) plot, providing a detailed view of how individual predictors influence the model's predictions. This visualization allows us to understand not just the overall importance of each predictor, but also how different values of that predictor impact the likelihood of motor fluctuations across our dataset. Each row represents a predictor, ordered by overall importance from top to bottom, and each point represents an individual observation in the dataset. The color of the points ranges from blue (lower values) to red (higher values) for that predictor. The position of points on the x‐axis shows the impact on the model's prediction: Points to the right of zero indicate the predictor pushed the model towards predicting motor fluctuations, and points to the left of zero indicate the predictor pushed the model away from predicting motor fluctuations. Thus, the SHAP plot shows whether high or low values of a predictor are associated with a higher or lower likelihood of predicted motor fluctuations.
Fig. S13. SHAP value plot illustrating the most informative predictors for the best refined model in the cross‐cohort time‐to‐motor fluctuation analysis. SHAP (SHapley Additive exPlanations) plot, providing a detailed view of how individual predictors influence the model's predictions. This visualization allows us to understand not just the overall importance of each predictor, but also how different values of that predictor impact the likelihood of motor fluctuations across our dataset. Each row represents a predictor, ordered by overall importance from top to bottom, and each point represents an individual observation in the dataset. The color of the points ranges from blue (lower values) to red (higher values) for that predictor. The position of points on the x‐axis shows the impact on the model's prediction: Points to the right of zero indicate the predictor pushed the model towards predicting motor fluctuations, and points to the left of zero indicate the predictor pushed the model away from predicting motor fluctuations. Thus, the SHAP plot shows whether high or low values of a predictor are associated with a higher or lower likelihood of predicted motor fluctuations.
Fig. S14. Bar plot illustrating the area under the positive net benefit curve for various best cross‐cohort comprehensive motor fluctuations classification models. The lines indicate significant differences in the net benefit area across the models. The blue bars represent models with a larger positive net benefit area than the negative net benefit, whereas the red bars indicate the opposite. The numbers within the bars represent the difference in net benefit area relative to the “all intervention” strategy. Arrows pointing upwards (↑) indicate a larger area than the “all intervention” strategy, while arrows pointing downwards (↓) indicate a smaller area.
Fig. S15. Bar plot illustrating the area under the positive net benefit curve for various best cross‐cohort refined motor fluctuations classification models. The lines indicate significant differences in the net benefit area across the models. The blue bars represent models with a larger positive net benefit area than the negative net benefit, whereas the red bars indicate the opposite. The numbers within the bars represent the difference in net benefit area relative to the “all intervention” strategy. Arrows pointing upwards (↑) indicate a larger area than the “all intervention” strategy, while arrows pointing downwards (↓) indicate a smaller area.
Fig. S16. Bar plot illustrating the area under the positive net benefit curve for various best cross‐cohort comprehensive time‐to‐MF models. The lines indicate significant differences in the net benefit area across the models. The blue bars represent models with a larger positive net benefit area than the negative net benefit, whereas the red bars indicate the opposite. The numbers within the bars represent the difference in net benefit area relative to the “all intervention” strategy. Arrows pointing upwards (↑) indicate a larger area than the “all intervention” strategy, while arrows pointing downwards (↓) indicate a smaller area.
Fig. S17. Bar plot illustrating the area under the positive net benefit curve for various best cross‐cohort refined time‐to‐MF models. The lines indicate significant differences in the net benefit area across the models. The blue bars represent models with a larger positive net benefit area than the negative net benefit, whereas the red bars indicate the opposite. The numbers within the bars represent the difference in net benefit area relative to the “all intervention” strategy. Arrows pointing upwards (↑) indicate a larger area than the “all intervention” strategy, while arrows pointing downwards (↓) indicate a smaller area.
Table S1. For each considered cohort and their combination (see column 1), the following statistics are provided: Column 2: The numbers of PD patients who met inclusion criterion 1, i.e. a diagnosis of Parkinson's disease according to the UK Parkinson's Disease Society Brain Bank Diagnostic Criteria (UKPDSBB) criteria, or the presence of at least two of the following symptoms: resting tremor, bradykinesia, or rigidity with either resting tremor or bradykinesia, or a single asymmetric rest tremor or asymmetric bradykinesia (see also the Methods sub‐section ‘Study population’ in the main manuscript). Column 3: The number of PD patients who met inclusion inclusion criterion 2, i.e. subjects with the confirmed presence of motor fluctuations within four years following the baseline visit or the completion of a motor fluctuations assessment without presenting symptoms within four years. Columns 3 and 4: The “Events” columns indicate the number and percentage of the patients for whom motor fluctuations were detected within the 4‐year follow‐up period considered for classification analysis (Column 3) and the time‐to‐MF analyses (Column 4) for each cohort separately (first three rows) and across all cohorts combined (fourth row).
Table S2. Overview of the demographic and baseline clinical characteristics for the subjects who developed MF (MF+) or did not develop MF (MF−) in the cross‐cohort analysis during a 4‐year follow‐up period. p‐Values for the significance of differences between the MF− and MF+ groups for individual features are shown in the last column.
Table S3. Overview of the demographic and baseline clinical characteristics for the subjects who developed MF (MF+) or did not develop MF (MF−) in the LuxPARK cohort during a 4‐year follow‐up period. p‐Values for the significance of differences between the MF− and MF+ groups for individual features are shown in the last column.
Table S4. Overview of the demographic and baseline clinical characteristics for the subjects who developed MF (MF+) or did not develop MF (MF−) in the PPMI cohort during a 4‐year follow‐up period. P‐values for the significance of differences between the MF− and MF+ groups for individual features are shown in the last column.
Table S5. Overview of the demographic and baseline clinical characteristics for the subjects who developed MF (MF+) or did not develop MF (MF−) in the ICEBERG cohort during a 4‐year follow‐up period. P‐values for the significance of differences between the MF− and MF+ groups for individual features are shown in the last column.
Table S6. The definition of aggregated feature variables derived from original MDS‐UPDRS variables, where missing values were addressed by averaging non‐missing related variables. Items retained for analysis were marked with an asterisk (*), ensuring that the data were representative and reducing the potential for bias.
Table S7. Summary of predictive performance metrics for comprehensive motor fluctuations classification. The table presents cross‐validated and hold‐out AUC values for single and multi‐cohort analyses and the number of features used in each model. Models with the highest average cross‐validated AUC scores in each cohort analysis across the nested CV hyper parameter optimizations are highlighted in bold and are referred to as the best models. The model with the highest hold‐out AUC score is indicated in italics. The “number of features” column includes the number of candidate features selected during cross‐validation (shown in brackets) and the subset of features demonstrating significant predictive impact identified through permutation importance analysis (preceding the brackets).
Table S8. Summary of predictive performance statistics for refined motor fluctuation classification (excluding baseline motor fluctuations and levodopa medication as input features). The table presents cross‐validated and hold‐out AUC values for single and multi‐cohort analyses and the number of features used in each model. Models with the highest average cross‐validated AUC scores in each cohort analysis are highlighted in bold and are referred to as the best models. The model with the highest hold‐out AUC score is indicated in italics. The “number of features” column includes the number of candidate features selected during cross‐validation (shown in brackets) and the subset of features demonstrating significant predictive impact identified through permutation importance analysis (preceding the brackets).
Table S9. Summary of predictive performance statistics for refined time‐to‐motor fluctuations (excluding baseline motor fluctuations and levodopa medication as input features). The table presents cross‐validated and hold‐out C‐indices for single and multi‐cohort analyses and the number of features used in each model. Models with the highest average cross‐validated C‐indices in each cohort analysis are highlighted in bold and are referred to as the best models. The model with the highest hold‐out C‐index is indicated in italics. The “number of features” column includes the number of candidate features selected during cross‐validation (shown in brackets) and the subset of features demonstrating significant predictive impact identified through permutation importance analysis (preceding the brackets).
Table S10. Comparative analysis of baseline features’ mean differences between the LuxPARK, PPMI, and ICEBERG cohorts. The p‐values highlight statistically significant differences in predictor averages between specific cohort pairs, revealing variations in predictor distributions specific to each cohort in motor fluctuations analysis.
Table S11. Statistics on the average percentage of times predictors were selected during 5‐fold cross‐validation analyses. It compares data for the best comprehensive and refined models in motor fluctuations classification and time‐to‐motor fluctuations analyses across the LuxPARK, PPMI, and ICEBERG cohorts. The information presented includes: “Average in CV (%)” – the average percentage of times each feature was chosen in 5‐fold CV for single‐cohort analyses in LuxPARK, PPMI, and ICEBERG for both motor fluctuations and time‐to‐motor fluctuations analyses; and “Average (%)” – the mean of the “Average in CV (%)” across all cohorts. Features are listed in descending order based on their overall average selection percentages in the best comprehensive and refined models for motor fluctuations and time‐to‐motor fluctuations analyses, with the top 15 most consistent features presented.
Table S12. Statistics on the average percentage of times predictors were selected during 5‐fold cross‐validation analyses across the machine learning algorithms. It compares data for the best comprehensive and refined models for each machine learning algorithm in motor fluctuations classification and time‐to‐motor fluctuations analyses across the LuxPARK, PPMI, and ICEBERG cohorts. The information presented includes: “Average in CV (%)” – the average percentage of times each feature was chosen in 5‐fold CV for single‐cohort analyses in LuxPARK, PPMI, and ICEBERG for both motor fluctuations and time‐to‐motor fluctuations analyses across the algorithms; and “Average (%)” – the mean of the “Average in CV (%)” across all cohorts. Features are listed in descending order based on their overall average selection percentages in the best comprehensive and refined models for motor fluctuations and time‐to‐motor fluctuations analyses, with the top 15 most consistent features presented.
Table S13. Comparison of the statistical significance in hold‐out predictive metrics between the best comprehensive and refined models across cohorts and between the best cross‐study normalized and unnormalized models. The type of normalization applied is indicated in the “Normalization” column. The statistical significance of the differences was assessed using DeLong's test for motor fluctuation classification (top) and a one‐shot nonparametric test for time‐to‐motor fluctuations analysis (bottom). The comprehensive model demonstrated superior predictive performance compared to the refined model, with a statistically significant difference in the hold‐out AUC/C‐index.
Table S14. Evaluation of the predictive performance for comprehensive and refined prognostic models for motor fluctuations (MF), including MF classification and time‐to‐MF analysis models. This evaluation provides cross‐validated and hold‐out AUC values and C‐indices for normalized and non‐normalized models. The “number of features” column indicates both the total number of candidate features selected during cross‐validation (in brackets) and the number of features with significant predictive impact, as determined by permutation importance analysis (preceding the brackets).
Table S15. Top 10 predictors for motor fluctuations (MF) prognosis, ranked by average permutation importance across best comprehensive and refined models for MF classification and time‐to‐MF analysis in the cross‐cohort analysis. The final rank is the average of the non‐missing ranks across the best models.
Table S16. The correlation between predictors and the motor fluctuations occurrence within 4 years and baseline motor fluctuations severity. The correlation is measured using the point‐biserial correlation for continuous or ordinal predictors and Matthews correlation coefficient (MCC) for binary predictors.
Table S17. The results of the correlation analysis of the predictors. The Spearman correlation coefficient was used for continuous or ordinal variable pairs, the point‐biserial correlation coefficient was used for continuous or ordinal and binary variable pairs, and the Matthews correlation coefficient (MCC) was used for binary variable pairs. p‐Values in parentheses accompany the correlation coefficients are presented to indicate the statistical significance of the correlation.
Table S18. Summary statistics for the levodopa equivalent daily dose (LEDD) among MF− and MF+ patients with PD in LuxPARK cohort, along with the statistical significance of the observed differences is indicated by p‐values derived from t‐tests (for normally distributed data) or Mann‐Whitney U‐tests (for data that is not normally distributed). The table also presents time‐to‐MF statistics for PD patients with LEDD < 400 mg and LEDD ≥ 400 mg, including p‐values from log‐rank tests comparing these two groups.
Table S19. The calibration analysis results for comprehensive and refined models in both motor fluctuations (MF) classification and time‐to‐MF analysis in cross‐cohort analysis. The calibration slope and mean squared error (MSE) illustrate the agreement between predicted probabilities and observed outcomes. Slope of 1 indicates perfect calibration.
Acknowledgments
The machine learning (ML) predictions in this article were partly performed using the high‐performance computing facilities of the University of Luxembourg (see http://hpc.uni.lu). We are grateful to Patrick May and Zied Landoulsi for providing the genetic information relevant to this work. We also acknowledge the valuable contributions of the pre‐publication check team for their pre‐review process (see https://r3.lcsb.uni.lu/). We thank all participants of the Luxembourg Parkinson's Study for their important support of our research. Furthermore, we acknowledge the joint effort of the National Centre of Excellence in Research on Parkinson's Disease (NCER‐PD) Consortium members from the partner institutions Luxembourg Centre for Systems Biomedicine, Luxembourg Institute of Health, Centre Hospitalier de Luxembourg, and Laboratoire National de Santé generally contributing to the Luxembourg Parkinson's Study, and also acknowledge the ICEBERG study group for their contribution, as listed in the Supplementary Material.
Relevant conflicts of interest/financial disclosures: The authors have no competing interests to declare.
Funding agencies: This research was funded by the Luxembourg National Research Fund (FNR) for the project RECAST (INTER/22/17104370/RECAST) as part of the Joint Programme‐Neurodegenerative Disease Research (JPND) and for the project PreDYT (INTER/EJP RD22/17027921/PreDYT). The National Centre of Excellence in Research on Parkinson's Disease (NCER‐PD) received funding from the Luxembourg National Research Fund (FNR/NCER13/BM/11264123). The ICEBERG cohort received funding and support from the Agence Nationale de la Recherche (ANR) under grant agreements ANR‐10‐IAIHU‐06 (IHU ICM), association France Parkinson, the Fondation d'Entreprise EDF, the Fondation Saint Michel, and Energipole.
Contributor Information
Enrico Glaab, Email: enrico.glaab@uni.lu.
the NCER‐PD Consortium:
Geeta Acharya, Gloria Aguayo, Myriam Alexandre, Muhammad Ali, Wim Ammerlann, Giuseppe Arena, Michele Bassis, Roxane Batutu, Katy Beaumont, Sibylle Béchet, Guy Berchem, Alexandre Bisdorff, Ibrahim Boussaad, David Bouvier, Lorieza Castillo, Gessica Contesotto, Nancy De Bremaeker, Brian Dewitt, Nico Diederich, Rene Dondelinger, Nancy E. Ramia, Angelo Ferrari, Katrin Frauenknecht, Joëlle Fritz, Carlos Gamio, Manon Gantenbein, Piotr Gawron, Laura Georges, Soumyabrata Ghosh, Marijus Giraitis, Enrico Glaab, Martine Goergen, Elisa Gómez De Lope, Jérôme Graas, Mariella Graziano, Valentin Groues, Anne Grünewald, Gaël Hammot, Anne‐Marie Hanff, Linda Hansen, Michael Heneka, Estelle Henry, Margaux Henry, Sylvia Herbrink, Sascha Herzinger, Alexander Hundt, Nadine Jacoby, Sonja Jónsdóttir, Jochen Klucken, Olga Kofanova, Rejko Krüger, Pauline Lambert, Zied Landoulsi, Roseline Lentz, Ana Festas Lopes, Victoria Lorentz, Tainá M. Marques, Guilherme Marques, Patricia Martins Conde, Patrick May, Deborah Mcintyre, Chouaib Mediouni, Francoise Meisch, Alexia Mendibide, Myriam Menster, Maura Minelli, Michel Mittelbronn, Saïda Mtimet, Maeva Munsch, Romain Nati, Ulf Nehrbass, Sarah Nickels, Beatrice Nicolai, Jean‐Paul Nicolay, Maria Fernanda Niño Uribe, Fozia Noor, Clarissa P. C. Gomes, Sinthuja Pachchek, Claire Pauly, Laure Pauly, Lukas Pavelka, Magali Perquin, Achilleas Pexaras, Armin Rauschenberger, Rajesh Rawal, Dheeraj Reddy Bobbili, Lucie Remark, Ilsé Richard, Olivia Roland, Kirsten Roomp, Eduardo Rosales, Stefano Sapienza, Venkata Satagopam, Sabine Schmitz, Reinhard Schneider, Jens Schwamborn, Raquel Severino, Amir Sharify, Ruxandra Soare, Ekaterina Soboleva, Kate Sokolowska, Maud Theresine, Hermann Thien, Elodie Thiry, Rebecca Ting Jiin Loo, Johanna Trouet, Olena Tsurkalenko, Michel Vaillant, Carlos Vega, Liliana Vilas Boas, Paul Wilmes, Evi Wollscheid‐Lengeling, and Gelani Zelimkhanov
Data Availability Statement
The LuxPARK clinical dataset used in this study was obtained from the NCER‐PD. The dataset for this manuscript is not publicly available as it is linked to the Luxembourg Parkinson's Study and its internal regulations. Any requests for accessing the dataset can be directed to request.ncer-pd@uni.lu. Data used in the preparation of this article were obtained on January 11, 2023, from the PPMI database (https://www.ppmi-info.org/access-data-specimens/data, RRID:SCR 006431). For up‐to‐date information on the study, please visit the PPMI website (www.ppmi-info.org). Data from the ICEBERG cohort analyzed during this study is available from the corresponding study group (jean-christophe.corvol@aphp.fr, marie.vidailhet@aphp.fr). The data processing, normalization and statistical analyses were performed using the R statistical programming language (v4.2.1). Python‐3.8.6‐GCCcore‐10.2.0 was used for efficient machine learning predictions. The open‐source code is accessible in the GitLab repository under the MIT license: https://gitlab.com/uniluxembourg/lcsb/biomedical-data-science/bds/ml-motor-fluctuations.
References
- 1. Williams‐Gray CH, Worth PF. Parkinson's disease and related conditions. Medicine 2023;51(9):645–651. [Google Scholar]
- 2. Wu J, Lim EC, Nadkarni NV, Tan EK, Kumar PM. The impact of levodopa therapy‐induced complications on quality of life in Parkinson's disease patients in Singapore. Sci Rep 2019;9(1):9248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Thanvi B, Lo N, Robinson T. Levodopa‐induced dyskinesia in Parkinson's disease: clinical features, pathogenesis, prevention and treatment. Postgrad Med J 2007;83(980):384–388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Harrie A, Hampstead BM, Lewis C, Herreshoff E, Kotagal V. Cognitive correlates of dual tasking costs on the timed up and go test in Parkinson disease. Clin Park Relat Disord 2022;7:100158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. You H, Mariani LL, Mangone G, Le Febvre de Nailly D, Charbonnier‐Beaupel F, Corvol JC. Molecular basis of dopamine replacement therapy and its side effects in Parkinson's disease. Cell Tissue Res 2018;373(1):111–135. [DOI] [PubMed] [Google Scholar]
- 6. Porta M, Servello D, Zekaj E, Gonzalez‐Escamilla G, Groppa S. Pre‐dopa deep brain stimulation: is early deep brain stimulation able to modify the natural course of Parkinson's disease? Front Neurosci 2020;14:523205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Muleiro Alvarez M, Cano‐Herrera G, Osorio Martínez MF, Vega Gonzales‐Portillo J, Monroy GR, Murguiondo Pérez R, et al. A comprehensive approach to Parkinson's disease: addressing its molecular, clinical, and therapeutic aspects. Int J Mol Sci 2024;25(13):7183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Leta V, Klingelhoefer L, Longardner K, Campagnolo M, Levent HÇ, Aureli F, et al. Gastrointestinal barriers to levodopa transport and absorption in Parkinson's disease. Eur J Neurol 2023;30(5):1465–1480. [DOI] [PubMed] [Google Scholar]
- 9. Redenšek S, Jenko Bizjan B, Trošt M, Dolžan V. Clinical‐Pharmacogenetic predictive models for time to occurrence of levodopa related motor complications in Parkinson's disease. Front Genet 2019;10:454938. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Kanellos FS, Tsamis KI, Rigas G, Simos YV, Katsenos AP, Kartsakalis G, et al. Clinical evaluation in Parkinson's disease: is the Golden standard shiny enough? Sensors 2023;23(8):3807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Ren J, Zhan X, Zhou H, Guo Z, Xing Y, Yin H, et al. Comparing the effects of GBA variants and onset age on clinical features and progression in Parkinson's disease. CNS Neurosci Ther 2024;30(2):e14387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Thanprasertsuk S, Phowthongkum P, Hopetrungraung T, Poorirerngpoom C, Sathirapatya T, Wichit P, et al. Levodopa‐induced dyskinesia in early‐onset Parkinson's disease (EOPD) associates with glucocerebrosidase mutation: a next‐generation sequencing study in EOPD patients in Thailand. PLoS One 2023;18(10):e0293516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Devigili G, Straccia G, Cereda E, Garavaglia B, Fedeli A, Elia AE, et al. Unraveling autonomic dysfunction in GBA‐related Parkinson's disease. Mov Disord Clin Pract 2023;10(11):1620–1638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Keun JTB, Arnoldussen IA, Vriend C, van de Rest O. Dietary approaches to improve efficacy and control side effects of levodopa therapy in Parkinson's disease: a systematic review. Adv Nutr 2021;12(6):2265–2287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Cilia R, Akpalu A, Sarfo FS, Cham M, Amboni M, Cereda E, et al. The modern pre‐levodopa era of Parkinson's disease: insights into motor complications from sub‐Saharan Africa. Brain 2014;137(10):2731–2742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Wirdefeldt K, Adami HO, Cole P, Trichopoulos D, Mandel J. Epidemiology and etiology of Parkinson's disease: a review of the evidence. Eur J Epidemiol 2011;26(Suppl 1):1. [DOI] [PubMed] [Google Scholar]
- 17. Espay AJ, Stocchi F, Pahwa R, Albanese A, Ellenbogen A, Ferreira JJ, et al. Safety and efficacy of continuous subcutaneous levodopa–carbidopa infusion (ND0612) for Parkinson's disease with motor fluctuations (BouNDless): a phase 3, randomised, double‐blind, double‐dummy, multicentre trial. Lancet Neurol 2024;23(5):465–476. [DOI] [PubMed] [Google Scholar]
- 18. Hauser RA, Espay AJ, Ellenbogen AL, et al. IPX203 vs immediate‐release carbidopa‐levodopa for the treatment of motor fluctuations in Parkinson disease: the RISE‐PD randomized clinical trial. JAMA Neurol 2023;80(10):1062–1069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Pavelka L, Rawal R, Ghosh S, et al. Luxembourg Parkinson's study ‐comprehensive baseline analysis of Parkinson's disease and atypical parkinsonism. Front Neurol 2023;14:1330321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Marek K, Jennings D, Lasch S, Siderowf A, Tanner C, Simuni T, et al. The Parkinson progression marker initiative (PPMI). Prog Neurobiol 2011;95(4):629–635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Dodet P, Houot M, Leu‐Semenescu S, Corvol JC, Lehéricy S, Mangone G, et al. Sleep disorders in Parkinson's disease, an early and multiple problem. NPJ Parkinsons Dis 2024;10(1):46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Prashanth R, Roy SD. Novel and improved stage estimation in Parkinson's disease using clinical scales and machine learning. Neurocomputing 2018;305:78–103. [Google Scholar]
- 23. Jankovic J, Tan EK. Parkinson's disease: etiopathogenesis and treatment. J Neurol Neurosurg Psychiatry 2020;91(8):795–808. [DOI] [PubMed] [Google Scholar]
- 24. Gibb WR, Lees AJ. The relevance of the Lewy body to the pathogenesis of idiopathic Parkinson's disease. J Neurol Neurosurg Psychiatry 1988;51(6):745–752. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Marek K, Chowdhury S, Siderowf A, Lasch S, Coffey CS, Caspell‐Garcia C, et al. The Parkinson's progression markers initiative (PPMI) ‐ establishing a PD biomarker cohort. Ann Clin Transl Neurol 2018;5(12):1460–1477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Luo J, Schumacher M, Scherer A, Sanoudou D, Megherbi D, Davison T, et al. A comparison of batch effect removal methods for enhancement of prediction performance using MAQC‐II microarray gene expression data. Pharmacogenomics J 2010;10(4):278–291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003;19(2):185–193. [DOI] [PubMed] [Google Scholar]
- 28. Kostka D, Spang R. Microarray based diagnosis profits from better documentation of gene expression signatures. PLoS Comput Biol 2008;4(2):e22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 2007;8(1):118–127. [DOI] [PubMed] [Google Scholar]
- 30. Chen C, Grennan K, Badner J, Zhang D, Gershon E, Jin L, et al. Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PLoS One 2011;6(2):e17238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Lazar C, Meganck S, Taminau J, Steenhoff D, Coletta A, Molter C, et al. Batch effect removal methods for microarray gene expression data integration: a survey. Brief Bioinform 2013;14(4):469–490. [DOI] [PubMed] [Google Scholar]
- 32. Stein CK, Qu P, Epstein J, Buros A, Rosenthal A, Crowley J, et al. Removing batch effects from purified plasma cell gene expression microarrays with modified ComBat. BMC Bioinformatics 2015;16:63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Freund Y, Schapire RE. A decision‐theoretic generalization of on‐line learning and an application to boosting. J Comput Syst Sci 1997;55(1):119–139. [Google Scholar]
- 34. Freund Y, Schapire RE. A short introduction to boosting. Jpn Soc Artif Intell 1999;14(5):771–780. [Google Scholar]
- 35. Berk RA. Classification and regression trees (CART). Statistical Learning from a Regression Perspective. Cham: Springer International Publishing; 2016:129–186. [Google Scholar]
- 36. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. In: Advances in Neural Information Processing Systems 31 (NeurIPS 2018); 2018.
- 37. Quinlan JR. Improved use of continuous attributes in C4.5. J Artif Intell Res 1996;4:77–90. [Google Scholar]
- 38. Tan YS, Singh C, Nasseri K, et al. Fast interpretable greedy‐tree sums. Proc Natl Acad Sci U S A 2025;122(7): e2310151122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. McTavish H, Zhong C, Achermann R, Karimalis I, Chen J, Rudin C, et al. Fast sparse decision tree optimization via reference ensembles. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. Washington, DC, USA: AAAI Press; 2022:9604–9613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat 2001;29(5):1189–1232. [Google Scholar]
- 41. Agarwal A, Tan YS, Ronen O, Singh C, Yu B. Hierarchical shrinkage: improving the accuracy and interpretability of tree‐based methods. Proceedings of the 39th International Conference on Machine Learning. Baltimore, Maryland, USA: PMLR; 2022:111–135. [Google Scholar]
- 42. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM; 2016:785–794. [Google Scholar]
- 43. He K, Li Y, Zhu J, Liu H, Lee JE, Amos CI, et al. Component‐wise gradient boosting and false discovery control in survival analysis with high‐dimensional covariates. Bioinformatics 2016;32(1):50–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Bertsimas D, Dunn J, Gibson E, Orfanoudaki A. Optimal survival trees. Mach Learn 2022;111(8):2951–3023. [Google Scholar]
- 45. Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn 2006;63(1):3–42. [Google Scholar]
- 46. Karami G, Giuseppe Orlando M, Delli Pizzi A, Caulo M, Del Gratta C. Predicting overall survival time in glioblastoma patients using gradient boosting machines algorithm and recursive feature elimination technique. Cancer 2021;13(19):4976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Wang M, Greenberg M, Forkert ND, Chekouo T, Afriyie G, Ismail Z, et al. Dementia risk prediction in individuals with mild cognitive impairment: a comparison of cox regression and machine learning models. BMC Med Res Methodol 2022;22(1):284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Park MY, Hastie T. L1‐regularization path algorithm for generalized linear models. J R Stat Soc Ser B Stat Methodol 2007;69(4):659–677. [Google Scholar]
- 49. Simon N, Friedman J, Hastie T, Tibshirani R. Regularization paths for Cox's proportional hazards model via coordinate descent. J Stat Softw 2011;39(5):1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Ann Appl Stat 2008;2(3):841–860. [Google Scholar]
- 51. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach California, USA: Curran Associates Inc.;2017:4768–4777. [Google Scholar]
- 52. Sundrani S, Lu J. Computing the Hazard ratios associated with explanatory variables using machine learning models of survival data. JCO Clin Cancer Inform 2021;5:364–378. [DOI] [PubMed] [Google Scholar]
- 53. Ferreira JA, Zwinderman AH. On the Benjamini–Hochberg method. Ann Stat 2006;34(4):1827–1849. [Google Scholar]
- 54. Corani G, Benavoli A. A Bayesian approach for comparing cross‐validated algorithms on multiple data sets. Mach Learn 2015;100(2–3):285–304. [Google Scholar]
- 55. Vickers AJ, van Calster B, Steyerberg EW. A simple, step‐by‐step guide to interpreting decision curve analysis. Diagn Progn Res 2019;3:18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Piovani D, Sokou R, Tsantes AG, Vitello AS, Bonovas S. Optimizing clinical decision making with decision curve analysis: insights for clinical investigators. Healthcare (Basel) 2023;11(16):2244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Zhang Z, Rousson V, Lee WC, Ferdynus C, Chen M, Qian X, et al. Decision curve analysis: a technical note. Ann Transl Med 2018;6(15):308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Austin PC, Harrell FE Jr, van Klaveren D. Graphical calibration curves and the integrated calibration index (ICI) for survival models. Stat Med 2020;39(21):2714–2742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Schrag A, Quinn N. Dyskinesias and motor fluctuations in Parkinson's disease. A community‐based study. Brain 2000;123(11):2297–2305. [DOI] [PubMed] [Google Scholar]
- 60. Jankovic J. Motor fluctuations and dyskinesias in Parkinson's disease: clinical manifestations. Mov Disord 2005;20(S11):11–16. [DOI] [PubMed] [Google Scholar]
- 61. Kelly MJ, Lawton MA, Baig F, Ruffmann C, Barber TR, Lo C, et al. Predictors of motor complications in early Parkinson's disease: a prospective cohort study. Mov Disord 2019;34(8):1174–1183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Wissel BD, Mitsi G, Dwivedi AK, Papapetropoulos S, Larkin S, López Castellanos JR, et al. Tablet‐based application for objective measurement of motor fluctuations in Parkinson disease. Digit Biomark 2018;1(2):126–135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Krüger R, Klucken J, Weiss D, Tönges L, Kolber P, Unterecker S, et al. Classification of advanced stages of Parkinson's disease: translation into stratified treatments. J Neural Transm 2017;124(8):1015–1027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Pandey S, Srivanitchapoom P. Levodopa‐induced dyskinesia: clinical features, pathophysiology, and medical management. Ann Indian Acad Neurol 2017;20(3):190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Santos‐Lobato BL, Schumacher‐Schuh AF, Rieder CRM, Hutz MH, Borges V, Ferraz HB, et al. Diagnostic prediction model for levodopa‐induced dyskinesia in Parkinson's disease. Arq Neuropsiquiatr 2020;78(4):206–216. [DOI] [PubMed] [Google Scholar]
- 66. Frequin HL, Verschuur CVM, Suwijn SR, et al. Long‐term follow‐up of the LEAP study: early versus delayed levodopa in early Parkinson's disease. Mov Disord 2024;39(6):975–982. [DOI] [PubMed] [Google Scholar]
- 67. Kim HJ, Mason S, Foltynie T, Winder‐Rhodes S, Barker RA, Williams‐Gray CH. Motor complications in Parkinson's disease: 13‐year follow‐up of the CamPaIGN cohort. Mov Disord 2020;35(1):185–190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Fasano A, Bloem BR. Gait disorders. Continuum 2013;19(5 Movement Disorders):1344–1382. [DOI] [PubMed] [Google Scholar]
- 69. Lian TH, Guo P, Zuo LJ, Hu Y, Yu SY, Yu QJ, et al. Tremor‐dominant in Parkinson disease: the relevance to iron metabolism and inflammation. Front Neurosci 2019;13:255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Wang YX, Zhao J, Li DK, Peng F, Wang Y, Yang K, et al. Associations between cognitive impairment and motor dysfunction in Parkinson's disease. Brain Behav 2017;7(6):e00719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Chung SJ, Yoo HS, Lee HS, Lee YH, Baik K, Jung JH, et al. Baseline cognitive profile is closely associated with long‐term motor prognosis in newly diagnosed Parkinson's disease. J Neurol 2021;268(11):4203–4212. [DOI] [PubMed] [Google Scholar]
- 72. Xiang‐Yang C, Jin‐Ru Z, Yun S, Cheng‐Jie M, Yu‐Bing S, Yu‐Lan C, et al. Fatigue correlates with sleep disturbances in Parkinson disease. Chin Med J (Engl) 2021;134(6):668–674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Höglund A, Hagell P, Broman JE, Pålhagen S, Sorjonen K, Fredrikson S, et al. Associations between fluctuations in daytime sleepiness and motor and non‐motor symptoms in Parkinson's disease. Mov Disord Clin Pract 2021;8(1):44–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Diaconu S, Monescu V, Filip R, Marian L, Kakucs C, Murasan I, et al. The impact of fatigue on sleep and other non‐motor symptoms in Parkinson's disease. Brain Sci 2024;14(4):397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Panigrahi B, Pillai KS, Radhakrishnan DM, Rajan R, Srivastava AK. Fatigue in Parkinson's disease—a narrative review. Ann Mov Disord 2024;7(3):157–171. [Google Scholar]
- 76. Luo R, Qi Y, He J, Zheng X, Ren W, Chang Y. Analysis of influencing factors of apathy in patients with Parkinson's disease. Brain Sci 2022;12(10):1343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Prange S, Klinger H, Laurencin C, Danaila T, Thobois S. Depression in patients with Parkinson's disease: current understanding of its neurobiology and implications for treatment. Drugs Aging 2022;39:417–439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. da Luz Scheffer FCF, Aguiar AS Jr, Ward C, et al. Impaired dopamine metabolism is linked to fatigability in mice and fatigue in Parkinson's disease patients. Brain Commun 2021;3(3):fcab116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79. Muñoz A, Lopez‐Lopez A, Labandeira CM, Labandeira‐Garcia JL. Interactions between the serotonergic and other neurotransmitter systems in the basal ganglia: role in Parkinson's disease and adverse effects of L‐DOPA. Front Neuroanat 2020;14:26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Beckers M, Bloem BR, Verbeek MM. Mechanisms of peripheral levodopa resistance in Parkinson's disease. npj Parkinson's Disease 2022;8:56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81. Fasano A, Visanji NP, Liu LWC, Lang AE, Pfeiffer RF. Gastrointestinal dysfunction in Parkinson's disease. Lancet Neurol 2015;14(6):625–639. [DOI] [PubMed] [Google Scholar]
- 82. Picillo M, Palladino R, Barone P, Erro R, Colosimo C, Marconi R, et al. The PRIAMO study: urinary dysfunction as a marker of disease progression in early Parkinson's disease. Eur J Neurol 2017;24(6):788–795. [DOI] [PubMed] [Google Scholar]
- 83. Felice VD, Quigley EM, Sullivan AM, O'Keeffe GW, O'Mahony SM. Microbiota‐gut‐brain signalling in Parkinson's disease: implications for non‐motor symptoms. Parkinsonism Relat Disord 2016;27:1–8. [DOI] [PubMed] [Google Scholar]
- 84. Gubbiotti M, Ditonno F, Brahimi E, Rosadi S, Rubilotta E. Clinical features in Parkinson's disease: characterization of urinary symptoms according to Parkinson disease subtype. Int Urol Nephrol 2025. [DOI] [PubMed] [Google Scholar]
- 85. Skjærbæk C, Knudsen K, Horsager J, Borghammer P. Gastrointestinal dysfunction in Parkinson's disease. J Clin Med 2021;10(3):493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86. Tezuka T, Taniguchi D, Sano M, Shimada T, Oji Y, Tsunemi T, et al. Pathophysiological evaluation of the LRRK2 G2385R risk variant for Parkinson's disease. NPJ Parkinsons Dis 2022;8:97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87. Ortega RA, Wang C, Raymond D, Bryant N, Scherzer CR, Thaler A, et al. Association of Dual LRRK2 G2019S and GBA variations with Parkinson disease progression. JAMA Netw Open 2021;4(4):e215845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88. Malek N, Weil RS, Bresner C, Lawton MA, Grosset KA, Tan M, et al. Features of GBA‐associated Parkinson's disease at presentation in the UK tracking Parkinson's study. J Neurol Neurosurg Psychiatry 2018;89(7):702–709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89. Brockmann K, Srulijes K, Pflederer S, Hauser AK, Schulte C, Maetzler W, et al. GBA‐associated Parkinson's disease: reduced survival and more rapid progression in a prospective longitudinal study. Mov Disord 2015;30(3):407–411. [DOI] [PubMed] [Google Scholar]
- 90. Sosero YL, Bandres‐Ciga S, Ferwerda B, Tocino MTP, Belloso DR, Gómez‐Garre P, et al. Dopamine pathway and Parkinson's risk variants are associated with levodopa‐induced dyskinesia. Mov Disord 2024;39(10):1773–1783. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1.
Fig. S1. The study workflow includes single‐cohort and multi‐cohort analyses, focusing on motor fluctuations (MF) classification and time‐to‐MF analysis to identify key predictors for MF. In the single‐cohort analysis, predictive models were developed and validated separately for the LuxPARK, PPMI, and ICEBERG cohorts, identifying cohort‐specific predictors. The multi‐cohort analysis integrated data across all cohorts to increase the generalizability and robustness of the predictions.
Fig. S2. Workflow for data processing and model development, illustrating the process, including variable aggregation and the nested cross‐validation workflow (Fig. 1) for model training and evaluation. Data processing steps, including missing values imputation, cross‐study normalization, one‐hot encoding, under sampling, and feature selection, were conducted independently on the training and validation sets to avoid data leakage, as detailed in the Supplementary Fig. S3. The comprehensive model included all features, while the refined model was trained by excluding baseline motor fluctuations and levodopa intake.
Fig. S3. Overview of the data processing and analysis workflow applied during each cross‐validation cycle to optimize and evaluate the prognostic models. The workflow covers multiple steps, including missing value imputation, cross‐study normalization, one‐hot encoding, under sampling, and feature selection.
Fig. S4. A comparison of cross‐validated AUC scores and probabilities for superior predictive performance for the best comprehensive classification model for motor fluctuations in the multi‐cohort analyses. The upper section presents boxplots of cross‐validated AUC scores for each cohort, with the dotted line indicating the hold‐out AUC scores. The lower section shows probabilities of one cohort's predictive performance surpassing another's. The arrows indicate a higher probability of predictive performance, highlighting that the cohort with a higher likelihood of superior performance for the best model is the one towards which the arrows point.
Fig. S5. A comparison of cross‐validated AUC scores and probabilities for superior predictive performance for the best refined classification model for motor fluctuations in the multi‐cohort analyses. The upper section presents boxplots of cross‐validated AUC scores for each cohort, with the dotted line indicating the hold‐out AUC scores. The lower section shows probabilities of one cohort's predictive performance surpassing another's. The arrows indicate a higher probability of predictive performance, highlighting that the cohort with a higher likelihood of superior performance for the best model is the one towards which the arrows point.
Fig. S6. A comparison of cross‐validated C‐indices and probabilities for superior predictive performance for the best comprehensive time‐to‐MF model in the multi‐cohort analyses. The upper section presents boxplots of cross‐validated C‐indices for each cohort, with the dotted line indicating the hold‐out C‐indices. The lower section shows probabilities of one cohort's predictive performance surpassing another's. The arrows indicate a higher probability of predictive performance, highlighting that the cohort with a higher likelihood of superior performance for the best model is the one towards which the arrows point.
Fig. S7. A comparison of cross‐validated C‐indices and probabilities for superior predictive performance for the best refined time‐to‐MF model in the multi‐cohort analyses. The upper section presents boxplots of cross‐validated C‐indices for each cohort, with the dotted line indicating the hold‐out C‐indices. The lower section shows probabilities of one cohort's predictive performance surpassing another's. The arrows indicate a higher probability of predictive performance, highlighting that the cohort with a higher likelihood of superior performance for the best model is the one towards which the arrows point.
Fig. S8. Stability analysis for the comprehensive models for predicting motor fluctuations (MF) in Parkinson's disease (PD) across different algorithms and cohort studies. The stability of the model is evaluated by calculating the standard deviations of the area under the curve (AUC) values across the cross‐validation cycles. A lower standard deviation (SD) indicates a higher stability of the predictive models.
Fig. S9. Stability analysis of comprehensive time to motor fluctuation (MF) models in Parkinson's disease (PD) across different algorithms and cohort studies. The stability of the model is evaluated by calculating the standard deviations of the C‐indices across the cross‐validation cycles. A lower standard deviation (SD) indicates a higher stability of the predictive models.
Fig. S10. Stability analysis of refined models for predicting motor fluctuations (MF) in Parkinson's disease (PD) across different algorithms and cohort studies. The stability of the model is evaluated by calculating the standard deviations of the area under the curve (AUC) values across the cross‐validation cycles. A lower standard deviation (SD) indicates a higher stability of the predictive models.
Fig. S11. Stability analysis of refined time to motor fluctuation (MF) models in Parkinson's disease (PD) across different algorithms and cohort studies. The stability of the model is evaluated by calculating the standard deviations of the C‐indices across the different cross‐validation cycles. A lower standard deviation (SD) indicates a higher stability of the predictive models.
Fig. S12. SHAP value plot illustrating the most informative predictors for the best refined model for cross‐cohort classification of motor fluctuations. SHAP (SHapley Additive exPlanations) plot, providing a detailed view of how individual predictors influence the model's predictions. This visualization allows us to understand not just the overall importance of each predictor, but also how different values of that predictor impact the likelihood of motor fluctuations across our dataset. Each row represents a predictor, ordered by overall importance from top to bottom, and each point represents an individual observation in the dataset. The color of the points ranges from blue (lower values) to red (higher values) for that predictor. The position of points on the x‐axis shows the impact on the model's prediction: Points to the right of zero indicate the predictor pushed the model towards predicting motor fluctuations, and points to the left of zero indicate the predictor pushed the model away from predicting motor fluctuations. Thus, the SHAP plot shows whether high or low values of a predictor are associated with a higher or lower likelihood of predicted motor fluctuations.
Fig. S13. SHAP value plot illustrating the most informative predictors for the best refined model in the cross‐cohort time‐to‐motor fluctuation analysis. SHAP (SHapley Additive exPlanations) plot, providing a detailed view of how individual predictors influence the model's predictions. This visualization allows us to understand not just the overall importance of each predictor, but also how different values of that predictor impact the likelihood of motor fluctuations across our dataset. Each row represents a predictor, ordered by overall importance from top to bottom, and each point represents an individual observation in the dataset. The color of the points ranges from blue (lower values) to red (higher values) for that predictor. The position of points on the x‐axis shows the impact on the model's prediction: Points to the right of zero indicate the predictor pushed the model towards predicting motor fluctuations, and points to the left of zero indicate the predictor pushed the model away from predicting motor fluctuations. Thus, the SHAP plot shows whether high or low values of a predictor are associated with a higher or lower likelihood of predicted motor fluctuations.
Fig. S14. Bar plot illustrating the area under the positive net benefit curve for various best cross‐cohort comprehensive motor fluctuations classification models. The lines indicate significant differences in the net benefit area across the models. The blue bars represent models with a larger positive net benefit area than the negative net benefit, whereas the red bars indicate the opposite. The numbers within the bars represent the difference in net benefit area relative to the “all intervention” strategy. Arrows pointing upwards (↑) indicate a larger area than the “all intervention” strategy, while arrows pointing downwards (↓) indicate a smaller area.
Fig. S15. Bar plot illustrating the area under the positive net benefit curve for various best cross‐cohort refined motor fluctuations classification models. The lines indicate significant differences in the net benefit area across the models. The blue bars represent models with a larger positive net benefit area than the negative net benefit, whereas the red bars indicate the opposite. The numbers within the bars represent the difference in net benefit area relative to the “all intervention” strategy. Arrows pointing upwards (↑) indicate a larger area than the “all intervention” strategy, while arrows pointing downwards (↓) indicate a smaller area.
Fig. S16. Bar plot illustrating the area under the positive net benefit curve for various best cross‐cohort comprehensive time‐to‐MF models. The lines indicate significant differences in the net benefit area across the models. The blue bars represent models with a larger positive net benefit area than the negative net benefit, whereas the red bars indicate the opposite. The numbers within the bars represent the difference in net benefit area relative to the “all intervention” strategy. Arrows pointing upwards (↑) indicate a larger area than the “all intervention” strategy, while arrows pointing downwards (↓) indicate a smaller area.
Fig. S17. Bar plot illustrating the area under the positive net benefit curve for various best cross‐cohort refined time‐to‐MF models. The lines indicate significant differences in the net benefit area across the models. The blue bars represent models with a larger positive net benefit area than the negative net benefit, whereas the red bars indicate the opposite. The numbers within the bars represent the difference in net benefit area relative to the “all intervention” strategy. Arrows pointing upwards (↑) indicate a larger area than the “all intervention” strategy, while arrows pointing downwards (↓) indicate a smaller area.
Table S1. For each considered cohort and their combination (see column 1), the following statistics are provided: Column 2: The numbers of PD patients who met inclusion criterion 1, i.e. a diagnosis of Parkinson's disease according to the UK Parkinson's Disease Society Brain Bank Diagnostic Criteria (UKPDSBB) criteria, or the presence of at least two of the following symptoms: resting tremor, bradykinesia, or rigidity with either resting tremor or bradykinesia, or a single asymmetric rest tremor or asymmetric bradykinesia (see also the Methods sub‐section ‘Study population’ in the main manuscript). Column 3: The number of PD patients who met inclusion inclusion criterion 2, i.e. subjects with the confirmed presence of motor fluctuations within four years following the baseline visit or the completion of a motor fluctuations assessment without presenting symptoms within four years. Columns 3 and 4: The “Events” columns indicate the number and percentage of the patients for whom motor fluctuations were detected within the 4‐year follow‐up period considered for classification analysis (Column 3) and the time‐to‐MF analyses (Column 4) for each cohort separately (first three rows) and across all cohorts combined (fourth row).
Table S2. Overview of the demographic and baseline clinical characteristics for the subjects who developed MF (MF+) or did not develop MF (MF−) in the cross‐cohort analysis during a 4‐year follow‐up period. p‐Values for the significance of differences between the MF− and MF+ groups for individual features are shown in the last column.
Table S3. Overview of the demographic and baseline clinical characteristics for the subjects who developed MF (MF+) or did not develop MF (MF−) in the LuxPARK cohort during a 4‐year follow‐up period. p‐Values for the significance of differences between the MF− and MF+ groups for individual features are shown in the last column.
Table S4. Overview of the demographic and baseline clinical characteristics for the subjects who developed MF (MF+) or did not develop MF (MF−) in the PPMI cohort during a 4‐year follow‐up period. P‐values for the significance of differences between the MF− and MF+ groups for individual features are shown in the last column.
Table S5. Overview of the demographic and baseline clinical characteristics for the subjects who developed MF (MF+) or did not develop MF (MF−) in the ICEBERG cohort during a 4‐year follow‐up period. P‐values for the significance of differences between the MF− and MF+ groups for individual features are shown in the last column.
Table S6. The definition of aggregated feature variables derived from original MDS‐UPDRS variables, where missing values were addressed by averaging non‐missing related variables. Items retained for analysis were marked with an asterisk (*), ensuring that the data were representative and reducing the potential for bias.
Table S7. Summary of predictive performance metrics for comprehensive motor fluctuations classification. The table presents cross‐validated and hold‐out AUC values for single and multi‐cohort analyses and the number of features used in each model. Models with the highest average cross‐validated AUC scores in each cohort analysis across the nested CV hyper parameter optimizations are highlighted in bold and are referred to as the best models. The model with the highest hold‐out AUC score is indicated in italics. The “number of features” column includes the number of candidate features selected during cross‐validation (shown in brackets) and the subset of features demonstrating significant predictive impact identified through permutation importance analysis (preceding the brackets).
Table S8. Summary of predictive performance statistics for refined motor fluctuation classification (excluding baseline motor fluctuations and levodopa medication as input features). The table presents cross‐validated and hold‐out AUC values for single and multi‐cohort analyses and the number of features used in each model. Models with the highest average cross‐validated AUC scores in each cohort analysis are highlighted in bold and are referred to as the best models. The model with the highest hold‐out AUC score is indicated in italics. The “number of features” column includes the number of candidate features selected during cross‐validation (shown in brackets) and the subset of features demonstrating significant predictive impact identified through permutation importance analysis (preceding the brackets).
Table S9. Summary of predictive performance statistics for refined time‐to‐motor fluctuations (excluding baseline motor fluctuations and levodopa medication as input features). The table presents cross‐validated and hold‐out C‐indices for single and multi‐cohort analyses and the number of features used in each model. Models with the highest average cross‐validated C‐indices in each cohort analysis are highlighted in bold and are referred to as the best models. The model with the highest hold‐out C‐index is indicated in italics. The “number of features” column includes the number of candidate features selected during cross‐validation (shown in brackets) and the subset of features demonstrating significant predictive impact identified through permutation importance analysis (preceding the brackets).
Table S10. Comparative analysis of baseline features’ mean differences between the LuxPARK, PPMI, and ICEBERG cohorts. The p‐values highlight statistically significant differences in predictor averages between specific cohort pairs, revealing variations in predictor distributions specific to each cohort in motor fluctuations analysis.
Table S11. Statistics on the average percentage of times predictors were selected during 5‐fold cross‐validation analyses. It compares data for the best comprehensive and refined models in motor fluctuations classification and time‐to‐motor fluctuations analyses across the LuxPARK, PPMI, and ICEBERG cohorts. The information presented includes: “Average in CV (%)” – the average percentage of times each feature was chosen in 5‐fold CV for single‐cohort analyses in LuxPARK, PPMI, and ICEBERG for both motor fluctuations and time‐to‐motor fluctuations analyses; and “Average (%)” – the mean of the “Average in CV (%)” across all cohorts. Features are listed in descending order based on their overall average selection percentages in the best comprehensive and refined models for motor fluctuations and time‐to‐motor fluctuations analyses, with the top 15 most consistent features presented.
Table S12. Statistics on the average percentage of times predictors were selected during 5‐fold cross‐validation analyses across the machine learning algorithms. It compares data for the best comprehensive and refined models for each machine learning algorithm in motor fluctuations classification and time‐to‐motor fluctuations analyses across the LuxPARK, PPMI, and ICEBERG cohorts. The information presented includes: “Average in CV (%)” – the average percentage of times each feature was chosen in 5‐fold CV for single‐cohort analyses in LuxPARK, PPMI, and ICEBERG for both motor fluctuations and time‐to‐motor fluctuations analyses across the algorithms; and “Average (%)” – the mean of the “Average in CV (%)” across all cohorts. Features are listed in descending order based on their overall average selection percentages in the best comprehensive and refined models for motor fluctuations and time‐to‐motor fluctuations analyses, with the top 15 most consistent features presented.
Table S13. Comparison of the statistical significance in hold‐out predictive metrics between the best comprehensive and refined models across cohorts and between the best cross‐study normalized and unnormalized models. The type of normalization applied is indicated in the “Normalization” column. The statistical significance of the differences was assessed using DeLong's test for motor fluctuation classification (top) and a one‐shot nonparametric test for time‐to‐motor fluctuations analysis (bottom). The comprehensive model demonstrated superior predictive performance compared to the refined model, with a statistically significant difference in the hold‐out AUC/C‐index.
Table S14. Evaluation of the predictive performance for comprehensive and refined prognostic models for motor fluctuations (MF), including MF classification and time‐to‐MF analysis models. This evaluation provides cross‐validated and hold‐out AUC values and C‐indices for normalized and non‐normalized models. The “number of features” column indicates both the total number of candidate features selected during cross‐validation (in brackets) and the number of features with significant predictive impact, as determined by permutation importance analysis (preceding the brackets).
Table S15. Top 10 predictors for motor fluctuations (MF) prognosis, ranked by average permutation importance across best comprehensive and refined models for MF classification and time‐to‐MF analysis in the cross‐cohort analysis. The final rank is the average of the non‐missing ranks across the best models.
Table S16. The correlation between predictors and the motor fluctuations occurrence within 4 years and baseline motor fluctuations severity. The correlation is measured using the point‐biserial correlation for continuous or ordinal predictors and Matthews correlation coefficient (MCC) for binary predictors.
Table S17. The results of the correlation analysis of the predictors. The Spearman correlation coefficient was used for continuous or ordinal variable pairs, the point‐biserial correlation coefficient was used for continuous or ordinal and binary variable pairs, and the Matthews correlation coefficient (MCC) was used for binary variable pairs. p‐Values in parentheses accompany the correlation coefficients are presented to indicate the statistical significance of the correlation.
Table S18. Summary statistics for the levodopa equivalent daily dose (LEDD) among MF− and MF+ patients with PD in LuxPARK cohort, along with the statistical significance of the observed differences is indicated by p‐values derived from t‐tests (for normally distributed data) or Mann‐Whitney U‐tests (for data that is not normally distributed). The table also presents time‐to‐MF statistics for PD patients with LEDD < 400 mg and LEDD ≥ 400 mg, including p‐values from log‐rank tests comparing these two groups.
Table S19. The calibration analysis results for comprehensive and refined models in both motor fluctuations (MF) classification and time‐to‐MF analysis in cross‐cohort analysis. The calibration slope and mean squared error (MSE) illustrate the agreement between predicted probabilities and observed outcomes. Slope of 1 indicates perfect calibration.
Data Availability Statement
The LuxPARK clinical dataset used in this study was obtained from the NCER‐PD. The dataset for this manuscript is not publicly available as it is linked to the Luxembourg Parkinson's Study and its internal regulations. Any requests for accessing the dataset can be directed to request.ncer-pd@uni.lu. Data used in the preparation of this article were obtained on January 11, 2023, from the PPMI database (https://www.ppmi-info.org/access-data-specimens/data, RRID:SCR 006431). For up‐to‐date information on the study, please visit the PPMI website (www.ppmi-info.org). Data from the ICEBERG cohort analyzed during this study is available from the corresponding study group (jean-christophe.corvol@aphp.fr, marie.vidailhet@aphp.fr). The data processing, normalization and statistical analyses were performed using the R statistical programming language (v4.2.1). Python‐3.8.6‐GCCcore‐10.2.0 was used for efficient machine learning predictions. The open‐source code is accessible in the GitLab repository under the MIT license: https://gitlab.com/uniluxembourg/lcsb/biomedical-data-science/bds/ml-motor-fluctuations.
