Skip to main content
PLOS Medicine logoLink to PLOS Medicine
. 2025 Aug 21;22(8):e1004665. doi: 10.1371/journal.pmed.1004665

Predicting knee osteoarthritis progression using neural network with longitudinal MRI radiomics, and biochemical biomarkers: A modeling study

Ting Wang 1,2,, Hao Liu 1,, Wenbo Zhao 1, Peihua Cao 3, Jia Li 4, Tianyu Chen 5, Guangfeng Ruan 6, Yan Zhang 3, Xiaoshuai Wang 3, Qin Dang 3,7, Mengdi Zhang 3,7, Alexander Tack 8, David Hunter 3,9, Changhai Ding 3,10,*, Shengfa Li 1,*
Editor: Christelle Nguyen11
PMCID: PMC12370028  PMID: 40839548

Abstract

Background

Knee osteoarthritis (KOA) worsens both structurally and symptomatically, yet no model predicts KOA progression using Magnetic Resonance Image (MRI) radiomics and biomarkers. This study aimed to develop and test the longitudinal Load-Bearing Tissue Radiomic plus Biochemical biomarker and Clinical variable Model (LBTRBC-M) to predict KOA progression.

Methods and findings

Data from the Foundation of the National Institutes of Health Osteoarthritis Biomarkers Consortium were used. We selected 594 participants with Kellgren-Lawrence grades 1–3 and complete biomarker data. The mean age was 61.6 ± 8.9 years, 58.8% were female, and the racial distribution was 79.3% White or White, 18.0% Black or African American, and 2.7% Asian or other non-White. A total of 1,753 knee MRIs were included across the study period, comprising 594 at baseline, 575 at 1-year follow-up, and 584 at 2-year follow-up. Outcomes included (1) both Joint Space Narrowing (JSN) and pain progression (n = 567), (2) only JSN progression (n = 303), (3) only pain progression (n = 295), and (4) non-progression (JSN or pain) (n = 588), corresponding to an approximate ratio of 2:1:1:2. JSN progression was defined as a minimum joint space width (JSW) loss of ≥0.7 mm, and pain progression as a sustained (≥2 time points) increase of ≥9 points on the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) pain subscale (0–100 scale). Using the eXtreme Gradient BOOSTing (XGBOOST) algorithm, the model was developed in the total development cohort (n = 877) and tested in the total test cohort (n = 876). In the total test cohort, the Area Under the receiver operating characteristic Curve (AUC) of LBTRBC-M for predicting JSN and pain progression, JSN progression, pain progression, and non-progression were 0.880 (95% confidence interval (CI) [0.853, 0.903]), 0.913 (95% CI [0.881, 0.937]), 0.886 (95% CI [0.856, 0.910]), and 0.909 (95% CI [0.888, 0.926]), respectively. The overall accuracy of LBTRBC-M was 70.1%. With LBTRBC-M assistance, the prognostic accuracy of resident physicians (n = 7) improved from 44.7%–49.0% to 64.4%–66.5%. The main limitations include the use of a non-routine MRI sequence, the lack of external validation in independent cohorts, and limited incorporation of all knee joint structures in radiomic feature extraction.

Conclusions

In this study, we observed that longitudinal MRI radiomic features of load-bearing knee joint tissues provide potentially informative markers for predicting knee osteoarthritis progression. These findings may help guide future efforts toward early risk stratification and personalized management of KOA.

Author summary

Why was this study done?

  • Knee osteoarthritis (KOA) is a progressive disease with both structural and symptomatic worsening.

  • There is currently no established predictive model that integrates longitudinal MRI radiomic features, biochemical biomarkers, and clinical variables to forecast KOA progression.

  • Early prediction of KOA progression could support timely intervention and personalized management.

What did the researchers do and find?

  • We analyzed data from 594 participants with Kellgren-Lawrence grades 1-3 in the FNIH Osteoarthritis Biomarkers Consortium dataset, incorporating 1,753 knee MRIs over a 2-year period.

  • We developed a predictive model, the Load-Bearing Tissue Radiomic plus Biochemical biomarker and Clinical variable Model (LBTRBC-M), using the XGBOOST algorithm.

  • The model achieved high accuracy in predicting KOA progression outcomes in an independent test cohort, with AUCs ranging from 0.880 to 0.913.

  • The use of LBTRBC-M improved the prognostic accuracy of seven resident physicians from 44.7%–49.0% to 64.4%–66.5%.

What do these findings mean?

  • Longitudinal MRI radiomic features of load-bearing knee joint tissues, when combined with biomarkers and clinical data, may help identify patients at higher risk of KOA progression.

  • These results support the potential clinical utility of AI-assisted prediction tools for enhancing early diagnosis and individualized treatment planning in KOA.

  • The model still needs to be tested in other groups of patients and should include more parts of the knee joint to make sure it works well for different people and in real-world clinical settings.


Ting Wang and Hao Liu develop and test a predictive model integrating longitudinal MRI radiomic features, biochemical biomarkers, and clinical variables to help forecast the progression of knee osteoarthritis.

Introduction

Osteoarthritis (OA) is a prevalent articular disease that can cause chronic joint pain and debilitating symptoms, significantly impacting patients’ quality of life [1,2]. With the aging population and increasing risk factors, the global incidence of OA is expected to rise [3]. The Global Burden of Disease (GBD) project estimates a staggering 303.1 million cases of OA worldwide [4]. Unfortunately, no approved Disease-Modifying Osteoarthritis Drugs (DMOADs) are available, making OA a severe condition with unmet medical needs [5]. It is worth noting that approximately 20%–30% of Knee Osteoarthritis (KOA) patients may progress to end-stage disease, necessitating Total Knee Replacement (TKR) [6].

The knee is the most commonly affected weight-bearing joint in older adults. In KOA patients, excessive stress and strain on load-bearing tissues, encompassing bone, cartilage, and meniscus, can also lead to cartilage deterioration and microscopic bone damage [7]. Some models were developed using knee Magnetic Resonance Image (MRI) and biochemical markers to predict KOA progression in the Foundation for the National Institutes of Health (FNIH) OA Biomarkers Consortium study [812]. Combining quantitative and semiquantitative data of load-bearing tissues with biochemical biomarkers, models achieved an Area Under the receiver operating characteristic Curve (AUC) of 0.641–0.722 for predicting Joint Space Narrowing (JSN) and pain progression [8]. Using semiquantitative change of cartilage and meniscus data, models achieved AUCs of 0.706–0.740 [9]. Predictive models based on radiographic subchondral Trabecular Bone Texture (TBT) had AUC of 0.633–0.649 [10]. Finally, models incorporating urinary C-terminal cross-linked telopeptides of type II collagen (CTX-Ⅱ), serum hyaluronan, and serum N-telopeptide of type I collagen (NTX-Ⅰ) had an AUC of 0.667 for predicting JSN and pain progression [12].

Recent advancements by Saarakkala and colleagues [13] and Lespessailles and colleagues [14] have demonstrated the potential of integrating imaging biomarkers with biochemical and clinical data. Saarakkala and colleagues highlighted the utility of deep learning on structural MRI for KOA progression prediction, emphasizing the power of data-driven representations over manually extracted biomarkers [13]. Lespessailles and colleagues underscored the role of trabecular bone texture and multimodal biomarker integration for improved predictive accuracy [14]. These studies highlight the growing recognition of combining multiple biomarker types to enhance precision in KOA predictions.

To date, the MRI semiquantitative data, biomarkers, and KOA symptom scores are widely used to evaluate the clinical benefit of KOA patients. However, a predictive model integrating data from longitudinal MRI radiomics, serum or urine biomarkers, and clinical high-risk factors is unavailable. In this study, using the Convolutional Neural Network (CNN) algorithm [15], we automatically segmented the knee MRI of load-bearing tissue, including femur, tibia, femorotibial cartilage and meniscus [16,17]. Then, using the eXtreme Gradient BOOSTing (XGBOOST) algorithm, we developed and tested the Load-Bearing Tissue Radiomic plus Biochemical biomarker and Clinical variable Model (LBTRBC-M), incorporating the 2-year follow-up of load-bearing tissue MRI radiomics, biochemical biomarkers, and clinical variables, to predict KOA progression within the subsequent 2 years in the FNIH OA Biomarkers Consortium cohort study.

Patients and methods

Study design and participants

The current study used publicly available, de-identified data from the Osteoarthritis Initiative (OAI), an ongoing, multicenter, prospective cohort study (Clinical Trials.gov identifier: NCT00080171). As such, no additional ethical approval was required for secondary analysis. The original OAI study protocol was approved by the institutional review boards of all participating centers, including the coordinating center at the University of California, San Francisco (IRB number: 10-00532). The study was conducted in accordance with the Health Insurance Portability and Accountability Act (HIPAA), and all participants provided informed consent.

Six-hundred participants were selected based on frequent knee pain and a Kellgren-Lawrence Grade (KLG) of 1, 2, or 3 on knee radiographs at baseline [18]. They were required to have the baseline and 24-month radiographic data on medial tibiofemoral joint space width (JSW) [19], knee MRI, stored serum and urine specimens, and clinical data. Further details can be found in our study’s flowchart (S1 Fig).

Inclusion and exclusion criteria

FNIH OA Biomarkers Consortium undertook a nested case-control study (194 JSN and pain progression cases and 406 OA comparators) of progressive KOA within the OAI, a unique longitudinal cohort with a publicly available repository of joint images, biologic specimens, and clinical data obtained at annual clinic visits. Details of the study design have been published previously [18]. Briefly, participants eligible for the present study were those who had at least 1 knee with a KLG of 1–3 at baseline determined at a central reading site and for whom knee radiographs, knee MRIs, stored serum and urine specimens, and clinical data were available for the baseline and 24-month visits. One index knee was selected for each participant.

Knees were excluded from analysis under the following circumstances: if progression criteria were met by 12 months to enable the study of change in biomarkers before the progression definition was met, if radiographic lateral joint space narrowing (JSN) grade 2 or 3 was present at baseline, or if total knee replacement or total hip replacement had occurred prior to 24 months due to possible effects on biochemical markers.

Clinical outcome

A predetermined number of index knees were categorized into four groups based on outcome assessment at 24 months: (1) knees with both JSN and pain progression (n = 194), (2) knees with JSN progression but not pain progression (n = 103), (3) knees with pain but not JSN progression (n = 103), and (4) knees with neither JSN nor pain progression (n = 200). The main analysis focused on comparing knees with both JSN and pain progression to all other knees [12,20]. JSN progression was defined as a minimum JSW loss of ≥0.7 mm, and pain progression as a sustained (≥2 time points) increase of ≥9 points on the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) pain subscale (0–100 scale) [18].

After enrollment, participants underwent routine clinical examination, knee MRI, and knee joint radiography every 12 months for 2 years.

Clinical examination and radiography

The FNIH OA Biomarkers Consortium cohort study used the following clinical examination: [1] Knee pain severity scale. [2] Participant global assessment. [3] WOMAC Osteoarthritis Index. [4] Knee Outcomes in Osteoarthritis Survey (KOOS). [5] Limitation in activity due to knee pain. [6] General health and functional status. [7] Walking ability and endurance. [8] Upper leg strength. [9] Serum and urine biochemical biomarkers assay.

The knee fixed-flexion, posterior-anterior weight-bearing radiographs obtained at baseline, and all annual follow-up visits were performed using a Plexiglas positioning frame (SynaFlexer; BioClinica, Newark, CA), with knees flexed to 5–15° and feet internally rotated 10°. All radiographs were centrally read to determine the KLG [21]. The KLG assessments for radiographs were performed by Dr. Piran Aliabadi, MD and Dr. Burt Sack, MD, under the direction of Dr. David Felson, MD from the Boston University Clinical Epidemiology Research and Training Unit for the baseline through 24-month visits. Specifically, Dr. Piran Aliabadi and Dr. Burt Sack are board-certified radiologists with extensive experience in musculoskeletal imaging, and the assessments were conducted under the supervision of Dr. David Felson, a senior rheumatologist and epidemiologist with decades of expertise in osteoarthritis research at the Boston University Clinical Epidemiology Research and Training Unit. The minimum JSW in the medial femorotibial compartment was measured using automated software [22].

Biochemical biomarkers

Eighteen serum biomarkers included Cartilage Oligomeric Matrix Protein Cartilage Oligomeric Matrix Protein (sCOMP), Hyaluronic Acid (sHA), Type IIA Procollagen Amino-terminal Propeptide (sPIIANP), Type I Collagen C-terminal Telopeptide (sCTX-I), Aggrecan Chondroitin Sulfate 846 Epitope (sCS846), Matrix Metalloproteinase-3 (sMMP-3), Cleavage neoepitope of type II collagen (sC2C), Type II Collagen Neoepitope (sC1, 2C), C-Propeptide of type II collagen (sCPⅡ), sNTX-Ⅰ, and Nitrated triple helix of type II collagen (sColl2−1NO2). Urine biomarkers included C-terminal cross-linked telopeptide of type Ⅰ collagen (α-isomer) (uCTX-Ⅰα), C-terminal cross-linked telopeptide of type I collagen (β-isomer) (uCTX-Ⅰβ), uNTX-Ⅰ, Cleavage neoepitope of type II collagen (uC2C), Type II collagen neoepitope (uC1, 2C), Nitrated triple helix of type II collagen (uColl2−1NO2), and uCTX-Ⅱ. These markers reflect processes such as cartilage degradation and synthesis, bone turnover, and joint inflammation [12]. Serum and/or urine biochemical markers were measured, and urinary markers were standardized based on urinary creatinine concentration. The interassay coefficients of variation for these markers ranged from 3% to 12% [12].

MRI protocol and assessment

This study is based on baseline SAGittal 3-Dimensional Double Echo Steady-State with selective Water Excitation (SAG-3D-DESS-WE) MRI data acquired by the FNIH OA Biomarkers Consortium cohort study using Siemens Trio 3.0 Tesla scanners (Magnetom Trio, Siemens Healthcare, Erlangen, Germany) [23]. The SAG-3D-DESS-WE series utilizes near anisotropic voxels (0.7 mm slice thickness × 0.37 mm × 0.46 mm) to maximize in-plane sagittal spatial resolution in a reasonable acquisition time (10.5 min). The protocol of SAG-3D-DESS-WE can be found in S1 Table.

Two experienced musculoskeletal radiologists (F. Roemer and A. Guermazi) independently assessed the MRIs according to Magnetic resonance imaging OsteoArthritis Knee Score (MOAKS). All scores showed substantial (0.61–0.8) or high (0.81–1.0) agreements regarding intra-observer and inter-observer reliabilities.

An automated MRI segmentation approach using CNN [15] was developed for six anatomical structures (femur, tibia, femoral and tibial cartilages [16], and both meniscus [17]. To compute features that might be suitable for being used as biomarkers for KOA progression, we employ methods of automatic image segmentation to specify the anatomical Volume of Interests (VOIs) using CNN. Ambellan and colleagues [16] used the method to segment the femur, tibia, femoral, and tibial cartilage. The method of Tack and colleagues [17] was used to segment the medial and lateral meniscus.

VOIs (n = 20 knees) defining the femur, tibia, femorotibial cartilages, and meniscus were manually adjusted by two independent authors (including S.F.L., with 6 years of experience in orthopedics), all of whom were blinded to the clinical outcome data. The segmentation and MRI were viewed and adjusted using itk-SNAP version 3.8.0 software (www.itksnap.org). The Dice Similarity Coefficient (DSC) was used to assess manual adjustment and automated segmentation agreement (S4 Table).

Double echo steady-state signal feature map of load-bearing tissues

We developed the feature maps of load-bearing tissues using their own Double Echo Steady-State (DESS) signal intensity’s (mean pixel value), which builds upon previous studies that visualized the texture of knee MRI [2426]. We used this method to detect the MRI changes in KOA progression participants.

Model development and test

In the FNIH OA Biomarkers Consortium cohort study, excluding 47 knee MRIs with non-conforming MR images, there were 594 participants with 1,753 knee MRIs selected during a 2-year follow-up. The knee MRIs were randomly split into a development cohort and a test cohort with a ratio of 1:1 at each visit time point. Each time point (−100 samples per group) would yield only −20 samples with an 8:2 or 7:3 split. A 1:1 split ensured better representation, enhancing model reliability. Although the FNIH dataset was originally constructed as a nested case-control study with matched pairs, we disrupted the original matching during the cohort split. Specifically, participants were randomly assigned to the development and test cohorts, disregarding their original case-control pairings. This strategy allowed us to evaluate model performance in a setting more reflective of real-world clinical variability and supported the use of generalized learning approaches, rather than those relying on strict pairwise comparison. The ratio of JSN and pain progression, JSN progression, pain progression, and non progression groups was 2:1:1:2 in each cohort.

The predictive model was developed in a total development cohort (n = 877), which was integrated by the development cohort of three visits. This model was tested in a total test cohort (n = 876), which incorporated by test cohort of three visits, including test cohort 1 (n = 301) at baseline, test cohort 2 (n = 297) at 1-year follow-up, and test cohort 3 (n = 278) at 2-year follow-up (Fig 1). The predictive performance was validated using 10-fold cross-validation (Repeated 100 interactions). To assess model stability, we compared LBTRBC-M performance across 100, 300, 1,000, and 25,000 cross-validation iterations (S18 Table). Given the limited performance gain and the high computational cost, we chose 100 iterations for the final analysis as a practical balance between accuracy and efficiency. The specifications of our hardware specs were Intel Core i5-1135G7 Central Processing Unit 2.40 GHz, 16 GB Random access memory, and a 512 GB Solid-State Drive.

Fig 1. Participant flow and MRI timeline.

Fig 1

A total of 600 eligible knees were identified from the FNIH OA Biomarkers Consortium cohort. After excluding 47 non-conforming knee MRIs, 1,753 knee MRIs were included over the 2-year follow-up period: 594 at baseline, 575 at 1-year follow-up, and 584 at 2-year follow-up. The predictive model was developed by total development cohort (877 knee MRIs) and tested by total test cohort (876 knee MRIs) which both were incorporated by three visits: baseline, 1-year follow-up, and 2-year follow-up. The ratio between the development cohort and test cohort was 1/1 in each visit time point. FNIH OA Biomarkers Consortium cohort: Foundation of the National Institutes of Health Osteoarthritis Biomarkers Consortium cohort, MRI: Magnetic Resonance Image, JSN: Joint Space Narrowing.

Three-dimensional MRI radiomic feature analysis

LBTRBC-M was developed in the total development cohort by integrating MRI radiomic features from load-bearing tissues, biochemical biomarkers, and clinical high-risk variables (Fig 2). The MRI protocol (S1 Table) and automatic CNN-based segmentation scheme (S2 Fig) were shown in our study. Image Biomarker Standardisation Initiative (IBSI) standardized MRI radiomic features [27] were extracted using Standardized Environment for Radiomics Analysis (SERA) package [28] from baseline, 1-year follow-up, and 2-year follow-up sagittal SAG-3D-DESS-WE MRI, and predictive models were constructed using an XGBOOST algorithm after using Least Absolute Shrinkage and Selection Operator (LASSO) logistic regression (Repeated 1,000,000 times) for feature selection (S4 and S5 Figs). We have included a comparison of the number of features before and after LASSO selection. Initially, 2,947 features were considered in the model: 2,922 MRI radiomic features, 17 biochemical biomarkers, and 7 clinical variables. After applying LASSO, 255 non-zero features were retained in the final model, including 236 MRI radiomic features, 13 biochemical biomarkers, and 6 clinical variables. In our analysis, we utilized Statistical Analysis System software (SAS, version 9.4), leveraging the generalized regression platform with the LASSO method. The selection of the optimal λ was guided by the adaptive weighting approach combined with the Akaike Information Criterion, corrected (AICc) validation method. Specifically, λ was chosen from the optimal solution path where the AICc value was minimized, ensuring the best balance between model fit and complexity. The assumptions underlying LASSO regression, including the distribution of errors and the linearity of relationships between predictors and the outcome, were assessed during model development. The linearity assumption was checked through exploratory data analysis and transformations applied to variables as necessary. Residual diagnostics were performed to confirm that the error terms were approximately normally distributed. In addition to LBTRBC-M, several other models, including the Load-Bearing Tissue Radiomic Model (LBT-RM), Femur Radiomic Model (FE-RM), Femoral Cartilage Radiomic Model (FC-RM), Tibia Radiomic Model (TI-RM), Tibial Cartilage Radiomic Model (TC-RM), Lateral Meniscal Radiomic Model (LM-RM), Medial Meniscal Radiomic Model (MM-RM), MOAKS models, Biochemical biomarker Model (BM), and Clinical Model (CM) were constructed for comparison.

Fig 2. Workflow of the study.

Fig 2

Knee MRIs of the FNIH OA Biomarkers Consortium cohort were automatically segmented by CNN for feature extraction. After feature evaluation and modeling using the LASSO and XGBOOST algorithm, respectively, eight types of features (biochemical biomarkers, clinical variables, FE, FC, TI, TC, LM, and MM MRI radiomic features) were generated and further used to develop the LBTRBC-M. The performance of LBTRBC-M in predicting KOA progression (i.e., JSN and pain progression vs. JSN progression vs. pain progression vs. non progression) was tested in three visits. FNIH OA Biomarkers Consortium cohort: Foundation of the National Institutes of Health Osteoarthritis Biomarkers Consortium cohort, SAG-3D-DESS-WE: Sagittal 3-Dimensional Double Echo Steady-State with selective Water Excitation, CNN: Convolutional Neural Network, XGBOOST: eXtreme Gradient BOOSTing, LASSO: Least Absolute Shrinkage and Selection Operator, FE-RM: Femur Radiomic Model, FC-RM: Femoral Cartilage Radiomic Model, TI-RM: Tibia Radiomic Model, TC-RM: Tibial Cartilage Radiomic Model, LM-RM: Lateral Meniscal Radiomic Model, MM-RM: Medial Meniscal Radiomic Model, LBTRBC-M: Load-Bearing Tissue Radiomic plus Biochemical biomarker and Clinical variable Model, JSN: Joint Space Narrowing, SERA: Standardized Environment for Radiomics Analysis, MRI: Magnetic Resonance Image.

Selection of hyperparameters, model fitting and optimization process

The hyperparameters for XGBOOST and LASSO were selected based on prior literature and initial experiments. Specifically, the values for XGBOOST (max_depth: 6, subsample: 1, colsample_bytree: 1, min_child_weight: 1, α: 0, λ: 1, learning_rate: 0.3, iterations: 100) were chosen in accordance with recommendations in previous studies and through empirical testing to optimize model performance (S16 Table). These settings were further refined using grid search (150 grid points) with cross-validation to identify the best-performing combination. For LASSO, the penalty parameter (λ) was optimized using cross-validation to minimize the mean squared error (MSE). The selection of hyperparameters was guided by Hastie and colleagues [29] and adapted to the specifics of the dataset and model requirements.

In this study, we used XGBOOST with 10-fold cross-validation to fit and optimize the model. Model performance was assessed using metrics such as AUC, accuracy, Logloss, and Root Average Squared Error (RASE). Key hyperparameters (S15 Table), including max_depth, learning_rate, subsample, and λ, were optimized via grid search with cross-validation, with practical constraints based on prior literature (e.g., max_depth between 3 and 10, learning_rate between 0.1 and 0.3).

Reader experiments

Seven resident physicians (Clinical practice years: 1–4 years, physicians list: Liu, Zhao, Cao, J Li, Chen, X Wang, Dang, M Zhang) participated in a study to predict the progression of KOA using knee MRI (SAG-3D-DESS-WE sequence), biochemical biomarkers (17 types of serum and urine biomarkers), and clinical variables (Age, sex, Body Mass Index (BMI), race, WOMAC pain score, WOMAC disability score, and pain medication use). The study also evaluated the use of LBTRBC-M in assisting the predictions. The LBTRBC-M system provided the probability of four clinical outcomes (JSN and pain progression, JSN progression, pain progression, and non-progression). We used color coding to differentiate prediction results within various probability ranges. Since the outcome variable was a four-class label, random prediction would assign approximately 25% probability to each class. Therefore, when the LBTRBC-M model output a probability above 25.0% for a given class, we considered this as indicative of stronger prediction confidence and displayed the results were displayed in red font; otherwise, they were shown in black font. This color-coding scheme was designed to help resident physicians better understand and utilize the model’s output.

Evaluation of model performance

We evaluated the accuracy of the XGBOOST MRI radiomic models in the prediction of KOA progression using ROC analysis. Evaluation metrics were computed, including the Area Under ROC Curve (AUC), sensitivity, specificity, and kappa value. In this study, a “successful prediction” is defined through a combination of predictive accuracy and clinical relevance. From a statistical perspective, a prediction is considered successful if the model achieves an AUC greater than 0.70, a widely accepted threshold indicating reliable discriminative performance for clinical decision-making [30]. Beyond statistical accuracy, clinical utility serves as a crucial measure of success. A prediction is clinically meaningful if it helps identify high-risk patients likely to experience JSN progression and/or pain progression within the next two years. Furthermore, the model’s ability to integrate multiple biomarkers and clinical variables enhances its capacity to provide a comprehensive risk assessment, ensuring its applicability in real-world clinical decision-making. The performance of the predictive model was validated using 10-fold cross-validation (S3 Fig).

Missing data

Missing data were addressed using multiple imputation (Predictive mean matching). Specifically, data were missing for 2 knees in the MOAKS femur lateral posterior bone marrow lesion percentage (lesion that is edema), 13 knees for serum biomarkers, and 10 knees for urine biomarkers. The percentages of missing data were 0.3%, 2.1%, and 1.7%, respectively. Given these low percentages, the impact on the results was deemed minimal.

Statistical analysis

Generalized Estimating Equation (GEE) was used to assess the risk of KOA progression adjusting for the confounders (age, sex, BMI, knee side, race, WOMAC knee pain score) [3134]. These confounders were selected based on prior literature and their potential impact on KOA progression. The GEE model assumed an autoregressive correlation structure, as this was deemed most suitable for the nature of the repeated measures in our data. MRI radiomic feature extraction and DSC calculation were performed in Matlab R2021a (version 9.10.0). GEE, LASSO logistic regression, XGBOOST modeling, and the Delong test were performed in the SAS. Model performance was assessed using AUC, sensitivity, specificity, accuracy, and kappa value in all development and test cohort. We compared all cohorts using the unpaired t test/one-way ANOVA tests for continuous variables and the χ2 test, Mann–Whitney test, or Kruskal–Wallis test for categorical variables, as appropriate. For continuous variables, parametric tests (e.g., unpaired t test, one-way ANOVA) were employed when data were normally distributed, as assessed using the Shapiro–Wilk test. Non-parametric tests (e.g., Mann–Whitney test, Kruskal–Wallis test) were used when the assumption of normality was not met. For categorical variables, χ² tests were used to compare proportions across groups. To control for type I error in our analysis, both Tukey’s honestly significant difference (HSD) test and Bonferroni correction were applied during one-way ANOVA tests for multiple comparisons. Area differences between two Receiver Operator Characteristic (ROC) curves were compared using the DeLong test. For Bonferroni correction, the adjusted P value threshold was calculated by dividing 0.05 by the number of comparisons. Accordingly, we considered results statistically significant if the adjusted p value was below this threshold. The exact adjusted significance levels are reported where relevant. A P value < 0.05 was considered statistically significant unless otherwise specified after correction. R (version 4.1.1) with the pROC package (version 1.18.0) was used for all analyses.

Results

Baseline characteristics

At the baseline visit, 293 (178/293, 61% females) individuals were included in the development cohort 1, while 301 (171/301, 57% females) individuals were included in the test cohort 1 (S2 Table). Results of baseline characteristics showed no significant differences between the two cohorts (S2 Table).

Baseline characteristics among the four groups (JSN and pain progression, JSN progression, pain progression, and non-progression) were generally similar, except for age (p = 0.018), female (p = 0.016), and KLG (p = 0.003) in test cohort 1 (Table 1). In addition, the levels of baseline serum/urine biochemical biomarkers showed no significant differences between the four groups in both cohorts (S3 Table).

Table 1. Baseline characteristics of participants in development cohort 1 and test cohort 1.

Development cohort 1 (n = 293) Test cohort 1 (n = 301)
JSN and pain (n = 85) JSN (n = 52) Pain (n = 47) Non (n = 109) p value JSN and pain (n = 108) JSN (n = 50) Pain (n = 52) Non (n = 91) p value
Agea 62.3 ± 9.2 63.2 ± 8.2 61.1 ± 8.2 62.0 ± 9.2 0.703 61.8 ± 8.5 63.2 ± 8.6 57.9 ± 9.1 60.8 ± 9.0 0.018
Femaleb 53 (62%) 25 (48%) 29 (62%) 71 (65%) 0.361 56 (52%) 21 (42%) 35 (67%) 59 (65%) 0.057
BMIa 30.7 ± 4.5 30.0 ± 4.8 30.8 ± 4.4 30.4 ± 4.7 0.826 30.7 ± 5.0 31.3 ± 4.3 31.2 ± 5.6 30.6 ± 4.9 0.845
PMb 30 (35%) 11 (21%) 17 (36%) 32 (29%) 0.488 33 (31%) 11 (22%) 18 (35%) 23 (25%) 0.653
Injuryb 24 (28%) 18 (35%) 19 (40%) 34 (31%) 0.694 44 (41%) 22 (44%) 19 (37%) 32 (35%) 0.798
Surgeryb 16 (19%) 9 (17%) 5 (11%) 23 (21%) 0.777 19 (18%) 11 (22%) 10 (19%) 13 (14%) 0.890
KLGb 0.960 0.003
 1 8 (9%) 8 (15%) 6 (13%) 11 (10%) 16 (15%) 6 (12%) 7 (13%) 13 (14%)
 2 50 (59%) 25 (48%) 23 (49%) 58 (53%) 34 (31%) 21 (42%) 35 (67%) 56 (62%)
 3 27 (32%) 19 (37%) 18 (38%) 40 (37%) 58 (54%) 23 (46%) 10 (20%) 22 (24%)
MJSWa 4.1 ± 1.3 3.9 ± 1.2 4.0 ± 1.1 3.8 ± 1.0 0.358 3.6 ± 1.4 3.6 ± 1.2 3.9 ± 1.0 4.0 ± 1.0 0.109
WOMAC_PSb 1 (0, 3) 2 (0, 5) 1 (0, 3) 1 (0, 4) 0.219 1 (0, 3) 2 (0, 5) 1 (0, 2) 1 (0, 5) 0.091
WOMAC_SSb 2 (0, 2) 2 (0, 3) 1 (0, 2) 1 (0, 2) 0.552 1 (0, 2) 1 (0, 3) 2 (0, 2) 1 (0, 3) 0.905
WOMAC_DSb 4 (0, 13) 3 (0, 15) 3 (1, 10) 4 (0, 11) 0.962 6 (1, 15) 5 (0, 18) 6 (1, 15) 2 (0, 16) 0.401

Data are mean ± SD or number (%).

a One-way ANOVA tests are used for differences between means.

b Kruskal–Wallis tests are used for differences between ranks.

The results of development cohort 1 and test cohort 1 corresponded to baseline follow-up. JSN: Joint Space Narrowing, BMI: Body Mass Index, PM: Pain Medication, KLG: Kellgren and Lawrence Grade, MJSW: Minimum Joint Space Width, WOMAC_PS: Western Ontario and McMaster Universities Arthritis Index Pain Score, WOMAC_SS: Western Ontario and McMaster Universities Arthritis Index Stiffness Score, WOMAC_DS: Western Ontario and McMaster Universities Arthritis Index Disability Score, SD: Standard Deviation.

Reliability of automatic segmentation

The CNN segmentation dataset was utilized for extracting MRI radiomic features, and all DSCs were above 0.800 (S4 Table). The feature selection process using LASSO regression was displayed in S4 and S5 Figs, and the selected features can be found in S5 Table.

Feature maps of load-bearing tissues

Fig 3 illustrates the distinct DESS signal intensity changes observed among the four groups across load-bearing structures. The high values of the femur and tibia were detected in JSN and pain progression, and pain progression group. The high values of femoral cartilage, tibial cartilage, lateral meniscus, and medial meniscus were detected in JSN and pain progression, and JSN progression group. As shown in S6 Fig, in the femur, tibia, and medial meniscus, the JSN and pain progression group exhibited the highest MRI radiomic values, followed by the pain progression group, and then the JSN progression group. Conversely, in the femoral cartilage, tibial cartilage, and lateral meniscus, the JSN and pain progression group had the highest MRI radiomic values, followed by the JSN progression group, and then the pain progression group.

Fig 3. The DESS signal feature maps of load-bearing tissues in different groups and prediction performance of LBT-RM, LBT-MOM, BM, CM, BCM, and LBTRBC-M in the test cohorts.

Fig 3

DESS signal intensity maps of load-bearing tissues were developed in JSN and pain progression (A), JSN progression (B), pain progression (C), and non progression group (D). The performance of predicting JSN and pain progression (E–H), JSN progression (I–L), pain progression (M–P), and non progression (Q–T) in LBT-RM, LBT-MOM, BM, CM, BCM, and LBTRBC-M in the test cohort 1 to 3 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. LBT-RM: Load-Bearing Tissue Radiomic Model, LBT-MOM: Load-Bearing Tissue MOAKS Model, BM: Biochemical biomarker Model, CM: Clinical Model, BCM: Biochemical biomarker plus Clinical variable Model, LBTRBC-M: Load-Bearing Tissue Radiomic plus Biochemical biomarker and Clinical variable Model, AUC: Area Under receiver operating characteristic Curve, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score, DESS: Double Echo Steady-State.

Algorithm selection

In our study, the performance of the LBTRBC-M model was compared across several machine learning algorithms, including XGBOOST, bootstrap forest, shallow neural network, support vector machines, decision tree, nominal logistic, and naive bayes (S15 Table). Among these, XGBOOST demonstrated the best overall performance, achieving the highest AUC (0.897) and the lowest Root Average Squared Error (RASE) (0.508). Additionally, XGBOOST had a favorable Entropy R Square (ERS) of 0.409 and a low misclassification rate (0.299), outperforming all other algorithms, including bootstrap forest and shallow neural network. Given its superior discriminative ability and error minimization, XGBOOST was selected as the optimal algorithm for predicting KOA progression in this study.

Performance of single-structure MRI radiomic models: Comparison with MOAKS imaging marker

The Single-Structure Radiomic Models (SS-RMs), including FE-RM, FC-RM, TI-RM, TC-RM, LM-RM, and MM-RM, showed low kappa values ranged from 0.042 to 0.296 (S7 Fig), moderate AUCs ranged from 0.507 (95% confidence interval (CI) [0.424, 0.589]) to 0.747 (95% CI [0.685, 0.801]) (S8S13 Figs and S6 Table), and low accuracy ranged from 30.9% to 51.5% (S7 Table) in the test cohorts.

In the total test cohort, comparisons with Single-Structure MOAKS Models (SS-MOMs), SS-RMs showed significant AUC improvement in predicting KOA progression, including JSN and pain progression, JSN progression, and pain progression, except for FC-RM versus FC-MOM in predicting JSN and/or pain progression (S8 Table).

Performance of LBTRBC-M: Comparison with SS-RMs

The integration of SS-RMs into the LBT-RM resulted in improved predictive accuracy. The LBT-RM achieved kappa values ranged from 0.355 to 0.450 (S14 Fig), AUCs ranged from 0.761 (95% CI [0.685, 0.823]) to 0.857 (95% CI [0.796, 0.903]) (Fig 3E3T), and accuracy ranged from 54.0% to 61.6% (S7 Table) in the test cohorts. In comparison, the Load-Bearing Tissue MOAKS Model (LBT-MOM), BM, CM, and Biochemical biomarker plus Clinical variable Model (BCM) had lower AUC values in the test cohorts, ranging from 0.643 (95% CI [0.541, 0.733]) to 0.736 (95% CI [0.668, 0.794]), 0.694 (95% CI [0.597, 0.776]) to 0.777 (95% CI [0.717, 0.827]), 0.563 (95% CI [0.487, 0.636]) to 0.741 (95% CI [0.669, 0.802]), and 0.706 (95% CI [0.613, 0.784]) to 0.822 (95% CI [0.751, 0.877]), respectively (S6 Table). Furthermore, the kappa values, AUCs, and accuracy of LBTRBC-M ranged from 0.551 to 0.628, 0.869 (95% CI [0.819, 0.907]) to 0.919 (95% CI [0.883, 0.946]), and 68.0% to 74.1% in the test cohorts (Fig 3E3T and S6 and S7 Tables).

When predicting KOA progression, LBTRBC-M outperformed LBT-RM, BM, CM, BCM, and Load-Bearing Tissue MOAKS Model (LBTMBC-M) with significant AUC differences (Tables 2 and S8, p < 0.050). To predict JSN and pain progression, the AUC differences between LBTRBC-M and LBT-RM, BM, CM, BCM, and LBTMBC-M were 0.072 (95% CI [0.044, 0.100]; p < 0.001), 0.122 (95% CI [0.088, 0.157]; p < 0.001), 0.224 (95% CI [0.183, 0.264]; p < 0.001), 0.100 (95% CI [0.067, 0.134]; p < 0.001), and 0.096 (95% CI [0.064, 0.128]; p < 0.001) in the total test cohort, respectively (Table 2). Additionally, As shown in S8 Table, when predicting KOA progression, there was no significant AUC difference between LBT-RM and BCM in the total test cohort, such as JSN and pain progression, the AUC difference was 0.028 (95% CI [−0.015, 0.072], p = 0.203), for JSN progression, 0.023 (95% CI [−0.029, 0.075], p = 0.379), and for pain progression, 0.030 (95% CI [−0.026, 0.086], p = 0.288).

Table 2. Comparing the areas under two correlated ROC curves between predictive models in the test cohorts.

Predicting models Test cohort 1 Test cohort 2 Test cohort 3 Total test cohort
AUC difference p value AUC difference p value AUC difference p value AUC difference p value
JSN and pain progression
 LBTRBC-M vs. LBT-RM 0.078 (0.025, 0.131) 0.004 0.063 (0.020, 0.106) 0.004 0.084 (0.036, 0.132) <0.001 0.072 (0.044, 0.100) <0.001
 LBTRBC-M vs. BM 0.092 (0.034, 0.151) 0.002 0.163 (0.100, 0.226) <0.001 0.115 (0.058, 0.172) <0.001 0.122 (0.088, 0.157) <0.001
 LBTRBC-M vs. Clinical model 0.204 (0.132, 0.275) <0.001 0.243 (0.176, 0.309) <0.001 0.227 (0.156, 0.299) <0.001 0.224 (0.183, 0.264) <0.001
 LBTRBC-M vs. BCM 0.070 (0.012, 0.127) 0.018 0.137 (0.078, 0.197) <0.001 0.100 (0.041, 0.158) <0.001 0.100 (0.067, 0.134) <0.001
 LBTRBC-M vs. LBTMBC-M 0.067 (0.014, 0.121) 0.014 0.119 (0.061, 0.176) <0.001 0.109 (0.053, 0.165) <0.001 0.096 (0.064, 0.128) <0.001
JSN progression
 LBTRBC-M vs. LBT-RM 0.120 (0.066, 0.175) <0.001 0.049 (−0.009, 0.107) 0.098 0.108 (0.061, 0.155) <0.001 0.092 (0.061, 0.123) <0.001
 LBTRBC-M vs. BM 0.147 (0.063, 0.231) <0.001 0.139 (0.062, 0.217) <0.001 0.176 (0.080, 0.272) <0.001 0.153 (0.104, 0.202) <0.001
 LBTRBC-M vs. Clinical model 0.193 (0.121, 0.265) <0.001 0.225 (0.143, 0.307) <0.001 0.189 (0.116, 0.263) <0.001 0.201 (0.158, 0.245) <0.001
 LBTRBC-M vs. BCM 0.097 (0.030, 0.163) 0.004 0.120 (0.050, 0.190) <0.001 0.132 (0.052, 0.212) 0.002 0.115 (0.074, 0.157) <0.001
 LBTRBC-M vs. LBTMBC-M 0.111 (0.044, 0.177) <0.001 0.152 (0.078, 0.225) <0.001 0.149 (0.072, 0.227) <0.001 0.136 (0.094, 0.178) <0.001
Pain progression
 LBTRBC-M vs. LBT-RM 0.088 (0.030, 0.146) 0.003 0.072 (0.022, 0.123) 0.005 0.120 (0.059, 0.181) <0.001 0.096 (0.063, 0.129) <0.001
 LBTRBC-M vs. BM 0.128 (0.053, 0.202) <0.001 0.199 (0.122, 0.276) <0.001 0.125 (0.046, 0.204) 0.002 0.149 (0.105, 0.193) <0.001
 LBTRBC-M vs. Clinical model 0.145 (0.081, 0.210) <0.001 0.261 (0.161, 0.362) <0.001 0.190 (0.108, 0.273) <0.001 0.195 (0.148, 0.243) <0.001
 LBTRBC-M vs. BCM 0.085 (0.020, 0.149) 0.010 0.187 (0.104, 0.270) <0.001 0.116 (0.043, 0.189) 0.002 0.126 (0.084, 0.169) <0.001
 LBTRBC-M vs. LBTMBC-M 0.133 (0.058, 0.207) <0.001 0.211 (0.123, 0.299) <0.001 0.143 (0.075, 0.211) <0.001 0.160 (0.116, 0.203) <0.001
Non progression
 LBTRBC-M vs. LBT-RM 0.109 (0.062, 0.155) <0.001 0.094 (0.055, 0.133) <0.001 0.109 (0.066, 0.152) <0.001 0.104 (0.079, 0.128) <0.001
 LBTRBC-M vs. BM 0.159 (0.097, 0.221) <0.001 0.172 (0.112, 0.232) <0.001 0.176 (0.111, 0.241) <0.001 0.169 (0.134, 0.205) <0.001
 LBTRBC-M vs. Clinical model 0.223 (0.155, 0.292) <0.001 0.333 (0.260, 0.405) <0.001 0.334 (0.253, 0.415) <0.001 0.295 (0.252, 0.337) <0.001
 LBTRBC-M vs. BCM 0.133 (0.073, 0.193) <0.001 0.177 (0.116, 0.238) <0.001 0.174 (0.109, 0.239) <0.001 0.161 (0.125, 0.196) <0.001
 LBTRBC-M vs. LBTMBC-M 0.097 (0.039, 0.155) 0.002 0.151 (0.096, 0.206) <0.001 0.151 (0.092, 0.211) <0.001 0.132 (0.099, 0.165) <0.001

Data are AUC difference (95% CI).

The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, one year follow-up, two years follow-up, and encompassed the aforementioned follow-up time points. ROC: Receiver Operating Characteristic, AUC: Area Under the ROC Curve, JSN: Joint Space Narrowing, CI: Confidence Interval, LBT-RM: Load-Bearing Tissue Radiomic Model, LBTRBC-M: Load-Bearing Tissue Radiomic plus Biochemical biomarker and Clinical variable Model, BM: Biochemical biomarker Model, BCM: Biochemical biomarker plus Clinical variable Model, LBTMBC-M: Load-Bearing Tissue MOAKS plus Biochemical biomarker and Clinical variable Model, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score.

Association of the model output with KOA progression

S9 Table displays the Odds Ratio (OR) of KOA progression for predictive model output in the entire cohort. The output of LBTRBC-M showed the highest ORs of JSN and pain progression in the predictive models. The adjusted OR of JSN and pain progression for LBTRBC-M output was 30.906 (95% CI [22.470, 42.511]); the adjusted OR of JSN progression for LBTRBC-M output was 6.465 (95% CI [4.740, 8.820]); and the adjusted OR of pain progression for LBTRBC-M output was 3.307 (95% CI [2.457, 4.452]).

Performance of LBTRBC-M-supported resident physicians: Compared with no LBTRBC-M-supported resident physicians

Seven resident physicians predicted the KOA progression using knee MRI, biochemical biomarkers, and clinical data (Fig 4A). Based on the predictive performance of individual physicians (S10 and S11 Tables), we found the assistance of LBTRBC-M significantly improved the accuracy of resident physicians in predicting KOA progression from 46.9% (95% CI [44.7%, 49.0%]) to 65.4% (95% CI [64.4%, 66.5%]) in the total test cohort (S12 Table). In addition, with LBTRBC-M support, the performance of physicians in predicting JSN and pain progression were also improved, with increased sensitivity and specificity of 68.1% (95% CI [66.2%, 70.0%]; p < 0.050) and 80.4% (95% CI [78.9%, 81.8%]; p < 0.050) in the total test cohort, respectively {compared to 57.5% (95% CI [51.7%, 63.2%]) and 51.8% (95% CI [48.8%, 54.9%]) without LBTRBC-M support, respectively}. Similarly, the prediction performance, including sensitivity and specificity of JSN progression, pain progression, and non progression in physicians, showed improvement with LBTRBC-M assistance. These improvements were validated in different test cohorts of visits (Fig 4B4I and S15 and S12 Tables).

Fig 4. Performance of resident physicians and models in predicting the KOA progression in the total test cohort.

Fig 4

A schematic workflow of the assessment of KOA progression risk by the resident physicians with LBTRBC-M assistance (A). AUC for predicting JSN and pain progression (B), JSN progression (D), pain progression (F), and non progression (H) among the LBTRBC-M, LBTMBC-M, and average performance of all resident physicians without (blue dot) and with (red dot) the support of LBTRBC-M in the total test cohort was demonstrated. The performance of LBTRBC-M was superior to LBTMBC-M that of clinical experts given baseline MOAKS, serum biochemical level, and clinical data; however, as shown in B, D, F, and H, both the sensitivity and specificity of resident physicians were improved when assistance was provided by LBTRBC-M (black arrow). Individual performance of resident physicians was represented by open shapes (without LBTRBC-M support) and filled shapes (with LBTRBC-M support). The prognostic performance of resident physicians was greatly boosted by integrating LBTRBC-M into the loop, as shown in dashed connection lines in C, E, G, and I. The results of the total test cohort encompassed the baseline, 1-years follow-up, and 2-year follow-up points. Each colored line represents a different resident physician. AUC: Area Under receiver operating characteristic Curve, LBTRBC-M: Load-Bearing Tissue Radiomic plus Biochemical biomarker and Clinical variable Model, LBTMBC-M: Load-Bearing Tissue MOAKS plus Biochemical biomarker and Clinical variable Model, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score, AI: Artificial Intelligence.

Sensitivity analyses

To evaluate the robustness of our findings, sensitivity analyses were performed by re-running the GEE model using alternative correlation structures, including independent, exchangeable, and unstructured. The results were compared across these assumptions, and the analyses confirmed that our findings remained consistent and robust (S13 Table). To assess the impact of missing data on the study’s findings, sensitivity analyses were conducted (S14 Table). These included: comparing the results from the multiple imputation approach with those obtained from a complete case analysis (excluding cases with missing data). The results were consistent across both approaches, confirming that the handling of missing data did not significantly alter the study’s conclusions. To evaluate model robustness, we performed a sensitivity analysis by varying key hyperparameters (S17 Table) such as max_depth, learning_rate, subsample, and λ. We used Delong test to assess the effect of each parameter on model performance. This analysis aimed to identify the most influential parameters and quantify uncertainty, improving the model’s reliability and understanding its behavior in predicting KOA progression.

Testing of interactions

We tested the significance of interaction terms by comparing AUC differences between model combinations (LBTRBC-M versus LBTRB-M, LBTRBC-M versus LBTRC-M, LBTRC-M versus LBTRC-M) using the DeLong test. The results, presented in S8 Table, showed no significant AUC differences across test cohorts 1, 2, 3, and the total cohort. This indicates that adding different features (biochemical biomarkers, clinical variables) did not significantly improve model performance.

Probability distribution patterns in LBTRAB-M model predictions

In our contour plot analysis of the LBTRAB-M model (S3C Fig), probability distributions varied across progression categories. JSN and pain progression clustered between 5.0% and 99.0%, JSN progression between 15.0%−25.0% and 55.0%−75.0%, pain progression between 2.0% and 25.0%, and non-progression between 50.0%−99.0%. These patterns suggest distinct probability thresholds for each outcome, highlighting the need for ROC and precision-recall curve analyses to optimize sensitivity and specificity for clinical use.

Stratified cross-validation

To ensure a balanced distribution of outcome types across datasets, we performed a stratified split of the total development and test cohort for the LBTRBC-M model, maintaining an approximate 2:1:1:2 ratio of JSN and pain progression, JSN progression, pain progression, and non-progression (S16A and S16B Fig). Prior to stratification, the AUCs in the total test cohort were: 0.880 (95% CI [0.853, 0.903]) for JSN and pain progression, 0.913 (95% CI [0.881, 0.937]) for JSN progression, 0.886 (95% CI [0.856, 0.910]) for pain progression, and 0.909 (95% CI [0.888, 0.926]) for non-progression. Following stratification, the corresponding AUCs were: 0.853 (95% CI [0.826, 0.877]) for JSN and pain progression (ΔAUC = 0.027, 95% CI [−0.003, 0.058]; p = 0.080); 0.860 (95% CI [0.823, 0.891]) for JSN progression (ΔAUC = 0.053, 95% CI [0.016, 0.090]; p = 0.005), 0.878 (95% CI [0.843, 0.906]) for pain progression (ΔAUC = 0.008, 95% CI [−0.031, 0.047]; p = 0.697), and 0.853 (95% CI [0.824, 0.877]) for non-progression (ΔAUC = 0.056, 95% CI [0.030, 0.082]; p < 0.001) (S16 and S16D Fig and S19 Table).

Discussion

In this longitudinal and multicenter study, we followed 594 participants, conducting 1,753 knee MRIs over a 2-year follow-up. Our final predictive model, LBTRBC-M, utilized MRI radiomics of load-bearing tissues, biochemical biomarkers, and clinical high-risk factors for KOA to predict the JSN and/or pain progression. The model performed AUCs ranged from 0.869 (95% CI [0.819, 0.907]) to 0.919 (95% CI [0.861, 0.954]), kappa values ranged from 0.551 to 0.628, and accuracy ranged from 68.0% to 74.1% in predicting JSN and/or pain progression in the test cohorts. The output of LBTRBC-M indicated an elevated risk of JSN and pain progression (adjusted OR ranged from 3.307 to 30.906). LBTRBC-M significantly improved the performance of resident physicians in predicting KOA progression, increasing sensitivity, specificity, and accuracy in the test cohorts. The performance of this predictive model was validated in different test cohorts of visits. LBTRBC-M was shown as a potential non-invasive assistance tool for physicians to evaluate whether KOA individuals benefit from personalized decision-making.

Recent studies have advanced KOA progression prediction by integrating MRI radiomics, biochemical biomarkers, and clinical risk factors into predictive models. For example, Saarakkala and colleagues [13] utilized deep learning models on MRI data to predict KOA progression, achieving an AUC of 0.78. Similarly, Lespessailles and colleagues [14] demonstrated the predictive utility of trabecular bone texture from radiographs combined with clinical and biochemical data, emphasizing the value of multimodal integration. Meanwhile, Hu and colleagues [35] developed the DeepKOA model, integrating MRI and clinical data but observed limited improvement when adding clinical features.

Previous research from the FNIH OA Biomarkers Consortium identified MRI and biochemical biomarkers associated with KOA progression [12,3640]. Previous studies have reported associations between high subchondral bone Trabecular Thickness (Tb. Th) of the medial tibia and KOA progression [36]. Another study found that specific bone phenotypes were associated with an increased risk of KOA progression [37]. Baseline MRI measurements of bone shape and area in the femur, tibia, and patella were also identified as risk factors for JSN and pain progression [38]. Loss of medial femorotibial cartilage thickness was strongly associated with JSN and pain progression, with higher OR values for JSN compared to pain progression [39]. Baseline cartilage volume in the lateral femoral plate predicted medial JSN progression [40]. Additionally, baseline urine levels of CTX-Ⅱ and CTX-Ⅰα were predictors of JSN and pain progression [12]. In our study, the outputs of LBT-RM, BCM, and LBTRBC-M all increased the odds of KOA progression, including JSN and pain progression, JSN progression, and pain progression, whereas the output of LBTRBC-M performed the highest ORs of KOA progression, with adjusted OR values ranging from 9.938 to 944.796.

Several models have been developed to predict KOA progression in the FNIH OA Biomarkers Consortium study [8,9,12]. However, radiomic models are lacking, particularly in combining longitudinal load-bearing tissue MRI radiomics with biochemical biomarkers. Integrating different image sets and biochemical biomarkers has shown higher predictive accuracy. For instance, using semiquantitative MRI-based cartilage markers alone yielded an AUC of 0.706 for predicting JSN and pain progression, whereas adding meniscal scores raised the AUC to 0.722 [9]. Similarly, combining MRI and radiographic markers resulted in an AUC of 0.718, which increased to 0.722 with the addition of biochemical biomarkers [8]. Using urine CTX-Ⅱ as a predictor only achieved an AUC of 0.583, but combining it with other biochemical biomarkers, clinical variables, and radiographic scores raised the AUC to 0.668 [12]. In our study, using single-structure MRI radiomic features alone yielded AUCs of 0.507 (95% CI [0.424, 0.589]) 0.747 (95% CI [0.685, 0.801]) for predicting JSN and/or pain progression. However, integrating load-bearing tissue of MRI radiomics, the predictive performance of LTB-RM in predicting KOA progression was enhanced compared with single-structure MRI radiomic models. To predict JSN and pain progression, by incorporating biochemical biomarkers, the AUCs (95% CI) of LBT-RM increased from 0.808 (95% CI [0.776, 0.836]) to 0.860 (95% CI [0.832, 0.883]) in the total test cohort. The 6.9% increase in AUC was modest and not statistically significant and therefore must be interpreted cautiously. Furthermore, combining biochemical biomarkers with clinical KOA high-risk factors, the BCM showed similar performance with LBT-RM in predicting KOA progression. Finally, integrating load-bearing tissue of MRI radiomics, biochemical markers, and clinical KOA high-risk factors, the performance of LBTRBC-M improved in predicting KOA progression compared with LBT-RM, with AUCs of 5.7–15.8% increasing and kappa values of 32.5–55.5% increasing in the test cohorts. In addition, LBTRBC-M also outperformed other models for predicting KOA progression, including LBT-MOM, BCM, and LBTMBC-M.

In clinical practice, physicians have limited experience in predicting KOA progression. In our research, using the MOAKS scoring data, assessed by two experienced musculoskeletal radiologists (more than 10 years of experience in assessing KOA radiology) [9], LBT-MOM yielded AUCs ranged from 0.643 (95% CI [0.541, 0.733]) to 0.736 (95% CI [0.668, 0.794]) and accuracy ranged from 48.6% to 52.9% in the test cohorts. The accuracy of resident physicians in our study to predict KOA progression ranged from 44.8% to 48.4% in the test cohorts, which showed similar predictive performance with single-structure MOAKS models, including FE-RM, TI-RM, and MM-RM. However, when provided with the support of LBTRBC-M, resident physicians showed improved accuracy in predicting KOA progression, with accuracy increasing from 46.9% to 65.4% in the total test cohort. In addition, the sensitivity and specificity of physicians to predict JSN and pain progression were also improved in the total test cohort, with the mean sensitivity and specificity increased from 57.5% and 51.8% to 68.1% and 80.4%, respectively. The LBTRBC-M model offers valuable support for resident physicians in predicting KOA progression, improving diagnostic accuracy and aiding clinical decision-making. However, real-world implementation faces key barriers, including high costs for data acquisition and software integration, time constraints for data processing, and the need for clinician training to ensure proper interpretation of results. Additionally, addressing regulatory compliance and ensuring data privacy are essential for successful adoption. Overcoming these challenges through streamlined workflows and institutional support is critical to unlocking the model’s full potential in clinical practice.

To mitigate class imbalance, we applied a stratified cohort split in the LBTRBC-M model to ensure proportional representation of each KOA progression subtype. This improved evaluation fairness and revealed that stratification notably reduced AUC inflation in JSN progression and non-progression groups. These results underscore the necessity of stratified validation in multi-class modeling to enhance generalizability and avoid performance overestimation. In this study, we took several measures to mitigate overfitting in the XGBOOST model. First, we used 10-fold cross-validation to robustly evaluate the model’s generalizability. Hyperparameters were optimized via grid search with cross-validation, balancing complexity and performance. L1 and L2 regularization (α and λ) were applied to reduce noise fitting. Simpler models, such as Logistic Regression and Decision Trees, were also tested to confirm the improvements were meaningful. These steps ensured accurate, generalizable predictions for KOA progression.

Our predictive model represents a step toward developing tools that could support clinical decision-making for KOA progression, pending further validation in diverse populations and clinical settings. It enables personalized risk assessments, aiding clinicians in early interventions and tailored treatment plans to slow disease progression and reduce invasive procedures. The model also supports patient monitoring for timely care adjustments. While further validation in diverse cohorts is needed, it represents a promising tool for enhancing precision medicine and optimizing resource allocation in KOA management.

In this study, we identified key sources of uncertainty, including parameter uncertainty, data uncertainty, and model structure uncertainty. Parameter uncertainty stems from the variability in hyperparameters, such as max_depth, learning_rate, and λ, which were optimized through grid search and cross-validation. Data uncertainty arises from potential measurement errors in MRI scans, biomarkers, and clinical data, as well as class imbalances and missing data. Model structure uncertainty relates to assumptions made by the XGBOOST algorithm, which may not capture all complex relationships, particularly between biomarkers and imaging features. While we addressed some of these uncertainties through optimization and validation, others, like data collection variations, remain unquantified. Future studies will focus on quantifying these uncertainties and improving the model’s robustness.

There are a few limitations to the present study. First, the MRI sequence (SAG-3D-DESS-WE) we used is not commonly employed in clinical practice, so future studies should include multiple common clinical MRI sequences such as Turbo Spin Echo (TSE). Second, our results need validation in independent populations, specifically, we plan to test the model on independent datasets from other cohorts or institutions, such as Multicenter Osteoarthritis Study (MOST) cohort, which would provide a robust assessment of its generalizability across different populations and settings. Third, our study has not incorporated MRI radiomics of all knee joint structures for developing predictive models. In future research, we plan to include more structures, such as the patella, ligaments, synovium, and muscles. Fourth, while the CNN automatic segmentation model we used is suitable for small sample sizes, it is not the latest deep learning architecture. We aim to introduce updated architectures, like the transformer architecture, to enhance segmentation accuracy. Fifth, the MRI radiomics and machine learning algorithms we use are conducive to model interpretation and understanding by clinicians, but their automation is not high. Future efforts will focus on using deep learning algorithms to establish predictive models and improve automation. Sixth, the impact of outliers on model performance was not assessed, but future studies will conduct robustness checks by excluding or mitigating outliers to evaluate their influence. Seventh, while this study focused on XGBOOST, future work will explore additional methods like Generalized Additive models (GAMs) or spline regression to confirm the stability of findings. Eighth, model stability across demographic subsets (e.g., age, sex) was not analyzed; future studies will test diverse subsets to validate findings across groups. Ninth, We also propose incorporating nested cross-validation in future studies to further enhance methodological rigor. Tenth, Scenario analysis can test model robustness under demographic shifts or new treatments. Future studies will include this to enhance clinical applicability. Eleventh, a limitation of our study is the lack of temporal analysis of feature dynamics over time. Exploring changes in MRI radiomic features, biomarkers, and clinical variables across time points could improve model interpretability and predictive accuracy. Future studies will incorporate temporal analysis to better capture disease progression trajectories. Lastly, the generalizability of our findings to other joints requires further validation.

In conclusion, the integrated model using longitudinal MRI radiomics, biochemical biomarkers, and clinical KOA high-risk factors, improved the prediction performance of KOA progression. LBTRBC-M enhances the accuracy of resident physicians to predict KOA progression and these findings need more validation in future trials.

Ethical approval

All deidentified patient level clinical data, outcome data, and MRI data in our study were obtained from the OAI, an ongoing, multicenter, prospective cohort study designed to identify biomarkers for KOA. The Health Insurance Portability and Accountability Act (HIPAA) compliant protocol of the OAI study had received institutional review board (IRB) approval. The original OAI study protocol was approved by the institutional review boards of all participating centers, including the coordinating center at the University of California, San Francisco (IRB number: 10-00532).

Supporting information

S1 Fig

Flow chart of FNIH OA Biomarkers Consortium cohort study inclusion. FNIH OA Biomarkers Consortium cohort: Foundation of the NIH OsteoArthritis Biomarkers Consortium cohort, MR: Magnetic Resonance, JSW: Joint Space Width, WOMAC: Western Ontario and McMaster Universities Arthritis Index, KLG: Kellgren-Lawrence Grade, BMI: Body Mass Index, BL: Baseline.

(TIF)

pmed.1004665.s001.tif (15MB, tif)
S2 Fig

The MRI segmentation scheme in our study. MRI: Magnetic Resonance Image.

(TIF)

pmed.1004665.s002.tif (20.8MB, tif)
S3 Fig

The results of 10-fold cross-validation to predict knee osteoarthritis progression and contour plot of predicted label probability under actual labels. The quartile of AUC was shown in A. The quartile of AUC in each fold was shown in B. Contour plot of predicted label probability under actual labels in final LBTRBC-M was shown in C. The 10-fold cross-validation were repeated 100 interactions. AUC: Area Under receiver operating characteristic Curve. JSN: Joint Space Narrowing, LBTRBC-M: Load-Bearing Tissue Radiomic plus Biochemical biomarker and Clinical variable Model.

(TIF)

pmed.1004665.s003.tif (1.8MB, tif)
S4 Fig

Feature selection process by LASSO regression in single-structure MRI radiomic models. Panel (A) to (F) show the magnitude of scaled parameter estimates for each model (FE-RM, FC-RM, TI-RM, TC-RM, LM-RM, and MM-RM), which indicate the importance of each MRI radiomic feature in predicting KOA progression. Panels (A), (B), (C), (D), (E), and (F) represent the magnitude of scaled parameter estimates for FE-RM, FC-RM, TI-RM, TC-RM, LM-RM, and MM-RM, respectively. Panels (G) to (L) present the scaled parameter estimates of the same models using the Akaike Information Criterion (AICc) for feature selection, providing an additional measure of model performance and fit. Panels (M) to (R) illustrate the weight of features for each model, showing the relative contribution of each selected feature to the overall predictive power of the model. These results highlight the most influential features in each MRI radiomic model for predicting KOA progression. FE-RM: Femur Radiomic Model, FC-RM: Femoral Cartilage Radiomic Model, TI-RM: Tibia Radiomic Model, TC-RM: Tibial Cartilage Radiomic Model, LM-RM: Lateral Meniscal Radiomic Model, MM-RM: Medial Meniscal Radiomic Model, AICc: Akaike Information Criterion, corrected.

(TIF)

pmed.1004665.s004.tif (20.8MB, tif)
S5 Fig

Feature selection process by LASSO regression in the LBT-RM and LBTRBC-M models. Panel (A) and (B) show the magnitude of scaled parameter estimates for the LBT-RM and the LBTRBC-M, respectively. Panels (C) and (D) represent the AICc-based selection of the most important features for both models, offering an alternative approach to assess the performance and fit of the models. Panels (E) and (F) show the weight of features in LBT-RM and LBTRBC-M, respectively, demonstrating how individual features contribute to the predictive power of each model. These visualizations offer a clearer understanding of the key features selected by LASSO regression and their impact on model performance for predicting KOA progression. LBT-RM: Load-Bearing Tissue Radiomic Model, LBTRBC-M: Load-Bearing Tissue Radiomic plus Biochemical Biomarker and Clinical Variable Model, AICc: Akaike Information Criterion, corrected.

(TIF)

pmed.1004665.s005.tif (17MB, tif)
S6 Fig

The DESS signal feature maps of load-bearing tissues in different groups. DESS signal intensity maps of femur (A–D), femoral cartilage (E–H), tibia (I–L), tibial cartilage (M–P), lateral meniscus (Q–T), medial meniscus (U–X) were developed in four groups. The high values of the femur and tibia were detected in JSN and pain progression, and pain progression group. The high value of femoral cartilage, tibial cartilage, lateral meniscus, and medial meniscus were detected in JSN and pain progression, and JSN progression group. JSN: Joint Space Narrowing, DESS: Double Echo Steady-State.

(TIF)

pmed.1004665.s006.tif (22.1MB, tif)
S7 Fig

The confusion matrix results of single-structure MRI radiomic models in the test cohorts. The confusion matrix of FE-RM (A–D), FC-RM (E–H), TI-RM (I–L), TC-RM (M–P), LM-RM (Q–T), MM-RM (U–X) in the test cohort 1–3 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. FE-RM: Femur Radiomic Model, FC-RM: Femoral Cartilage Radiomic Model, TI-RM: Tibia Radiomic Model, TC-RM: Tibial Cartilage Radiomic Model, LM-RM: Lateral Meniscal Radiomic Model, MM-RM: Medial Meniscal Radiomic Model.

(TIF)

pmed.1004665.s007.tif (26.4MB, tif)
S8 Fig

The comparations of AUC between FE-RM and FE-MOM in predicting KOA progression. The performance of predicting JSN and pain progression (A–D), JSN progression (E–H), pain progression (I–L), and non progression (M–P) in FE-RM and FE-MOM in the test cohort 1–3 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. FE-RM: Femur Radiomic Model, FE-MOM: Femur MOAKS Model, AUC: Area Under receiver operating characteristic Curve, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score.

(TIF)

pmed.1004665.s008.tif (18.2MB, tif)
S9 Fig

The comparations of AUC between FC-RM and FC-MOM in predicting KOA progression. The performance of predicting JSN and pain progression (A–D), JSN progression (E–H), pain progression (I–L), and non progression (M–P) in FC-RM and FC-MOM in the test cohort 1–3 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. FC-RM: Femoral Cartilage Radiomic Model, FC-MOM: Femoral Cartilage MOAKS Model, AUC: Area Under receiver operating characteristic Curve, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score.

(TIF)

pmed.1004665.s009.tif (18.2MB, tif)
S10 Fig

The comparations of AUC between TI-RM and TI-MOM in predicting KOA progression. The performance of predicting JSN and pain progression (A–D), JSN progression (E–H), pain progression (I–L), and non progression (M–P) in TI-RM and TI-MOM in the test cohort 1–3 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. TI-RM: Tibia Radiomic Model, TI-MOM: Tibia MOAKS model, AUC: Area Under receiver operating characteristic Curve, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score.

(TIF)

pmed.1004665.s010.tif (18.2MB, tif)
S11 Fig

The comparations of AUC between TC-RM and TC-MOM in predicting KOA progression. The performance of predicting JSN and pain progression (A–D), JSN progression (E–H), pain progression (I–L), and non progression (M–P) in TC-RM and TC-MOM in the test cohort 1–3 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. TC-RM: Tibial Cartilage Radiomic Model, TC-MOM: Tibial Cartilage MOAKS model, AUC: Area Under receiver operating characteristic Curve, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score.

(TIF)

pmed.1004665.s011.tif (18.2MB, tif)
S12 Fig

The comparations of AUC between LM-RM and LM-MOM in predicting KOA progression. The performance of predicting JSN and pain progression (A–D), JSN progression (E–H), pain progression (I–L), and non progression (M–P) in LM-RM and LM-MOM in the test cohort 1–3 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. LM-RM: Lateral meniscus radiomic model, LM-MOM: Lateral meniscus MOAKS model, AUC: Area Under receiver operating characteristic Curve, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score.

(TIF)

pmed.1004665.s012.tif (18.2MB, tif)
S13 Fig

The comparations of AUC between MM-RM and MM-MOM in predicting KOA progression. The performance of predicting JSN and pain progression (A–D), JSN progression (E–H), pain progression (I–L), and non progression (M–P) in MM-RM and MM-MOM in the test cohort 1–3 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. MM-RM: Medial meniscus radiomic model, MM-MOM: Medial meniscus MOAKS model, AUC: Area Under receiver operating characteristic Curve, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score.

(TIF)

pmed.1004665.s013.tif (18.2MB, tif)
S14 Fig

The confusion matrix results of load-bearing tissue MRI radiomic models in the test cohorts. The confusion matrix of LBT-RM (A–D) and LBTRBC-M (E–H) in the test cohort 1–4 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. LBT-RM: Load-Bearing Tissue Radiomic Model, LBTRBC-M: Load-Bearing Tissue Radiomic plus Biochemical biomarker and Clinical variable Model.

(TIF)

pmed.1004665.s014.tif (18MB, tif)
S15 Fig

Performance of resident physicians and models in predicting the KOA progression in different time point test cohorts. AUC for predicting JSN and pain progression (A–F), JSN progression (G–L), pain progression (M–R), and non progression (S–X) among the LBTRBC-M, LBTMBC-M, and average performance of all resident physicians without (blue dot) and with (red dot) the support of LBTRBC-M in the test cohort 1–3 was demonstrated. As shown in A, C, E, G, I, K, M, O, Q, S, U, and W, both the sensitivity and specificity of resident physicians were improved when aid was supported by LBTRBC-M (black arrow) in the test cohort 1–3. As shown in B, D, F, H, J, L, N, P, R, T, V, and X, the individual performance of resident physicians was represented by open shapes (without LBTRBC-M aid) and filled shapes (with LBTRBC-M aid). The results of test cohort 1, test cohort 2, and test cohort 3 corresponded to baseline, 1-years follow-up, and 2-year follow-up time points. The colored dotted line of yellow, orange, purple, green, blue, teal, and red represented the predictive performance change of Liu, Zhao, Cao, J Li, Chen, X Wang, Dang, and M Zhang, respectively. AUC: Area Under receiver operating characteristic Curve, LBTRBC-M: Load-Bearing Tissue Radiomic plus Biochemical biomarker and Clinical variable Model, LBTMBC-M: Load-Bearing Tissue MOAKS plus Biochemical biomarker and Clinical variable Model, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score, AI: Artificial Intelligence.

(TIF)

pmed.1004665.s015.tif (18.8MB, tif)
S16 Fig

The predictive performance of the LBTRBC-M model using/without using the stratified cross-validation in the total test cohort. (A–B) We implemented a stratified cohort split for the LBTRBC-M model to ensure proportional representation of each KOA progression subtype, maintaining an approximate 2:1:1:2 ratio of JSN and pain progression, JSN progression, pain progression, and non-progression. (C–D) The AUC of LBTRBC-M model using/without using the stratified cross-validation was displayed in the total test cohort. AUC: Area Under receiver operating characteristic Curve, LBTRBC-M: Load-Bearing Tissue Radiomic plus Biochemical biomarker and Clinical variable Model, JSN: Joint Space Narrowing, KOA: Knee Osteoarthritis.

(TIF)

pmed.1004665.s016.tif (1.8MB, tif)
S1 Table

MRI protocol details.

(DOCX)

pmed.1004665.s017.docx (17.8KB, docx)
S2 Table

Baseline characteristics of participants in the development cohort 1 and test cohort 1.

(DOCX)

pmed.1004665.s018.docx (24.6KB, docx)
S3 Table

Baseline biochemical biomarker levels of participants in the development cohort 1 and test cohort 1.

(DOCX)

pmed.1004665.s019.docx (23.2KB, docx)
S4 Table

DSCs for the CNNs automated segmentation and manual adjustment segmentation.

(DOCX)

pmed.1004665.s020.docx (17KB, docx)
S5 Table

Selected features of predictive models in total development cohort.

(DOCX)

pmed.1004665.s021.docx (58KB, docx)
S6 Table

The areas under ROC curves of predictive models in the test cohorts.

(DOCX)

pmed.1004665.s022.docx (37.7KB, docx)
S7 Table

The accuracy of predictive models in the test cohorts.

(DOCX)

pmed.1004665.s023.docx (22.8KB, docx)
S8 Table

Comparing the areas under two correlated ROC curves between predictive models in the test cohorts.

(DOCX)

pmed.1004665.s024.docx (59KB, docx)
S9 Table

Related risks of outcomes for predictive model outputs.

(DOCX)

pmed.1004665.s025.docx (23.2KB, docx)
S10 Table

Predictive performance of resident physicians under the assistance of LBTRBC-M.

(DOCX)

pmed.1004665.s026.docx (37.4KB, docx)
S11 Table

The accuracy of resident physicians under the assistance of LBTRBC-M.

(DOCX)

pmed.1004665.s027.docx (22.5KB, docx)
S12 Table

Predictive performance of resident physicians under the assistance of LBTRBC-M in the test cohorts.

(DOCX)

pmed.1004665.s028.docx (24.4KB, docx)
S13 Table

Related risks of outcomes for LBTRBC-M outputs using different GEE model.

(DOCX)

pmed.1004665.s029.docx (17.3KB, docx)
S14 Table

Compares the predictive performance of the LBTRBC-M model using/without using the MI in the total test cohort.

(DOCX)

pmed.1004665.s030.docx (15.1KB, docx)
S15 Table

Comparing predictive performance of the LBTRBC-M model using different algorithms in the total test cohort.

(DOCX)

pmed.1004665.s031.docx (15.6KB, docx)
S16 Table

The parameters of LBTRBA-M model.

(DOCX)

pmed.1004665.s032.docx (17KB, docx)
S17 Table

The predictive performance of LBTRBC-M using different hyperparameters in the total test cohort.

(DOCX)

pmed.1004665.s033.docx (19.5KB, docx)
S18 Table

Compares the predictive performance of the LBTRBC-M model with different interactions in the total test cohort.

(DOCX)

pmed.1004665.s034.docx (16.4KB, docx)
S19 Table

Compares the predictive performance of the LBTRBC-M model using/without using the stratified cross-validation in the total test cohort.

(DOCX)

pmed.1004665.s035.docx (15.3KB, docx)
S1 Checklist

TRIPODAI checklist.

(PDF)

pmed.1004665.s036.pdf (407.3KB, pdf)

Acknowledgments

We would like to acknowledge the dedication and commitment of the OAI study participants. The OAI is a public-private partnership comprised of five contracts (N01-AR-2-2258; N01-AR-2-2259; N01-AR-2-2260; N01-AR-2-2261; N01-AR-2-2262) funded by the NIH and conducted by the OAI Study Investigators. Private funding partners include Merck Research Laboratories, Novartis Pharmaceuticals Corporation, GlaxoSmithKline, and Pfizer, Private sector funding for the OAI is managed by the foundation for the NIH. This manuscript was prepared using an OAI public use data set (in addition to data obtained within NIH/NIAMS funded ancillary grants) and does not necessarily reflect the opinions or views of the OAI investigators, the NIH, or the private funding partners. Special thanks go to the subjects who made this study possible, the OAI investigators, the Foundation of the NIH Osteoarthritis Biomarkers Consortium investigators, staff, and participants. We used ChatGPT for language editing in this study.

Abbreviations

:

GBD

global burden of disease

JSN

joint space narrowing

KOA

knee osteoarthritis

OAI

osteoarthritis initiative

TBT

trabecular bone texture

TKR

total knee replacement

Data Availability

The data that support the findings of this study are publicly available through the Osteoarthritis Initiative (OAI) repository at https://nda.nih.gov/oai/. De-identified patient-level clinical data, outcome data, and MRI imaging data used in this study can be accessed from this repository. The specific dataset utilized is clearly identifiable upon accessing the repository. Additionally, the source code for the predictive model developed in this study is available at https://github.com/dmlc/xgboost, and a permanently archived version has been deposited in Zenodo at https://doi.org/10.5281/zenodo.15680828.

Funding Statement

This work was supported by the National Key Research & Development Program of China (2023YFE0209700, C.H.D. was funded; https://service.most.gov.cn/), the National Natural Science Foundation of China (Grant No. 82373653, C.H.D. was funded, 82103903, C.H.D. was funded, 82002344, T.W. was funded; https://www.nsfc.gov.cn/), Science and Technology Projects in Guangzhou (Grant No. SL2023A04J02586, C.H.D. was funded; https://kjj.gz.gov.cn/), and the Key development projects of the Sichuan Provincial Science and Technology Plan (Grant No. 2024YFFK0298, S.F.L. was funded; https://kjt.sc.gov.cn/) and the Chengdu Medical Research Project (Grant number: 2024487, S.F.L. was funded; https://cdwjw.chengdu.gov.cn/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Hunter DJ, Bierma-Zeinstra S. Osteoarthritis. Lancet. 2019;393(10182):1745–59. [DOI] [PubMed] [Google Scholar]
  • 2.Latourte A, Kloppenburg M, Richette P. Emerging pharmaceutical therapies for osteoarthritis. Nat Rev Rheumatol. 2020;16(12):673–88. doi: 10.1038/s41584-020-00518-6 [DOI] [PubMed] [Google Scholar]
  • 3.Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet. 2016;388(10053):1545–602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Safiri S, Kolahi A-A, Smith E, Hill C, Bettampadi D, Mansournia MA, et al. Global, regional and national burden of osteoarthritis 1990–2017: a systematic analysis of the Global Burden of Disease Study 2017. Ann Rheum Dis. 2020;79(6):819–28. doi: 10.1136/annrheumdis-2019-216515 [DOI] [PubMed] [Google Scholar]
  • 5.Administration UFD. Osteoarthritis: Structural Endpoints for the Development of Drugs, Devices, and Biological Products for Treatment Guidance for Industry. 2018. [Google Scholar]
  • 6.Weinstein AM, Rome BN, Reichmann WM, Collins JE, Burbine SA, Thornhill TS, et al. Estimating the burden of total knee replacement in the United States. J Bone Joint Surg Am. 2013;95(5):385–92. doi: 10.2106/JBJS.L.00206 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Martel-Pelletier J. Pathophysiology of osteoarthritis. Osteoarthr Cartil. 2004;12(Suppl A):S31–3. [DOI] [PubMed] [Google Scholar]
  • 8.Hunter DJ, Deveza LA, Collins JE, Losina E, Katz JN, Nevitt MC, et al. Multivariable modeling of biomarker data from the phase I Foundation for the National Institutes of Health Osteoarthritis Biomarkers Consortium. Arthritis Care Res (Hoboken). 2022;74(7):1142–53. doi: 10.1002/acr.24557 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Collins JE, Losina E, Nevitt MC, Roemer FW, Guermazi A, Lynch JA, et al. Semiquantitative imaging biomarkers of knee osteoarthritis progression: data from the Foundation for the National Institutes of Health Osteoarthritis Biomarkers Consortium. Arthritis Rheumatol. 2016;68(10):2422–31. doi: 10.1002/art.39731 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kraus VB, Collins JE, Charles HC, Pieper CF, Whitley L, Losina E, et al. Predictive validity of radiographic trabecular bone texture in knee osteoarthritis: the Osteoarthritis Research Society International/Foundation for the National Institutes of Health Osteoarthritis Biomarkers Consortium. Arthritis Rheumatol. 2018;70(1):80–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Sun Y, Deng C, Zhang Z, Ma X, Zhou F, Liu X. Novel nomogram for predicting the progression of osteoarthritis based on 3D-MRI bone shape: data from the FNIH OA biomarkers consortium. BMC Musculoskelet Disord. 2021;22(1):782. doi: 10.1186/s12891-021-04620-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kraus VB, Collins JE, Hargrove D, Losina E, Nevitt M, Katz JN, et al. Predictive validity of biochemical biomarkers in knee osteoarthritis: data from the FNIH OA Biomarkers Consortium. Ann Rheum Dis. 2017;76(1):186–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Panfilov E, Saarakkala S, Nieminen MT, Tiulpin A. Predicting Knee Osteoarthritis Progression from Structural MRI using Deep Learning. IEEE; 2022. [Google Scholar]
  • 14.Almhdie-Imjabbar A, Toumi H, Lespessailles E. Performance of radiological and biochemical biomarkers in predicting radio-symptomatic knee osteoarthritis progression. Biomedicines. 2024;12(3). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Tack A, Ambellan F, Zachow S. Towards novel osteoarthritis biomarkers: multi-criteria evaluation of 46,996 segmented knee MRI data from the osteoarthritis initiative. PLoS One. 2021;16(10):e0258855. doi: 10.1371/journal.pone.0258855 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ambellan F, Tack A, Ehlke M, Zachow S. Automated segmentation of knee bone and cartilage combining statistical shape knowledge and convolutional neural networks: data from the osteoarthritis initiative. Med Image Anal. 2019;52:109–18. doi: 10.1016/j.media.2018.11.009 [DOI] [PubMed] [Google Scholar]
  • 17.Tack A, Mukhopadhyay A, Zachow S. Knee menisci segmentation using convolutional neural networks: data from the osteoarthritis initiative. Osteoarthr Cartil. 2018;26(5):680–8. doi: 10.1016/j.joca.2018.02.907 [DOI] [PubMed] [Google Scholar]
  • 18.Hunter DJ, Nevitt M, Losina E, Kraus V. Biomarkers for osteoarthritis: current position and steps towards further validation. Best Pract Res Clin Rheumatol. 2014;28(1):61–71. doi: 10.1016/j.berh.2014.01.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Neumann G, Hunter D, Nevitt M, Chibnik LB, Kwoh K, Chen H, et al. Location specific radiographic joint space width for osteoarthritis progression. Osteoarthr Cartil. 2009;17(6):761–5. doi: 10.1016/j.joca.2008.11.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kraus VB, Hargrove DE, Hunter DJ, Renner JB, Jordan JM. Establishment of reference intervals for osteoarthritis-related soluble biomarkers: the FNIH/OARSI OA Biomarkers Consortium. Ann Rheum Dis. 2017;76(1):179–85. [DOI] [PubMed] [Google Scholar]
  • 21.Kellgren JH, Lawrence JS. Radiological assessment of osteo-arthrosis. Ann Rheum Dis. 1957;16(4):494–502. doi: 10.1136/ard.16.4.494 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wirth W, Duryea J, Hellio Le Graverand M-P, John MR, Nevitt M, Buck RJ, et al. Direct comparison of fixed flexion, radiography and MRI in knee osteoarthritis: responsiveness data from the osteoarthritis initiative. Osteoarthr Cartil. 2013;21(1):117–25. doi: 10.1016/j.joca.2012.10.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Peterfy CG, Schneider E, Nevitt M. The osteoarthritis initiative: report on the design rationale for the magnetic resonance imaging protocol for the knee. Osteoarthr Cartil. 2008;16(12):1433–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Stehling C, Liebl H, Krug R, Lane NE, Nevitt MC, Lynch J, et al. Patellar cartilage: T2 values and morphologic abnormalities at 3.0-T MR imaging in relation to physical activity in asymptomatic subjects from the osteoarthritis initiative. Radiology. 2010;254(2):509–20. doi: 10.1148/radiol.09090596 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Pan J, Stehling C, Muller-Hocker C, Schwaiger BJ, Lynch J, McCulloch CE, et al. Vastus lateralis/vastus medialis cross-sectional area ratio impacts presence and degree of knee joint abnormalities and cartilage T2 determined with 3T MRI – an analysis from the incidence cohort of the osteoarthritis initiative. Osteoarthr Cartil. 2011;19(1):65–73. doi: 10.1016/j.joca.2010.10.023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Li S, Cao P, Li J, Chen T, Luo P, Ruan G, et al. Integrating radiomics and neural networks for knee osteoarthritis incidence prediction. Arthritis Rheumatology. 2024. [DOI] [PubMed] [Google Scholar]
  • 27.Zwanenburg A, Vallières M, Abdalah MA, Aerts HJWL, Andrearczyk V, Apte A, et al. The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology. 2020;295(2):328–38. doi: 10.1148/radiol.2020191145 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ashrafinia S. Quantitative Nuclear Medicine Imaging using Advanced Image Reconstruction and Radiomics. The Johns Hopkins University; 2019. [Google Scholar]
  • 29.Hastie T, Tibshirani R, Friedman JH, Friedman JH. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer; 2009. [Google Scholar]
  • 30.Carrington AM, Manuel DG, Fieguth PW, Ramsay T, Osmani V, Wernly B, et al. Deep ROC analysis and AUC as balanced average accuracy, for improved classifier selection, audit and explanation. IEEE Trans Pattern Anal Mach Intell. 2023;45(1):329–41. doi: 10.1109/TPAMI.2022.3145392 [DOI] [PubMed] [Google Scholar]
  • 31.Eaton CB, Sayeed M, Ameernaz S, Roberts MB, Maynard JD, Driban JB, et al. Sex differences in the association of skin advanced glycation endproducts with knee osteoarthritis progression. Arthritis Res Ther. 2017;19(1):36. doi: 10.1186/s13075-017-1226-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Vina ER, Ran D, Ashbeck EL, Kwoh CK. Natural history of pain and disability among African–Americans and Whites with or at risk for knee osteoarthritis: A longitudinal study. Osteoarthr Cartil. 2018;26(4):471–9. doi: 10.1016/j.joca.2018.01.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Zhang C, Zhuang Z, Chen X, Li K, Lin T, Pang F, et al. Osteoporosis is associated with varus deformity in postmenopausal women with knee osteoarthritis: a cross-sectional study. BMC Musculoskelet Disord. 2021;22(1):694. doi: 10.1186/s12891-021-04580-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Driban JB, Price LL, Eaton CB, Lu B, Lo GH, Lapane KL, et al. Individuals with incident accelerated knee osteoarthritis have greater pain than those with common knee osteoarthritis progression: data from the Osteoarthritis Initiative. Clin Rheumatol. 2016;35(6):1565–71. doi: 10.1007/s10067-015-3128-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Hu J, Zheng C, Yu Q, Zhong L, Yu K, Chen Y, et al. DeepKOA: a deep-learning model for predicting progression in knee osteoarthritis using multimodal magnetic resonance images from the osteoarthritis initiative. Quant Imaging Med Surg. 2023;13(8):4852–66. doi: 10.21037/qims-22-1251 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Pishgar F, Guermazi A, Roemer FW, Link TM, Demehri S. Conventional MRI-based subchondral trabecular biomarkers as predictors of knee osteoarthritis progression: data from the Osteoarthritis Initiative. Eur Radiol. 2021;31(6):3564–73. doi: 10.1007/s00330-020-07512-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Roemer FW, Collins JE, Neogi T, Crema MD, Guermazi A. Association of knee OA structural phenotypes to risk for progression: a secondary analysis from the Foundation for National Institutes of Health Osteoarthritis Biomarkers study (FNIH). Osteoarthritis Cartilage. 2020;28(9):1220–8. doi: 10.1016/j.joca.2020.05.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Hunter D, Nevitt M, Lynch J, Kraus VB, Katz JN, Collins JE, et al. Longitudinal validation of periarticular bone area and 3D shape as biomarkers for knee OA progression? Data from the FNIH OA Biomarkers Consortium. Ann Rheum Dis. 2016;75(9):1607–14. doi: 10.1136/annrheumdis-2015-207602 [DOI] [PubMed] [Google Scholar]
  • 39.Eckstein F, Collins JE, Nevitt MC, Lynch JA, Kraus VB, Katz JN, et al. Brief report: cartilage thickness change as an imaging biomarker of knee osteoarthritis progression: data from the Foundation for the National Institutes of Health Osteoarthritis Biomarkers Consortium. Arthritis Rheumatol. 2015;67(12):3184–9. doi: 10.1002/art.39324 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Hafezi-Nejad N, Guermazi A, Roemer FW, Hunter DJ, Dam EB, Zikria B, et al. Prediction of medial tibiofemoral compartment joint space loss progression using volumetric cartilage measurements: data from the FNIH OA biomarkers consortium. Eur Radiol. 2017;27(2):464–73. [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Alexandra Tosun

Dear Dr Li,

Thank you for submitting your manuscript entitled "Predicting KOA Progression: Integrating Neural network, Longitudinal MRI Radiomics, and Biochemical Biomarkers" for consideration by PLOS Medicine.

Your manuscript has now been evaluated by the PLOS Medicine editorial staff and I am writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Please re-submit your manuscript within two working days, i.e. by Aug 22 2024.

Login to Editorial Manager here: https://www.editorialmanager.com/pmedicine

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. Once your manuscript has passed all checks it will be sent out for review.

Feel free to email me at atosun@plos.org or us at plosmedicine@plos.org if you have any queries relating to your submission.

Kind regards,

Alexandra Tosun, PhD

Associate Editor

PLOS Medicine

Decision Letter 1

Alexandra Tosun

Dear Dr Li,

Many thanks for submitting your manuscript "Predicting KOA Progression: Integrating Neural network, Longitudinal MRI Radiomics, and Biochemical Biomarkers" (PMEDICINE-D-24-02732R1) to PLOS Medicine. The paper has been reviewed by subject experts and a statistician; their comments are included below and can also be accessed here: [LINK]

As you will see, the reviewers find the study interesting, but point out the lack of detail and the need to clarify the methodology. After discussing the paper with the editorial team and an academic editor with relevant expertise, I'm pleased to invite you to revise the paper in response to the reviewers' comments. We plan to send the revised paper to some or all of the original reviewers, and we cannot provide any guarantees at this stage regarding publication.

When you upload your revision, please include a point-by-point response that addresses all of the reviewer and editorial points, indicating the changes made in the manuscript and either an excerpt of the revised text or the location (eg: page and line number) where each change can be found. Please also be sure to check the general editorial comments at the end of this letter and include these in your point-by-point response. When you resubmit your paper, please include a clean version of the paper as the main article file and a version with changes tracked as a marked-up manuscript. It may also be helpful to check the guidelines for revised papers at http://journals.plos.org/plosmedicine/s/revising-your-manuscript for any that apply to your paper.

We ask that you submit your revision by Dec 19 2024. However, if this deadline is not feasible, please contact me by email, and we can discuss a suitable alternative.

Don't hesitate to contact me directly with any questions (atosun@plos.org).

Best regards,

Alexandra

Alexandra Tosun, PhD

Associate Editor

PLOS Medicine

atosun@plos.org

-----------------------------------------------------------

Comments from the reviewers:

Reviewer #1: 1. The use of various statistical tests (e.g., unpaired t-test, one-way ANOVA, χ² test, Mann-Whitney test, Kruskal-Wallis test) for comparing different groups is appropriate. However, the rationale behind selecting specific tests for continuous versus categorical variables should be explicitly stated. Clarifying why a parametric or non-parametric test was chosen, based on the data distribution, would enhance the transparency of the methodology.

2. The use of Generalized Estimating Equations (GEE) for analyzing correlated data, such as repeated measures, is appropriate. However, more details should be provided on the correlation structure assumed in the GEE model. Additionally, it would be beneficial to specify how the model handles missing data and whether any sensitivity analyses were conducted to assess the robustness of the findings.

3. The application of LASSO for feature selection is valid, but further details could be provided on how the optimal penalty parameter (lambda) was chosen.

4. 10-fold cross-validation method used to validate model performance is appropriate. However, the manuscript should describe whether stratified cross-validation was applied, particularly when dealing with imbalanced datasets, as this could significantly impact the performance metrics.

5. Using LASSO regression and cross-validation for both feature selection and model validation carries a risk of overfitting, especially if the same data is used for both processes. The authors should clarify how they mitigated feature selection bias, potentially through techniques like nested cross-validation or using a separate validation set.

6. The manuscript mentions multiple tests, but it lacks a clear indication of how multiple comparisons were managed to control for Type I error. If multiple comparisons were indeed made, methods such as Bonferroni correction or false discovery rate (FDR) control should be discussed to ensure the robustness of the findings.

7. The ROC analysis and DeLong test are standard methods for comparing the performance of predictive models. The authors should ensure that the interpretation of AUC, sensitivity, specificity, and kappa values is clear and tied to the clinical relevance of the findings. Including confidence intervals for the AUC values would also provide a sense of the precision of these estimates.

8. It is essential to check and report whether the assumptions underlying each statistical model were verified. For example, for LASSO regression, assumptions related to the distribution of errors and the linearity of relationships should be considered. Similarly, for GEE models, the choice of correlation structure (e.g., exchangeable, autoregressive) should be justified based on the data characteristics.

9. While cross-validation was employed, it would be beneficial to discuss any plans or considerations for external validation. If external datasets are available, they should be used to validate the model's performance, as external validation is crucial for assessing the generalizability of predictive models.

10. While the manuscript briefly mentions the handling of missing data, a more detailed discussion is warranted. The authors should specify how missing data were handled (e.g., imputation methods, exclusion) and whether any sensitivity analyses were performed to assess the impact of missing data on the study's findings.

11. The methods section should discuss how potential confounders were addressed in the statistical models. For instance, were any variables included to adjust for confounding? If so, how were these variables selected?

12. The manuscript does not mention whether interaction terms were considered in the models. Given the complexity of the relationships between radiomic features, biomarkers, and clinical variables, it might be worthwhile to explore interaction terms to capture any synergistic effects. If interactions were tested but found not to be significant, this should be clearly stated.

13. If there are subgroups within the data (e.g., based on demographic variables like age or sex), the authors might consider stratified analysis or clustering techniques to explore whether the model performs differently across these subgroups.

14. While LASSO and XGBOOST can handle some non-linearity, it might be worth exploring whether more explicitly non-linear modeling approaches (e.g., spline regression, generalized additive models) could provide better fits for certain variables, especially those that exhibit complex relationships with the outcome.

15. Given that XGBOOST and LASSO involve the selection of hyperparameters, the manuscript should discuss how sensitive the results are to different hyperparameter choices. Grid search or Bayesian optimization methods for hyperparameter tuning could be mentioned to ensure that the model is optimally configured.

16. While XGBOOST is a powerful algorithm, the authors might consider discussing why this specific method was chosen over other machine learning techniques (e.g., Random Forest, Support Vector Machines, Neural Networks). A brief comparison or justification could strengthen the manuscript's methodological rigor.

17. While the manuscript discusses techniques like cross-validation, it would be prudent to further elaborate on how the risk of overfitting was mitigated, particularly given the complexity of models like XGBOOST. The discussion could include the importance of balancing model complexity with simplicity in design.

18. Authors could include robustness checks to test the stability of their findings under different assumptions or subsets of data. For example, re-running analyses after excluding outliers, using different modeling techniques, or testing the model on various subsets of data could provide additional confidence in the results.

19. Given the complexity of the models, the authors should discuss how uncertainty in the model predictions is quantified and reported. This could include confidence intervals, prediction intervals, or Bayesian approaches that provide a probabilistic interpretation of model outputs.

20. The manuscript could benefit from a discussion on scenario analysis, where the model's predictions are tested under different hypothetical scenarios. This approach could help in understanding how changes in key variables (e.g., a sudden change in patient demographics or the introduction of a new treatment) might impact predictions.

21. While the manuscript compares the proposed model with other models, a more detailed comparative analysis with traditional statistical methods or simpler machine learning models (e.g., logistic regression, decision trees) could provide insights into the added value of the more complex modeling approaches used.

22. The manuscript discusses sensitivity and specificity, but it would be valuable to delve deeper into the trade-offs between these metrics. For instance, depending on the clinical application, different thresholds might be appropriate, and ROC curves or precision-recall curves could be used to explore these trade-offs in more detail.

23. The statistical analysis should be carefully interpreted within the context of clinical significance, not just statistical significance. The authors should avoid over-reliance on p-values and instead emphasize effect sizes and confidence intervals.

Additional comments:

24. The statistical analyses were performed using R (version 4.1.1) and SAS (version 9.4). The authors should ensure that all relevant packages (e.g., pROC in R) and their versions are cited. This is crucial for ensuring the reproducibility of the analyses.

25. It is commendable that the authors have made their data and source code publicly available. However, they should also ensure that all the scripts used for the analysis are well-documented and that a clear workflow is provided. This would facilitate others in reproducing the results.

26. Ensure that all figures and their legends in the supplementary materials are self-explanatory. For example, Figure S4 and Figure S5 detail the feature selection process using LASSO regression, but the legends could benefit from a more detailed explanation of the significance of the selected features and their impact on model performance. Adding more context would help readers who may not be intimately familiar with LASSO or radiomic features.

27. Supplementary tables, such as Table S2 and Table S3, contain extensive data on baseline characteristics and biomarker levels. Each table should have clear annotations explaining what each biomarker or clinical variable represents, especially for an interdisciplinary audience. For instance, terms like "sCOMP" or "sHA" should be briefly described or referenced to where a description can be found.

28. Including a workflow diagram in the supplementary materials could aid in understanding complex processes. For example, a diagram showing the steps from raw MRI data through to the final model could be beneficial, especially in understanding the MRI segmentation scheme outlined in Figure S2.

Reviewer #2: Osteoarthritis (OA) is the most frequent musculoskeletal disease and the knee OA is the most frequent symptomatic site involved.

In this paper authors aimed to provide a new model of KOA progression prediction using both clinical, biochemical and imaging parameters based on the FNIH biomarker consortium.

This paper is relevant and contributive to the field however there are some issues which must be revised before the paper can be accepted.

Major issues:

1. It is quite unusual to see modeling processes with 25,000 repetitions for cross-validation. Typically, standard practices involve around 100 iterations, with 300 iterations being a common upper limit unless the data complexity demands more. This excessive number of iterations might lead to diminishing returns and could indicate overfitting if not carefully monitored. It therefore would be beneficial for the authors to clarify the rationale behind such a high number of repetitions. They should explain how this approach improves predictive performance and whether they observed any significant differences in outcomes compared to more conventional methods. This would help strengthen their methodology section and provide a clearer justification for their choices.

2. The extensive iterations used in both the cross-validation and feature selection processes lend credibility to the predictive modeling efforts in the study. However, it is crucial to address the computational feasibility and interpret the potential implications of such high iteration counts. Overall, these aspects contribute to the manuscript's strength and could be emphasized further to showcase the rigor of the research methodology.

3. Clarification on Feature Selection Process: The methodology mentions using LASSO for feature selection before applying the XGBOOST algorithm. Providing more details on the feature selection process, including criteria for inclusion or exclusion of features, would improve transparency and reproducibility.

4. Temporal Analysis and Model Interpretation: Given the longitudinal nature of the data, discussing how changes over time influence the predictive model could be beneficial. Analyzing the temporal dynamics of specific features and their relationship with KOA progression would provide deeper insights into the disease's progression and the model's interpretability.

5. Consideration of Clinical Relevance: Discussing the clinical implications of their predictive model, including how it could be applied in real-world settings and its potential impact on patient management, would add significant value to the manuscript.

6. A more comprehensive review of the literature, citing prominent groups and contextualizing this study among both classic and recent research, would enhance the scientific rigor and relevance of the manuscript.

The lack of citations from influential groups such as Saarakkala or Lespessailles raises concerns about comprehensiveness in the literature review.

Minor issues:

1. P7, second paragraph: "The main analysis focused on comparing knees with both JSN

and pain progression to all other knees. (16)". I wonder if ref (16) is appropriate because it is an old reference (2014) and a position paper rather than an original article.

2. P8, again, (ref18) reported the effects of sprifermin is a clinical trial and thus is not appropriate.

3. P9, ref (19) is the main reference to describe the KL method but do not explain in the present study how and who analyses the radiographs.

4. P18, BCM, this has not been defined previously.

5. P19, in fact, a recent paper has investigated the combination of biomechanical and radiological biomarkers to predict KOA progression in the FNIH dataset (https://doi.org/10.3390/biomedicines12030666 Performance of Radiological and Biochemical Biomarkers in Predicting Radio-Symptomatic Knee Osteoarthritis Progression- Biomedicines-2024).

6. P19, authors wrote « Similarly, combining MRI and radiographic markers resulted in an AUC of 0.718, which increased to 0.722 with the addition of biochemical biomarkers.(8) ». However moving from 0.718 to 0.722, was this numerically better AUC statistically and clinically significant?

Reviewer #3: In this paper, the authors propose a diagnostic and prognostic model for knee osteoarthritis progression. Using data from the Osteoarthritis Initiative (OAI) and advanced modeling techniques like XGBoost, convolutional neural networks, and longitudinal MRI radiomics, the study demonstrates the potential to improve prediction accuracy for joint space narrowing and pain progression. While this work is innovative and aligns with the journal's scope, significant revisions are necessary before publication.

-The introduction lacks a clear articulation of the study's specific goals and the gaps it aims to address.

-While the background is comprehensive, it could be condensed to focus more on relevant prior studies and their shortcomings.

-Please provide more details on the statistical and machine learning methodologies, such as the rationale behind using LASSO logistic regression for feature selection and its effect on model interpretability

-Please, explain why a 1:1 split between the development and test cohorts was used, and discuss how this split ensures robust external validation. How does this approach to cohort splitting compare to alternative validation strategies, such as k-fold cross-validation, in terms of generalizability?

-While the results are robust and well-documented, the authors should clarify : What defines a "successful" prediction? (e.g., accuracy threshold or clinical utility).

- In the Discussion Section please include a more explicit comparison with recent literature, highlighting where this study excels or falls short.

- While the discussion mentions the potential utility of the model for resident physicians, more emphasis is needed on real-world implementation barriers (e.g., cost, time, and training).

- In some sections the English language needs to be polished to improve fluency and clarity

Any attachments provided with reviews can be seen via the following link: [LINK]

--------------------------------------------------------- ---

General editorial requests:

(Note: not all will apply to your paper, but please check each item carefully)

* We ask every co-author listed on the manuscript to fill in a contributing author statement, making sure to declare all competing interests. If any of the co-authors have not filled in the statement, we will remind them to do so when the paper is revised. If all statements are not completed in a timely fashion this could hold up the re-review process. If new competing interests are declared later in the revision process, this may also hold up the submission. Should there be a problem getting one of your co-authors to fill in a statement we will be in contact. Please do not add or remove authors without first discussing this with the handling editor. You can see our competing interests policy here: http://journals.plos.org/plosmedicine/s/competing-interests.

* Please upload any figures associated with your paper as individual TIF or EPS files with 300dpi resolution at resubmission; please read our figure guidelines for more information on our requirements: http://journals.plos.org/plosmedicine/s/figures. While revising your submission, please upload your figure files to the PACE digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at PLOSMedicine@plos.org.

* Please ensure that the paper adheres to the PLOS Data Availability Policy (see http://journals.plos.org/plosmedicine/s/data-availability), which requires that all data underlying the study's findings be provided in a repository or as Supporting Information. For data residing with a third party, authors are required to provide instructions with contact information (web or email address) for obtaining the data. Please note that a study author cannot be the contact person for the data. PLOS journals do not allow statements supported by "data not shown" or "unpublished results." For such statements, authors must provide supporting data or cite public sources that include it.

* We expect all researchers with submissions to PLOS in which author-generated code underpins the findings in the manuscript to make all author-generated code available without restrictions upon publication of the work. In cases where code is central to the manuscript, we may require the code to be made available as a condition of publication. Authors are responsible for ensuring that the code is reusable and well documented. Please make any custom code available, either as part of your data deposition or as a supplementary file. Please add a sentence to your data availability statement regarding any code used in the study, e.g. "The code used in the analysis is available from Github [URL] and archived in Zenodo [DOI link]" Please review our guidelines at https://journals.plos.org/plosmedicine/s/materials-software-and-code-sharing and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. Because Github depositions can be readily changed or deleted, we encourage you to make a permanent DOI'd copy (e.g. in Zenodo) and provide the URL.

* FINANCIAL DISCLOSURES: The funding statement should include: specific grant numbers, initials of authors who received each award, URLs to sponsors’ websites. Also, please state whether any sponsors or funders (other than the named authors) played any role in study design, data collection and analysis, the decision to publish, or preparation of the manuscript. If they had no role in the research, include this sentence: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

* DATA AVAILABILITY: The Data Availability Statement (DAS) requires revision. For each data source used in your study:

a) If the data are freely or publicly available, note this and state the location of the data: within the paper, in Supporting Information files, or in a public repository (include the DOI or accession number).

b) If the data are owned by a third party but freely available upon request, please note this and state the owner of the data set and contact information for data requests (web or email address). Note that a study author cannot be the contact person for the data.

c) If the data are not freely available, please describe briefly the ethical, legal, or contractual restriction that prevents you from sharing it. Please also include an appropriate contact (web or email address) for inquiries (again, this cannot be a study author).

* ETHICS STATEMENT: Please provide the name(s) of the institutional review board(s) that provided ethical approval. Please specify whether informed consent was written or oral.

FORMATTING - GENERAL

* Abstract: Please structure your abstract using the PLOS Medicine headings (Background, Methods and Findings, Conclusions). Please combine the Methods and Findings sections into one section.

* At this stage, we ask that you include a short, non-technical Author Summary of your research to make findings accessible to a wide audience that includes both scientists and non-scientists. The Author Summary should immediately follow the Abstract in your revised manuscript. This text is subject to editorial change and should be distinct from the scientific abstract. Ideally each sub-heading should contain 2-3 single sentence, concise bullet points containing the most salient points from your study. In the final bullet point of 'What Do These Findings Mean?', please include the main limitations of the study in non-technical language. Please see our author guidelines for more information: https://journals.plos.org/plosmedicine/s/revising-your-manuscript#loc-author-summary.

* Please express the main results with 95% CIs as well as p values. When reporting p values please report as p<0.001 and where higher as the exact p value p=0.002, for example. Throughout, suggest reporting statistical information as follows to improve clarity for the reader "22% (95% CI [13%,28%]; p</=)". Please be sure to define all numerical values at first use.

* Please include page numbers and line numbers in the manuscript file. Use continuous line numbers (do not restart the numbering on each page).

* Please cite the reference numbers in square brackets. Citations should precede punctuation.

FIGURES AND TABLES

* Please provide titles and legends for all figures and tables (including those in Supporting Information files).

* Please define all abbreviations used in each figure/table (including those in Supporting Information files).

* Please consider avoiding the use of red and green in order to make your figure more accessible to those with color blindness.

SUPPLEMENTARY MATERIAL

* Please note that supplementary material will be posted as supplied by the authors. Therefore, please amend it according to the relevant comments outlined here.

* Please cite your Supporting Information as outlined here: https://journals.plos.org/plosmedicine/s/supporting-information

REFERENCES

* PLOS uses the numbered citation (citation-sequence) method and first six authors, et al.

* Please ensure that journal name abbreviations match those found in the National Center for Biotechnology Information (NCBI) databases (http://www.ncbi.nlm.nih.gov/nlmcatalog/journals), and are appropriately formatted and capitalised.

* Where website addresses are cited, please include the complete URL and specify the date of access (e.g. [accessed: 12/06/2024]).

* Please also see https://journals.plos.org/plosmedicine/s/submission-guidelines#loc-references for further details on reference formatting.

STUDY TYPE-SPECIFIC REQUESTS

The following list is derived from Geoffrey P Garnett, Simon Cousens, Timothy B Hallett, Richard Steketee, Neff Walker. Mathematical models in the evaluation of health programmes. (2011) Lancet DOI:10.1016/S0140-6736(10)61505-X:

* If pertinent, please provide a diagram that shows the model structure, including how the natural history of the disease is represented, the process and determinants of disease acquisition, and how the putative intervention could affect the system.

* Please provide a complete list of model parameters, including clear and precise descriptions of the meaning of each parameter, together with the values or ranges for each, with justification or the primary source cited and important caveats about the use of these values noted.

* Please provide a clear statement about how the model was fitted to the data, including goodness-of-fit measure, the numerical algorithm used, which parameter varied, constraints imposed on parameter values, and starting conditions.

* For uncertainty analyses, please state the sources of uncertainties quantified and not quantified [can include parameter, data, and model structure].

* Please provide sensitivity analyses to identify which parameter values are most important in the model. Uncertainty estimates seek to derive a range of credible results on the basis of an exploration of the range of reasonable parameter values. The choice of method should be presented and justified.

* Please discuss the scientific rationale for the choice of model structure and identify points where this choice could influence conclusions drawn. Please also describe the strength of the scientific basis underlying the key model assumptions.

* For studies that develop a prediction model or evaluate its performance, please ensure that the study is reported according to the TRIPOD statement (https://www.equator-network.org/reporting-guidelines/tripod-statement) and include the completed checklist as Supporting Information. Please add the following statement, or similar, to the Methods: "This study is reported as per the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis (TRIPOD) statement (S1 Checklist)." For studies using machine learning, please use the TRIPOD-AI checklist. When completing the checklist, please use section and paragraph numbers, rather than page numbers.

Decision Letter 2

Alexandra Tosun

Dear Dr Li,

Many thanks for re-submitting your manuscript "Predicting KOA Progression: Integrating Neural network, Longitudinal MRI Radiomics, and Biochemical Biomarkers" (PMEDICINE-D-24-02732R2) to PLOS Medicine. The paper has been seen again by one subject expert and the statistician; their comments are included below and can also be accessed here: [LINK]

Thank you for your response to the reviewers' comments. As you can see, while the statistical reviewer is satisfied with your responses to their comments, the subject-matter reviewer still has concerns about your study. Please note that any changes you make in response to the reviewer and/or editorial comments must be within the text and provide full clarity to the reviewers for us to consider the manuscript further. It's not sufficient (in all cases) to address the reviewer comments by adding them as a limitation. After discussing the paper with the editorial team, we ask you to carefully and robustly address the comments in a further revision. We plan to send the revised paper to some or all of the original reviewers.

When you upload your revision, please include a point-by-point response that addresses all of the reviewer and editorial points, indicating the changes made in the manuscript and either an excerpt of the revised text or the location (eg: page and line number) where each change can be found. Please also be sure to check the general editorial comments at the end of this letter and include these in your point-by-point response. When you resubmit your paper, please include a clean version of the paper as the main article file and a version with changes tracked as a marked-up manuscript. It may also be helpful to check the guidelines for revised papers at http://journals.plos.org/plosmedicine/s/revising-your-manuscript for any that apply to your paper.

We ask that you submit your revision by Mar 07 2025. However, if this deadline is not feasible, please contact me by email, and we can discuss a suitable alternative.

Don't hesitate to contact me directly with any questions (atosun@plos.org).

Best regards,

Alexandra

Alexandra Tosun, PhD

Associate Editor

PLOS Medicine

atosun@plos.org

-----------------------------------------------------------

Comments from the reviewers:

Reviewer #1: Thank you for thoroughly addressing all my comments and providing detailed clarifications in your revised manuscript. I appreciate the effort you have put into enhancing the rigor and transparency of your analyses.

The manuscript now reflects a robust and valuable contribution to subject, I am happy to recommend this work for publication and look forward to seeing its positive impact in the field.

Reviewer #2: R1- We thank the authors for clarifying their rationale for performing 25,000 repetitions of cross-validation. While ensuring stability in performance estimates is important, the justification for such an unusually high number remains unclear, especially since they acknowledge that predictive accuracy plateaued before reaching this level. Beyond that point, the additional computational burden becomes difficult to justify. A quantitative analysis demonstrating how variance decreases across iterations would strengthen their argument. Their plan to explore fewer iterations in future studies is good, but a more systematic justification for their current approach is essential to enhance the study's methodological transparency and rigor. So we request the authors to include this justification in the current manuscript.

R2- Once again, we thank the authors for recognizing the significance of clarifying the implications of their high number of iterations, especially with relation to computing feasibility. Although they use robustness to support their decision, further explanation is required to properly address methodological trade-offs and feasibility.

The computational feasibility is still unclear. Cross-validation with 25,000 iterations can be computationally challenging, particularly when dealing with high-dimensional data. To show that their method is practical and scalable, it would be beneficial if the authors provide information on hardware specs, runtime, and resource needs.

The authors also do not provide quantitative proof to support the need for 25,000 iterations, even though they claim that less iterations (such as 100-300) resulted in higher variability in performance indicators. A statistical analysis (such as a graphic showing the relationship between variability) would show clearly where performance metrics stabilize and whether this high number of iterations is justified.

R3- We appreciate the authors' detailed clarification of the LASSO feature selection process, which greatly enhances the clarity of their methodology. As an optional improvement, we suggest including a comparison between the number of features initially considered and those retained after LASSO selection.

R4- Authors response acknowledges the limitation regarding the lack of temporal analysis of feature dynamics over time. Addressing it as a limitation in their discussion section is sufficient.

R5- We appreciate the authors' detailed response concerning the clinical relevance of their predictive model.

R6- Although the authors have cited 3 recent papers from 3 different research teams, the conclusions drawn about their results are not entirely fair and balanced. Specifically, they state, 'Compared to these studies, our model (LBTRBC-M) achieved higher AUC values (0.869-0.920) by integrating load-bearing tissue MRI radiomics, biochemical biomarkers, and clinical variables. This suggests a more comprehensive risk assessment and superior predictive accuracy.' However, the results from their study and those from the other cited studies cannot be directly compared, as the datasets and methodologies used in those studies differ. Comparing AUC values is only relevant in the context of 'challenge' studies, where different teams use the same dataset and aim for the same goal, as seen in the KOA progression prediction challenge (KNOAP2020, Ref.1). Therefore, the sentence from Line 526 to Line 529 should be deleted.

Ref.1: Hirvasniemi et al., The KNee OsteoArthritis Prediction (KNOAP2020) challenge: An image analysis challenge to predict incident symptomatic radiographic knee osteoarthritis from MRI and X-ray images, Osteoarthritis Cartilage (2023), 31(1):115-125. doi:10.1016/j.joca.2022.10.001

Furthermore, I have several additional concerns that the authors should address before I can give positive consideration for publication:

Major concerns:

1) Although authors indicate that both TUKEY'S honestly significant difference and Bonferroni correction were applied to control for type I error due to multiple comparisons, they should provide to the reader the nominal P value which is now considered as statistically significant in their work. Thus, the P value indicated line 375 in the statistical analysis section is probably wrong due to the numerous comparisons made in the present study.

2) While DeLong's test is widely recognized as a non-parametric method for comparing the AUCs of different models, it does not fully account for the shared variance between nested models, as in this study. This limitation can lead to biased or overly conservative results (Demler et al., 2012; Ref. 1). To address this issue, the authors should consider using alternative approaches, such as bootstrap resampling or permutation-based testing.

Ref.1: Demler et al., Misuse of DeLong test to compare AUCs for nested models. Statistics in medicine, 31.23 (2012): 2577-2587

3) Line 563-566: Typically, the clinical relevance of results is discussed when a statistically significant difference is observed. However, if the effect size is limited, the clinical relevance may be considered poor. In this study, the authors take a different approach, suggesting that a non-statistically significant result may still hold clinical relevance. I find this reasoning inappropriate, as statistical significance is a fundamental criterion for establishing meaningful clinical interpretation. Moreover, if the authors wish to argue for clinical relevance despite the absence of statistical significance, they should provide additional supporting evidence through other clinical metrics or relevant justifications, like PPV and NPV.

4) Lines 327-329: The authors chose a 25% threshold for displaying results in red font based on the output of the LBTRBC-M model. However, this threshold seems somewhat arbitrary without further justification. It would be beneficial for the authors to provide a rationale for selecting 25% as the cut-off point, as this decision could impact on the interpretation of the results.

5) Line 599: The authors did not implement stratified cross-validation, which significantly affects the validity of their comparisons to other studies. Without stratification, the distribution of cases and controls across folds may be highly imbalanced, leading to misleading model performance metrics. In particular, folds with a much lower Case/Control ratio can artificially inflate the AUC, as the model may achieve high accuracy primarily by correctly predicting the majority class (controls) rather than effectively distinguishing progressors. This can create an overestimation of the model's discriminatory power and reduce the reliability of the results. To ensure a fair and meaningful comparison with other studies and to obtain robust performance estimates, the authors should implement stratified cross-validation before final acceptance. This will help maintain a consistent class distribution across folds, preventing potential bias in model evaluation. Acknowledging the lack of stratification in the cross validation process as the authors do in their revised manuscript is not sufficient for quality of this kind of study.

Minor concerns:

1) Throughout the paper, the authors should consistently specify that they are using "MRI radiomics". When 'radiomics' is mentioned without this clarification, readers may assume that X-ray radiomics was also studied, which is not the case.

2) The authors do not report the 95% confidence interval (CI) values for the AUC throughout the paper. Reporting the 95% CI is essential for assessing the precision and reliability of the AUC estimates. Without these intervals, it is difficult to evaluate the statistical significance and the robustness of the model's performance. I strongly recommend including the 95% CI for AUC to provide a more comprehensive interpretation of the results.

3) Line 154: The period of KOA progression should be specified upon its first mention. For example, it should be stated as 'KOA progression over 4 years' or 'KOA progression within the subsequent 2 years.

4) Line 171: It would be better to specify that the 194 cases refer to those related to JSN and pain cases.

5) Line 217: It would be necessary to specify which serum and/or urine biochemical markers were included or at least provide a reference. The authors mention the number of biochemical markers on line 322, but do not specify which markers were studied, and this information is not provided until after four pages.

6) Lines 410, 508, 510 and 559: It is important to specify what type of progression this value refers.

Any attachments provided with reviews can be seen via the following link: [LINK]

Decision Letter 3

Alexandra Tosun

Dear Dr. Li,

Thank you very much for re-submitting your manuscript "Predicting KOA Progression: Integrating Neural network, Longitudinal MRI Radiomics, and Biochemical Biomarkers" (PMEDICINE-D-24-02732R3) for review by PLOS Medicine.

Thank you for your detailed response to the reviewers' and editors’ comments. I have discussed the paper with my colleagues, and it has also been seen again by two of the original reviewers. The changes made to the paper were satisfactory to the reviewers. As such, we intend to accept the paper for publication, pending your attention to the reviewers' and editors' comments below in a further revision. When submitting your revised paper, please once again include a detailed point-by-point response to the editorial comments.

The remaining issues that need to be addressed are listed at the end of this email. Any accompanying reviewer attachments can be seen via the link below. Please take these into account before resubmitting your manuscript:

[LINK]

In revising the manuscript for further consideration here, please ensure you address the specific points made by each reviewer and the editors. In your rebuttal letter you should indicate your response to the reviewers' and editors' comments and the changes you have made in the manuscript. Please submit a clean version of the paper as the main article file. A version with changes marked must also be uploaded as a marked up manuscript file. Please also check the guidelines for revised papers at http://journals.plos.org/plosmedicine/s/revising-your-manuscript for any that apply to your paper.

In addition to these revisions, you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests shortly.

Please note, when your manuscript is accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you've already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosmedicine@plos.org.

We ask that you submit your revision within 1 week (Jun 18 2025). However, if this deadline is not feasible, please contact me by email, and we can discuss a suitable alternative.

Please do not hesitate to contact me directly with any questions (atosun@plos.org). If you reply directly to this message, please be sure to 'Reply All' so your message comes directly to my inbox.

We look forward to receiving the revised manuscript.   

Sincerely,

Alexandra Tosun, PhD

Senior Editor 

PLOS Medicine

plosmedicine.org

***Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.***

------------------------------------------------------------

Comments from Reviewers:

Reviewer #2:

I would like to sincerely thank the author to have consistently addressed all my concerns or comment in their last revised manuscript.

I am glad to consider that with its revisions the paper is now recommendable for publication.

Any attachments provided with reviews can be seen via the following link:

[LINK]

------------------------------------------------------------

Requests from Editors:

GENERAL

* Please confirm that your title complies with to PLOS Medicine's style. Your title must be nondeclarative and not a question. It should begin with main concept if possible. "Effect of" should be used only if causality can be inferred, i.e., for an RCT. Please place the study design ("A randomized controlled trial," "A retrospective study," "A modelling study," etc.) in the subtitle (i.e., after a colon).

* Statistical reporting: Please revise throughout the manuscript, including tables and figures.

- Please report statistical information as follows to improve clarity for the reader ""22% (95% CI [13,28]; p</=)"".

- Please separate upper and lower bounds with commas instead of hyphens as the latter can be confused with reporting of negative values.

- Please repeat statistical definitions (HR, CI etc.) for each set of parentheses.

* Please ensure that all abbreviations are defined at first use throughout the text (including statistical abbreviations). Please also check figures and tables.

* Please ensure that tables and figures, including those in supplementary files, are appropriately referenced in the main text.

* Please note that the funding statement should also include URLs to sponsors’ websites.

* Data availability: Please clarify whether, once access to the repository is gained (via the link you provided), it will be clear to anyone interested in the data which dataset was used for this study.

* Many abbreviations are used throughout the main text. Please check whether the number could potentially be reduced to improve readability. Please also carefully check whether abbreviations are defined at first use throughout the text (including statistical abbreviations). Please also check figures and tables.

* Please consider whether removing some of the numerical results in main text would improve readability in cases where the numbers are easy to find in the relevant table/figure.

* The manuscript is still quite complex and, at times, difficult to follow, particularly considering that PLOS Medicine serves a broad medical audience. Please keep this in mind when revising the manuscript. Please streamline the manuscript and create easy-to-follow methods and results sections. These sections should enable readers who are unfamiliar with the topic to understand your research.

* Because Github depositions can be readily changed or deleted, we encourage you to make a permanent DOI'd copy in Zenodo and provide the URL.

ABSTRACT

* Please confirm that your abstract complies with our requirements, including providing all the information relevant to this study type https://journals.plos.org/plosmedicine/s/submission-guidelines#loc-abstract

* Please ensure that all numbers presented in the abstract are present and identical to numbers presented in the main manuscript text.

* Please suggest changing the description of outcomes to the following: “Outcomes included 1) both Joint Space Narrowing (JSN) and pain progression, 2) only JSN progression, 3) only pain progression, and 4) non-progression (JSN or pain)”

* We don’t think it’s currently 100% clear what you mean with “with a ratio of 2:1:1:2”. Please revise for clarity. We suggest including the exact numbers in the parentheses of the development and the total test cohort.

* “A total of 1753 knee MRIs were included over a 2-year follow-up.” We suggest describing this more accurately, providing the numbers for each time point.

*We suggest adding the definitions of JSN progression and pain progression in the Abstract.

* Please include basic demographics of the participants, i.e. mean age, sex, ethnicity/race.

* Please define MRI at first use.

* Why do you only report the AUC of the test and not the development cohort?

* Please include the number of resident physicians.

* In the last sentence of the Abstract Methods and Findings section, please describe the main limitation(s) of the study's methodology.

* Abstract Conclusions:

- Please address the study implications without overreaching what can be concluded from the data; the phrase ""In this study, we observed ..."" may be useful.

- Please interpret the study based on the results presented in the abstract, emphasizing what is new without overstating your conclusions.

- Please avoid vague statements such as ""these results have major implications for policy/clinical care"". Mention only specific implications substantiated by the results.

- Please avoid assertions of primacy (""We report for the first time...."")"

AUTHOR SUMMARY

* It seems that the Author Summary of the previous version was not incorporated into the main text. Please revise. The Author Summary should follow the Abstract.

* In the author summary, please revise formatting and ensure you use bullet points.

METHODS AND RESULTS

* Please consider reducing the number of subheadings.

* Ethics: Please clarify whether the need for ethical approval was waived for your study and if so, why. Please provide the name(s) of the institutional review board(s) together with the approval numbers that provided ethical approval.

* “The KLG assessments for radiographs were performed by Dr. Piran Aliabadi, MD and Dr. Burt Sack, MD, under the direction of Dr. David Felson, MD from the Boston University Clinical Epidemiology Research and Training Unit for the baseline through 24-month visits.” – We think it would be useful to add descriptions of job level and experience.

* “Seventeen types of serum and urine biochemical markers were included in the FNIH cohort study, as detailed in reference (12).” – we suggest describing these in detail here instead of using a reference only.

* l.320: Please clarify whether the clinical variable was ‘race’, ‘ethnicity’ or ‘race/ethnicity’.

* “At the baseline visit, 178 (61%) females were included in the development cohort 1, while 171 (57%) females were included in the test cohort 1 (Table S2).” – The description you have provided seems misleading. Please revise. Suggestion: “At the baseline visit, 293 (178/293, 61% females) individuals were included in the development cohort 1, while 301 (171/301, 57% females) individuals were included in the test cohort 1 (Table S2).”

* The terms gender and sex are not interchangeable (as discussed in https://www.who.int/health-topics/gender#tab=tab_1 ); please use the appropriate term.

* l.493: “This analysis will identify the most influential parameters and quantify uncertainty, improving the model's reliability and understanding its behavior in predicting KOA progression.” – The use of future tense seems wrong – please revise.

* Please check that any use of statistical terms (such as trend or significant) are supported by the data, and if not please remove them.

* Figure 1: Please consider splitting the figure into two separate figures.

* Figure 3: “The colored dotted line of yellow, orange, purple, green, blue, teal, and red represented the predictive performance change of Liu, Zhao, Cao, J Li, Chen, X Wang, Dang, and M Zhang, respectively.” – is it necessary to be able to identify the performer? We suggest removing the names and simply stating that each color belongs to one of the test individuals.

* Figure 3: For H and I, why is there no heading (total test cohort)?

DISCUSSION

* Pleas remove all subheadings.

* We don’t think that it’s necessary to cite the ORs of other studies in parentheses throughout the discussion.

“Our predictive model shows clinical potential in managing KOA progression by integrating MRI radiomic features, biomarkers, and clinical data.” – Based on the discussion and its limitations, do you think this statement is supported, or would you rather say that the model is a step toward developing a model that could be used in a clinic? We feel it would be better to tone down any statement about clinical utility.

------------------------------------------------------------

General Editorial Requests

1) We ask every co-author listed on the manuscript to fill in a contributing author statement. If any of the co-authors have not filled in the statement, we will remind them to do so when the paper is revised. If all statements are not completed in a timely fashion this could hold up the re-review process. Should there be a problem getting one of your co-authors to fill in a statement we will be in contact. YOU MUST NOT ADD OR REMOVE AUTHORS UNLESS YOU HAVE ALERTED THE EDITOR HANDLING THE MANUSCRIPT TO THE CHANGE AND THEY SPECIFICALLY HAVE AGREED TO IT.

2) Please ensure that the paper adheres to the PLOS Data Availability Policy (see http://journals.plos.org/plosmedicine/s/data-availability), which requires that all data underlying the study's findings be provided in a repository or as Supporting Information. For data residing with a third party, authors are required to provide instructions with contact information for obtaining the data. PLOS journals do not allow statements supported by "data not shown" or "unpublished results." For such statements, authors must provide supporting data or cite public sources that include it.

3) Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

Decision Letter 4

Alexandra Tosun

Dear Dr Li, 

On behalf of my colleagues and the Academic Editor, Christelle Nguyen, I am pleased to inform you that we have agreed to publish your manuscript "Predicting KOA Progression using Neural network withLongitudinal MRI Radiomics, and Biochemical Biomarkers: A Modeling Study" (PMEDICINE-D-24-02732R4) in PLOS Medicine.

I appreciate your thorough responses to the reviewers' and editors' comments throughout the editorial process. We look forward to publishing your manuscript. Editorially, there are a few remaining points that should be addressed prior to publication. We will carefully check whether the changes have been made. If you have any questions or concerns regarding these final requests, please feel free to contact me at atosun@plos.org.

Please see below the minor points that we request you respond to:

1) Title: Please change 'KOA' to 'knee osteoarthritis'.

2) Abstract: Please remove claims of primacy, such as 'novel'.

3) Author Summary: In the final bullet point of 'What Do These Findings Mean?', please state the main limitations of the study in non-technical language.

4) Funding statement/Financial Disclosure Statement: Please include URLs to all sponsors’ websites in the statement in the online submission form.

5) Ethics Statement: Please update the statement in the online submission form using the details provided in lines 68-75 of the last track changes version and in your last rebuttal. This is particularly important for the information regarding consent and the fact that no additional ethical approval was required for this study.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email (including the editorial points above). Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Once you have received these formatting requests, please note that your manuscript will not be scheduled for publication until you have made the required changes.

In the meantime, please log into Editorial Manager at http://www.editorialmanager.com/pmedicine/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process. 

PRESS

We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with medicinepress@plos.org. If you have not yet opted out of the early version process, we ask that you notify us immediately of any press plans so that we may do so on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for submitting to PLOS Medicine. We look forward to publishing your paper. 

Sincerely, 

Alexandra Tosun, PhD 

Senior Editor 

PLOS Medicine

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig

    Flow chart of FNIH OA Biomarkers Consortium cohort study inclusion. FNIH OA Biomarkers Consortium cohort: Foundation of the NIH OsteoArthritis Biomarkers Consortium cohort, MR: Magnetic Resonance, JSW: Joint Space Width, WOMAC: Western Ontario and McMaster Universities Arthritis Index, KLG: Kellgren-Lawrence Grade, BMI: Body Mass Index, BL: Baseline.

    (TIF)

    pmed.1004665.s001.tif (15MB, tif)
    S2 Fig

    The MRI segmentation scheme in our study. MRI: Magnetic Resonance Image.

    (TIF)

    pmed.1004665.s002.tif (20.8MB, tif)
    S3 Fig

    The results of 10-fold cross-validation to predict knee osteoarthritis progression and contour plot of predicted label probability under actual labels. The quartile of AUC was shown in A. The quartile of AUC in each fold was shown in B. Contour plot of predicted label probability under actual labels in final LBTRBC-M was shown in C. The 10-fold cross-validation were repeated 100 interactions. AUC: Area Under receiver operating characteristic Curve. JSN: Joint Space Narrowing, LBTRBC-M: Load-Bearing Tissue Radiomic plus Biochemical biomarker and Clinical variable Model.

    (TIF)

    pmed.1004665.s003.tif (1.8MB, tif)
    S4 Fig

    Feature selection process by LASSO regression in single-structure MRI radiomic models. Panel (A) to (F) show the magnitude of scaled parameter estimates for each model (FE-RM, FC-RM, TI-RM, TC-RM, LM-RM, and MM-RM), which indicate the importance of each MRI radiomic feature in predicting KOA progression. Panels (A), (B), (C), (D), (E), and (F) represent the magnitude of scaled parameter estimates for FE-RM, FC-RM, TI-RM, TC-RM, LM-RM, and MM-RM, respectively. Panels (G) to (L) present the scaled parameter estimates of the same models using the Akaike Information Criterion (AICc) for feature selection, providing an additional measure of model performance and fit. Panels (M) to (R) illustrate the weight of features for each model, showing the relative contribution of each selected feature to the overall predictive power of the model. These results highlight the most influential features in each MRI radiomic model for predicting KOA progression. FE-RM: Femur Radiomic Model, FC-RM: Femoral Cartilage Radiomic Model, TI-RM: Tibia Radiomic Model, TC-RM: Tibial Cartilage Radiomic Model, LM-RM: Lateral Meniscal Radiomic Model, MM-RM: Medial Meniscal Radiomic Model, AICc: Akaike Information Criterion, corrected.

    (TIF)

    pmed.1004665.s004.tif (20.8MB, tif)
    S5 Fig

    Feature selection process by LASSO regression in the LBT-RM and LBTRBC-M models. Panel (A) and (B) show the magnitude of scaled parameter estimates for the LBT-RM and the LBTRBC-M, respectively. Panels (C) and (D) represent the AICc-based selection of the most important features for both models, offering an alternative approach to assess the performance and fit of the models. Panels (E) and (F) show the weight of features in LBT-RM and LBTRBC-M, respectively, demonstrating how individual features contribute to the predictive power of each model. These visualizations offer a clearer understanding of the key features selected by LASSO regression and their impact on model performance for predicting KOA progression. LBT-RM: Load-Bearing Tissue Radiomic Model, LBTRBC-M: Load-Bearing Tissue Radiomic plus Biochemical Biomarker and Clinical Variable Model, AICc: Akaike Information Criterion, corrected.

    (TIF)

    pmed.1004665.s005.tif (17MB, tif)
    S6 Fig

    The DESS signal feature maps of load-bearing tissues in different groups. DESS signal intensity maps of femur (A–D), femoral cartilage (E–H), tibia (I–L), tibial cartilage (M–P), lateral meniscus (Q–T), medial meniscus (U–X) were developed in four groups. The high values of the femur and tibia were detected in JSN and pain progression, and pain progression group. The high value of femoral cartilage, tibial cartilage, lateral meniscus, and medial meniscus were detected in JSN and pain progression, and JSN progression group. JSN: Joint Space Narrowing, DESS: Double Echo Steady-State.

    (TIF)

    pmed.1004665.s006.tif (22.1MB, tif)
    S7 Fig

    The confusion matrix results of single-structure MRI radiomic models in the test cohorts. The confusion matrix of FE-RM (A–D), FC-RM (E–H), TI-RM (I–L), TC-RM (M–P), LM-RM (Q–T), MM-RM (U–X) in the test cohort 1–3 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. FE-RM: Femur Radiomic Model, FC-RM: Femoral Cartilage Radiomic Model, TI-RM: Tibia Radiomic Model, TC-RM: Tibial Cartilage Radiomic Model, LM-RM: Lateral Meniscal Radiomic Model, MM-RM: Medial Meniscal Radiomic Model.

    (TIF)

    pmed.1004665.s007.tif (26.4MB, tif)
    S8 Fig

    The comparations of AUC between FE-RM and FE-MOM in predicting KOA progression. The performance of predicting JSN and pain progression (A–D), JSN progression (E–H), pain progression (I–L), and non progression (M–P) in FE-RM and FE-MOM in the test cohort 1–3 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. FE-RM: Femur Radiomic Model, FE-MOM: Femur MOAKS Model, AUC: Area Under receiver operating characteristic Curve, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score.

    (TIF)

    pmed.1004665.s008.tif (18.2MB, tif)
    S9 Fig

    The comparations of AUC between FC-RM and FC-MOM in predicting KOA progression. The performance of predicting JSN and pain progression (A–D), JSN progression (E–H), pain progression (I–L), and non progression (M–P) in FC-RM and FC-MOM in the test cohort 1–3 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. FC-RM: Femoral Cartilage Radiomic Model, FC-MOM: Femoral Cartilage MOAKS Model, AUC: Area Under receiver operating characteristic Curve, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score.

    (TIF)

    pmed.1004665.s009.tif (18.2MB, tif)
    S10 Fig

    The comparations of AUC between TI-RM and TI-MOM in predicting KOA progression. The performance of predicting JSN and pain progression (A–D), JSN progression (E–H), pain progression (I–L), and non progression (M–P) in TI-RM and TI-MOM in the test cohort 1–3 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. TI-RM: Tibia Radiomic Model, TI-MOM: Tibia MOAKS model, AUC: Area Under receiver operating characteristic Curve, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score.

    (TIF)

    pmed.1004665.s010.tif (18.2MB, tif)
    S11 Fig

    The comparations of AUC between TC-RM and TC-MOM in predicting KOA progression. The performance of predicting JSN and pain progression (A–D), JSN progression (E–H), pain progression (I–L), and non progression (M–P) in TC-RM and TC-MOM in the test cohort 1–3 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. TC-RM: Tibial Cartilage Radiomic Model, TC-MOM: Tibial Cartilage MOAKS model, AUC: Area Under receiver operating characteristic Curve, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score.

    (TIF)

    pmed.1004665.s011.tif (18.2MB, tif)
    S12 Fig

    The comparations of AUC between LM-RM and LM-MOM in predicting KOA progression. The performance of predicting JSN and pain progression (A–D), JSN progression (E–H), pain progression (I–L), and non progression (M–P) in LM-RM and LM-MOM in the test cohort 1–3 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. LM-RM: Lateral meniscus radiomic model, LM-MOM: Lateral meniscus MOAKS model, AUC: Area Under receiver operating characteristic Curve, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score.

    (TIF)

    pmed.1004665.s012.tif (18.2MB, tif)
    S13 Fig

    The comparations of AUC between MM-RM and MM-MOM in predicting KOA progression. The performance of predicting JSN and pain progression (A–D), JSN progression (E–H), pain progression (I–L), and non progression (M–P) in MM-RM and MM-MOM in the test cohort 1–3 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. MM-RM: Medial meniscus radiomic model, MM-MOM: Medial meniscus MOAKS model, AUC: Area Under receiver operating characteristic Curve, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score.

    (TIF)

    pmed.1004665.s013.tif (18.2MB, tif)
    S14 Fig

    The confusion matrix results of load-bearing tissue MRI radiomic models in the test cohorts. The confusion matrix of LBT-RM (A–D) and LBTRBC-M (E–H) in the test cohort 1–4 and the total test cohort. The results of test cohort 1, test cohort 2, test cohort 3, and the total test cohort corresponded to baseline, 1-years follow-up, 2-year follow-up, and encompassed the aforementioned follow-up time points. LBT-RM: Load-Bearing Tissue Radiomic Model, LBTRBC-M: Load-Bearing Tissue Radiomic plus Biochemical biomarker and Clinical variable Model.

    (TIF)

    pmed.1004665.s014.tif (18MB, tif)
    S15 Fig

    Performance of resident physicians and models in predicting the KOA progression in different time point test cohorts. AUC for predicting JSN and pain progression (A–F), JSN progression (G–L), pain progression (M–R), and non progression (S–X) among the LBTRBC-M, LBTMBC-M, and average performance of all resident physicians without (blue dot) and with (red dot) the support of LBTRBC-M in the test cohort 1–3 was demonstrated. As shown in A, C, E, G, I, K, M, O, Q, S, U, and W, both the sensitivity and specificity of resident physicians were improved when aid was supported by LBTRBC-M (black arrow) in the test cohort 1–3. As shown in B, D, F, H, J, L, N, P, R, T, V, and X, the individual performance of resident physicians was represented by open shapes (without LBTRBC-M aid) and filled shapes (with LBTRBC-M aid). The results of test cohort 1, test cohort 2, and test cohort 3 corresponded to baseline, 1-years follow-up, and 2-year follow-up time points. The colored dotted line of yellow, orange, purple, green, blue, teal, and red represented the predictive performance change of Liu, Zhao, Cao, J Li, Chen, X Wang, Dang, and M Zhang, respectively. AUC: Area Under receiver operating characteristic Curve, LBTRBC-M: Load-Bearing Tissue Radiomic plus Biochemical biomarker and Clinical variable Model, LBTMBC-M: Load-Bearing Tissue MOAKS plus Biochemical biomarker and Clinical variable Model, MOAKS: Magnetic resonance imaging OsteoArthritis Knee Score, AI: Artificial Intelligence.

    (TIF)

    pmed.1004665.s015.tif (18.8MB, tif)
    S16 Fig

    The predictive performance of the LBTRBC-M model using/without using the stratified cross-validation in the total test cohort. (A–B) We implemented a stratified cohort split for the LBTRBC-M model to ensure proportional representation of each KOA progression subtype, maintaining an approximate 2:1:1:2 ratio of JSN and pain progression, JSN progression, pain progression, and non-progression. (C–D) The AUC of LBTRBC-M model using/without using the stratified cross-validation was displayed in the total test cohort. AUC: Area Under receiver operating characteristic Curve, LBTRBC-M: Load-Bearing Tissue Radiomic plus Biochemical biomarker and Clinical variable Model, JSN: Joint Space Narrowing, KOA: Knee Osteoarthritis.

    (TIF)

    pmed.1004665.s016.tif (1.8MB, tif)
    S1 Table

    MRI protocol details.

    (DOCX)

    pmed.1004665.s017.docx (17.8KB, docx)
    S2 Table

    Baseline characteristics of participants in the development cohort 1 and test cohort 1.

    (DOCX)

    pmed.1004665.s018.docx (24.6KB, docx)
    S3 Table

    Baseline biochemical biomarker levels of participants in the development cohort 1 and test cohort 1.

    (DOCX)

    pmed.1004665.s019.docx (23.2KB, docx)
    S4 Table

    DSCs for the CNNs automated segmentation and manual adjustment segmentation.

    (DOCX)

    pmed.1004665.s020.docx (17KB, docx)
    S5 Table

    Selected features of predictive models in total development cohort.

    (DOCX)

    pmed.1004665.s021.docx (58KB, docx)
    S6 Table

    The areas under ROC curves of predictive models in the test cohorts.

    (DOCX)

    pmed.1004665.s022.docx (37.7KB, docx)
    S7 Table

    The accuracy of predictive models in the test cohorts.

    (DOCX)

    pmed.1004665.s023.docx (22.8KB, docx)
    S8 Table

    Comparing the areas under two correlated ROC curves between predictive models in the test cohorts.

    (DOCX)

    pmed.1004665.s024.docx (59KB, docx)
    S9 Table

    Related risks of outcomes for predictive model outputs.

    (DOCX)

    pmed.1004665.s025.docx (23.2KB, docx)
    S10 Table

    Predictive performance of resident physicians under the assistance of LBTRBC-M.

    (DOCX)

    pmed.1004665.s026.docx (37.4KB, docx)
    S11 Table

    The accuracy of resident physicians under the assistance of LBTRBC-M.

    (DOCX)

    pmed.1004665.s027.docx (22.5KB, docx)
    S12 Table

    Predictive performance of resident physicians under the assistance of LBTRBC-M in the test cohorts.

    (DOCX)

    pmed.1004665.s028.docx (24.4KB, docx)
    S13 Table

    Related risks of outcomes for LBTRBC-M outputs using different GEE model.

    (DOCX)

    pmed.1004665.s029.docx (17.3KB, docx)
    S14 Table

    Compares the predictive performance of the LBTRBC-M model using/without using the MI in the total test cohort.

    (DOCX)

    pmed.1004665.s030.docx (15.1KB, docx)
    S15 Table

    Comparing predictive performance of the LBTRBC-M model using different algorithms in the total test cohort.

    (DOCX)

    pmed.1004665.s031.docx (15.6KB, docx)
    S16 Table

    The parameters of LBTRBA-M model.

    (DOCX)

    pmed.1004665.s032.docx (17KB, docx)
    S17 Table

    The predictive performance of LBTRBC-M using different hyperparameters in the total test cohort.

    (DOCX)

    pmed.1004665.s033.docx (19.5KB, docx)
    S18 Table

    Compares the predictive performance of the LBTRBC-M model with different interactions in the total test cohort.

    (DOCX)

    pmed.1004665.s034.docx (16.4KB, docx)
    S19 Table

    Compares the predictive performance of the LBTRBC-M model using/without using the stratified cross-validation in the total test cohort.

    (DOCX)

    pmed.1004665.s035.docx (15.3KB, docx)
    S1 Checklist

    TRIPODAI checklist.

    (PDF)

    pmed.1004665.s036.pdf (407.3KB, pdf)
    Attachment

    Submitted filename: PMEDICINE-D-24-02732R1-Response to Review.docx

    pmed.1004665.s039.docx (66.4KB, docx)
    Attachment

    Submitted filename: PMEDICINE-D-24-02732R2-Response to Review.docx

    pmed.1004665.s040.docx (27.4KB, docx)
    Attachment

    Submitted filename: respond to reviewer and editor.docx

    pmed.1004665.s041.docx (36.6KB, docx)

    Data Availability Statement

    The data that support the findings of this study are publicly available through the Osteoarthritis Initiative (OAI) repository at https://nda.nih.gov/oai/. De-identified patient-level clinical data, outcome data, and MRI imaging data used in this study can be accessed from this repository. The specific dataset utilized is clearly identifiable upon accessing the repository. Additionally, the source code for the predictive model developed in this study is available at https://github.com/dmlc/xgboost, and a permanently archived version has been deposited in Zenodo at https://doi.org/10.5281/zenodo.15680828.


    Articles from PLOS Medicine are provided here courtesy of PLOS

    RESOURCES