Abstract
Reliable analysis of subchondral trabecular microstructure is critical for knee osteoarthritis assessment. However, this analysis largely relies on high-resolution MRI acquired using balanced fast field echo (BFFE) sequences, which are rarely included in routine clinical protocols. Clinical CT is widely acquired, yet its limited spatial resolution and soft-tissue contrast makes direct trabecular parameter estimation unreliable. Therefore, it is specifically demanded to enable accurate trabecular microstructural analysis and osteoarthritis diagnosis using routine clinical CT, while also approaching the reliability of MR-based analysis. In this paper, we propose CT-based Subchondral Microstructural Analysis (CT-SMA) method, which utilizes distillation learning technology to transfer high-resolution structural knowledge from MR to CT while enforcing CT-only inference. The core idea of CT-SMA is to transfer microstructural knowledge learned from high-resolution MR to CT through cross-modal knowledge distillation, using a pre-trained MR-based teacher model to supervise CT-based student model on feature maps. To support effective distillation, CT-SMA further introduces a synthesis-based, multi-stage MR–CT registration strategy that establishes patch-level correspondences across modalities, despite substantial differences in resolution, contrast, and appearance. Experiments on a clinical knee imaging cohort demonstrate that CT-SMA substantially improves CT-based trabecular parameter estimation, achieving strong agreement (ICC = 0.742) with MR-derived references across key trabecular biomarkers. Moreover, when aggregated using a Transformer-based model, the regressed CT-derived parameters enable patient-level osteoarthritis diagnosis with an AUC of 0.883, substantially outperforming CT-based prediction without distillation (AUC = 0.778). These results indicate that routine clinical CT can support reliable subchondral bone analysis via proposed CT-SMA, establishing a practical foundation for large-scale studies.
Keywords: cross-modal registration, distillation learning, knee osteoarthritis, subchondral bone analysis, computed tomography
1. Introduction
Knee Osteoarthritis (KOA) is a prevalent degenerative joint disease that causes pain and disability in elder individuals (1, 2). Accumulating evidence suggests that microstructural alterations in subchondral trabecular bone are closely linked to early disease onset and progression, often preceding visible cartilage degeneration (3). Quantitative trabecular biomarkers such as bone volume fraction (BV/TV), trabecular thickness (Tb.Th), trabecular separation (Tb.Sp), and trabecular number (Tb.N) have been shown to correlate with histological degeneration, biomechanical competence, and long-term structural outcomes (4, 5). High-resolution MRI acquired with balanced fast field echo (BFFE) sequences, provides the spatial detail required to compute these biomarkers reliably (6). However, its long acquisition time, high cost and limited accessibility restrict large-scale use in clinical environments and population studies (7). On the other hand, routine clinical CT is widely available and fast to acquire, but its voxel resolution (150–500 μm) is substantially coarser than trabecular bone microstructure (50–200 μm) (8, 9). This substantial resolution mismatch leads to strong partial volume effects and makes trabecular morphology almost indiscernible on CT images (10, 11). As a result, recovering MR-level trabecular information from routine CT remains technically challenging, despite its potential to enable scalable and cost-effective assessment of subchondral bone microstructure in osteoarthritis.
To enable reliable subchondral trabecular microstructural analysis using clinical CT, the fundamental challenge lies in compensating for the severe loss of fine-scale structural information inherent to CT acquisition. This information loss makes it difficult for traditional radiomics analysis approaches to extract stable microstructural descriptors, and limits the effectiveness of deep learning models that attempt to regress trabecular parameters directly from CT appearance. In contrast, high-resolution MR provides detailed representations of trabecular architecture and the relationship between MR texture patterns and quantitative biomarkers is well established (12). In this way, the anatomical structure information along with the trabecular parameters obtained from the acquired MR images can serve as an explicit supervisory signal to guide parameter estimation from CT.
The technology of cross-modality knowledge distillation (13, 14) provides a natural mechanism for transferring MR-based structural knowledge to optimize CT-based model, and resolve the issues in CT-based subchondral bone microstructural analysis. In practice, for paired MR/CT patches from the same anatomical region, a prediction-consistency loss is used to distill trabecular-parameter knowledge from the MR-based teacher model to the CT-based student model. However, successful distillation process requires CT patches to correspond accurately to their MR counterparts, so that the MR model can act as an effective teacher during training. Therefore, it is demanded to establish accurate MR-CT registration to make reliable correspondence between the MR and CT patches. Nevertheless, such cross-modal registration is challenging due to their considerable resolution difference, their substantial contrast and appearance discrepancy, and the frequent variation in knee flexion angle between CT and MR acquisitions.
To address these challenges, we propose a novel CT-based Subchondral Microstructural Analysis (CT-SMA) method, which is a unified computational framework designed to enable reliable trabecular analysis and osteoarthritis assessment using routine clinical CT. The central idea of CT-SMA is to leverage high-resolution MR as a teacher modality during training, while operating exclusively on CT at inference time. To achieve this, CT-SMA integrates three tightly coupled components: First, to make distillation feasible in practice, we introduce a synthesis-guided MR–CT registration module that establishes anatomically faithful cross-modal patch correspondences despite large contrast/resolution gaps and pose variations. Second, built on these aligned MR/CT patch pairs, we develop a distillation-based regression module (DLRM) that transfers MR-derived trabecular-parameter knowledge to a CT model via a prediction-consistency objective. Finally, to support patient-level OA assessment, we design a transformer-based aggregation module that integrates spatially distributed trabecular parameters into a global representation for downstream prediction. We demonstrate that this strategy substantially improves the reliability of CT-based trabecular parameter estimation and enables downstream OA analysis with performance approaching MR-based methods. The main contributions of this work are summarized as follows:
-
(1)
We present CT-SMA, a unified framework that bridges MR-guided training and CT-only inference for subchondral trabecular analysis.
-
(2)
We propose a synthesis-guided cross-modal registration pipeline that produces anatomically consistent MR–CT patch correspondences for distillation.
-
(3)
We introduce a cross-modality distillation strategy to transfer high-resolution MR structural priors to CT for accurate trabecular-parameter regression.
-
(4)
We validate the clinical utility of the distilled CT representations on OA prediction, achieving large gains over CT-only baselines and nearing MR-based performance.
2. Related work
2.1. Trabecular microstructure analysis for OA classification
Trabecular microstructure has been increasingly recognized as a prognostic substrate for osteoarthritis (OA), motivating computational methods that quantify trabecular biomarkers from clinical imaging and leverage them for OA prediction. Early studies on radiographs primarily relied on classical trabecular texture analysis. They used statistical descriptors of local intensity patterns, and demonstrated that subchondral trabecular patterns are predictive of OA status and progression. However, radiographs provide only two-dimensional projections and typically rely on handcrafted features that are sensitive to acquisition conditions and offer limited structural fidelity (15, 16). In contrast, high-resolution magnetic resonance imaging has enabled more direct characterization of trabecular microarchitecture. In particular, balanced fast field echo (BFFE) sequences have been widely adopted for subchondral bone microstructural analysis of the knee region (2, 17). By exploiting steady-state free precession and strong T2/T1 contrast, BFFE provides high spatial resolution and clear delineation between bone, cartilage, and marrow spaces, allowing individual trabeculae to be resolved more reliably than conventional sequences. Based on BFFE acquisitions, classical three-dimensional morphometric parameters such as bone volume fraction (BV/TV), trabecular thickness (Tb.Th), trabecular separation (Tb.Sp), and trabecular number (Tb.N) can be estimated, and these biomarkers have shown strong associations with OA severity, progression, and cartilage degeneration (3). Reported results show strong associations between MR-derived features and the underlying trabecular structure (3), yet its clinical application has been undermined due to the high acquisition cost (7).
Conversely, CT-based trabecular analysis has also been explored, but mainly in controlled or ex vivo settings where scan quality is sufficiently high to extract three-dimensional structural descriptors linked to biomechanical properties. These studies suggest that CT-derived structural features are informative for osteoarthritic bone changes, but this evidence is largely obtained under high-resolution imaging conditions that are uncommon in clinical CT (18, 19). Collectively, prior work establishes the predictive value of trabecular microstructure for OA, yet existing approaches either require high-resolution imaging that do not address the core challenge of recovering fine-scale trabecular information from routine low-resolution clinical CT.
2.2. Cross-modal image registration
Cross modality registration plays a fundamental role in medical image analysis as it enables the integration of complementary anatomical information across imaging modalities for tasks such as diagnosis, treatment planning and multimodal learning. However, multimodal alignment remains challenging due to substantial appearance discrepancies between modalities, complex geometric inconsistencies introduced by patient positioning, and the instability of nonconvex optimization procedures required for deformable matching. Traditional approaches rely either on intensity-based similarity measures or on deformable transformation models, but these methods often struggle to obtain reliable correspondences when appearance differences are large (20). More recent learning-based registration techniques focus primarily on unsupervised prediction of deformation fields, yet their reliance on intensity-based consistency limits their robustness in cross modality settings (21). Synthesis-assisted registration attempts to reduce the appearance gap by generating modality converted images before alignment, although synthesis errors and structural distortions frequently impair the accuracy of the downstream registration (20, 22).
Knee-specific MR and CT registration has also been investigated using segmentation-based workflows that rely on bone surface extraction followed by intensity driven alignment. Prior work demonstrated that registering MR and CT bone surfaces is inherently challenging because MR provides limited visibility of cortical boundaries, which reduces the reliability of surface-based correspondence and highlights the difficulty of establishing accurate anatomical alignment between the two modalities. More advanced strategies register individually-segmented skeletal components as rigid bodies and interpolate the surrounding deformation field before applying a rigidity constrained deformable refinement (23). Although these approaches improve geometric alignment of bone structures, they depend on accurate bone segmentation and primarily operate on surface representations rather than volumetric correspondences. Overall, existing studies in this area remain limited, and learning-based multimodal registration methods for the knee are still relatively underexplored, particularly in scenarios that require voxel level alignment within trabecular bone for downstream microstructural analysis (24).
2.3. Cross-modal knowledge distillation
Knowledge distillation provides a mechanism for transferring useful information between models or modalities by encouraging consistency at the prediction or feature level, enabling complementary modality specific information to be shared across representations (25). Cross modal distillation techniques typically operate through divergence minimization between output distributions, consistency regularization on intermediate features, or adversarial learning schemes that promote modality invariant embeddings (26).
In medical image analysis, cross modality distillation has been studied primarily in the context of segmentation. One representative approach integrates an image alignment module with online mutual distillation to exploit modality shared shape priors and enhance segmentation performance across CT and MR modalities (13, 26). Another line of work achieves unpaired multimodal segmentation by parameter sharing and constraining the prediction distributions of CT and MRI through a distillation inspired loss, significantly improving segmentation accuracy with a compact model architecture (27). Beyond segmentation, cross modal distillation has been used to address incomplete multimodal data in neuroimaging, where a multimodal model serves as a teacher to guide a single modality student for Alzheimer's Disease classification (28).
Although these approaches demonstrate the effectiveness of cross modal distillation for classification and segmentation, they primarily focus on semantic outputs and do not address continuous microstructural biomarker regression, large resolution discrepancies between modalities, or the need for patch level anatomical correspondence within trabecular bone. To the best of our knowledge, cross modal distillation has not been explored for knee osteoarthritis analysis or subchondral trabecular microstructure estimation.
3. Method
The overall pipeline of CT-SMA method is illustrated in Figure 1, which aims to enable trabecular parameter regression from clinical CT by transferring high-resolution structural information from MR. Let and denote the original MR and CT volumes of size H × W × D. As shown in Figure 2, CT-SMA contains three major components. The first component constructs anatomically meaningful correspondences between modalities through synthesis and registration. The second component performs patch-level distillation learning to transfer trabecular knowledge from MR to CT. The final component aggregates patch-level predictions into a patient-level representation for osteoarthritis classification.
Figure 1.
Our CT-SMA framework for cross-modal trabecular parameter regression and subchondral-bone-microstructural-analysis. The synthesis and registration module achieves MR-to-CT alignment through three dedicated submodules. First, a coarse MR-to-CT registration submodule aligns MR images to CT space to generate crMR images. Second, a synthesis submodule based on SPSNet transforms crMR images into sCT while preserving anatomical structures. Third, a refined registration submodule performs component-wise registration from sCT to CT for individual bone regions. With paired MR and CT patches, the distillation-learning-based regression (DLRM) module first pre-trains an MR-based teacher model and then supervises the CT-based student model with knowledge distillation. The regressed parameters are further employed for KOA analysis, including KL-grade-based KOA classification and MOAKS score prediction.
Figure 2.
Detailed designs of submodules in CT-SMA. (A) Shows the structure of SPSNet, which consists of two generators ( and ) and two discriminators ( and ). The mutual information (MI) loss constrains the structural similarity during cross-modal synthesis process.
3.1. Synthesis and registration module
CT and MR exhibit substantial discrepancies in field of view, voxel spacing, intensity distribution, and knee articulation. Direct voxel-level alignment is unreliable due to these differences. Therefore, this module progressively reduces cross-modal discrepancies through three stages: coarse global alignment, structure-preserving synthesis and bone-wise refined alignment.
3.1.1. Coarse registration
The goal of coarse registration is to eliminate global misalignment in field of view and resolution. We estimate a rigid transformation that maximizes mutual information between the transformed MR volume and CT:
Mutual information is defined as
which provides robustness for multimodal alignment. Applying the rigid transform yields the coarsely aligned MR volume as
This volume is now spatially consistent with CT, providing the foundation for subsequent finer alignment.
3.1.2. Structural-preserving synthesis submodule (SPSNet)
Although coarse registration reduces global spatial misalignment, direct multimodal registration remains challenging due to the large intensity distribution gap between MR and CT. To mitigate this issue while preserving anatomical structures, we design a Structural Preserving Synthesis Network (SPSNet). As illustrated in Figure 2, the key idea is to map MR to a CT-like appearance and map CT to an MR-like appearance, while enforcing structural consistency using mutual information.
Let denote the coarsely registered MR volume and denote the CT volume. SPSNet contains two generators: The generator synthesizes a CT-like image from MR, and synthesizes an MR-like image from CT:
Two discriminators and are used to distinguish synthesized images from real images in the corresponding target domains. The adversarial objectives are defined as:
To prevent structural distortions and ensure content preservation during synthesis, the cycle consistency loss is computed as:
To encourages preservation of anatomical structures, we explicitly enforce structural similarity between the synthesized images and their corresponding inputs using mutual information. Specifically, we impose mutual information consistency between the coarsely registered MR image and its synthesized CT counterpart, as well as between the original CT image and the synthesized MR image:
By maximizing mutual information across synthesis directions, SPSNet discourages geometric distortions and hallucinated structures while allowing flexible intensity translation across modalities.
The overall objective of the synthesis submodule is defined as the weighted sum of adversarial, cycle consistency, and mutual information losses:
This step produces a CT-like representation that remains structurally faithful to MR and enables robust mono-modal alignment in the next stage. The weighting coefficients were set to and . A larger weight is assigned to the cycle-consistency loss to preserve semantic consistency during cross-modal translation, while the mutual information term provides an auxiliary constraint to encourage anatomical structure consistency across modalities.
3.1.3. Refined bone-wise registration
After synthesis, residual misalignment persists due to differences in knee-bent angles during MR and CT acquisition. Importantly, femur and tibia exhibit distinct rigid motions that cannot be adequately modeled by a single global transformation. To address this articulation-dependent misalignment, we design refined registration submodule (RRS) in a bone-wise manner.
Let denote binary masks corresponding to the femur and tibia, respectively. These are segmented by a pre-trained UNet (29). Separate rigid transformations are estimated by maximizing mutual information within each bone region:
The refined registered MR image is obtained by composing the bone-specific transformations:
This step yields anatomically consistent MR–CT alignment at the bone level, enabling reliable extraction of corresponding patches for downstream regression.
3.2. Distillation-learning-based regression module (DLRM)
After establishing accurate anatomical correspondence, paired MR and CT patches can be extracted from the refined registered images. The objective of this module is to construct a CT-based regression model capable of predicting trabecular parameters that are otherwise difficult to recover from low-resolution CT alone.
Direct training on whole volumes is infeasible due to limited sample size. Therefore, we adopt a patch-level learning strategy, which substantially increases the number of training samples and allows the model to focus on local trabecular patterns. Patch-level predictions are later aggregated to support patient-level osteoarthritis analysis. To compensate for the limited spatial resolution of CT, as illustrated in Figure 3, we employ a teacher–student distillation framework, in which high-resolution MR patches guide the learning of the CT-based model.
Figure 3.
Details of distillation-learning-based regression module (DLRM).
3.2.1. Teacher model pre-training
In the pre-training stage, the trabecular parameters obtained from morphology calculation serve as the ground-truths, and the teacher model T is trained to predict these parameters directly from the MR image patches. Let denote an MR patch and denote the corresponding trabecular parameter ground truths. The teacher prediction is
and the teacher is optimized using a regression loss that minimizes the discrepancy between predictions and ground truth parameters:
3.2.2. Distillation-guided student training
The student network S learns to regress trabecular parameters from CT patches . Its prediction is denoted as
Knowledge distillation is applied at both the prediction and feature levels. The distillation loss is defined as
where for i-th patch and denote the j-th intermediate feature maps of the student and teacher networks, respectively. The prediction-level term encourages the student to approximate the teacher's output, while the feature-level term promotes alignment of internal representations across modalities.
There is also a supervised regression loss that penalizes the discrepancy between the student prediction and the MR-derived reference :
During inference, only the CT-based student model is used. The regressed trabecular parameters are subsequently utilized for knee osteoarthritis analysis.
3.3. Patch-level aggregation and patient-level OA classification
The trabecular parameters regressed from individual CT patches characterize local subchondral bone microstructure. However, knee osteoarthritis is a joint-level disease whose manifestation is spatially heterogeneous, which cannot be reliably inferred from isolated local measurements. Therefore, patient-level diagnosis requires integrating trabecular information across multiple anatomical regions while preserving their spatial context.
To this end, we introduce a Transformer-based aggregation module that fuses patch-level trabecular parameters into a global representation for osteoarthritis classification. This proposed design explicitly models inter-patch relationships and enables the network to capture long-range dependencies among spatially distributed microstructural patterns.
3.3.1. Patch-level feature construction
For each subject, the CT-based student model produces a set of patch-level trabecular parameter predictions. Let denote the predicted trabecular parameters for the i-th patch. While these parameters encode local microstructural properties, they do not contain explicit spatial information.
To incorporate anatomical context, each patch is associated with its spatial coordinate , defined in the CT image coordinate system. A positional embedding function maps the coordinate into a low-dimensional representation. The final patch descriptor is constructed by concatenating trabecular parameters and positional encoding . This design allows the aggregation module to jointly reason about microstructural characteristics and their spatial distribution within the joint.
3.3.2. Transformer-based aggregation
Each patch descriptor is first projected into a d-dimensional embedding space using a linear transformation , where and denote learnable parameters. The resulting sequence is treated as a set of patch tokens and processed by a Transformer encoder consisting of two stacked layers:
Within each Transformer layer, self-attention mechanisms allow each patch token to selectively attend to other patches, enabling the model to capture correlations between trabecular patterns at different anatomical locations. This is particularly important for osteoarthritis analysis, where disease-related changes often emerge as coordinated alterations across multiple subregions rather than isolated local abnormalities.
After two Transformer layers, the updated patch embeddings are aggregated into a global representation using average pooling:
The vector h summarizes joint-level trabecular microstructural characteristics and serves as the patient-level feature representation.
3.3.3. OA classification
The aggregated representation h is fed into a classification head to predict the osteoarthritis grade of the subject. Specifically, a linear classifier followed by a softmax activation produces a multi-class prediction:
where denotes the predicted probability distribution over OA classes, and and are learnable parameters. The model is trained using a cross-entropy loss:
where is the ground-truth OA label for c-th OA class. By integrating spatially distributed trabecular parameters into a unified representation, this module enables CT-SMA to perform patient-level osteoarthritis classification using only CT data at inference time.
4. Experiments and results
To evaluate the superiority of our proposed models, we first introduce experiments settings, and then sequentially compared the performances of different methods on MR-to-CT synthesis, cross-modal registration, trabecular parameter regression, and KOA analysis.
4.1. Dataset and preprocessing
This retrospective study was approved by the institutional review board of Shanghai Ninth People's Hospital, and written informed consent was obtained from all participants. A total of 149 consecutive subjects with knee pain and suspected cartilage injury were recruited between October 2020 and May 2021. Among them, 96 subjects underwent both CT and MR-BFFE imaging. The flowchart of patient selection is presented in Figure 4. After excluding cases with secondary osteoarthritis due to fracture, rheumatoid arthritis, osteonecrosis, prior knee surgery, or insufficient image quality, a final cohort of 80 subjects was included in the analysis. The specific demographic information of the selected subjects is listed in Table 1.
Figure 4.
Flowchart illustrating the patient selection process in this study.
Table 1.
Demography analysis of patients with various stages of OA.
| Variables | Total (n = 80) | Normal (n = 30) | Mild OA (n = 25) | Advanced OA (n = 25) | P value |
|---|---|---|---|---|---|
| Ages (Years) | 51.3 ± 14.5 | 38.6 ± 9.6 | 56.3 ± 9.7 | 62.4 ± 10.1 | < 0.001 |
| Sex | 0.58 | ||||
| Male | 45 (56.2%) | 15 (33.3%) | 14 (31.1%) | 16 (35.6%) | |
| Female | 35 (43.8%) | 15 (42.9%) | 11 (31.4%) | 9 (25.7%) | |
| BMI | 25.2 ± 3.1 | 24.6 ± 2.7 | 24.9 ± 3.6 | 26.1 ± 2.6 | 0.16 |
| Knee | 0.78 | ||||
| Left knee | 41 (51.2%) | 14 (34.1%) | 13 (31.7%) | 14 (34.1%) | |
| Right knee | 39 (48.8%) | 16 (41.0%) | 12 (30.8%) | 11(28.2%) |
BMI, body mass index. Ages and BMI are cited as mean values ± standard deviations.
Subjects were stratified into three groups according to the Kellgren–Lawrence (KL) grading system: normal control (KL = 0, n = 30), mild OA (KL = 1–2, n = 25), and advanced OA (KL = 3–4, n = 25). Knee MR images were further assessed using the MRI Osteoarthritis Knee Score (MOAKS), which evaluates cartilage loss across five anatomical subregions.
For preprocessing, MR volumes were cropped to 640 × 640 × 120 voxels, while CT volumes were cropped to 150 × 150 × 90 voxels to define regions of interest covering the knee joint. For training the distillation-learning-based regression (DLRM) module, CT patches of size 12 × 12 × 12 were sampled from trabecular regions. The corresponding MR patches were extracted at the same anatomical locations and resized to 48 × 48 × 16 to match the resolution discrepancy between modalities. More details on patch sampling is provided in Section 4.5. The comparisons of pre-processed MR and CT in in-plane FOV, resolution and scanner are listed in the Table 2.
Table 2.
Comparison of FOV, resolution and scanner of MR and CT images.
| Imaging parameter | MR image | CT image |
|---|---|---|
| In-plane FOV | 640 × 640 × 120 | 150 × 150 × 90 |
| Resolution (mm3) | 0.234 × 0.234 × 1.500 | 0.977 × 0.977 × 1.000 |
| Scanner | Philips Healthcare Achieva 3.0TX | Philips Brilliance 64 CT |
4.2. Implementation and evaluation
All components of the proposed CT-SMA framework were implemented in PyTorch and trained on a single NVIDIA Tesla A100 GPU. Quantitative results are reported as mean ± standard deviation over five-fold cross-validation. All cross-validation splits were performed strictly at the patient level to prevent information leakage. Specifically, all patches derived from a given subject were assigned to the same fold, ensuring that no patches from the same patient appeared in both training, validation and test sets.
The training details of networks in different modules of CT-SMA are listed in Table 3. The details include the dimension, learning rate, optimizer, training epoch, and datasets used for training.In addition to the optimization settings listed in Table 3, regularization strategies were applied to mitigate overfitting. Specifically, dropout (rate = 0.5) was employed in both the DLRM and Transformer aggregation modules, and weight decay of 1 × 10−⁴ was applied during training. For the SPSNet module, paired geometric data augmentation, including random flipping, rotation, and spatial shifting, was performed consistently on MR–CT image pairs to preserve anatomical correspondence. Further implementation details and regularization configurations are provided in Supplementary Material A1.
Table 3.
The training details of networks in our proposed CT-SMA.
| Networks | Dimension | Learning rate | Optimizer | Epochs | Dataset |
|---|---|---|---|---|---|
| SPSNet | 2.5D | 0.0002 | Adam | 100 | crMR, CT |
| UNet | 2.5D | 0.0001 | Adam | 50 | CT |
| DLRM | 3D | 0.0003 | Adam | 50 | MR, CT |
UNet is used for segmentation of bone components in refined registration submodule.
The proposed CT-SMA framework performs CT-only inference and does not require MR data or registration at deployment. On a workstation equipped with an Intel Core i5-13600 CPU and an NVIDIA RTX 4090 GPU, the average end-to-end inference time is approximately 2.3 s per subject (GPU inference). Additional computational and deployment details are provided in Supplementary Material A2.
Registration accuracy was evaluated using target registration error (TRE) computed on corresponding anatomical landmarks in MR and CT images. Regression performance was assessed using intraclass correlation coefficients (ICCs) for trabecular parameters, including BV/TV, Tb.Th, Tb.Sp, and Tb.N. For osteoarthritis classification, precision, recall, F1-score, and area under the ROC curve (AUC) with 95% confidence intervals were reported. Statistical significance between methods was assessed using paired t-tests with a 5% significance level.
4.3. Results for knee MR-CT registration
4.3.1. MR-to-CT synthesis
We first evaluate the performance of MR-to-CT synthesis using SPSNet. As the synthesis submodule aims to perform modality transformation while retaining anatomical structures, we commence by comparing the anatomical similarity between sCT and crMR, and image intensity distribution consistency between sCT and CT. The qualitative synthesis results are shown in Figure 5, which compares our SPSNet with CycleGAN (31) and MUNIT (30). It can be observed that CycleGAN fails to preserve the anatomical structures of crMR, and both MUNIT and CycleGAN result in noticeable distortions in the femur region of sCT. In contrast, our proposed SPSNet preserves more anatomical structure and accomplishes successful modality transformation from crMR.
Figure 5.
Comparison of crMR-to-sCT synthesis using different methods. Each row displays the synthesis results of a sampled slice from the test dataset. The columns, arranged from left to right, represent CT images, sCT images by MUNIT (30), sCT images by CycleGAN (31), sCT images by our proposed SPSNet and the original crMR images, respectively. crMR = coarsely-registered MR, sCT = synthesized CT.
To further quantify the anatomical structural similarity between sCT and crMR, the TRE of landmarks is tabulated in Table 4. Significantly lower TRE (2.93 ± 1.75 mm) is observed on sCT images generated by our proposed SPSNet than CycleGAN (6.70 ± 4.22 mm) and MUNIT (4.03 ± 2.81 mm).
Table 4.
The TRE comparison for the MR-to-CT synthesis and the overall registration.
| MR-to-CT Synthesis (crMR-sCT) | |||
|---|---|---|---|
| Methods | CycleGAN (31) | MUNIT (32) | SPSNet (ours) |
| TRE/mm | 6.70 ± 4.22 | 4.03 ± 2.81 | 2.93 ± 1.75 |
| (95% CI: 5.78–7.62) | (95% CI: 3.41–4.65) | (95% CI: 2.55–3.31) | |
| Overall Registration (MR-CT) | |||
| Methods | ANTs (33) | ANTs (33) + SPSNet | ANTs (33) + SPSNet + RRS (ours) |
| TRE/mm | 13.56 ± 8.41 | 8.06 ± 4.98 | 3.67 ± 2.43 |
| (95% CI: 11.72–15.40) | (95% CI: 6.97–9.15) | (95% CI: 3.14–4.20) | |
RRS, Refined registration submodule. TRE values are reported as mean ± standard deviation, with 95% confidence intervals shown in parentheses. In the synthesis process, a lower TRE indicates a smaller spatial gap between the same landmark in the synthesized and real images. For the overall registration, lower TREs indicates better registration results.
Bold values indicate the lowest registration error in each row.
This improvement can be attributed to the design of SPSNet, which explicitly constrains structural consistency during cross-modal synthesis. By preserving the physical locations of anatomical structures, SPSNet effectively reduces the multi-modal registration problem to a mono-modal registration between synthesized CT and real CT, thereby simplifying subsequent alignment and improving robustness.
4.3.2. Overall MR-to-CT registration
Building upon the synthesized CT images, we evaluate the overall MR-to-CT registration performance, which aims to align the two modalities under simultaneous differences in resolution, intensity distribution, and knee-bent angles. By converting the original multi-modal registration problem into a mono-modal registration between synthesized CT and real CT, the synthesis submodule provides a more favorable initialization for subsequent alignment.
Qualitative comparisons of the overall registration are shown in Figure 6. While the tibia region is well registered in crMR, unsatisfactory registration is observed in the femur due to the angle differences between the femur and tibia. The proposed refined registration submodule (RRS) addresses this issue by separately registering the femur and tibia regions, enabling more accurate compensation for relative angular discrepancies and yielding improved alignment in the refined registered MR (rrMR) images.
Figure 6.
Comparisons of different registration methods. Each row shows the synthesis results of a sampled slice from the test dataset. From left to right, the columns are sequentially real MR images, real CT images, coarsely-registered MR (crMR) images by ANTs (rigid only) (32), refined-registered MR images (rrMR), and the visualization of registration performance between rrMR and real CT. In the visualization images, the high signals on the registered CT images are colored yellow and overlaid onto the corresponding rrMR images. The visualization shows a satisfied structural alignment situation of rrMR and CT images.
Quantitative ablation results using target registration error (TRE) are also summarized in Table 4. Compared to the coarse registration by ANTs (32), SPSNet achieves modality transformation and mitigates misalignment caused by intensity distribution differences during registration, yielding a lower TRE of 8.06 ± 4.98 mm (vs. 13.56 ± 8.41 mm, ). Furthermore, by explicitly modeling knee-bent angle variations through region-wise rigid registration, the refined registration submodule further reduces TRE to 3.67 ± 2.43 mm (P < .001), demonstrating its effectiveness in handling joint-specific geometric inconsistencies.
4.4. Results for distillation-learning-based regression (DLRM)
The regression module aims to estimate trabecular microstructural parameters from CT image patches. Quantitative results, summarized in Table 5, report the intraclass correlation coefficients (ICCs) between the regressed parameters and the corresponding ground truth derived from high-resolution MR images.
Table 5.
ICC comparison of trabecular parameter regression results.
| Trabecular parameter | Radiomic-feature-based model | CNN regression model | Distillation-learning-based CNN regression model |
|---|---|---|---|
| BV/TV | 0.548 ± 0.011 | 0.670 ± 0.053 | 0.804 ± 0.037 |
| Tb.Th | 0.407 ± 0.032 | 0.488 ± 0.038 | 0.773 ± 0.042 |
| Tb.Sp | 0.356 ± 0.017 | 0.523 ± 0.054 | 0.711 ± 0.063 |
| Tb.N | 0.326 ± 0.036 | 0.502 ± 0.085 | 0.622 ± 0.133 |
| Overall | 0.409 ± 0.026 | 0.545 ± 0.059 | 0.742 ± 0.046 |
The compared regression methods include the radiomic-feature-based model, CNN regression model, and the proposed distillation-learning-based CNN regression model. The trabecular parameters that were regressed are BV/TV, Tb.Th, Tb.Sp, and Tb.N. BV/TV, bone volume/total volume; Tb.Th, trabecular thickness; Tb.Sp, trabecular separation; Tb.N, trabecular number.
Compared with radiomic-feature-based regression (ICC: 0.409 ± 0.024), CNN-based models achieve notably higher ICCs (0.545 ± 0.067), indicating the advantage of learning hierarchical representations directly from image patches. However, directly applying mono-modal CNN regression to CT remains challenging due to the limited spatial resolution of conventional CT and the substantial appearance discrepancy between CT and MR, from which the ground-truth parameters are computed.
By incorporating the proposed distillation learning strategy, the CT-based student model is guided by a well pre-trained MR-based teacher model, effectively transferring microstructural knowledge encoded in high-resolution MR patches. As shown in Figure 7, this cross-modal supervision substantially improves regression performance, yielding a mean ICC of 0.742 ± 0.046 (P < .001). Importantly, although MR information is exploited during training, the resulting model operates exclusively on CT data at inference time, demonstrating the feasibility of CT-based subchondral bone microstructural analysis despite pronounced modality and resolution differences.
Figure 7.
Scatter plots of trabecular parameter regression results. Both the regression results from the teacher (left) and student (right) models with four plots representing BV/TV, Tb.Th, Tb.Sp, and Tb.N are presented. Each blue point represents a regression result from a single CT patch in the trabecular section of the knee of one patient. BV/TV, bone volume/total volume; Tb.Th, trabecular thickness; Tb.Sp, trabecular separation; Tb.N, trabecular number.
To evaluate sensitivity to residual MR–CT misalignment, we performed a controlled perturbation analysis by introducing synthetic translation errors (1–15 mm) into MR patch sampling coordinates. Regression agreement (ICC) showed minimal degradation under small perturbations (1–2 mm), moderate decline at 5 mm, and substantial deterioration when misalignment approached or exceeded the patch extent (10–15 mm), indicating graceful degradation under anatomically plausible residual errors (see Supplementary Section B.1).
4.5. KOA analysis based on regressed trabecular parameters
Following patch-level trabecular parameter regression, we further evaluated the proposed CT-SMA framework in patient-level knee osteoarthritis (KOA) analysis. Regressed trabecular parameters obtained from different regression methods were adopted as input features for two clinically relevant tasks: KL-grade-based KOA classification and MOAKS score prediction.
The trabecular regions of the femur and tibia were first automatically segmented using a pre-trained 3D U-Net model following (33). The resulting bone masks were subsequently reviewed and manually refined by a radiologist with 15 years of experience to ensure anatomical accuracy. Patch sampling for the DLRM module was restricted to these validated trabecular regions. Consistent with prior work (34), each trabecular region of the femur and tibia was partitioned into K = 100 anatomically coherent subregions using maskSLIC (35). For each subregion, trabecular parameters regressed from the central patch were extracted and aggregated as patient-level descriptors. To avoid excessive redundancy, the patch size for CT is chosen at 12 × 12 × 12 (approximately 11.74 × 11.74 × 12 mm3), which ensures the intersection-over-union (IoU) between any two sampled CT patches remained below 0.5 within a subject across the dataset. To investigate the impact of different aggregation strategies, we compared three types of patient-level models: a Multi-Layer Perceptron (MLP) (36), a CNN-based aggregation model, and a Transformer-based aggregation model.
The MLP serves as a lightweight baseline commonly adopted under limited patient-level supervision. The CNN-based model introduces local interaction among subregions but remains constrained by fixed receptive fields. In contrast, the Transformer-based model explicitly models long-range dependencies among spatially distributed trabecular subregions through self-attention, enabling more effective reasoning over global microstructural patterns within the joint.
Quantitative results for OA vs. non-OA classification are summarized in Table 6, with ROC curves and scatter plots illustrated in Figure 8. For KL-grade-based (OA vs. non-OA) classification, CT-SMA achieves a significantly higher AUC compared to radiomic-based methods (P < .001) and vanilla CNN-based regression methods (P < .001). This performance gain can be attributed to the improved quality of patch-level trabecular features produced by the distillation-guided regression module, which captures microstructural patterns that are more discriminative for disease staging. To further assess staging capability, we additionally evaluated the model for OA severity stratification (Mild OA: KL = 1–2 vs. Advanced OA: KL = 3–4) using the existing predictions. The corresponding AUC and F1-score results are provided in the Supplementary Materials Section B.3.
Table 6.
The statistical analysis of KL-grade-based KOA classification and MOAKS score prediction using trabecular parameters from different regression methods.
| Methods | Metrics | Radiomic-based (CT) | CNN (CT) | CT-SMA (CT) | CNN (MR) |
|---|---|---|---|---|---|
| KL-grade-based KOA classification | Precision | 0.689 ± 0.078 | 0.745 ± 0.041 | 0.863 ± 0.053 | 0.887 ± 0.061 |
| Recall | 0.773 ± 0.045 | 0.794 ± 0.047 | 0.903 ± 0.068 | 0.935 ± 0.037 | |
| F1 score | 0.738 ± 0.057 | 0.763 ± 0.040 | 0.879 ± 0.056 | 0.910 ± 0.044 | |
| AUC | 0.742 ± 0.032 | 0.778 ± 0.044 | 0.883 ± 0.029 | 0.914 ± 0.048 | |
| MOAKS score prediction | Mean Average Error | 1.438 ± 0.153 | 1.059 ± 0.072 | 0.893 ± 0.041 | 0.876 ± 0.038 |
| R-Square | 0.723 ± 0.129 | 0.790 ± 0.064 | 0.881 ± 0.037 | 0.903 ± 0.030 |
The classification is implemented using regressed trabecular parameters on image patches as features. Here we compare different methods to obtain trabecular parameters and the employed aggregation for different methods is Transformer layers. We compare radiomic-based method, CNN-based method and our proposed CT-SMA on CT images. Results of CNN-based method on MR images are provided in the right column for references.
Bold values indicate the best performance among the CT-based methods.
Figure 8.
Results of KOA analysis based on regressed trabecular parameters. (a) presents the Receiver Operating Characteristic (ROC) curve of KL-grade-based KOA classification based on regressed trabecular parameters. The green line corresponds to the performance of a CNN on CT images, the orange line indicates the performance using CT images with the proposed CT-SMA method, and the blue line represents the performance using MR images as a reference. (b–d) are scatter plots of cartilage score predictions based on regressed trabecular bone parameters. (b) shows predictions from CT images using a CNN, (c) illustrates predictions from CT images using the proposed CT-SMA method, and (d) displays predictions from MR images using a CNN for reference.
Moreover, we compare the methods to aggregate patch-level trabecular parameters for OA analysis tasks as ablation in Table 7. Across both KL-grade-based classification and MOAKS score prediction tasks, a consistent performance ranking is observed among aggregation strategies, with Transformer-based aggregation outperforming CNN-based aggregation, which in turn surpasses the MLP baseline. This trend indicates that modeling global inter-subregion relationships is critical for accurate KOA assessment, as disease-related trabecular alterations often manifest as coordinated changes across multiple anatomical locations rather than isolated local abnormalities. These results further demonstrate that, beyond accurate patch-level regression, the choice of patient-level aggregation model plays a decisive role in translating microstructural predictions into clinically meaningful KOA outcomes.
Table 7.
The statistical analysis of KL-grade-based KOA classification and MOAKS score prediction using different aggregation methods for patch-level trabecular parameters.
| Methods | Metrics | MLP(CT) | CNN (CT) | Transformers (CT) | Transformers (MR) |
|---|---|---|---|---|---|
| KL-grade-based KOA classification | Precision | 0.843 ± 0.058 | 0.846 ± 0.051 | 0.863 ± 0.053 | 0.887 ± 0.061 |
| Recall | 0.836 ± 0.074 | 0.899 ± 0.065 | 0.903 ± 0.068 | 0.935 ± 0.037 | |
| F1 score | 0.844 ± 0.067 | 0.868 ± 0.050 | 0.879 ± 0.056 | 0.910 ± 0.044 | |
| AUC | 0.742 ± 0.032 | 0.872 ± 0.033 | 0.883 ± 0.029 | 0.914 ± 0.048 | |
| MOAKS score prediction | Mean Average Error | 1.076 ± 0.146 | 0.945 ± 0.052 | 0.893 ± 0.041 | 0.876 ± 0.038 |
| R-Square | 0.837 ± 0.062 | 0.866 ± 0.044 | 0.881 ± 0.037 | 0.903 ± 0.030 |
The classification is implemented using regressed trabecular parameters on image patches as features. Here we compare different aggregation methods to employ regressed trabecular parameters. Our CT-SMA employs the proposed Transformers layer for aggregation. The first four rows are results of KL-grade-based KOA classification.
Bold values indicate the best performance among the CT-based methods.
Although a performance gap remains between CT-SMA and MR-based analysis (P = .016), likely due to the inevitable information loss caused by the resolution discrepancy between CT and high-resolution MR, CT-SMA attains an AUC of 0.883 (95% CI, 0.854–0.912). This result highlights the strong potential of CT-based KOA diagnosis, especially considering the lower cost and broader availability of CT imaging in clinical practice. Furthermore, for MOAKS score prediction, CT-SMA achieves an R-square value of 0.881 ± 0.037, demonstrating performance comparable to MR-based methods (P = 0.143).
5. Conclusions and discussions
In this paper, we propose a distillation-learning-based method namely CT-SMA for trabecular parameter regression, and investigate the feasibility of CT-based subchondral-bone-microstructural-analysis. The proposed CT-SMA framework is composed of three tightly integrated modules that jointly address cross-modal correspondence construction, microstructural knowledge transfer, and patient-level osteoarthritis modeling. Specifically, CT-SMA consists of (1) a synthesis-based MR–CT registration module for establishing anatomically consistent cross-modal patch correspondences, (2) a distillation-learning-based regression module (DLRM) that transfers high-resolution trabecular knowledge from MR to CT, and (3) a Transformer-based aggregation module that integrates spatially distributed trabecular parameters for patient-level OA prediction.
Experimental results substantiate the functional role of each module within CT-SMA. The synthesis-based registration module enables stable and anatomically meaningful MR–CT alignment, which is essential for constructing reliable cross-modal patch pairs and avoiding error propagation in subsequent learning stages. Built upon these correspondences, the DLRM consistently improves the agreement between CT-predicted trabecular parameters and MR-derived references, as reflected by substantially increased intraclass correlation coefficients. These results indicate that, despite the limited spatial resolution of clinical CT, the proposed framework can recover microstructural patterns with sufficient fidelity for quantitative analysis.
Furthermore, downstream osteoarthritis prediction experiments indicate that CT-based trabecular parameters regressed by CT-SMA preserve sufficient disease-discriminative information, yielding patient-level performance comparable to MR-based analysis while requiring only CT at inference. This demonstrates that the transferred microstructural knowledge is both quantitatively consistent and effective for joint-level OA modeling, supporting the applicability of CT-SMA in large-scale population studies.
In addition, we evaluated the computational efficiency of the deployed model. On a standard GPU workstation, the average end-to-end inference time was approximately 2–3 s per subject, and the model size remains within a lightweight deployment range. These results suggest that CT-SMA can be integrated into routine CT-based analysis pipelines without substantial computational burden. Although prospective validation and full integration into clinical workflows (e.g., PACS systems or inter-reader studies) were beyond the scope of this retrospective study, the computational feasibility observed here supports the potential for future real-world deployment and workflow integration.
More broadly, the present study demonstrates that CT-based assessment of subchondral microstructure can be reformulated as a structured cross-modal learning problem. By constructing anatomically aligned correspondences and transferring morphometric knowledge from MR to CT at the patch level, CT-SMA bridges the gap between routine CT accessibility and high-resolution structural characterization. This system-level integration enables CT to serve not merely as a coarse anatomical modality, but as a source of quantitatively meaningful structural biomarkers for osteoarthritis modeling.
The limitations of this study and potential remedies are listed as follows: First, the cohort was derived from a single institution with a consistent scanner configuration and a relatively homogeneous population, which may introduce acquisition-specific and selection bias and limit generalizability. Although the cohort size (n = 80) is comparable to several prior imaging studies of trabecular microstructure (2, 3, 10), a larger cohort would provide greater statistical power for subgroup analysis and more robust estimation of model performance. Scanner-dependent imaging characteristics and population-specific baseline bone properties may influence structural feature distributions under domain shifts. Future multi-center validation across diverse scanners and populations will be necessary to assess robustness and improve external generalizability. Second, although the selected trabecular morphometric parameters (BV/TV, Tb.Th, Tb.Sp, and Tb.N) are widely adopted structural descriptors associated with OA severity, they capture only part of the complex pathological processes involved in KOA. Additional topology-based measures (37, 38) or complementary clinical and imaging variables (39) may further enhance phenotyping. Future extensions may incorporate broader structural or multimodal descriptors to provide more comprehensive disease characterization. Finally, MR-derived morphometric parameters serve as established in vivo structural surrogates but do not represent direct histological or micro-CT ground truth. The present study therefore evaluates CT-SMA relative to MR-equivalent biomarkers rather than micron-scale trabecular architecture. Dedicated ex vivo validation, biomechanical correlation studies, or longitudinal outcome-based assessments may further strengthen biological interpretability.
Funding Statement
The author(s) declared that financial support was not received for this work and/or its publication.
Footnotes
Edited by: Francesca Angelone, University of Naples Federico II, Italy
Reviewed by: Ye Wu, Nanjing University of Science and Technology, China
Youngjun Lee, Indiana University Purdue University, United States
Abbreviations KOA, knee osteoarthritis; TRE, target registration errors; ICC, intraclass correlation coefficients; BV/TV, bone volume/total volume; Tb.Th, trabecular thickness; Tb.Sp, trabecular separation; Tb.N, trabecular number; CT-SMA, CT-based subchondral microstructural analysis; KL, kellgren & lawrence; AUC, area under the receiver operating characteristic curve.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.
Ethics statement
The studies involving humans were approved by Shanghai Ninth People's Hospital. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.
Author contributions
YH: Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. XZ: Conceptualization, Validation, Writing – review & editing. GQ: Data curation, Formal analysis, Writing – review & editing. KX: Formal analysis, Investigation, Project administration, Supervision, Writing – review & editing. CL: Data curation, Project administration, Resources, Writing – review & editing. LZ: Conceptualization, Investigation, Project administration, Resources, Writing – review & editing.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fradi.2026.1798348/full#supplementary-material
References
- 1.Whittaker JL, Truong LK, Dhiman K, Beck C. Osteoarthritis year in review 2020: rehabilitation and outcomes. Osteoarthritis Cartilage. (2021) 29:190–207. 10.1016/j.joca.2020.10.005 [DOI] [PubMed] [Google Scholar]
- 2.Liu C, Liu C, Ren X, Si L, Shen H, Wang Q, et al. Quantitative evaluation of subchondral bone microarchitecture in knee osteoarthritis using 3 T MRI. BMC Musculoskelet Disord. (2017) 18:1–9. 10.1186/s12891-017-1865-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Xue Z, Wang L, Sun Q, Xu J, Liu Y, Ai S, et al. Radiomics analysis using MR imaging of subchondral bone for identification of knee osteoarthritis. J Orthop Surg Res. (2022) 17:1–11. 10.1186/s13018-022-03314-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Pishgar F, Guermazi A, Roemer FW, Link TM, Demehri S. Conventional MRI-based subchondral trabecular biomarkers as predictors of knee osteoarthritis progression: data from the osteoarthritis initiative. Eur Radiol. (2021) 31:3564–73. 10.1007/s00330-020-07512-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Holzer L, Kraiger M, Talakic E, Fritz G, Avian A, Hofmeister A, et al. Microstructural analysis of subchondral bone in knee osteoarthritis. Osteoporos Int. (2020) 31:2037–45. 10.1007/s00198-020-05461-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Tan C, Li K, Yan Z, Yi J, Wu P, Yu HJ, et al. Towards large-scale MR thigh image analysis via an integrated quantification framework. Neurocomputing. (2017) 229:63–76. 10.1016/j.neucom.2016.05.108 [DOI] [Google Scholar]
- 7.Wessman BV, Moriarity AK, Ametlli V, Kastan DJ. Reducing barriers to timely MR imaging scheduling. Radiographics. (2014) 34:2064–70. 10.1148/rg.347140025 [DOI] [PubMed] [Google Scholar]
- 8.Adams JE. Advances in bone imaging for osteoporosis. Nat Rev Endocrinol. (2013) 9:28–42. 10.1038/nrendo.2012.217 [DOI] [PubMed] [Google Scholar]
- 9.Lowitz T, Museyko O, Bousson V, Chappard C, Laouisset L, Laredo JD, et al. Advanced knee structure analysis (AKSA): a comparison of bone mineral density and trabecular texture measurements using computed tomography and high-resolution peripheral quantitative computed tomography of human knee cadavers. Arthritis Res Ther. (2017) 19:1–11. 10.1186/s13075-016-1210-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Burghardt AJ, Link TM, Majumdar S. High-resolution computed tomography for clinical imaging of bone microarchitecture. Clin Orthop Relat Res. (2011) 469:2179–93. 10.1007/s11999-010-1766-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Graeff C, Campbell GM, Peña J, Borggrefe J, Padhi D, Kaufman A, et al. Administration of romosozumab improves vertebral trabecular and cortical bone as assessed with quantitative computed tomography and finite element analysis. Bone. (2015) 81:364–9. 10.1016/j.bone.2015.07.036 [DOI] [PubMed] [Google Scholar]
- 12.Eckstein F, Guermazi A, Gold G, Duryea J, Hellio Le Graverand MP, Wirth W, et al. Imaging of cartilage and bone: promises and pitfalls in clinical trials of osteoarthritis. Osteoarthritis Cartilage. (2014) 22:1516–32. 10.1016/j.joca.2014.06.023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Li K, Yu L, Wang S, Heng P-A. Towards cross-modality medical image segmentation with online mutual knowledge distillation. Proceedings of the AAAI Conference on Artificial Intelligence. New York: AAAAI Press (2020). p. 775–83 [Google Scholar]
- 14.Huang W, Xu Z, Wan R, Yang X, Zhang B. KD-KI: knowledge distillation with knowledge infusion for anomaly detection and localization. Neurocomputing. (2025) 656:131423. 10.1016/j.neucom.2025.131423 [DOI] [Google Scholar]
- 15.Stachowiak GW, Wolski M, Woloszynski T, Podsiadlo P. Detection and prediction of osteoarthritis in knee and hand joints based on the x-ray image analysis. Biosurf Biotribol. (2016) 2:162–72. 10.1016/j.bsbt.2016.11.004 [DOI] [Google Scholar]
- 16.Woloszynski T, Podsiadlo P, Stachowiak G, Kurzynski M, Lohmander L, Englund M. Prediction of progression of radiographic knee osteoarthritis using tibial trabecular bone texture. Arthritis Rheum. (2012) 64:688–95. 10.1002/art.33410 [DOI] [PubMed] [Google Scholar]
- 17.Jung JY, Yoon YC, Kim HR, Choe BK, Wang JH, Jung JY. Knee derangements: comparison of isotropic 3D fast spin-echo, isotropic 3D balanced fast field-echo, and conventional 2D fast spin-echo MR imaging. Radiology. (2013) 268:802–13. 10.1148/radiol.13121990 [DOI] [PubMed] [Google Scholar]
- 18.Baum T, Carballido-Gamio J, Huber M, Müller D, Monetti R, Räth C, et al. Automated 3D trabecular bone structure analysis of the proximal femur—prediction of biomechanical strength by CT and DXA. Osteoporos Int. (2010) 21:1553–64. 10.1007/s00198-009-1090-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Khovanova NA, Shaikhina T, Mallick KK. Neural networks for analysis of trabecular bone in osteoarthritis, bioinspired. Biomimetic Nanobiomater. (2015) 4:90–100. 10.1680/bbn.14.00006 [DOI] [Google Scholar]
- 20.Darzi F, Bocklitz T. A review of medical image registration for different modalities. Bioengineering. (2024) 11:786. 10.3390/bioengineering11080786 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Song X, Chao H, Xu X, Guo H, Xu S, Turkbey B, et al. Cross-modal attention for multi-modal image registration. Med Image Anal. (2022) 82:102612. 10.1016/j.media.2022.102612 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Lian C, Li X, Kong L, Wang J, Zhang W, Huang X, et al. Cocyclereg: collaborative cycle-consistency method for multi-modal medical image registration. Neurocomputing. (2022) 500:799–808. 10.1016/j.neucom.2022.05.113 [DOI] [Google Scholar]
- 23.Kuiper RJ, Van Stralen M, Sakkers RJ, Bergmans RH, Zijlstra F, Viergever MA, et al. CT To MR registration of complex deformations in the knee joint through dual quaternion interpolation of rigid transforms. Phys Med Biol. (2021) 66:175024. 10.1088/1361-6560/ac1769 [DOI] [PubMed] [Google Scholar]
- 24.Murtha P, Watterson N, Nikou C, Jaramaz B. Accuracy of ultrasound to MR registration of the knee. Int J Med Robot Comput Assist Surg. (2008) 4:51–7. 10.1002/rcs.170 [DOI] [PubMed] [Google Scholar]
- 25.Zhang X, Han L, Davies S, Sobeih T, Han L, Dancey D. A novel energy-efficient spike transformer network for depth estimation from event cameras via cross-modality knowledge distillation. Neurocomputing. (2025) 658:131745. 10.1016/j.neucom.2025.131745 [DOI] [Google Scholar]
- 26.Liu W, Hou J, Zhong X, Zhan H, Cheng J, Fang Y, et al. Improving multi-modal brain tumor segmentation via pre-training and knowledge distillation based post-training. Neurocomputing. (2025) 640:130318. 10.1016/j.neucom.2025.130318 [DOI] [Google Scholar]
- 27.Dou Q, Liu Q, Heng PA, Glocker B. Unpaired multi-modal segmentation via knowledge distillation. IEEE Trans Med Imaging. (2020) 39:2415–25. 10.1109/TMI.2019.2963882 [DOI] [PubMed] [Google Scholar]
- 28.Kwak MG, Mao L, Zheng Z, Su Y, Lure F, Li J. A cross-modal mutual knowledge distillation framework for Alzheimer’s disease diagnosis: addressing incomplete modalities. IEEE Trans Autom Sci Eng. (2025) 22:14218–33. 10.1109/TASE.2025.3556290 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation, medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference. Proceedings, Part III 18 (2015). Munich, Germany: Springer; (2015). p. 234–41 [Google Scholar]
- 30.Huang X, Liu MY, Belongie S, Kautz J, Multimodal unsupervised image-to-image translation. Proceedings of the European Conference on Computer Vision ECCV. Munich: Springter (2018). pp. 172–89. [Google Scholar]
- 31.Zhu JY, Park T, Isola P, Efros AA, Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE International Conference on Computer Vision. Venice: IEEE (2017). pp. 2223–32. [Google Scholar]
- 32.Avants BB, Tustison N, Song G. Advanced normalization tools (ANTS). Insight J. (2009) 2:1–35. 10.54294/uvnhin [DOI] [Google Scholar]
- 33.Marsilio L, Moglia A, Rossi M, Manzotti A, Mainardi L, Cerveri P. Combined edge loss UNet for optimized segmentation in total knee arthroplasty preoperative planning. Bioengineering. (2023) 10:1433. 10.3390/bioengineering10121433 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Bayramoglu N, Tiulpin A, Hirvasniemi J, Nieminen MT, Saarakkala S. Adaptive segmentation of knee radiographs for selecting the optimal ROI in texture analysis. Osteoarthritis Cartilage. (2020) 28:941–52. 10.1016/j.joca.2020.03.006 [DOI] [PubMed] [Google Scholar]
- 35.Irving B, maskSLIC: regional superpixel generation with application to local pathology characterisation in medical images, arXiv preprint arXiv:1606.09518. (2016).
- 36.Haykin S. Neural Networks: A Comprehensive Foundation. Upper Saddle River, NJ: Prentice Hall PTR; (1998). [Google Scholar]
- 37.Cottam WJ, Iwabuchi SJ, Drabek MM, Reckziegel D, Auer DP. Altered connectivity of the right anterior insula drives the pain connectome changes in chronic knee osteoarthritis. Pain. (2018) 159:929–38. 10.1097/j.pain.0000000000001209 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Mao C, Yang H, Dong T, Wang S, Shi Z, Guo R, et al. Thalamocortical dysconnectivity is associated with pain in patients with knee osteoarthritis. Eur J Neurosci. (2024) 60:5831–48. 10.1111/ejn.16531 [DOI] [PubMed] [Google Scholar]
- 39.Yusuf E, Kortekaas MC, Watt I, Huizinga TW, Kloppenburg M. Do knee abnormalities visualised on MRI explain knee pain in knee osteoarthritis? A systematic review. Ann Rheum Dis. (2011) 70:60–7. 10.1136/ard.2010.131904 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.








