ABSTRACT
Background
Accurately evaluating the radiological features of facet joint osteoarthritis (FJOA) may help to elucidate its relationship with pain. Multitask deep learning (DL) models have emerged as promising tools for this purpose.
Materials and Methods
This retrospective study employed a dataset of 13 223 axial CT facet joint (FJ) patches cropped from 1 360 patients across two hospitals. At the image level, the dataset was categorized as training dataset (n = 7430), validation dataset (n = 2000), internal test dataset (n = 1890), and external test dataset (n = 1903). The radiologic features of FJOA were qualitatively assessed using a multitask DL model based on ResNet‐18 according to the FJOA grading guidelines proposed by Weishaupt. Two batches of images from each of the internal and external test datasets were used to test the change in readers' assessment accuracy with and without DL assistance, as measured using a paired t test.
Results
In this study, the accuracy of the model on the internal and external test datasets was 89.8% and 76.6% for joint space narrowing (JSN), 79.6% and 80.2% for osteophytes, 65.5% and 56% for hypertrophy, 88% and 89.6% for subchondral bone erosions, and 82.8% and 89.8% for subchondral cysts. The model's Gwet κ values reach 0.88. When junior readers used the DL model for assistance, the accuracy was significantly improved (p value ranged from < 0.001 to 0.043).
Conclusion
A multitask DL model is a viable method for assessing the severity of radiological features in FJOA, offering support to readers during image evaluation.
Keywords: CT, deep learning, facet joint osteoarthritis, radiological feature
To develop a multitask deep learning (DL) model for assessing the severity of five radiological features of facet joint osteoarthritis (FJOA) on lumbar spine CT and evaluate the performance of readers with DL assistance.

1. Introduction
Facet joints (FJs), as core components of the vertebral “three‐joint complex,” play critical roles in facilitating spinal movement, supporting axial loads, and limiting excessive rotation and displacement [1]. Similar to other synovial joints, degenerative changes in FJs primarily manifest as facet joint osteoarthritis (FJOA) [2]. Weishaupt et al. described the typical radiological features of FJOA, including joint space narrowing (JSN), osteophytes, hypertrophy, subchondral bone erosion, and subchondral cysts [3]. These structural lesions are mainly evaluated through conventional CT examinations [4, 5]. Previous studies have often constructed 0–3 grade composite scoring scales based on single or combined manifestations of these features to explore associations between FJOA and pain, intervertebral disc degeneration, and spine stability [1, 6, 7]. However, the results are often contradictory: For example, although FJOA has been confirmed as an important cause of low back pain [2, 8], whether more significant pain symptoms necessarily accompany high‐grade FJOA remains controversial; it is still unclear whether spinal degeneration starts with FJOA or intervertebral disc degeneration [6]; and as facet tropism as a risk factor for spinal instability [9], whether FJOA promotes or protects against it has not yet been determined.
The root cause of these contradictions may lie in the limitations of FJOA composite scoring scales. On one hand, FJOA of the same grade may correspond to distinctly different structural changes: grade 3 FJOA in different patients may only present as subchondral cysts, or may simultaneously involve JSN, severe osteophytes, and severe hypertrophy without subchondral cysts. Such scales that incorporate an excessive number of features directly undermine the reliability of studies investigating the correlations between FJOA grades and pain or intervertebral disc degeneration. On the other hand, FJ osteophytes and hypertrophy exhibit dual effects: They may induce pain by causing spinal canal stenosis, lateral recess stenosis, and foraminal stenosis [10], yet also serve as a self‐stabilizing mechanism against spinal instability [11]. Thus, accurately evaluating individual radiological features holds greater clinical value than relying solely on overall severity grading. Regrettably, manual assessment of individual features shows only poor‐to‐moderate reliability [4]. Before further exploring associations between FJOA features, spinal degeneration, and clinical outcomes, improving the accuracy and consistency of individual feature assessments is imperative.
In recent years, deep learning (DL)‐based algorithms have demonstrated significant advantages in the field of medical image analysis. They not only enable vertebral segmentation to support the detection of abnormal lesions [12] but also exhibit higher accuracy, consistency, and objectivity compared to manual assessment when identifying subtle structural changes in medical images. Previous studies have confirmed that DL can perform binary classification to identify the presence or absence of FJOA in axial lumbar MRI slices [13]. Further research has shown that DL models can automatically grade FJ degeneration in MRI with a certain margin of error [14], even when affected by the low resolution of MRI for FJs. Similarly, a DL model for semi‐automatic assessment of facet tropism has been developed and shown to be more reliable than manual assessment [15]. However, as mentioned above, accurately evaluating individual features and their severity may be more meaningful than condensing an overall FJOA grade. To our knowledge, no studies have yet explored the use of DL to quantify individual radiological features of FJOA comprehensively.
This study aims to develop a two‐stage multitask DL model based on a two‐center dataset. First, the segmentation model in the first stage will localize and segment the FJ regions in axial CT images. Subsequently, the classification model in the second stage will identify multiple radiological features of FJOA and their severity, including the presence of JSN, four‐level severity grading of osteophytes and hypertrophy, three‐level severity grading of subchondral bone erosion, and the presence of subchondral cysts. Meanwhile, gradient‐weighted class activation mapping (Grad‐CAM) will be used to visualize the focus of the DL model on abnormal regions. To further verify the practicality of this model as a clinical decision‐support tool, this study will also evaluate whether the assessment accuracy of FJOA radiological features by two junior readers (with < 5 years of experience) is improved under the guidance of the DL model.
2. Materials and Methods
The study protocol and its potential benefits were reviewed by the review boards of both participating institutions (Approval numbers: 2022 K199‐1 and IRB‐AF/SC‐04/02.0). Due to the retrospective nature of the study, informed consent from participants was deemed unnecessary. The study is registered on ClinicalTrials.gov (NCT05635006).
2.1. Datasets Collection
This retrospective study collected 1400 patients who underwent lumbar CT examinations for low back pain at the Anonymized hospital between January 2020 and September 2022 as internal data. Inclusion criteria limited the study to patients with low back pain undergoing lumbar CT imaging. Exclusion criteria removed 268 patients with conditions like spondylolysis, lumbar spondylolisthesis, lumbar spine fractures, previous lumbar surgery, or severe scoliosis. External data was collected on 300 patients between November 2022 and April 2023 at the Anonymized hospital. However, 72 patients were excluded using the same exclusion criteria as those used for internal data. Ultimately, the internal data included 1132 patients, and the external data included 228 patients.
A spine radiologist with 5 years of experience in spinal imaging (anonymized reader) selected representative CT slices at spinal levels (L1/L2, L2/L3, L3/L4, L4/L5, L5/S1) for each patient. The selection criteria required CT slices to contain bilateral complete FJs and demonstrate all possible radiological features, yielding 5660 CT slices for the internal dataset and 984 slices for the external dataset. Subsequently, the anonymized reader systematically delineated bilateral FJs in CT slices using bounding boxes. Each bounding box matched the corresponding FJ size and completely enclosed the joint (Figure 1 illustrates the specific shape of the bounding boxes). This resulted in a total of 11 320 FJ CT patches in the internal dataset. At the CT patch level, the internal dataset was randomly divided into a training set (n = 7430), a validation set (n = 2000), and a test set (n = 1890). The external dataset included 1903 FJ CT patches from 228 patients, collected using the same methodology as the internal dataset (Figure E1 shows the patient selection for both internal and external datasets).
FIGURE 1.

The bounding boxes were annotated by researchers using Pair software (Version 2.7, Shenzhen, China). The size of each bounding box was determined based on the facet joint, requiring it to enclose the inferior articular process, superior articular process, and surrounding osteophytes or other proliferative tissues in the image. Typically, the joints at the L4‐L5/L5‐S1 levels are the largest. All CT patches were subsequently resized to 320 × 320 pixels and fed into the second‐stage model.
The CT scanners used in the study differed between the two hospitals. In the internal dataset, patients were primarily scanned using a uCT960+ scanner (United Imaging; Shanghai, China) with a section thickness of 1.0 mm, a section interval of 0.5 mm, and a peak tube voltage of 120 kVp. The external dataset consisted of scans acquired on an iCT256 scanner (Philips) with a slice thickness of 0.9 mm, a section interval of 0.7 mm, and a peak tube voltage of 120 kVp. All lumbar CT images from both datasets were stored in the digital imaging and communications in medicine (DICOM) format.
2.2. Dataset Labeling
To ensure unbiased assessment, readers were blinded to the CT reports. First, an anonymized reader trained in standardized FJOA reading and an anonymized reader, (A spine radiologist with more than 10 years of experience reading spine images) independently assessed the following features of FJs in both internal and external datasets using the criteria developed by Weishaupt et al. [3]: JSN (Defined as < 2 mm, categorized as absent/present), osteophytes and hypertrophy (Categorized as no/mild/moderate/severe), subchondral bone erosions (Categorized as no/mild/severe), and subchondral cysts (Categorized as absent/present). Image assessments were performed using Pair (Version 2.7, Shenzhen, China), a universal labeling software for medical images. Readers received decision support through example images from Weishaupt et al. [3]. In cases of agreement between the two readers, their grading results served as the ground truth. For disagreements, a third spinal radiologist (Anonymized reader) with 8 years of experience re‐assessed the images and determined the ground truth. Gwet κ coefficient was used to assess interreader reliability between the two initial readers for each radiological feature [16].
2.3. DL Model Implementation
Our two‐stage model aimed to assess each radiological feature of FJOA qualitatively (Figure 2). In the first stage, we addressed potential missed detections from the initial object detection model by employing nnU‐Net, a DL‐based segmentation method. This method segmented the FJ region on both sides by predicting binary masks corresponding to the previously labeled bounding boxes. Padding the masks prevented image deformation during the subsequent cropping step, which involved extracting CT patches under the masks. Since the bounding box size matched the FJ size, the cropped patches were inherently similar in size. These patches were then resized to 320 × 320 pixels and fed into the second‐stage model. FJOA manifests as a failure of the whole joint. The multitask DL strategy intrinsically encodes interfeature correlations among FJOA degeneration characteristics through shared representation learning, diverging fundamentally from sequential single‐task paradigms that neglect feature interdependencies. Furthermore, the lightweight architecture demonstrates superior deployment viability in clinical workflows constrained by limited computational resources. Here, a pretrained ResNet‐18 model served as the backbone, responsible for extracting multiscale semantic information from the CT patches. The extracted features were subsequently fed into five parallel fully connected networks, one for each FJOA feature (JSN, osteophytes/hypertrophy, subchondral bone erosions, subchondral cysts). The model's decision‐making process for each feature classification can be visualized using the gradient‐weighted class activation map generated after the final convolutional layer.
FIGURE 2.

The scheme of the proposed deep learning approach. First, a nnUNet was trained to detect and localize the facet joints of both sides. Next, we cropped the ROI of the facet joints of both sides and resized them to 320 × 320. Then, the cropped images were fed into a pretrained ResNet‐18 for radiological feature extraction. Five parallel fully connected networks were trained to qualitatively evaluate the features of joint space narrowing (JSN), osteophytes, hypertrophy, subchondral bone erosions (SUBERO), and subchondral cyst (SUBCYST), respectively, and output the severity of the features.
We implemented our DL model using the PyTorch framework (Version 1.11.0) within a Python 3.7 environment with SciPy and Pandas libraries. In the first stage, we utilized nnU‐Net's automated configuration capability to optimize critical hyperparameters, including patch size, voxel spacing, batch size, and network depth, based on dataset characteristics and GPU memory constraints. nnU‐Net computes target spacing by analyzing the spacing distribution across training images (e.g., adopting the median or 10th percentile), followed by resampling of all images to balance spatial resolution retention and computational efficiency. To ensure robust hyperparameter optimization, we implemented fivefold cross‐validation, partitioning training data into five subsets and iteratively validating on one subset while training on the others. During this process, we fixed all hyperparameters except the learning rate, which was decayed progressively from 1e‐2 to 8e‐3, 6e‐3, 4e‐3, 2e‐3, and finally 1e‐3 as the optimal value. For optimizer selection, we first employed nnU‐Net's default SGD optimizer, then comparatively evaluated Adam and AdamW; results demonstrated SGD's superior performance for our specific task and dataset, leading to its retention. Due to GPU memory limitations, we maintained the patch size (320 × 320 and batch size as automatically determined by nnU‐Net). Additionally, the training epochs were reduced from the default 1000–100, which achieved stable convergence and sufficient performance. Correspondingly, the second‐stage ResNet‐18 network was trained using identical learning rates and batch sizes for 100 epochs. All training procedures were conducted on an Ubuntu 18.04 system with two NVIDIA TITAN RTX GPUs. Project source code is publicly available at: https://github.com/DRhattaneee/FacetJoint.
2.4. Reader Performance Assessment
Two readers (Anonymized reader and anonymized reader with 3 and 2 years of experience, respectively) independently evaluated two random batches of data (500 images each, no duplicates) from the internal test datasets. The readers assessed the first batch without assistance from the DL model. Subsequently, following a washout period of 2 weeks to 1 month, they assessed the second batch along with the DL model. Two batches of data randomly selected from an external test dataset were evaluated in the same way. Paired t tests were then conducted to compare the assessment accuracy between the two rounds for each reader.
2.5. Statistics
Model performance was evaluated by an anonymized reader (An information technology engineer specializing in image analysis) using both internal and external test datasets. In the first stage, the model's ability to detect FJs was assessed using Dice scores with 95% confidence intervals (CIs). For the second‐stage classification tasks (Evaluating each feature of FJOA), we employed a comprehensive set of metrics: precision, accuracy, sensitivity, specificity, F1 score, confusion matrix, and receiver operating characteristic (ROC) curves. Interreader reliability and model reliability were evaluated using the Gwet κ coefficient with 95% CIs. The level of agreement was interpreted as follows: less than 0 indicating poor, 0–0.2 slight, 0.21–0.4 fair, 0.41–0.6 moderate, 0.61–0.8 substantial, and 0.81–1 almost perfect. To understand and visualize the DL model's decision‐making process for FJOA feature assessment, we employed the Grad‐CAM method. This method generates heatmaps that localize regions with high activation within the image. Furthermore, paired t tests (SPSS, version 27.0, IBM Corporation, Chicago, IL, USA) were used to analyze the performance of readers with and without DL assistance. We set a significance threshold of p < 0.05 and utilized accuracy radar plots to visualize trends in readers' assessments.
3. Results
3.1. Participant Characteristics
This study included a total of 1360 patients. The internal dataset comprised 1132 patients (Mean age 46.1 ± 15.0 years [Standard deviation]; 568 women; mean body mass index 24.5 ± 4.1 kg/m2). The external dataset included 228 patients (Mean age 51.5 ± 16.8 years; 119 women; mean body mass index 23.6 ± 3.2 kg/m2). A total of 13 223 FJ CT patches were obtained from both datasets (Patient screening process in both datasets is Figure E1; the patient demographics are summarized in Table E1). Table 1 detail the distribution of ground truth grading in the internal and external datasets, respectively.
TABLE 1.
Frequencies of FJOA feature grades from internal dataset and external test dataset.
| Feature grades | Internal training and validation dataset (n = 9430) | Internal test dataset (n = 1890) | External test dataset (n = 1903) |
|---|---|---|---|
| Joint space narrow | |||
| Absent | 2933 (31.1) | 612 (32.4) | 860 (45.2) |
| Present | 6497 (68.9) | 1278 (67.6) | 1043 (54.8) |
| Osteophytes | |||
| No | 7196 (76.3) | 1379 (72.9) | 1518 (79.8) |
| Mild | 1726 (18.3) | 378 (20.0) | 237 (12.5) |
| Moderate | 356 (3.8) | 110 (5.9) | 113 (5.9) |
| Severe | 152 (1.6) | 23 (1.2) | 35 (1.8) |
| Hypertrophy | |||
| No | 3719 (39.4) | 638 (33.8) | 834 (43.8) |
| Mild | 3365 (35.7) | 637 (33.8) | 815 (42.8) |
| Moderate | 2138 (22.7) | 581 (30.6) | 240 (12.6) |
| Severe | 208 (2.2) | 34 (1.8) | 14 (0.8) |
| Subchondral bone erosions | |||
| No | 8350 (88.6) | 1638 (86.7) | 1689 (88.6) |
| Mild | 909 (9.6) | 220 (11.6) | 188 (9.9) |
| Severe | 171 (1.8) | 32 (1.7) | 26 (1.5) |
| Subchondral cysts | |||
| Absent | 5004 (53.1) | 1062 (56.2) | 1782 (93.6) |
| Present | 4426 (46.9) | 828 (43.8) | 121 (6.4) |
Note: Unless otherwise stated, data are numbers of facet joint CT images, with percentages in parentheses.
Across all datasets and radiological features, the interreader agreement between the two reference radiologists establishing the ground truth demonstrated consistency levels ranging from moderate to almost perfect. The Gwet κ for overall agreement was 0.81 (95% CI: 0.79–0.83) for the internal data and 0.71 (95% CI: 0.69–0.74) for the external data. Agreement on specific features varied: JSN showed an almost perfect agreement at 0.99 (Internal, 95% CI: 0.99–1.00) and 0.86 (External, 95% CI: 0.84–0.88). Osteophytes also demonstrated almost perfect agreement at 0.85 (Internal, 95% CI: 0.84–0.87) and 0.81 (External, 95% CI: 0.79–0.83). Agreement for hypertrophy was 0.88 (Internal, 95% CI: 0.86–0.90) and 0.52 (External, 95% CI: 0.49–0.55). Subchondral bone erosions exhibited almost perfect agreement at 0.90 (Internal, 95% CI: 0.89–0.92) and 0.89 (External, 95% CI: 0.88–0.91). Subchondral cysts achieved agreements of 0.58 (Internal, 95% CI: 0.54–0.61) and 0.57 (External, 95% CI: 0.53–0.60).
3.2. Model Performance for FJs Detection
After training on 9430 axial CT slices of FJs from the internal dataset, the nnU‐Net model achieved a promising Dice score of 0.81 (95% CI: 0.77–0.84) in the first stage, demonstrating its effectiveness in FJ detection. Subsequently, nnU‐Net was employed to segment the remaining 3793 images from both the internal and external test datasets. To ensure accuracy, an anonymized reader reviewed all images and confirmed that the predicted region masks by nnU‐Net fully encompassed the corresponding FJs.
3.3. Model Performance of Characteristic Classification on Internal Test Dataset
The performance of the DL model in classifying features was evaluated using 1890 FJ axial CT patches from the internal test dataset. The overall accuracy and AUC for JSN were 89.8% (1697 of 1890) and 0.96, respectively. Similarly, the overall accuracy and AUC for osteophytes were 79.6% (1504 of 1890) and 0.92, for hypertrophy 65.5% (1237 of 1890) and 0.86, for subchondral bone erosions 88% (1663 of 1890) and 0.93, and for subchondral cysts 82.8% (1565 of 1890) and 0.91. Furthermore, the precision of grading each feature was determined. The precision for grading JSN was 85.6% (504 of 589) for absent and 91.7% for present. The precision for grading osteophytes was 91.6% (1187 of 1296) for absent, 52.8% (255 of 483) for mild, 54% (54 of 100) for moderate, and 72.8% (8 of 11) for severe. The precision for grading hypertrophy was 85.2% (386 of 453) for absent, 51.5% (456 of 886) for mild, 71.5% (392 of 548) for moderate, and 100% (three of three) for severe. The precision for grading subchondral bone erosions was 95.6% (1503 of 1573) for absent, 49.5% (150 of 303) for mild, and 71.4% (10 of 14) for severe. Finally, the precision for grading subchondral cysts was 88.8% (844 of 951) for absent and 76.8% (721 of 939) for present. The ROC for all features is illustrated in Figure 3A.
FIGURE 3.

Performance of deep learning (DL) model on internal test dataset. (A) Receiver operating characteristic (ROC) curves for the classification of radiological features of facet joint osteoarthritis (FJOA): Joint space narrowing (JSN), osteophytes, hypertrophy, subarticular bone erosions (SUBERO), subchondral cysts (SUBCYST). JSN showed the highest area under the Receiver operating characteristic curve (0.96). (B–F) Confusion matrix of the DL model for grading radiological features of FJOA. Grading discrepancies between nonneighboring grades occurred in 1.1% (21 of 1890) of the osteophytes, 1.0% (19 of 1890) of the joints for hypertrophy, and 0.2% (4 of 1890) for subchondral bone erosions, respectively.
Detailed statistics for the five FJOA radiological features within the internal test dataset are depicted in detail in Figure 3B–E. The analysis revealed that the most significant discrepancies occurred between adjacent grades (e.g., mild vs. moderate).
Additionally, variations in grading consistency were observed across the features. Subchondral bone erosions exhibited the highest reliability, achieving a Gwet κ value of 0.86 (95% CI: 0.85–0.88). JSN followed closely behind with Gwet κ of 0.82 (95% CI: 0.80–0.84). Overall, the DL model demonstrated moderate to near‐perfect reliability, as summarized in Table 2.
TABLE 2.
Overview of multitask deep learning model performance results on internal test dataset.
| Assessed characteristics | Precision (%) | F1 score (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | Gwet k |
|---|---|---|---|---|---|---|
| Joint space narrow | 88.6 | 88.2 | 89.8 | 87.9 | 89.8 | 0.82 |
| Absent | 85.6 | 83.9 | 82.4 | |||
| Present | 91.7 | 92.5 | 93.3 | |||
| Osteophytes | 67.8 | 61.6 | 79.6 | 59.4 | 93.2 | 0.76 |
| No | 91.6 | 88.8 | 86.1 | |||
| Mild | 52.8 | 59.2 | 67.5 | |||
| Moderate | 54 | 51.4 | 49.1 | |||
| Severe | 72.7 | 47.1 | 34.8 | |||
| Hypertrophy | 77.1 | 54.1 | 65.5 | 52.1 | 88.5 | 0.56 |
| No | 85.2 | 70.8 | 60.5 | |||
| Mild | 51.5 | 59.9 | 71.6 | |||
| Moderate | 71.5 | 69.4 | 67.5 | |||
| Severe | 100 | 16.2 | 8.9 | |||
| Subchondral bone erosions | 72.2 | 64.8 | 88.0 | 63.7 | 94.0 | 0.86 |
| No | 95.6 | 93.6 | 91.8 | |||
| Mild | 49.5 | 57.4 | 68.2 | |||
| Severe | 71.4 | 43.5 | 31.3 | |||
| Subchondral cysts | 82.8 | 82.7 | 82.8 | 83.3 | 82.8 | 0.66 |
| Absent | 88.8 | 83.9 | 79.5 | |||
| Present | 76.8 | 81.6 | 87.1 |
Note: Precision is defined as true‐positive findings divided by the sum of false‐positive and true‐positive findings. F1 score is defined as the harmonic mean of precision and recall. Sensitivity is defined as true‐positive findings divided by the sum of true‐positive and false‐negative findings.
3.4. Model Performance of Characteristic Classification on External Test Dataset
The external validation tests were conducted using 1903 FJ CT patches from the 228 patients. The overall accuracy for feature classification and AUC was as follows, respectively: 76.6% (1457 of 1903) and 0.90 for JSN, 80.2% (1527 of 1903) and 0.94 for osteophytes, 56% (1066 of 1903) and 0.81 for hypertrophy, 90% (1705 of 1903) and 0.95 for subchondral bone erosions, and 89.8% (1709 of 1903) and 0.93 for subchondral cysts. The ROC curves for all features of the external test dataset are shown in Figure 4A.
FIGURE 4.

Performance of deep learning (DL) model on internal test dataset. (A) Receiver operating characteristic (ROC) curves for the classification of radiological features of facet joint osteoarthritis (FJOA): Joint space narrowing (JSN), osteophytes, hypertrophy, subarticular bone erosions (SUBERO), subchondral cysts (SUBCYST). SUBERO showed the highest area under the Receiver operating characteristic curve (0.95). (B–F) Confusion matrix of the DL model for grading radiological features of FJOA. Grading discrepancies between nonneighboring grades occurred in 1.6% (30 of 1903) of the osteophytes, 4.9% (94 of 1903) for hypertrophy, and 0.7% (14 of 1903) for subchondral bone erosions, respectively.
Detailed statistics for the five FJOA radiological features evaluated in the external test dataset are provided in Figure 4B–E. Subchondral bone erosions exhibited the greatest reliability, with a Gwet κ coefficient of 0.88 (95% CI: 0.86, 0.90), followed by subchondral cysts with 0.87 (95% CI: 0.86, 0.89). These findings are summarized in Table 3.
TABLE 3.
Overview of multitask deep learning model performance results on external test dataset.
| Assessed characteristics | Precision (%) | F1 score (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | Gwet k |
|---|---|---|---|---|---|---|
| Joint space narrow | 82.2 | 74.3 | 76.6 | 74.4 | 74.4 | 0.57 |
| Absent | 93.5 | 66.6 | 51.7 | |||
| Present | 70.9 | 81.9 | 97.0 | |||
| Osteophytes | 65.7 | 59.9 | 80.2 | 62.1 | 91.9 | 0.77 |
| No | 96.1 | 89.5 | 83.7 | |||
| Mild | 39.5 | 51.3 | 73.4 | |||
| Moderate | 57.8 | 61.4 | 65.5 | |||
| Severe | 69.2 | 37.5 | 25.7 | |||
| Hypertrophy | 61.6 | 47.0 | 56.0 | 49.1 | 83.2 | 0.44 |
| No | 90.9 | 57.5 | 42.1 | |||
| Mild | 52.1 | 58.1 | 65.6 | |||
| Moderate | 36.6 | 49.0 | 74.2 | |||
| Severe | 66.7 | 23.5 | 14.3 | |||
| Subchondral bone erosions | 68.0 | 74.9 | 89.6 | 87.6 | 94.3 | 0.88 |
| No | 99.1 | 94.5 | 90.2 | |||
| Mild | 48.8 | 61.7 | 84.0 | |||
| Severe | 56.1 | 68.7 | 88.5 | |||
| Subchondral cysts | 67.7 | 72.5 | 89.8 | 86.5 | 86.5 | 0.87 |
| Absent | 98.7 | 94.3 | 90.3 | |||
| Present | 36.6 | 50.8 | 82.6 |
Note: Precision is defined as true‐positive findings divided by the sum of false‐positive and true‐positive findings. F1 score is defined as the harmonic mean of precision and recall. Sensitivity is defined as true‐positive findings divided by the sum of true‐positive and false‐negative findings.
3.5. Model Visualization
Figure 5 displays original axial CT patches of FJs alongside their Grad‐CAMs, including JSN (Figure 5A,F), osteophytes (Figure 5B,G), hypertrophy (Figure 5C,H), subchondral bone erosions (Figure 5D,I), and subchondral cysts (Figure 5E,J). Notably, in Figure 5A,F, the model emphasizes the localization of the articular surface center when evaluating JSN and considers the entire joint space. Areas of anomalies are also accurately identified when evaluating other radiological features. Additionally, when presented with the same CT images of the FJ (Figure 5D,E), the model focuses on different regions depending on the assessment target (Subchondral bone erosions vs. subchondral cysts), as indicated by the arrows in Figure 5I,J. This observation demonstrates that the model's decision‐making process emulates human experts by selectively attending to pathological regions rather than relying solely on interimage correlations.
FIGURE 5.

Heat map of facet joint osteoarthritis (FJOA) radiological features assessment. (A–E) Axial CT images of FJOA. (F–J) Gradient‐weighted class activation maps. The gradient‐weighted class activation maps were overlaid with images of facet joints serving as heat maps (red indicating higher activation, blue indicating lower activation). (A, F) Joint space narrow (JSN) (arrows) of the facet joint. (B, G) Severe osteophytes (arrows). (C, H) Hypertrophy of facet joint. (D, I) Subchondral bone erosions (SUBERO). (E, J) Subchondral cysts (SUBCYST). The heat maps demonstrate the neural network's emphasis on the abnormal region during its evaluation.
3.6. Reader Performance Using the DL Model
Figure 6 depicts the variations in accuracy observed between the two readers when evaluating with and without the assistance of the DL model. The incorporation of the DL model resulted in improved accuracy for reader 1 (Anonymized reader) across various features (p = 0.043). In the internal test dataset, the accuracy for JSN increased from 60.6% (303 of 500) to 82% (410 of 500). Similarly, accuracy for osteophytes improved from 33.8% (169 of 500) to 69.8% (349 of 500), for hypertrophy from 40.8% (204 of 500) to 59% (295 of 500), for subchondral bone erosions from 88.8% (444 of 500) to 90.8% (454 of 500), and for subchondral cysts from 70.2% (351 of 500) to 78.4% (392 of 500). Reader 2 (Anonymized reader) also demonstrated enhanced accuracy with the assistance of the DL model (p = 0.029). For JSN, accuracy increased from 88.4% (442 of 500) to 90.4% (452 of 500), for osteophytes from 52% (260 of 500) to 68.4% (342 of 500), for hypertrophy from 42.2% (211 of 500) to 60.4% (302 of 500), for subchondral bone erosions from 78.4% (392 of 500) to 85.8% (429 of 500), and for subchondral cysts from 74.8% (374 of 500) to 81.8% (409 of 500). In the evaluation of the external test dataset, both readers similarly achieved a significant improvement in accuracy (p < 0.001, p = 0.001). These findings are summarized in Table 4. Additionally, the two readers reported experiencing reduced assessment times with the aid of the DL model. The time per CT slice decreased from approximately 40 s to approximately 15–20 s.
FIGURE 6.

Radar plots of assessment accuracy of two spine radiologists with and without the assistance of the deep learning (DL) model. (A, B) The two radiologists' accuracy trends in the internal test dataset. (C, D) Accuracy trends in the external test dataset. With the assistance of the DL model, the accuracy of the two doctors' assessment of all features of facet joint osteoarthritis (FJOA) was improved (p < 0.05).
TABLE 4.
Reader performance with and without DL model aid.
| Reader and assessed characteristics | Internal test dataset (n = 1000) | External test dataset (n = 1000) | ||||
|---|---|---|---|---|---|---|
| Without DL (n = 500) | With DL (n = 500) | p | Without DL (n = 500) | With DL (n = 500) | p | |
| Reader 1 | ||||||
| Joint space narrow | 303 (60.6) | 410 (82) | 0.043 | 311 (62.2) | 413 (82.6) | < 0.001 |
| Osteophytes | 169 (33.8) | 349 (69.8) | 226 (45.2) | 374 (74.8) | ||
| Hypertrophy | 204 (40.8) | 295 (59) | 222 (44.4) | 314 (62.8) | ||
| Subchondral bone erosion | 444 (88.8) | 454 (90.8) | 322 (64.4) | 440 (88) | ||
| Subchondral cysts | 351 (70.2) | 392 (78.4) | 296 (59.2) | 379 (75.8) | ||
| Reader 2 | ||||||
| Joint space narrow | 442 (88.4) | 452 (90.4) | 0.029 | 331 (66.2) | 415 (83) | 0.001 |
| Osteophytes | 260 (52) | 342 (68.4) | 254 (50.8) | 386 (77.2) | ||
| Hypertrophy | 211 (42.2) | 302 (60.4) | 247 (49.4) | 318 (63.6) | ||
| Subchondral bone erosion | 392 (78.4) | 429 (85.8) | 347 (69.4) | 454 (90.8) | ||
| Subchondral cysts | 374 (74.8) | 409 (81.8) | 322 (64.4) | 400 (80) | ||
Note: Data are the number of facet joint CT images. No duplication between images used in each assessment. Unless otherwise indicated, data in parentheses are the accuracy and are percentages. DL = deep learning. Statistical tests were performed using paired t tests.
4. Discussion
This study implemented a two‐stage DL model for multitask classification of radiological features in multisegment FJOA on axial lumbar CT scans. To our knowledge, this is the first study to quantify the severity of radiological features in the entire lumbar FJOA using a two‐stage multitask DL approach. In the first stage, the model achieved a Dice score of 0.81 for detecting and localizing FJ on CT slices. In the subsequent stage, the model assessed the presence of pathological radiological features in FJs, including JSN, osteophytes, hypertrophy, subchondral bone erosion, and subchondral cysts. Across internal and external dual test datasets, the model achieved an accuracy of 56%–89.8% and a maximum Gwet κ value of 0.88. Grad‐CAM applied to the final layer of the DL model explained the model, demonstrating that when evaluating different features, the model accurately focused on abnormal regions. The lightweight model based on the ResNet‐18 architecture in this study is expected to be integrated into clinical workflows as an auxiliary tool to improve the diagnostic accuracy and reading efficiency of junior readers.
As a prevalent source of low back pain, facetogenic pain interventions are the second most frequently performed pain management procedure category in the United States [17]. To determine the radiological features of FJOA related to pain and treatment effect, researchers have conducted numerous studies on X‐rays [18], CT [19], MRI [20], and SPECT [21]. However, the relationship between the radiological features of FJOA and pain remains controversial to this day. In reviewing existing literature on the association between FJOA and pain, we found that researchers commonly use simplified four‐point grading scales to characterize the degree of FJ degeneration and incorporate such composite scores into pain correlation analyses. However, this approach has obvious limitations: current scales, which include an excessive number of radiological features in each grade, are prone to causing assessment bias, which may lead to inaccurate conclusions when exploring associations with pain. Furthermore, when focusing on the assessment of individual radiological features, researchers often face challenges such as time‐consuming processes, high subjectivity, and poor consistency [4]. Thus, the objective quantification of FJOA radiological features through standardized tools is not only key to accurately determining disease prevalence but also provides support for future in‐depth investigations into the associations between each feature and pain manifestations, treatment decisions, and prognostic outcomes.
By training DL models on large datasets, accurate and efficient evaluation and classification of medical images can be achieved, with reliability comparable to that of experienced clinicians [22, 23]. In a previous study, Bharadwaj et al. [13] used a two‐stage model for regional localization and binary classification of lumbar FJOA in MRI: the Dice in the localization stage reached 0.83; subsequently, based on the Pathria criteria [18], lumbar FJOA was classified into two categories (normal/mild or moderate/severe), and the AUC of this DL model reached 0.93. Nikpasand [14] developed a neural network based on the Fujiwara scale to perform a four‐classification evaluation of FJ's degeneration. Although the agreement between the model results and the gold standard was only 49% due to limitations in MRI imaging efficacy, the study still proposed a new perspective—that using DL to evaluate the grade of FJ's degeneration is feasible. However, existing grading systems (including the Kellgren‐Lawrence [24], Weishaupt [3], and Pathria [18] criteria, etc.) have fundamental limitations. These scales provide an oversimplified and general assessment of osteoarthritis, making it difficult to thoroughly and meticulously capture the various radiological features of the disease. To address this, Schacky et al. [25] employed a multitask DL model to evaluate the severity of five radiological features (including acetabular osteophytes) in hip osteoarthritis, achieving a precision of 75.8%–97.6%. Among them, the evaluation accuracy for “Absence” of subchondral cysts was the highest, reaching 98.1%; it should be noted, however, that dataset imbalance (Absent: 15 018 cases; Present: 346 cases) may have led to overestimation of model performance. Nevertheless, this study still highlighted the application potential of multitask DL models in this field.
In evaluating FJ radiological features, the two‐stage model proposed in this study, compared with the research by Bharadwaj et al. [13], does not achieve an outstanding Dice coefficient in the first‐stage segmentation. However, the core objective of this stage is to ensure the segmentation range wraps around the target CT region as much as possible, rather than precise cropping. After manual inspection of all CT patches segmented by the model, it was found that the first‐stage localization and segmentation model fully met the requirements for subsequent analyses. The second‐stage classification model adopted evaluation criteria recommended by international guidelines [26] to assess the severity of five radiological features using data from two centers. In the internal test set, JSN achieved the highest accuracy (89.8%); even for osteophytes and hypertrophy, which are highly subjective to clinical assessment, the model's overall accuracy reached 79.6% and 65.5%, respectively, while the accuracy for subchondral bone erosion and subchondral cysts was 88% and 82.8%. In the external test set, the model's performance remained at a high level without significant degradation, confirming its strong generalization ability. In comparison, this study's ability to evaluate features with multiple degeneration degrees in both internal and external test sets outperforms the similar study by Schacky et al. [25] on classifying features of hip osteoarthritis, showing better precision in mild (52.8% vs. 46.4%), moderate (71.5% vs. 68%), and severe (100% vs. 80%) grades of osteophytes or joint hypertrophy. It should be noted that Schacky et al.'s study was based on hip radiographs, whereas the axial lumbar CT used in this study is more sensitive to bony degeneration of FJs, which may contribute to the superior performance of our model. This study also found that in both internal and external test sets, the evaluation performance for extreme degeneration grades (“Absent,” “Present,” “Severe”) was higher than that for intermediate grades (Mild, Moderate). This may be attributed to the lack of clear quantitative indicators between different degrees, especially for osteophytes, hypertrophy, and subchondral bone erosion, which results in low intrarater agreement even among experienced clinicians when assessing these features. The Gwet's κ values of the model for osteophytes and hypertrophy in the internal test set were 0.76 and 0.56, respectively, and 0.77 and 0.44 in the external test set, all exceeding the previously reported manual assessment agreement (κ = 0.33–0.36) [4]. This indicates that the model exhibits more robust performance than manual assessment, even for radiological features without quantitative indicators. Furthermore, this study emphasizes the importance of data balance for model performance. There are significant differences in feature prevalence across datasets; for example, the incidence of subchondral cysts was 43.8% in the internal dataset and 6.4% in the external dataset. This discrepancy may explain why the external dataset showed better accuracy (89.8% vs. 82.8%) and Gwet's κ value (0.87 vs. 0.66) for subchondral cyst assessment, which is consistent with previous reports that data imbalance can misjudge model performance [27, 28]. A practical solution to this issue is to increase data volume to balance the distribution.
Finally, clinical utility stands as a core focus in DL model research. This study evaluated the impact of DL model assistance on diagnostic accuracy among junior readers through randomized sampling from internal and external datasets. Results demonstrated that DL assistance significantly improved interpretation accuracy for both radiologists (Reader 1: p = 0.043 and < 0.001; Reader 2: p = 0.029 and 0.001). However, notably for specific features such as osteophytes and hypertrophy, readers' accuracy under DL assistance did not consistently surpass the standalone performance of the DL model. This observation aligns with prior research [22], suggesting readers may occasionally reject correct detections provided by DL models, reverting instead to subjective judgment. Such subjective variability may be a key factor contributing to the significant variation (3%–76%) in FJOA prevalence reported across the literature [4]. To mitigate this heterogeneity, implementing standardized image interpretation training protocols for junior readers and researchers is imperative. The DL model developed in this study has demonstrated clinical utility and may serve as an effective educational aid, providing objective quantitative support for standardized training initiatives.
Indeed, the present study has certain limitations that should be acknowledged. First, the accuracy for the “mild” and “moderate” classifications of osteophytes and hypertrophy was low. This could be due to a potential data class imbalance and the inherent subjectivity associated with ground truth labeling. Second, while our study utilized axial CT slices for grading in compliance with established FJOA guidelines, restriction to two‐dimensional imaging represents a recognized limitation, as single‐slice axial views cannot comprehensively capture all degenerative features throughout the three‐dimensional joint architecture. Finally, the manual selection of CT slices in this study may diminish the model's clinical deployment potential. A significant advancement for clinical utility would be achieved if the DL model could automatically identify individual FJs from hundreds of axial lumbar CT images and autonomously evaluate relevant slices. We plan to address this challenge in future research.
In conclusion, our study provides evidence supporting the potential of a multitask DL model for assessing various radiological features of FJOA on lumbar axial CT slices. This model has the potential to be a valuable tool in large‐scale epidemiological studies and may ultimately aid clinical practice.
Conflicts of Interest
The authors declare no conflicts of interest.
Supporting information
Data S1: Supporting Information.
Acknowledgments
We thank Home for Researchers editorial team (www.home‐for‐researchers.com) for language editing service.
Wang Y., Chen Z., Huang J., et al., “Comprehensive Evaluation of Facet Joints Osteoarthritis Radiological Features on Lumbar CT: A Multitask Deep Learning Approach,” JOR Spine 8, no. 3 (2025): e70115, 10.1002/jsp2.70115.
Funding: This work was supported by the “Yun‐Ling Scholars” Program of Yunnan Province (XDYC‐YLXZ‐2022‐0015), the Central Government to Guide Local Scientific and Technological Development Fund (202407AA11001), the Yunnan International Joint Laboratory of Intelligent Orthopaedics (202503AP140037), the R&D Project of Pazhou Lab (Huangpu) (2023K0604), the Yunnan Clinical Medical Center for Spinal Cord Diseases (ZX2022000101), the National Natural Science Foundation of China (82172442), the Joint Funding Scheme 2022 for Scientific Research Projects (FDCT‐GDST Projects) by the Science and Technology Development Fund of Macau and the Department of Science and Technology of Guangdong Province (2022A0505020019 and 0056/2021/AGJ), the Yunnan Key Laboratory of Digital Orthopaedics (202005AG070004), the Yunnan Provincial Department of Science and Technology Social Development Special Project (202403AC100003), and the Guangdong Climbing Plan (pdjh2023b0011).
Contributor Information
Zhihai Su, Email: suzhh23@mail.sysu.edu.cn.
Hai Lu, Email: lvhai@mail.sysu.edu.cn.
Sheng Lu, Email: lusheng@kust.edu.cn.
Data Availability Statement
Data generated or analyzed during the study are available from the corresponding author by request.
References
- 1. Gellhorn A. C., Katz J. N., and Suri P., “Osteoarthritis of the Spine: The Facet Joints,” Nature Reviews Rheumatology 9 (2013): 216–224, 10.1038/nrrheum.2012.199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Perolat R., Kastler A., Nicot B., et al., “Facet Joint Syndrome: From Diagnosis to Interventional Management,” Insights Into Imaging 9, no. 5 (2018): 773–789, 10.1007/s13244-018-0638-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Weishaupt D., Zanetti M., Boos N., and Hodler J., “MR Imaging and CT in Osteoarthritis of the Lumbar Facet Joints,” Skeletal Radiology 28, no. 4 (1999): 215–219, 10.1007/s002560050503. [DOI] [PubMed] [Google Scholar]
- 4. Berg L., Thoresen H., Neckelmann G., Furunes H., Hellum C., and Espeland A., “Facet Arthropathy Evaluation: CT or MRI?,” European Radiology 29, no. 9 (2019): 4990–4998, 10.1007/s00330-019-06047-5. [DOI] [PubMed] [Google Scholar]
- 5. Cao G., Wang H., Xie S., et al., “Assessment of T2‐Weighted MRI‐Derived Synthetic CT for the Detection of Suspected Lumbar Facet Arthritis: A Comparative Analysis With Conventional CT,” European Spine Journal (2025), 10.1007/s00586-025-08958-y. [DOI] [PubMed] [Google Scholar]
- 6. Tiwari P., Kaur H., Jha V., and Bansal N., “Chronology of Disc Degeneration and Facet Joint Arthritis in Lumbar Spine Is Variable—A CT Based Cross‐Sectional Study,” Journal of Orthopaedic Science 27, no. 6 (2022): 1197–1202, 10.1016/j.jos.2021.07.016. [DOI] [PubMed] [Google Scholar]
- 7. Shi H., Li S., Liu S., et al., “Facet Joint Tropism, Pelvic Incidence and Intervertebral Height Index: Associations With Facet Joint Osteoarthritis in Lumbar Spinal Stenosis,” Spine Journal 24, no. 2 (2024): 317–324, 10.1016/j.spinee.2023.10.001. [DOI] [PubMed] [Google Scholar]
- 8. Borenstein D., “Does Osteoarthritis of the Lumbar Spine Cause Chronic Low Back Pain?,” Current Rheumatology Reports 6, no. 1 (2004): 14–19, 10.1007/s11926-004-0079-z. [DOI] [PubMed] [Google Scholar]
- 9. Alonso F., Kirkpatrick C. M., Jeong W., et al., “Lumbar Facet Tropism: A Comprehensive Review,” World Neurosurgery 102 (2017): 91–96, 10.1016/j.wneu.2017.02.114. [DOI] [PubMed] [Google Scholar]
- 10. Suri P., Rainville J., Kalichman L., and Katz J. N., “Does This Older Adult With Lower Extremity Pain Have the Clinical Syndrome of Lumbar Spinal Stenosis?,” Journal of the American Medical Association 304, no. 23 (2010): 2628–2636, 10.1001/jama.2010.1833. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Izzo R., Guarnieri G., Guglielmi G., and Muto M., “Biomechanics of the Spine. Part II: Spinal Instability,” European Journal of Radiology 82, no. 1 (2013): 127–138, 10.1016/j.ejrad.2012.07.023. [DOI] [PubMed] [Google Scholar]
- 12. Qadri S. F., Lin H. X., Shen L. L., et al., “CT‐Based Automatic Spine Segmentation Using Patch‐Based Deep Learning,” International Journal of Intelligent Systems (2023): 2345835, 10.1155/2023/2345835. [DOI] [Google Scholar]
- 13. Bharadwaj U. U., Christine M., Li S., et al., “Deep Learning for Automated, Interpretable Classification of Lumbar Spinal Stenosis and Facet Arthropathy From Axial MRI,” European Radiology 33, no. 5 (2023): 3435–3443, 10.1007/s00330-023-09483-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Nikpasand M., Middendorf J. M., Ella V. A., et al., “Automated Magnetic Resonance Imaging‐Based Grading of the Lumbar Intervertebral Disc and Facet Joints,” JOR Spine 7, no. 3 (2024): e1353, 10.1002/jsp2.1353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Kowlagi N., Kemppainen A., Panfilov E., et al., “Semiautomatic Assessment of Facet Tropism From Lumbar Spine MRI Using Deep Learning: A Northern Finland Birth Cohort Study,” Spine (Phila Pa 1976) 49, no. 9 (2024): 630–639, 10.1097/BRS.0000000000004909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Wongpakaran N., Wongpakaran T., Wedding D., and Gwet K. L., “A Comparison of Cohen's Kappa and Gwet's AC1 When Calculating Inter‐Rater Reliability Coefficients: A Study Conducted With Personality Disorder Samples,” BMC Medical Research Methodology 13 (2013): 61, 10.1186/1471-2288-13-61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Cohen S. P., Huang J. H., and Brummett C., “Facet Joint Pain—Advances in Patient Selection and Treatment,” Nature Reviews Rheumatology 9, no. 2 (2013): 101–116, 10.1038/nrrheum.2012.198. [DOI] [PubMed] [Google Scholar]
- 18. Pathria M., Sartoris D. J., and Resnick D., “Osteoarthritis of the Facet Joints: Accuracy of Oblique Radiographic Assessment,” Radiology 164, no. 1 (1987): 227–230, 10.1148/radiology.164.1.3588910. [DOI] [PubMed] [Google Scholar]
- 19. Kalichman L., Li L., Kim D. H., et al., “Facet Joint Osteoarthritis and Low Back Pain in the Community‐Based Population,” Spine (Phila Pa 1976) 33, no. 23 (2008): 2560–2565, 10.1097/BRS.0b013e318184ef95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Jensen M. C., Brant‐Zawadzki M. N., Obuchowski N., Modic M. T., Malkasian D., and Ross J. S., “Magnetic Resonance Imaging of the Lumbar Spine in People Without Back Pain,” New England Journal of Medicine 331, no. 2 (1994): 69–73, 10.1056/NEJM199407143310201. [DOI] [PubMed] [Google Scholar]
- 21. Russo V. M., Dhawan R. T., Baudracco I., Dharmarajah N., Lazzarino A. I., and Casey A. T., “Hybrid Bone SPECT/CT Imaging in Evaluation of Chronic Low Back Pain: Correlation With Facet Joint Arthropathy,” World Neurosurgery 107 (2017): 732–738, 10.1016/j.wneu.2017.08.092. [DOI] [PubMed] [Google Scholar]
- 22. Bennani S., Regnard N. E., Ventre J., et al., “Using AI to Improve Radiologist Performance in Detection of Abnormalities on Chest Radiographs,” Radiology 309 (2023): e230860, 10.1148/radiol.230860. [DOI] [PubMed] [Google Scholar]
- 23. Foreman S. C., Schinz D., El H. M., et al., “Deep Learning to Differentiate Benign and Malignant Vertebral Fractures at Multidetector CT,” Radiology 310 (2024): e231429, 10.1148/radiol.231429. [DOI] [PubMed] [Google Scholar]
- 24. Kellgren J. H. and Lawrence J. S., “Radiological Assessment of Osteo‐Arthrosis,” Annals of the Rheumatic Diseases 16, no. 4 (1957): 494–502, 10.1136/ard.16.4.494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. von Schacky C. E., Sohn J. H., Liu F., et al., “Development and Validation of a Multitask Deep Learning Model for Severity Grading of Hip Osteoarthritis Features on Radiographs,” Radiology 295 (2020): 136–145, 10.1148/radiol.2020190925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Kettler A. and Wilke H. J., “Review of Existing Grading Systems for Cervical or Lumbar Disc and Facet Joint Degeneration,” European Spine Journal 15 (2006): 705–718, 10.1007/s00586-005-0954-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Hirose T. A., Arimura H., Ninomiya K., Yoshitake T., Fukunaga J. I., and Shioyama Y., “Radiomic Prediction of Radiation Pneumonitis on Pretreatment Planning Computed Tomography Images Prior to Lung Cancer Stereotactic Body Radiation Therapy,” Scientific Reports 10 (2020): 20424, 10.1038/s41598-020-77552-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Ait Nasser A. and Akhloufi M. A., “A Review of Recent Advances in Deep Learning Models for Chest Disease Detection Using Radiography,” Diagnostics (Basel, Switzerland) 13, no. 1 (2023): 159, 10.3390/diagnostics13010159. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1: Supporting Information.
Data Availability Statement
Data generated or analyzed during the study are available from the corresponding author by request.
