Skip to main content
Quantitative Imaging in Medicine and Surgery logoLink to Quantitative Imaging in Medicine and Surgery
. 2025 Sep 18;15(10):9492–9514. doi: 10.21037/qims-24-1976

Lumbar region perception segmentation: a novel algorithm for automated vertebral motion parameter measurement

Zhiyi Zhou 1,#, Qian Zhang 1,#, Ruizhang Zhu 2, Shenjun Wang 1, Xiang Pan 2, Haoxia Yang 1, Zhengfeng Lu 1, Ying Yang 3,, Suming Wei 1,
PMCID: PMC12514737  PMID: 41081138

Abstract

Background

Lower back pain frequently results from irregular lumbar spine movement. Quantifying vertebral motion is crucial for diagnosing lumbar spine disorders, but the accuracy and efficiency of traditional methods are limited. To address this limitation and automate the measurement of vertebral motion parameters for clinical diagnosis, we developed the vertebra segmentation network (VerSeg-Net), a novel deep learning-based approach for segmenting lumbar vertebrae and measuring motion parameters from dynamic X-ray images.

Methods

The VerSeg-Net integrates: the region-aware (RA) module, which partitions features into non-overlapping blocks and applies dynamic sparse attention to filter irrelevant regions; and adaptive receptive field feature fusion (AFF), which fuses multi-scale contextual features via deformable convolutions. The model was trained on 50 patients (using 2,000 dynamic X-ray sequences; 512×512 pixels) with lumbar disorders, using a Philips UNIQ FD20 C-arm. Its performance was benchmarked against the U-Net, ResUnet, DeeplabV3+, and PFNet model. Statistical significance was assessed via paired t-tests (α=0.05).

Results

In terms of its segmentation accuracy, VerSeg-Net had a mean dice similarity coefficient (DSC) of 96.2% (vs. 92.77% for DeeplabV3+; P<0.001). Additionally, it had a mean intersection over union (MIoU) of 88.84%. In terms of its motion parameter errors, VerSeg-Net had a displacement [anterior displacement of superior vertebra (AZ)/posterior displacement of superior vertebra (BZ)] of 1.12±0.73 mm/1.22±0.70 mm [coefficient of variation (CV) =0.09–0.92], a rotation [vertebral rotation angle (RX)] of 1.21±0.46° (CV =0.05–0.85), and an intervertebral height [anterior disc height (Ha)/middle disc height (Hb)/posterior disc height (Hc)] of 1.52±0.36 mm/1.49±0.06 mm/1.70±0.05 mm. In terms of its efficiency, VerSeg-Net had a processing speed of 4.2 ms/frame (vs. 18 ms/frame for U-Net++).

Conclusions

VerSeg-Net is a reliable and accurate method for analyzing lumbar spine motion, and thus could significantly aid in clinical diagnosis and treatment planning.

Keywords: Lumbar spine, vertebral motion, deep learning, image segmentation, dynamic X-ray

Introduction

Background

Lower back pain is a common chronic condition among middle-aged and elderly individuals, particularly those over 60 years old. Degenerative lumbar spine instability is a significant factor contributing to symptoms such as lower back pain and sciatica (1,2). Lumbar spine instability has been associated with spondylolisthesis (2-4). Clinical studies (1,3,5,6) have found that the anatomical parameters of the lumbar spine in patients with lumbar spine instability, including vertebral body displacement, and facet joint angle, differ significantly from those in healthy individuals. However, the clinical manifestation of lumbar spine movement is often complex (5), and most previous research has focused on the triggers of lumbar instability and the clinical symptoms of lumbar spine diseases. To date, no authoritative diagnostic standard for degenerative lumbar instability has been established (6).

Lumbar instability is a common spinal disorder with a complex pathophysiological mechanism involving multiple factors. Recent studies (1,7) have shown that it is closely related to various factors such as intervertebral disc degeneration, ligamentous laxity, and facet joint pathology. Lumbar instability can lead to abnormal relative motion between vertebrae, which in turn triggers a series of clinical symptoms, including low back pain and radicular pain in the lower limbs, significantly affecting the quality of life of patients. Therefore, accurate assessment of lumbar instability is of great significance for disease diagnosis, treatment planning, and prognosis evaluation.

Functional impairment and instability of the lumbar spine are often reflected in its movement. Patients with lumbar spine diseases exhibit increased segmental lumbar motion, and measuring intervertebral kinematic parameters helps in the study of lumbar spine activity patterns. Specifically, this involves analyzing the horizontal displacement, rotation angle, and disc height changes in the sagittal plane to determine spatial changes in the lumbar spine. This method provides a comprehensive assessment of the lumbar spine’s movement status, which is essential for diagnosing instability. Therefore, measuring lumbar spine movement parameters.

The main objective of our research was to comprehensively evaluate the movement status of patients with degenerative spondylolisthesis and to establish and refine a dynamic quantitative analysis model for the lumbar spine. Specifically, this study sought to analyze the horizontal displacement, rotation angle, and disc height changes between vertebral bodies in the sagittal plane to determine spatial changes in the lumbar spine. This method provides a comprehensive assessment of the lumbar spine’s movement status. The measurement of these critical kinematic parameters underscore the importance of measuring lumbar spine movement parameters. Thus, the main objective of our research was to comprehensively evaluate the movement status of patients with degenerative spondylolisthesis, and to establish and refine a dynamic quantitative analysis model for the lumbar spine.

In recent years, with the rapid development of artificial intelligence, especially the widespread application of deep learning in medical image analysis, new ideas and methods for automated lumbar instability assessment have emerged. Deep learning algorithms can automatically learn and recognize features in medical images, thereby enabling the rapid and accurate measurement of parameters related to lumbar instability. For example, Zhang et al. (7) proposed an automated method for vertebral landmark detection and spinal curvature estimation using nondirectional part affinity fields. Their method employs deep learning algorithms to identify vertebral landmarks and calculate spinal curvature parameters, offering a new tool for spinal disease diagnosis. Gertzbein et al. (8) used artificial neural networks to recognize vertebral landmarks in the lumbar spine, providing an automated solution for measuring lumbar anatomical parameters. Yeh et al. (9) adopted a deep learning approach to achieve automatic landmark detection and alignment analysis in whole-spine lateral radiographs, further advancing the automation of spinal image analysis. Zhou et al. (10) introduced an artificial intelligence-based X-ray measurement technology for assessing anatomical parameters related to lumbosacral stability, demonstrating the potential of deep learning in lumbar disease diagnosis. Cina et al. (11) developed a two-step deep learning model for precise landmark localization in spinal radiographs, enhancing the accuracy and efficiency of landmark detection. Additionally, Hipp et al. (12) established reference data for diagnosing spondylolisthesis and disc space narrowing using second National Health and Nutrition Examination Survey X-rays, providing standardized references for clinical diagnosis. Al Arif et al. (13) improved active shape models with random classification forests for cervical vertebra segmentation, offering a new technical approach for spinal image analysis. These studies have shown that deep learning technology has broad application prospects in spinal image analysis, and can significantly improve the accuracy and efficiency of measurements.

Despite significant progress in automated vertebral landmark detection, existing methods still have limitations. For example, some methods lack accuracy and robustness when dealing with complex spinal images, especially in cases of vertebral deformities or poor image quality. Moreover, most existing methods focus solely on landmark detection without comprehensive analysis of vertebral motion parameters. Therefore, an automated method that can accurately detect vertebral landmarks and simultaneously analyze vertebral motion parameters urgently needs to be developed.

Our study

Previous studies have employed methods for analyzing lumbar vertebra; however, most of these methods rely on traditional texture feature extraction and corner detection, which have drawbacks such as poor detection performance and long detection times. We designed an automated measurement method based on lumbar vertebra motion sequences to address these issues. Since direct measurement in vivo is impractical, we use medical imaging techniques for measurement. Traditional X-ray transmission techniques have drawbacks, such as high radiation doses and challenges in continuous image acquisition (14,15). Dynamic X-ray technology successfully overcomes the shortcomings of traditional methods. It not only reduces radiation dose and minimizes potential harm to the human body, it also allows for continuous sampling of the moving vertebra, thereby obtaining continuous and clear lumbar vertebra motion image sequences (3,8,9).

These motion image sequences are primarily studied to track vertebra positions, compute relevant dynamic parameters, and output visual results. Lumbar vertebra movement is a complex three-dimensional (3D) structure, including sagittal plane flexion and extension, coronal plane lateral bending, and transverse plane rotation. Currently, in clinical practice, abnormal lumbar movements during flexion and extension in the sagittal plane are commonly used as standards to determine the presence of lumbar instability (11-13). Therefore, our experiment mainly focused on the sagittal plane and emphasized the analysis of its motion parameters.

The lumbar vertebra motion parameters mainly include the displacement of the anterior displacement of superior vertebra (AZ), the posterior displacement of superior vertebra (BZ), the vertebral rotation angle (RX), and the anterior, middle, and posterior edge heights of the vertebra (Ha, Hb, and Hc, respectively) as shown in Figure 1. The left panel in Figure 1 illustrates the calculation of RX, where RX is the angle between the vertebra’s positions before and after movement; the middle panel in Figure 1 shows the calculation of AZ and BZ, with coordinate systems established for the upper left and lower left vertices of the vertebra to represent their horizontal displacement distances; the right panel in Figure 1 depicts the height of the disc between the vertebra’s anterior and posterior edges, with Ha, Hb, and Hc representing the heights at the front, middle, and back edges, respectively.

Figure 1.

Figure 1

Schematic diagram of lumbar vertebra motion parameters. AZ, anterior displacement of superior vertebra; BZ, posterior displacement of superior vertebra; Ha, anterior disc height; Hb, middle disc height; Hc, posterior disc height; RX, vertebral rotation angle.

In recent years, deep learning has been widely applied across various fields, including image recognition and natural language processing (16-19). Its model architecture can automatically extract key features from data without manual design. With large amounts of data, continuous training can optimize the model to enhance its generalization ability, allowing it to handle more complex test sets (20-22). Therefore, we proposed a method using deep learning to measure lumbar vertebra motion parameters. Specifically, we developed the lumbar vertebra segmentation network (VerSeg-Net), which includes a region-related perception module and an adaptive receptive field feature fusion (AFF) module. This network was then trained on dynamic X-ray lumbar vertebra images, and model prediction results were obtained using the test set. Since the vertebral body is somewhat rectangular, the segmentation results were processed to extract the minimum enclosing rectangle to obtain the four vertices of the vertebra, and the motion parameters of these vertices were calculated. The process is illustrated in Figure 2.

Figure 2.

Figure 2

Flowchart of our proposed method. AFF, adaptive receptive field feature fusion; AZ, anterior displacement of superior vertebra; BZ, posterior displacement of superior vertebra; GT, ground truth; Ha, anterior disc height; Hb, middle disc height; Hc, posterior disc height; RA, region-aware; RX, vertebral rotation angle.

Highlights

This study highlights are as follows:

  1. We proposed a new method for measuring lumbar vertebra motion parameters based on deep learning. Specifically, we designed VerSeg-Net, which focuses on instance segmentation of dynamic X-ray lumbar vertebra images. Using the prior feature of the vertebral body resembling a rectangle, vertex detection is performed on the segmented vertebrae in each frame, and motion parameters are calculated based on these vertex coordinates. The final output is the visualization of the motion results.

  2. We combined region perception algorithms with the downsampling process of deep neural networks by using a region-aware (RA) (23) module, which partitions features into regions of the same size. An attention-based screening algorithm enables each region to dynamically focus on areas with higher matching, avoiding interference from soft tissue regions in lumbar vertebra images.

  3. Based on SK-Net, we improved traditional feature fusion methods by integrating an AFF (24) module, which performs branch partitioning after feature fusion. Each branch undergoes convolution with different sizes, followed by the addition of fully connected layers. Each branch is selected and then fused, with channel and spatial attention added to enhance the model’s generalization ability.

  4. We found that VerSeg-Net for lumbar vertebra images based on dynamic X-rays performed well compared to other algorithms. Ablation experiments on the added modules demonstrated that each module improved segmentation accuracy.

  5. We found that compared to expert annotations, our lumbar vertebra measurement method achieved accurate prediction results, with the mean absolute error for lumbar vertebra level displacement being within 1.25 pixels (approximately 0.93 mm), vertebral motion angle within 1.25 degrees, and anterior-posterior edge height within 2 pixels (approximately 1.48 mm) when compared to the expert measurements. Thus, our method could aid in clinical diagnosis and lays a foundation for future lumbar vertebra parameter analysis.

We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1976/rc).

Methods

Model

VerSeg-Net

We proposed the lumbar VerSeg-Net. In the downsampling stage, this network uses a ResUnet-based backbone network to extract features from input lumbar vertebra images. Each layer includes operations such as 3×3 convolution, normalization, activation functions, and pooling. Our pooling method is maximum pooling, with a pooling region of 2×2. The input to the model is a single lumbar vertebra image IR1HW. After each downsampling stage, the dimensions of the features are reduced by a factor of 2. After each downsampling stage, the features enter a region perception module where the regions of interest are paired (i.e., the regions of interest are grouped or matched based on specific algorithms or criteria within the region perception module). This process generally involves associating regions with similar characteristics or positions to facilitate further analysis and feature extraction related to the lumbar vertebra segmentation. For example, regions with similar intensity values or spatial proximity might be paired together to enhance the accuracy of identifying and distinguishing the lumbar vertebra regions from the surrounding tissues and background.

After four downsampling stages, the network moves to the decoding phase. To enhance feature fusion, skip connections are used to concatenate tensors, followed by feature aggregation using the AFF module. After four decoding operations, bilinear interpolation is used to upsample and obtain the segmentation result, PreR1HW. The network model diagram is shown in Figure 3. The algorithm flowchart for VerSeg-Net is illustrated in Table 1.

Figure 3.

Figure 3

VerSeg-Net. AFF, adaptive receptive field feature fusion; RA, region-aware.

Table 1. Algorithm flowchart of VerSeg-Net.
Algorithm Algorithm flowchart of VerSeg-Net
Input Current image features I1HW, model of F, RA, and AFF
Param Training times: T
Output Processed features: Pre1HW
Flowchart
   1 For t = 1: T do:
   2 Down sampling I=F(I)
   3 Extract regional correlation features I=RA(I)
   4    Repeat step 2 until downsampling to 16×16
   5 Up sampling I^=upsample(I)
   6 Perform feature aggregation I^=AFF(I^,I)
   7    Compute the loss between the features and the segmentation labels
   8 End

AFF, adaptive receptive field feature fusion; RA, region-aware.

Cross-entropy loss is used to update the model parameters, as defined in Eq. [1], which is expressed as follows:

Lseg=1mi=1mj=1nyj(i)log(pj(i)) [1]

where m represents the number of samples, n denotes the number of classes, y represents the true labels, p represents the model’s predicted probabilities, and j represents the traversal from the first category to the nth category.

Region correlation perception module

The presence of black shadows in dynamic lumbar X-rays and edge regions filled with 0 pixels can interfere with the segmentation network. Therefore, based on Biformer (25-27), we introduced a dynamic query-aware sparse attention mechanism to filter out most irrelevant key-value pairs during region partitioning, retaining only a small subset of the most relevant areas to achieve content-aware and more flexible feature extraction. Additionally, fine-grained tokens are applied to these areas and their attention is focused on other regions (28,29).

The specific method is as follows: the two-dimensional (2D) input feature map, XRCHW, is first divided into SS non-overlapping regions, each containing HW/S2 feature vectors; The feature X is reshaped to S2(HW/S2)C; Next, linear operations are used to derive Q, K, and V; The linear operations are defined in Eqs. [2,3], which are expressed as follows:

Q,K,VRS2HWS2C [2]
Q=XrWq,K=XrWk,V=XrWv [3]

where Wq, Wk, and Wv represent the weight matrices for queries, keys, and values, respectively, with the shape RC*C.

The algorithm then identifies the attention regions corresponding to each area. First, the average Qr and Kr for each region are computed, where Qr,KrRS2C. A relationship matrix ArRS2S2, is obtained by performing matrix multiplication between Qr and the transpose of Kr. Ar represents the degree of association between different regions. To enhance effective inter-region associations, the core step is to prune Ar by retaining only the top (k) connections for each region. A row-wise top-k operator is used to derive the index matrix, Ir, where IrNS2k. Thus, each row of i of Ir corresponds to the (i)-th region and contains (k) indices of the most relevant regions. The algorithm is defined in Eqs. [4,5], which are expressed as follows:

Ar=Qr(Kr)T [4]
Ir=topIndex(Ar) [5]

For the index matrix Ir, its associative features can be used with finer granularity. For each region i it will focus on the key-value pairs of the k associated regions indexed by I(i,1)r,I(i,2)r,I(i,3)r,,I(i,k)r. The K and V, tensors are computed based on these indices. Here, Kg and Vg have the shape S2(kHW/S2)C, representing the aggregated (K) and (V) tensors. The attention mechanism is used to fuse the key-value pairs. The algorithm is defined as in Eqs. [6-8]. The principle of the RA module is shown in Figure 4, and its algorithm flowchart is presented in Table 2. Eqs. [6-8] are expressed as follows:

Figure 4.

Figure 4

Region-aware module.

Table 2. Flowchart of the region-aware algorithm.
Algorithm Region-aware algorithm
Input Current image features:  XRCHW
Param The square root of the number of regions, S, and the number of regions, K
Output Processed features X^RCHW
Flowchart
   1 Partition the features into regions with size S2(HW)/S2C
   2 Use linear projections to compute (Q), (K), and (V), respectively
   3 Normalize the feature dimensions to obtain Qr, Kr
   4 Multiply the transpose of Qr and Kr to obtain the relationship matrix, Ar
   5 Select the top (k) features from Ar to obtain the index matrix, Ir
   6 Perform aggregation to obtain the key-value pairs Kg and Vg
   7 Perform attention computation for Q, Kg and Vg
   8 Output features
Kg=gather(K,Ir) [6]
Vg=gather(V,Ir) [7]
O=Attention(Q,Kg,Vg)+LCE(V) [8]

AFF module

In the decoding process of the lumbar spine segmentation network, the presence of skip connections makes feature fusion a crucial factor to consider. Due to different convolution parameters, the fused features cannot fit a fixed convolutional region. To enable the network to adaptively adjust its receptive field size, we used a SKNet-based network structure and improved it by adding an attention localization module (30). This primarily involves three processes: splitting, fusion, and selection. The adaptive receptive field model is shown in Figure 5.

Figure 5.

Figure 5

Adaptive feature aggregation module. 2D, two dimensional.

In the splitting process, for a given feature module XRCHW, to enhance its receptive field, we use convolution operations with kernels of size 3 and 5, respectively, splitting it into two branches F˜:XU˜RCHW and F^:XU^RCHW, respectively. F˜ and F^ consist of depthwise convolution, normalization, and activation functions. To allow neurons to adaptively adjust the kernel size, we first fuse the results from the two branches through element-wise summation, as defined in Eq. [9], which is expressed as follows:

U=U˜+U^ [9]

Next, global average pooling is used to generate the feature sRC. Average pooling refers to the averaging of all elements over the spatial dimensions, H*W, as defined in Eq. [10], which is expressed as follows:

sc=Fgp(Uc)=1H×Wi=1Hj=1WUc(i,j) [10]

To achieve precise adaptive selection guidance, a fully connected layer is used to generate zRd1, where δ represents the activation function, B represents normalization, and W represents model weights, as defined in Eq. [11], which is expressed as follows:

z=Ffc(s)=δ(B(Ws)) [11]

During the selection process, we apply a “softmax” operation to z after the fully connected layer. This helps U˜ and U^ adaptively choose different spatial scales of information, as defined in Eq. [12], which is expressed as follows:

ac=eAczeAcz+eBcz,  bc=eBczeAcz+eBcz [12]

where A,BRCd, while a and b represent the soft attention for U˜ and U^, respectively. Specifically, AcR1d represents the C-th row of A, and ac denotes the c-th element of a. Similarly, BcR1d represents the C-th row of B, and bc denotes the c-th element of b. The final feature map is obtained from Eqs. [13,14], which are expressed as follows:

Vc=acU˜c+bcU˜c,ac+bc=1 [13]
V=[V1,V2,,VC],VCH×W [14]

Dataset

Dataset introduction

The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of Wuxi Ninth People’s Hospital (No. KS2023006). All participants provided informed consent prior to their inclusion in the study.

Our experimental data were obtained from Department of Radiology, Wuxi Ninth People’s Hospital Affiliated to Soochow University using a Philips UNIQ FD20 C-arm X-ray system (Philips Healthcare, Best, the Netherlands), with a pixel size of approximately 0.74 millimeters. The data of 50 patients with lumbar spine disorders were collected. The data collection method was as follows: before testing, specialized equipment was used to secure each patient’s knees and pelvis; during the test, the patient maintained a standing position, and the X-ray emitter was calibrated to ensure that the L3 vertebra was centered in the image; the patient then performed flexion and extension movements at a constant speed. The specific movement protocol was as follows: starting from a neutral standing position, the maximum flexion angle was 60 degrees, and the maximum extension angle was 20 degrees. The patient was required to flex at a constant speed for 10 seconds, return to the neutral position at the same speed, extend for 5 seconds, and return to the neutral position again. This entire process constituted one complete movement cycle. To ensure that the L3 vertebra remained centered on the display, the C-arm was fixed in a pre-calibrated position. The patient was instructed to perform flexion and extension movements within a controlled range (with a maximum flexion angle of 60 degrees, and a maximum extension angle of 20 degrees), while maintaining consistent posture, ensuring L3 remained centered in the imaging field throughout the process. The C-arm remained stationary, with its position pre-adjusted to align with the L3 vertebra during the neutral standing position.

The mean radiation dose was 1.5±0.3 mGy, as reported by the imaging system. The window width was set to 1000 HU and the window level to 250 HU. During the movement, digital frame capture was performed at a rate of 15–30 frames per second, resulting in dynamic medical images of the lumbar spine with a resolution of 512×512 pixels. Figure 6 shows the imaging process and the sequence of lumbar spine movement images.

Figure 6.

Figure 6

Data collection process: the image in the upper panel represents the patient’s imaging process, while the image in the lower panel shows the sampling results. These images are published with the patient or participant’s consent.

The study included 50 patients (12 males, 38 females) with a mean age of 62.92±7.28 years (range, 45–73 years). The cohort comprised middle-aged to elderly individuals and no young adults (≤44 years), and had a female predominance (76%).

Data preprocessing

After data collection, we obtained digital imaging and communications in medicine (31) images, which were converted to portable network graphic format for easier data filtering and annotation. The following method was used for data filtering: During imaging, images that appeared too dark or in which the L3 vertebra could not be detected properly, were discarded. Additionally, due to the slow movement of the patients and the high sampling rate, there was almost no motion between adjacent lumbar vertebrae; thus, images were selected every eight frames. After filtering, the patient data were represented as D×512×512, where D was the number of image slices and 512×512 was the pixel size of the images. To ensure consistency across all patients, each patient had 40 motion images, with some images from before and after the movement being discarded, resulting in a final dataset of 40×512×512. In the annotation phase, Labelme software (version 5.3.0, developed by B.C. Russell at the MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA; available at: https://github.com/wkentaro/labelme) was used to annotate all images with lumbar vertebrae, including L1, L2, L3, L4, L5, and S1. Additionally, 10 patients were selected by doctors for lumbar vertebra apex annotation and the manual calculation of parameters.

To ensure the reliability of these measurements, which serve as the comparison standard, a thorough reproducibility analysis was conducted. Two independent experts performed the annotations on the same set of images, allowing us to calculate both intra-observer and inter-observer variability. These results are presented in Table 3 and discussed further below.

Table 3. Example of manually calculated motion parameters by the physician (every eight frames represents one statistical unit).
Frames Parameters
AZ (pixel) BZ (pixel) RX (°) Ha (pixel) Hb (pixel) Hc (pixel)
1–8 4 5 1.29 24.08 17.56 11.05
9–16 1 1 1.08 21.02 17.09 115
17–24 8 2 2.34 26.31 20.43 14.56
297–304 2 3 2.3 25.4 17.29 11.05
305–312 6 3 1.33 20.9 17.87 12.65
313–320 2 3 0.49 22.56 17.80 10.4

AZ, anterior displacement of superior vertebra; BZ, posterior displacement of superior vertebra; Ha, anterior disc height; Hb, middle disc height; Hc, posterior disc height; RX, vertebral rotation angle.

The mean error values reported in Table 3 reflect the average deviation from the gold standard measurements. The variance among different observers was captured through the calculation of intra-observer and inter-observer coefficient of variation (CV) values, which are also provided in Table 3. A lower CV indicated higher agreement between observers or within an observer’s repeated measures, suggesting greater reliability of the measurement process.

Figure 7 illustrates the annotated images used in the reproducibility study, while Table 3 sets out the manually calculated motion parameters, alongside their corresponding reproducibility metrics. The observed consistency across observers supports the robustness of our manual measurement protocol.

Figure 7.

Figure 7

Data annotation process: the left panel shows the original images, and the right panel shows the annotation results.

This approach aligns with established methods in the field, such as those described previously (29-33), which similarly emphasized the importance of assessing the reproducibility of manual measurements when they serve as the benchmark for automated systems.

A protocol was prepared before the study without registration.

Statistical analysis

The quantitative data are expressed as the mean ± standard deviation (SD). Model segmentation performance was evaluated using the dice similarity coefficient (DSC), precision, recall, and mean intersection over union (MIoU). Motion parameter accuracy was quantified through absolute error measurements between automated and manual methods, supplemented by CV calculations (SD/mean). Inter-observer reliability for landmark annotations was assessed using the intraclass correlation coefficient (ICC) under a two-way random-effects model [ICC(2, k)]. All the analyses were conducted in Python 3.8 (Python Software Foundation, https://www.python.org/) using statistical modules from SciPy 1.10 (SciPy Community, https://scipy.org/).

Results

Segmentation performance

All 50 samples had segmentation labels, among which, 10 had motion parameter labels. Thus, in our experiment, the data of the 10 patients with motion parameters were used as the test samples, while the data of the remaining 40 patients were used as the training samples. As shown in Figure 2, we first trained the model on the data of these 40 patients, using the experimental equipment parameters listed in Table 4.

Table 4. Experimental equipment parameters.

Device Parameters
CPU Intel i5-12600KF
Memory 32G
GPU NVIDIA 4060Ti
GPU-Memory 12G
Tool PyTorch
OS Windows 10

CPU, Central Processing Unit; GPU, Graphics Processing Unit; OS, Operating System.

Our network did not use any pre-trained models; rather, its weights were initialized randomly before training. The experimental parameters were as follows—batch size: 8, training epochs: 150, initial learning rate: 0.001, and neural network optimizer: Adam (32). The network converged after 50 epochs, and the training time was approximately 4 hours. Cross-entropy loss (33) was used for backpropagation in this experiment. Since the lumbar region does not include the sacral vertebra, S, and adding this label would have affected the model’s segmentation accuracy and consequently its motion parameter calculations, the sacral vertebra, S, label was discarded during the training phase and treated as background. None of the 50 cases included transitional vertebrae, ensuring that the analyzed lumbar vertebrae (L1–L5) were anatomically typical. During the testing phase, DSC, recall, accuracy, and MIoU were used as the evaluation metrics. Additionally, to validate the model’s accuracy, a comparison experiment was conducted with a basic network model. To assess the module’s effectiveness, an ablation study was performed, followed by both quantitative and qualitative analyses.

Comparative analysis

To demonstrate the high accuracy of VerSeg-Net, we conducted comparative experiments with several segmentation networks, including the classic semantic segmentation networks U-Net (21), U-Net++ (33), ResUnet (34), and DeeplabV3+ (35). U-Net is a classic network for lumbar spine image segmentation, while U-Net++ and ResUnet are variants of U-Net. DeeplabV3+ is a classic segmentation network for extracting contextual information, employing dilated convolutions at multiple scales. Given that the color between the lumbar spine and surrounding soft tissues is similar, and its edge structure is somewhat camouflaged, we also included the camouflaged segmentation network PFNet (27) in the comparison experiments. PFNet is trained using a localization-focused approach, initially predicting a rough model of the vertebrae, and then refining the segmentation by discovering and removing distracting regions based on this rough prediction to enhance performance.

The results of our comparative experiments are shown in Table 5. Notably, the proposed VerSeg-Net achieved the best performance in segmenting each vertebra, with the highest metric for L3 exceeding 95%. The average DSC metric reached 96.2%. Conversely, U-Net’s segmentation performance was less satisfactory, with a metric of only 81.53%. Although DeeplabV3+ and PFNet each achieved a performance of 92.77%, this figure was 1.15% lower than that of our VerSeg-Net. This indicates that with sufficient medical imaging data, appropriately enhancing the network structure can improve performance.

Table 5. Quantitative analysis of comparative experiments based on DSC (%) evaluation metrics.

Methods L1 L2 L3 L4 L5 Mean
U-Net 87.90 87.33 823 70.78 78.42 81.53
U-Net++ 88.47 85.55 86.88 85.41 85.31 86.32
ResUnet 89.06 90.18 91.25 92.64 91.49 90.92
DeeplabV3+ 90.78 93.54 94.05 93.59 91.89 92.77
PFNet 90.92 93.31 94.51 93.51 91.59 92.77
VerSeg-Net 92.07 94.18 95.25 94.64 93.50 93.92

The measurement results of the motion parameters are expressed in pixels, with 1 pixel approximately equal to 0.74 mm. DSC, dice similarity coefficient.

The qualitative analysis results of the comparative experiments on the X-ray dataset are shown in Figure 8. Notably, U-Net showed the poorest segmentation performance, including not only imprecise vertebrae edge segmentation but label prediction misalignment. This issue arises because U-Net has poor perception of the lumbar region. Compared to other methods, our results are closer to the labels, while other networks, such as PFNet and DeeplabV3+, often misinterpreted discrete points as labels due to their lack of region segmentation concept. Conversely, our proposed VerSeg-Net rarely exhibited this issue. Thus, the segmentation results from VerSeg-Net can be effectively used for apex detection.

Figure 8.

Figure 8

Qualitative analysis results of comparative experiments on the quantitative X-ray dataset.

Ablation study

To verify the effectiveness of the proposed RA module and feature aggregation module, we conducted a quantitative analysis of our ablation experiments in this dataset (Table 6). According to our network structure, four sets of ablation experiments were designed. ResUnet was taken as the baseline network (baseline), to which the RA module was then added (baseline + RA) on the basis of the ResUnet network, as was the AFF module (baseline + AFF). In addition, a set of experiments was also conducted to examine the attention mechanism, which was added to the AFF module (baseline + AFF + ATT). Finally, the complete network (VerSeg-Net) was examined and compared with the baseline network

Table 6. Quantitative analysis of ablation experiments.

Methods DSC (%) Precisions (%) Recall (%) MIoU (%)
Baseline 90.92 90.23 90.49 86.67
Baseline + RA 91.54 91.71 92.08 87.17
Baseline + AFF 92.77 921 92.73 87.93
Baseline + AFF + ATT 977 966 94.14 88.56
VerSeg-Net 992 94.16 94.00 88.84

AFF, adaptive receptive field feature fusion; ATT, attention; DSC, dice similarity coefficient; MIoU, mean intersection over union; RA, region-aware.

The ablation experiment results are shown in Table 6. Notably, adding the region correlation perception module (baseline + RA) to the baseline network (baseline) improved the segmentation accuracy, such that the DSC and MIoU increased by 0.62% and 0.5%, respectively. Similarly, adding the feature aggregation module (baseline + AFF) also led to increases of 1.85% and 1.26%, respectively. After all the modules had been added to the network, there were increases of 3% and 1.87%, respectively. Thus, the designed modules can help the network adaptively extract regional features and fuse different receptive field features, thereby improving accuracy.

The qualitative analysis results of the ablation experiments are shown in Figure 9. Notably, as the baseline network did not add the region correlation perception module and feature aggregation module, its segmentation result still showed misaligned segmentation, and its boundary clarity was also low. However, as the modules were added to the network, these issues gradually decreased.

Figure 9.

Figure 9

Qualitative analysis results of the ablation experiment. AFF, adaptive receptive field feature fusion; ATT, attention; B, baseline; RA, region-aware.

Motion parameter accuracy

Low back pain mostly occurs in the lumbar spine. It is often caused by the offset of the patient’s vertebrae. Abnormal lumbar spine motion parameters are often a precursor to this offset. Therefore, the measurement of lumbar spine motion parameters is particularly important. Although L4–L5 instability demonstrates the highest clinical incidence (2,5,6), L3 was selected as the radiographic center for two reasons:

  1. Central positioning of L3 ensures full-field visualization of adjacent vertebrae (L2–L4) without edge artifacts during dynamic imaging;

  2. Relative motion analysis between L3 and its neighbors (L2/L4) indirectly quantifies L4–L5 stability via derived kinematic parameters.

Thus, while L3 served as a fiducial reference to enhance measurement accuracy, the ultimate aim of the experiment was to characterize segmental motion patterns across the lumbar spine, paying particular attention to the clinically critical L4–L5 level. The lumbar spine parameters were calculated using the following steps: (I) a reference rectangular coordinate system was established; (II) according to the segmentation result, the four vertex coordinates of the vertebral body were set, and its position was confirmed; (III) the relevant motion parameters were calculated according to its coordinates, including displacement parameters AZ and BZ; rotation angle parameter RX; and intervertebral heights Ha, Hb, and Hc; and (IV) the statistics were tabulated and the visual output was generated.

Vertex coordinate calculation

After segmenting the lumbar spine, four vertices of the lumbar spine had to be extracted for parameter calculation. The pixels of the segmentation result image were 512×512, and a rectangular coordinate system was established for the lumbar spine image. In our experiment, it was stipulated that the upper left corner was the origin, the horizontal direction was the X-axis, with positive values extending to the right. The vertical direction was the Y-axis, and downward was the positive direction, and this defined rectangular coordinate system was used as the basis for determining the vertex coordinates. Considering the sagittal structure of the lumbar spine, its vertebral body part was similar to a rectangle. Therefore, we used the minimum circumscribed rectangle to extract the segmentation result, and then obtained its four vertices. Figure 10 shows the results obtained by this method and the results annotated by experts. Notably, there was little deviation in the results. The motion parameters of the comparison experiment are detailed below. The algorithm flowchart for extracting vertices is shown in Table 7.

Figure 10.

Figure 10

Schematic diagram of calculating coordinates according to the segmentation results.

Table 7. Algorithm flow for extracting vertebral body vertices.

Algorithm Vertebral body vertex extraction algorithm
Input Lumbar spine image prediction result sequence: PreN512512
Param The number, N, of lumbar spine images
Output The vertex coordinates of L3 in each image
Flowchart
   1 Establish a rectangular coordinate system for lumbar spine images
   2 For n = 1: N do:
   3 For Pren, change the L3 label to 255, and set the rest to 0
   4    Count the area of the largest area of L3
   5    Use cv2.findContours() to find its contour, Ln
   6    Perform minimum bounding rectangle extraction on Ln
   7    Obtain four vertices: A, B, C, and D
   8    Perform position division on A, B, C, and D
   9 End
   10 Output the position coordinates of each image

Figure 10 presents a visual display of the coordinate calculation. The upper left panel is the original image, the lower left panel is the coordinate annotated by the doctor, the middle panel is the segmentation result image, the upper right panel is the extraction result of the minimum circumscribed rectangle, and the lower right panel is the position division result of the vertices of this rectangle. After coordinate extraction, position division was required. After the extraction of the minimum circumscribed rectangle, it was impossible to determine which was the upper left panel or which was the upper right panel. Since the vertebral body in this experiment was not inverted, we adopted the proximity rule. The point closest to the upper left panel of the image was the upper left panel, and the point closest to the upper right panel of the image was the upper right panel, etc. However, this algorithm may fail if the patient flexes too far forward, as the shape of the vertebrae can change significantly. To address this issue, consideration should be given to the incorporation of prior knowledge of vertebrae shape changes, the use of additional features to identify corner points, or the use of machine learning methods to train a model for accurate corner detection (20-22). The calculation example result is shown in the lower right corner.

Calculation of motion parameters

Calculation method for AZ and BZ (displacement parameters)

During the movement of the lumbar spine, the displacement of its vertebral body will occur. The motion parameters of displacement are represented by AZ and BZ. The displacement parameters can be calculated with the vertices between one vertebral body. AZ represents the horizontal displacement distance between two frames of the lower left corner vertex of the upper vertebral body. BZ represents the horizontal displacement distance between two frames of the upper left corner vertex of the upper vertebral body (as shown in Figure 11).

Figure 11.

Figure 11

Calculation method for displacement parameters. AZ, anterior displacement of superior vertebra; BZ, posterior displacement of superior vertebra.

Let the vertex before movement be A1(xA1, yA1), and the vertex after movement be A2(xA2, yA2). The calculation methods for AZ and BZ are shown in Figure 11. The parameter calculation formulas are as shown in Eqs. [15,16], which are expressed as follows:

AZ=|xA2xA1| [15]
BZ=|xB2xB1| [16]

Calculation method for RX (rotation angle parameter)

In the process of flexion and the extension of the lumbar spine, rotation is indispensable. This parameter is represented by RX. The angle between the line connecting the upper left corner vertex and the lower left corner vertex of the vertebral body before displacement, A1B1, and the line connecting after displacement, A2B2, represents the rotation angle RX. Given A1(xA1, yA1), A2(xA2, yA2), B1(xB1, yB1), and B2 (xB2, yB2), the slopes m1 of A1B1 and m2 of A2B2 are first obtained, after which, the angle is obtained according to their slopes. The calculation method is shown in Figure 12, and the parameter calculation formulas are as shown in Eqs. [17-19], which are expressed as follows:

Figure 12.

Figure 12

Calculation method for rotation angle. RX, vertebral rotation angle.

m1=yB1yA1xB1xA1 [17]
m2=yB2yA2xB2xA2 [18]
RX=arctan(|m2m11+m2m1|) [19]

Alternatively, the method described by McCane et al. (36) can also be used to calculate rotations (and the center-of-rotation) from landmarks. This method has the advantage of being able to handle scaling transformations, which may occur during the movement of the lumbar spine. The method is based on a least squares derivation and provides a more robust solution than the method based on slopes. However, the method of McCane et al. is more complex and computationally intensive.

The choice of method depends on the specific requirements of the application. If the goal is to obtain a simple and fast solution, the method based on slopes may be sufficient. However, if the goal is to obtain a more accurate and robust solution, especially in the presence of scaling transformations, the method of McCane et al. would be preferable.

Calculation method for Ha, Hb, and Hc (the intervertebral height parameters)

The intervertebral height represents the distance between two adjacent vertebral bodies. Among them, Ha, Hb, and Hc, represent the distances of their anterior, middle, and posterior edges, respectively. A1(xA1, yA1) represents the vertex of the posterior edge of the upper vertebral body, and D1(xD1, yD1) represents the vertex of the anterior edge of the upper vertebral body. B2(xB2, yB2) represents the vertex of the posterior edge of the lower vertebral body, and C2(xC2, yC2) represents the vertex of the anterior edge of the lower vertebral body. The calculation method is shown in Figure 13, and the parameter calculation formulas are as shown in Eqs. [20-22], which are expressed as follows:

Figure 13.

Figure 13

Calculation method for intervertebral height. Ha, anterior disc height; Hb, middle disc height; Hc, posterior disc height.

Ha=(yA1yB2)2+(xA1xB2)2 [20]
Hc=(yD1yC2)2+(xD1xC2)2 [21]
Hb=(|yD1yA1|2|yC2yB2|2)2+(|xD1xA1|2|xC2xB2|2)2 [22]

Calculation results and analysis

Using the above formulas, we obtained the predicted motion parameters and compared them with the parameters manually calculated by expert physicians. Since adjacent lumbar spine images are nearly identical and the movement of the lumbar spine is minimal, statistical errors are relatively large. Therefore, every eighth frames were used as a sampling unit, and the lumbar spine motion parameters were calculated accordingly. Tables 8,9 show the predicted values, true values, and their errors. We present the motion parameters of one patient as an example.

Table 8. Comparison of automated calculation method and manual method (displacement and rotation).

Frames Parameters
AZ (mm) BZ (mm) RX (°)
Predict Label Error Predict Label Error Predict Label Error
1–8 2.22 2.96 0.74 2.96 3.7 0.74 2.33 1.29 1.04
9–16 1.48 0.74 0.74 0 0.74 0.74 1.94 1.08 0.86
17–24 3.7 5.92 1.48 0.74 1.48 0.74 2.22 2.34 0.12
297–304 1.48 1.48 0 2.22 2.22 0 29 23 0.06
305–312 4.44 4.44 0 1.48 2.22 0.74 2.02 1.33 0.69
31320 0.74 1.48 0.74 3.7 2.22 0 1.37 0.49 0.88

AZ, anterior displacement of superior vertebra; BZ, posterior displacement of superior vertebra; RX, vertebral rotation angle.

Table 9. Comparison of automated calculation method and manual method (intervertebral height).

Frames Parameters
Ha (mm) Hb (mm) Hc (mm)
Predict Label Error Predict Label Error Predict Label Error
1–8 15.61 17.82 2.21 11.94 12.99 1.05 8.27 8.18 0.09
9–16 15.27 15.55 0.28 11.17 12.65 1.48 9.03 9.73 0.7
17–24 17.9 19.47 1.57 14.2 15.12 0.92 9.03 10.77 1.74
297–304 17.28 17.42 0.14 12.5 12.79 0.29 7.73 8.18 0.45
305–312 16.35 17.09 0.74 11.16 13.22 2.06 7.44 9.36 1.92
313–320 15.26 16.69 1.43 11.77 13.17 1.4 9.01 9.65 0.64

Ha, anterior disc height; Hb, middle disc height; Hc, posterior disc height.

In terms of the qualitative analysis, Figure 14 shows the visualization of predicted values versus actual values for one patient, where green represents the true values analyzed by experts, and red represents the parameters measured by our method. Notably, our method’s results for AZ, BZ, and RX were closely aligned with the true values, indicating the effectiveness of our approach. For the parameters Ha, Hb, and Hc, our method’s predicted values were slightly smaller than the actual values. A numerical analysis after experimentation indicated that this discrepancy was due to occasional shifts in the vertices of the minimum enclosing rectangle extracted from the true vertices. However, overall, the curves were largely consistent, demonstrating that our method was also effective in measuring intervertebral height (see Tables 8,9, which compare the automated and manual methods for displacement and rotation parameters and intervertebral height measurements).

Figure 14.

Figure 14

Qualitative analysis curve comparison. One patient; each unit represents eight frames of images; green represents the true values; and red represents the predicted values. AZ, anterior displacement of superior vertebra; BZ, posterior displacement of superior vertebra; GT, ground truth; Ha, anterior disc height; Hb, middle disc height; Hc, posterior disc height; RX, vertebral rotation angle.

In addition, we performed a quantitative analysis of the measurement results. After calculating the results for each patient, we averaged the results to determine the range of our prediction errors (see Table 10). Notably, the error in our lumbar motion parameter measurements compared to the standard set by professional physicians was very minimal. The model’s measurements of lumbar displacement and rotation parameters performed particularly well, with AZ and BZ errors of only 1.12 and 1.22 pixels, respectively, and a lumbar motion angle error of only 1.21 degrees. However, while the intervertebral height measurement also achieved high accuracy, its error was slightly larger compared to the intervertebral displacement measurements.

Table 10. Mean error for each patient.

No. Metric Parameters
AZ BZ RX Ha Hb Hc
1 ERROR 0.75±0.73 1.1±0.1 1.17±0.46 1.52±0.36 1.49±0.06 1.7±0.05
CV 0.97 0.09 0.39 0.24 0.04 0.03
2 ERROR 0.72±0.66 0.81±0.07 0.82±0.04 1.52±0.1 1.52±0.71 1.74±0.2
CV 0.92 0.09 0.05 0.07 0.47 0.11
3 ERROR 0.91±0.01 0.62±0.65 0.67±0.54 1.57±0.31 1.4±0.63 1.65±0.08
CV 0.01 1.05 0.81 0.2 0.45 0.05
4 ERROR 0.95±0.27 0.95±0.01 1.21±0.42 0.94±0.64 1.01±0.73 1.25±0.41
CV 0.28 0.01 0.35 0.68 0.72 0.33
5 ERROR 0.64±0.55 1±0.07 0.89±0.13 1.12±0.1 0.79±0.62 1.41±0.5
CV 0.86 0.07 0.15 0.09 0.78 0.35
6 ERROR 0.92±0.13 0.83±0.7 1.01±0.36 1.07±0.71 1.01±0.62 1.24±0.26
CV 0.14 0.84 0.36 0.66 0.61 0.21
7 ERROR 0.98±0.72 0.82±0.67 0.68±0.58 1.29±0.64 1.19±0.73 1.52±0.34
CV 0.73 0.82 0.85 0.5 0.61 0.22
8 ERROR 0.82±0.01 1±0.07 0.78±0.06 1.72±0.74 1.52±0.09 1.75±0.54
CV 0.01 0.07 0.08 0.43 0.06 0.31
9 ERROR 0.66±0.42 0.9±0.47 0.62±0.5 1.51±0.7 0.87±0.56 0.81±0.62
CV 0.64 0.52 0.81 0.46 0.64 0.77
10 ERROR 0.93±0.71 0.98±0.22 1.1±0.19 1.21±0.21 0.93±0.73 1±0.44
CV 0.76 0.22 0.17 0.17 0.78 0.44

ERROR denotes the mean error ± SD; CV represents the coefficient of variation, calculated as the ratio of the standard deviation to the mean (CV = SD/mean). The units for AZ, BZ, Ha, Hb, and Hc are in millimeters (mm); the unit for RX is degrees (°). AZ, anterior displacement of superior vertebra; BZ, posterior displacement of superior vertebra; CV, coefficient of variation; Ha, anterior disc height; Hb, middle disc height; Hc, posterior disc height; RX, vertebral rotation angle; SD, standard deviation.

Discussion

Interpretation

The study introduced VerSeg-Net, a novel deep learning–based approach for segmenting lumbar vertebrae and measuring motion parameters from dynamic X-ray images. The results demonstrated the VerSeg-Net’s high segmentation accuracy and low motion parameter calculation errors, suggesting that VerSeg-Net can effectively analyze lumbar spine motion and serve as a reliable method for clinical diagnosis.

Specifically, our proposed deep learning-based algorithm for automatically measuring lumbar motion parameters forms the foundational technical basis for these findings. The algorithm assesses lumbar instability by analyzing horizontal displacement, rotation angles, and disc height changes in the sagittal plane-key metrics that align with the motion parameters quantified by VerSeg-Net. Compared with traditional methods, our algorithm offers high automation, a fast measurement speed, and high accuracy, effectively overcoming the limitations of manual measurements (e.g., subjectivity and inefficiency). Additionally, its higher robustness when processing complex spinal images, especially in cases of vertebral deformities or poor image quality, directly contributes to the consistent segmentation results observed, even in challenging clinical scenarios. By comparing our method with existing approaches, we further validated these advantages in terms of accuracy and robustness, reinforcing why VerSeg-Net achieves low calculation errors and reliable performance.

Concerns may arise due to the erratic appearance of motion curves in Figure 14. The seemingly abnormal fluctuations in the curves may be attributed to several factors. First, measurement errors are inevitable. The imaging quality of lateral X-ray films can be affected by various factors such as patient positioning and exposure parameters during data collection, which may introduce inaccuracies in the measured motion parameters, and thus affect the smoothness of the curves. Second, landmark localization error may play a role. The accuracy of landmark localization depends on the specific localization protocol and the operator’s experience. Inaccurate landmark placement can lead to abnormal fluctuations in the calculated motion curves. Additionally, X-ray images inherently contain noise, such as quantum noise and scatter noise, which can also contribute to the non-smoothness of the curves.

To address these issues, we implemented a number of measures. We applied high-order spline interpolation and other advanced smoothing techniques to reduce the impact of abnormal fluctuations in the motion curves. This helped to better represent the underlying motion trends. We also conducted a detailed analysis of the motion curves and systematically identified and removed outliers that were likely caused by measurement errors or landmark localization inaccuracies. In future research, we intend to expand our dataset by incorporating additional samples. A larger sample size will enhance the robustness and reliability of our results, reducing the influence of individual outliers or errors on the curves (see Table 10, which sets out the mean error for each patient).

Additionally, we acknowledge that variations in image projection due to spinal rotation about the cranial-caudal axis (transverse plane rotation) could affect measurements, as 2D X-ray projections inherently compress 3D motion. Such rotation may distort vertebral shape or position in 2D images, potentially affecting segmentation accuracy and introducing errors in vertex extraction, displacement, rotation, or height calculations. Although our imaging protocol minimized transverse rotation by instructing subjects to strictly maintain sagittal flexion-extension, residual rotation might persist. To address this limitation, in our future work, we will integrate 3D pose estimation techniques to correct projection-related variations, thereby enhancing measurement robustness.

Limitations

While the algorithm and VerSeg-Net show promising performance, the study had several limitations. The sample size was relatively small, and the data were collected from a single hospital, which may limit the generalizability of the findings, particularly in terms of the algorithm’s robustness in diverse clinical settings. Additionally, while the algorithm demonstrated robustness in handling complex images, it may face challenges in handling complex lumbar spine pathologies or poor-quality images. Future research should involve larger and more diverse samples, and explore the integration of multi-modality data.

Additionally, our method assumes a roughly rectangular vertebral shape for vertex extraction via minimum bounding rectangles. While effective for intact vertebrae in degenerative conditions (e.g., spondylolisthesis), this approach may yield errors in cases of vertebral fractures or severe deformations where the morphology deviates substantially from a quadrilateral. Future research will explore integrating fracture-aware segmentation modules or deformable shape models to address such pathological variations.

Usability of the model in the context of current care

The model can be used to provide immediate feedback on lumbar spine motion during examinations. Its high automation and fast measurement speed can be leveraged to facilitate timely diagnosis and treatment decisions, addressing the inefficiencies of traditional manual measurements. This aligns with our initial goal of promoting the development of spinal disease diagnosis and treatment technologies. Moving forward, future research should focus on the real-time application and clinical integration of the model.

Conclusions

Intervertebral kinematic parameter measurement aids in the study of lumbar spine motion patterns, and the accurate measurement of lumbar motion parameters can support clinical diagnosis. We proposed a measurement method using deep neural networks for vertebra segmentation, with the segmentation results assisting in parameter calculation. To this end, we introduced VerSeg-Net, a lumbar spine segmentation network based on dynamic X-rays, which includes an RA perception module. This module divides lumbar spine images into several feature blocks and incorporates a query-aware attention mechanism to filter out irrelevant key-value pairs between blocks, focusing attention on more relevant areas. The experimental results showed that the RA module helped the neural network learn more robust contextual features. Additionally, VerSeg-Net includes a feature aggregation module for skip connections, which, by using different convolutional kernel sizes, learns contextual features at various scales. We demonstrated VerSeg-Net’s effectiveness on a dynamic X-ray dataset from a local hospital. We extracted the minimum bounding rectangles from the segmentation results to obtain the four vertices of the lumbar spine, designed a calculation method based on these vertices, and visualized the results. Compared with annotations from professional physicians, the error fell within a reasonable range, indicating that our automated lumbar parameter measurement method has significant clinical relevance.

Supplementary

The article’s supplementary files as

qims-15-10-9492-rc.pdf (413KB, pdf)
DOI: 10.21037/qims-24-1976
DOI: 10.21037/qims-24-1976

Acknowledgments

We acknowledge Dr. Yuzhou Liu for technical support during manuscript revision.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This study was approved by the Ethics Committee of Wuxi Ninth People’s Hospital (No. KS2023006). All participants provided informed consent prior to their inclusion in the study.

Footnotes

Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-1976/rc

Funding: This work was supported by Wuxi Modern Industry Development Fund Project (No. Y20212056).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1976/coif). The authors have no conflicts of interest to declare.

Data Sharing Statement

Available at https://qims.amegroups.com/article/view/10.21037/qims-24-1976/dss

qims-15-10-9492-dss.pdf (75.4KB, pdf)
DOI: 10.21037/qims-24-1976

References

  • 1.Panjabi MM. Clinical spinal instability and low back pain. J Electromyogr Kinesiol 2003;13:371-9. 10.1016/s1050-6411(03)00044-0 [DOI] [PubMed] [Google Scholar]
  • 2.Cho IY, Park SY, Park JH, Suh SW, Lee SH. MRI findings of lumbar spine instability in degenerative spondylolisthesis. J Orthop Surg (Hong Kong) 2017;25:2309499017718907. 10.1177/2309499017718907 [DOI] [PubMed] [Google Scholar]
  • 3.Takayanagi K, Takahashi K, Yamagata M, Moriya H, Kitahara H, Tamaki T. Using cineradiography for continuous dynamic-motion analysis of the lumbar spine. Spine (Phila Pa 1976) 2001;26:1858-65. 10.1097/00007632-200109010-00008 [DOI] [PubMed] [Google Scholar]
  • 4.Esmailiejah AA, Abbasian M, Bidar R, Esmailiejah N, Safdari F, Amirjamshidi A. Diagnostic efficacy of clinical tests for lumbar spinal instability. Surg Neurol Int 2018;9:17. 10.4103/sni.sni_359_17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Pitkänen MT, Manninen HI, Lindgren KA, Sihvonen TA, Airaksinen O, Soimakallio S. Segmental lumbar spine instability at flexion-extension radiography can be predicted by conventional radiography. Clin Radiol 2002;57:632-9. 10.1053/crad.2001.0899 [DOI] [PubMed] [Google Scholar]
  • 6.Iguchi T, Kanemura A, Kasahara K, Sato K, Kurihara A, Yoshiya S, Nishida K, Miyamoto H, Doita M. Lumbar instability and clinical symptoms: which is the more critical factor for symptoms: sagittal translation or segment angulation? J Spinal Disord Tech 2004;17:284-90. 10.1097/01.bsd.0000102473.95064.9d [DOI] [PubMed] [Google Scholar]
  • 7.Zhang C, Wang J, He J, Gao P, Xie G. Automated vertebral landmarks and spinal curvature estimation using non-directional part affinity fields. Neurocomputing 2021;438:280-9. [Google Scholar]
  • 8.Gertzbein SD, Seligman J, Holtby R, Chan KH, Kapasouri A, Tile M, Cruickshank B. Centrode patterns and segmental instability in degenerative disc disease. Spine (Phila Pa 1976) 1985;10:257-61. 10.1097/00007632-198504000-00014 [DOI] [PubMed] [Google Scholar]
  • 9.Yeh YC, Weng CH, Huang YJ, Fu CJ, Tsai TT, Yeh CY. Deep learning approach for automatic landmark detection and alignment analysis in whole-spine lateral radiographs. Sci Rep 2021;11:7618. 10.1038/s41598-021-87141-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Zhou S, Yao H, Ma C, Chen X, Wang W, Ji H, He L, Luo M, Guo Y. Artificial intelligence X-ray measurement technology of anatomical parameters related to lumbosacral stability. Eur J Radiol 2022;146:110071. 10.1016/j.ejrad.2021.110071 [DOI] [PubMed] [Google Scholar]
  • 11.Cina A, Bassani T, Panico M, Luca A, Masharawi Y, Brayda-Bruno M, Galbusera F. 2-step deep learning model for landmarks localization in spine radiographs. Sci Rep 2021;11:9482. 10.1038/s41598-021-89102-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hipp JA, Grieco TF, Newman P, Reitman CA. Definition of Normal Vertebral Morphometry Using NHANES-II Radiographs. JBMR Plus 2022;6:e10677. 10.1002/jbm4.10677 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Al Arif SMMR, Knapp K, Slabaugh G. Fully automatic cervical vertebrae segmentation framework for X-ray images. Comput Methods Programs Biomed 2018;157:95-111. 10.1016/j.cmpb.2018.01.006 [DOI] [PubMed] [Google Scholar]
  • 14.Pfeiffer D, Pfeiffer F, Rummeny E. Advanced X-ray Imaging Technology. Recent Results Cancer Res. 2020;216:3-30. 10.1007/978-3-030-42618-7_1 [DOI] [PubMed] [Google Scholar]
  • 15.Ou X, Chen X, Xu X, Xie L, Chen X, Hong Z, Bai H, Liu X, Chen Q, Li L, Yang H. Recent Development in X-Ray Imaging Technology: Future and Challenges. Research (Wash D C). 2021;2021:9892152. 10.34133/2021/9892152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Tabrizchi H, Mosavi A, Vamossy Z, Varkonyi-Koczy AR, editors. Densely Connected Convolutional Networks (DenseNet) for Diagnosing Coronavirus Disease (COVID-19) from Chest X-ray Imaging. 2021 IEEE International Symposium on Medical Measurements and Applications (MeMeA); 23-25 June, 2021. [Google Scholar]
  • 17.Wang W, Wang G, Wu X, Ding X, Cao X, Wang L, Zhang J, Wang P. Automatic segmentation of prostate magnetic resonance imaging using generative adversarial networks. Clin Imaging 2021;70:1-9. 10.1016/j.clinimag.2020.10.014 [DOI] [PubMed] [Google Scholar]
  • 18.Milletari F, Navab N, Ahmadi SA, editors. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. 2016 Fourth International Conference on 3D Vision (3DV); 25-28 Oct, 2016. [Google Scholar]
  • 19.Noh H, Hong S, Han B. Learning Deconvolution Network for Semantic Segmentation. 2015 IEEE International Conference on Computer Vision (ICCV); 2015:1520-8. [Google Scholar]
  • 20.Dickey JP, Pierrynowski MR, Bednar DA, Yang SX. Relationship between pain and vertebral motion in chronic low-back pain subjects. Clin Biomech (Bristol) 2002;17:345-52. 10.1016/s0268-0033(02)00032-3 [DOI] [PubMed] [Google Scholar]
  • 21.Li X, Wang W, Hu X, Yang J, editors. Selective Kernel Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 15-20 June, 2019. [Google Scholar]
  • 22.Zhu L, Wang X, Ke Z, Zhang W, Lau R. BiFormer: Vision Transformer with Bi-Level Routing Attention. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023:10323-33. [Google Scholar]
  • 23.Dong X, Bao J, Chen D, Zhang W, Yu N, Yuan L, Chen D, Guo B. CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021:12114-24. [Google Scholar]
  • 24.Huang G, Sun Y, Liu Z, Sedra D, Weinberger KQ. Deep Networks with Stochastic Depth. arXiv 2016. arXiv:1603.09382v3.
  • 25.Huang Z, Wang X, Wei Y, Huang L, Shi H, Liu W. CCNet: Criss-Cross Attention for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence; 2023;45:6896-908. 10.1109/TPAMI.2020.3007032 [DOI] [PubMed] [Google Scholar]
  • 26.Kirillov A, Girshick R, He K, Dollár P. Panoptic Feature Pyramid Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): IEEE Computer Society; 2019:6392-401. [Google Scholar]
  • 27.Mei H, Ji GP, Wei Z, Yang X, Wei X, Fan DP, editors. Camouflaged Object Segmentation with Distraction Mining. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 20-25 June, 2021. [Google Scholar]
  • 28.Kahn CE, Jr, Carrino JA, Flynn MJ, Peck DJ, Horii SC. DICOM and radiology: past, present, and future. J Am Coll Radiol 2007;4:652-7. 10.1016/j.jacr.2007.06.004 [DOI] [PubMed] [Google Scholar]
  • 29.Gleinert-Rożek MŁ, Kosiński A, Kaczyńska A, Zajączkowski M, Kuta W, Kamiński R, Piwko G. Metric analysis of the lumbar region of human vertebral column. Folia Morphol (Warsz) 2020;79:655-61. 10.5603/FM.a2020.0008 [DOI] [PubMed] [Google Scholar]
  • 30.Davis RJ, Lee DC, Wade C, Cheng B. Measurement Performance of a Computer Assisted Vertebral Motion Analysis System. Int J Spine Surg 2015;9:36. 10.14444/2036 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Bagheri R, Ebrahimi Takamjani I, Dadgoo M, Ahmadi A, Sarrafzadeh J, Pourahmadi MR, Jafarpisheh AS. Gender-Related Differences in Reliability of Thorax, Lumbar, and Pelvis Kinematics During Gait in Patients With Non-specific Chronic Low Back Pain. Ann Rehabil Med 2018;42:239-49. 10.5535/arm.2018.42.2.239 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Chen Z, Wang W, Chen X, Dong F, Cheng G, He L, Ma C, Yao H, Zhou S. Deep learning-based quantitative morphological study of anteroposterior digital radiographs of the lumbar spine. Quant Imaging Med Surg 2024;14:5385-95. 10.21037/qims-22-540 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Frobin W, Brinckmann P, Leivseth G, Biggemann M, Reikerås O. Precision measurement of segmental motion from flexion-extension radiographs of the lumbar spine. Clin Biomech (Bristol) 1996;11:457-65. 10.1016/s0268-0033(96)00039-3 [DOI] [PubMed] [Google Scholar]
  • 34.Kingma D, Ba J. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations; 2014:1293-8. [Google Scholar]
  • 35.Mao A, Mohri M, Zhong Y. Cross-Entropy Loss Functions: Theoretical Analysis and Applications. International Conference on Machine Learning; 2023:23803-28. [Google Scholar]
  • 36.McCane B, King TI, Abbott JH. Calculating the 2D motion of lumbar vertebrae using splines. J Biomech. 2006;39:2703-8. 10.1016/j.jbiomech.2005.09.015 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

The article’s supplementary files as

qims-15-10-9492-rc.pdf (413KB, pdf)
DOI: 10.21037/qims-24-1976
DOI: 10.21037/qims-24-1976

Data Availability Statement

Available at https://qims.amegroups.com/article/view/10.21037/qims-24-1976/dss

qims-15-10-9492-dss.pdf (75.4KB, pdf)
DOI: 10.21037/qims-24-1976

Articles from Quantitative Imaging in Medicine and Surgery are provided here courtesy of AME Publications

RESOURCES