Abstract
This paper suggests an automated system for segmentation of the Lumbo-Sacral (LS) Magnetic Resonance Imaging (MRI) of spine and evaluation of its geometrical characteristics. The LS MRI of spine is segmented into anatomical parts such as vertebrae, intervertebral discs (IVDs), canal and vertebral height and width, IVDs height and width, canal diameter, IVDs height index and signal intensity parameters are computed to facilitate automatic analysis. To overcome the subjectivity and variability that come with manual analysis, an expert-verified dataset is developed. This dataset improves clinical outcomes and diagnostic accuracy by facilitating objective and consistent lumbar spine analysis. Furthermore, the generated data supports the development of personalized treatment programs that enhance patient care. For segmentation of LS spine MRIs, we have utilized DeepLabV3 + with ResNet50 and attention gate. Automated quantitative analysis of LS spine enables several therapeutic advantages such as (1) lessens the workload of the radiologists, allowing them to concentrate on challenging cases and improves productivity during routine evaluations (2) offers objective and consistent analysis and boosts diagnostic precision by minimizing mistakes (3) enables early spine pathology detection and easy monitoring allowing for timely medications. The suggested automating model is highlighted by its capability to improve clinical efficiency, accuracy, patient care quality and will have a potential influence on the management of spinal health and the treatment of low back pain. Index term- Lumbar spine MRI, Segmentation, Back pain, DeepLabV3 + , Personalized treatment.
Keywords: Lumbar spine MRI, Segmentation, Back pain, DeepLabV3 +, Personalized treatment
1) Introduction
Low back pain is a significant global public health issue which has a substantial negative influence on productivity and quality of life [1–3]. Approximately 540 million people suffer from low back pain worldwide and become the foremost cause of disability. Accurate diagnosis and effective treatment of conditions like lumbar spondylosis, spinal stenosis, and disc herniation are critical for managing this widespread problem [4, 5]. The use of magnetic resonance imaging (MRI) is essential in the evaluation of lumbar spine diseases as it provides detailed images of spinal structures [6]. However, manual review of these images is time-consuming and prone to high interobserver variability [7], which can lead to inconsistent and subjective assessments.
To address these challenges, we have developed an automated system for the quantitative analysis of lumbar spine MRIs. This system focuses on the segmentation of key lumbar spine structures, including lumbar vertebrae, intervertebral discs (IVDs), and the vertebral canal. By automating the segmentation process, our system ensures consistent and objective identification of these structures, thereby enhancing the reliability of the analysis. Similar approaches to automate segmentation using deep learning have demonstrated promising results in improving the consistency and accuracy of spinal structure identification [8]. Automation ensures consistent and objective analysis, improving diagnostic accuracy and reducing the risk of errors. Early detection and monitoring of spine pathologies become more feasible, enabling timely interventions. Standardization of assessments and reporting enhances comparative studies and coordination among healthcare providers. Furthermore, detailed and precise data generated by automated systems support personalized treatment plans, improving patient outcomes. Overall, automating lumbar spine analysis enhances clinical efficiency, accuracy, and patient care quality.
The system measures a range of spine geometrical parameters that are crucial for diagnosing and monitoring lumbar spine conditions [9–13]. These parameters include disc height and width, vertebrae height and width, canal width, and signal intensity. Accurate measurement of these parameters is essential for evaluating spinal health, assessing the severity of degenerative changes, and planning appropriate interventions [14].
By integrating pretrained models as encoders, our system aims to be a vital tool to improve the diagnosis and treatment of low back pain [7]. The quantitative data provided by our system can help clinicians make more informed decisions, potentially leading to better patient treatment. Additionally, the objective nature of the automated measurements reduces the variability inherent in manual grading, offering a more standardized approach for lumbar spine evaluation. The development of this automated system represents a significant advancement in spinal health diagnostics, leveraging technology to enhance clinical accuracy and efficiency. Recent medical research on lumbar spine imaging dedicated its efforts toward automating quantitative analysis for improved accuracy and efficiency. Research has explored different approaches to accomplish this goal.
The main progress made in this area includes developing automated methods for segmenting lumbar spine structures. The techniques have been particularly designed for spinal components to improve measurement accuracy. The introduction of an automatic dura-contouring tool enhanced radiographic quantification in patients with lumbar spine conditions by performing CTM measurements [15]. The application of deep learning methods has become a focus for developing automated analysis systems. The DeepSPINE model utilizes a convolutional neural network to perform vertebral segmentation and disc-level designation and spinal stenosis grading which leads to high accuracy in identifying lumbar spinal stenosis. The model utilized 22,796-disc levels from 4,075 patients through its training process [16].
Research implemented automatic segmentation methods through MRI to examine lumbar spinal stenosis in patients by analysing intervertebral discs together with vertebral bodies and yellow ligament structures. The method enabled researchers to acquire global measurements at every disc level by separating healthy from diseased segments [17]. A separate AI system performed autonomous segmentation and measurements of spinal structures in MRI images through high Dice coefficient performance in different spine areas. The system provided high precision and reduced doctor workload while using AI to automatically measure IVD geometric parameters from MRI data [18]. Researchers utilized computed tomography (CT) scans to perform quantitative assessment of lumbar bone radiographs in their study. The process started by defining an area of interest followed by edge detection for vertebral bodies and border extraction and endplate identification through multiple preprocessing steps. Multiple preprocessing steps resulted in feature extraction that included area, mass, gray-level mean value and variance, internal and external perimeter, complexity, deflection, amplitude factor and endplate centroid [19].
The field of lumbar spine condition quantification uses machine learning algorithms for its analysis. The research created a completely automated system to measure lumbar spondylolisthesis quantities which shows the capability of machine learning in this application [20]. Researchers analysed disc bulging in patients with lumbar spinal stenosis through CT imaging in their study. The researchers evaluated surgical outcome effects by measuring how disc bulging extended beyond the endplate area on CT axial images [21].
The research shows how automated quantitative analysis improves lumbar spine imaging while demonstrating its future possibilities. The exact measurement of spinal geometrical parameters presents ongoing challenges despite current advancements. The main drawback of current deep learning models involves their exclusive attention to segmentation work and IVD geometric parameter quantification without delivering useful clinical measurements. Accurate measurement of essential spinal parameters remains challenging because intervertebral disc height and width (average and central values) IVD height index vertebral height and width spinal canal dimensions (average and central values) and IVD signal intensity prove difficult to quantify. Most current studies create segmentation masks yet they do not convert these masks into meaningful clinical metrics required for proper diagnosis and treatment planning. The practical application of automated quantitative analysis in clinical settings would receive major benefits from overcoming these current limitations.
By leveraging the state-of-the-art deep learning-based segmentation technique we have developed an automated and improved segmentation techniques. The developed system has the capability to provide detailed and reproducable position measurement which is a valuable requirement in the clinical settings. This setting has a significant impact in patient care such as:
It will lessen the workload and improves the productivity of the radiologists.
It will offer objective and consistent analysis and boosts diagnostic precision by minimizing mistakes.
It will enable early spine pathology detection and timely monitoring of spinal abnormalities.
The structure of the paper is as follows: Starting with the introduction Section"Dataset Preparation and Pre-Processing"describes the dataset and preprocessing followed by proposed methodology and segmentation in Section"Proposed Methodology". The results of the segmentation model and the outcomes of geometrical measurements are analyzed in Section"Conclusion". Section 5 describe the conclusion of the study followed by its future research direction.
Dataset Preparation and Pre-Processing
The dataset used in this study comprises of mid-sagittal T2-weighted (T2 W) LS spine MRIs. The study is funded by Department of Health Research (DHR) Multi—Disciplinary Research Units (MRU) and started after institute Ethical Clearance in collaboration of AIIMS Raebareli and NIT Meghalaya. Due to the retroactive nature of the data and the use of deidentified MRIs, written informed permission has been taken, guaranteeing patient confidentiality and adhering to ethical standards.
Data Collection
The data is collected from patients of the Neurosurgery Department, AIIMS Raebareli (some MRIs are from outside also) covering the period from duration of March 2023 to February 2024 as the study was approved as a time-bound project. A total of 173 subjects underwent lumbar spine MRI using a 1.5-T MRI unit. The mid-sagittal T2 W images are acquired with a resolution of (512 × 512) pixels. This resolution is optimal for capturing the detailed anatomical structures of the LS spine necessary for accurate segmentation. The detailed demographical distribution of the dataset is described below.
Demographical (Age group and Gender) Distribution
The dataset includes detailed demographic information categorized by gender and age groups. The dataset is containing 105 male subjects and 68 female subjects MRIs. The description of the dataset is given in the Table 1. Variability of our collected dataset involving different age and gender factor will enhance the robustness of our model to measure variability of geometrical changes in LS spine.
Table 1.
Gender and Age Group Wise Dataset Description of Dataset Received from AIIMS Raebareli
| S.No | Age Group | Male Subjects | Female Subjects | Total Number of MR Images |
|---|---|---|---|---|
| 1 | 0–15 | 3 | 1 | 13 |
| 2 | 16–30 | 31 | 13 | 185 |
| 3 | 31–45 | 28 | 24 | 225 |
| 4 | 46–60 | 24 | 20 | 190 |
| 5 | Above 60 | 19 | 10 | 110 |
Pre-Processing
The preprocessing is an essential step in getting the dataset ready for training a semantic segmentation model. It involves several steps such as image normalization, resizing and cropping followed by image augmentation aimed at standardizing images, enhancing image quality, and boosting up the dataset volume respectively to improve model performance.
Image Normalization
Normalization is done to get uniform pixel intensity. In the normalization process the pixel values are adjusted to a common scale by removing the mean intensity and dividing by the standard deviation of the pixel intensities. Normalization helps in reducing the variability due to different imaging conditions and improves the convergence of the model during training.
Resizing and Cropping
Most of the original MRIs of the LS spine are (512 × 512) pixels resolution but not all. So, resizing or cropping is necessary to fit the input requirements of the neural network architecture used for segmentation (segmentation model architecture will be discussed in the upcoming section). Maintaining the aspect ratio and important anatomical details is crucial during this step to avoid losing critical information. Resized and cropped spine MRI is shown in Fig. 1.
Fig. 1.
Example of Resized and cropped image. (a) Original MRI, (b) Resized and cropped MRI to match the dataset resolution of (512 × 512)
Data Augmentation
Data augmentation techniques boost up the volume of training set to improve the model resilience and generalizability. In the augmentation process, the images are randomly rotated by some angles, translated horizontally or vertically, and flipped to create a more diverse training set. This enables the model to learn invariant features and improving its performance on unseen data.
Data Annotation
Accurate data segmentation and labeling are critical steps in the development of effective semantic segmentation models for medical imaging. All MRIs are annotated in order to facilitate semantic segmentation using LabelMe (version 3.3.6) [22], a widely used image annotation tool to felicitate semantic segmentation and efficient labeling. This process involves detailed and precise labeling of key anatomical structures within the lumbar spine region, ensuring that the segmentation model can effectively learn to distinguish among these structures.
Annotation Tool
LabelMe (version 3.3.6) [22] is chosen for its user-friendly interface and robust functionality, making it well-suited for detailed medical image annotation. The tool allows precise boundary delineation and supports various annotation formats, which are essential for training sophisticated machine learning models. Each anatomical structure is manually annotated by expert annotators. This manual process ensures high accuracy and reliability of the labels, which is vital for the performance of segmentation model. Frequent cross check is made to maintain accurate and consistent label annotation.
Annotation Process
The labeling process is conducted on T2 W lumbar spine images with a resolution of (512 × 512) pixels. Each image is meticulously annotated to identify and demarcate 14 distinct areas, which are essential for comprehensive lumbar spine analysis. 14 distinct areas include five lumbar vertebrae (L1 to L5), one sacrum vertebrae (S1), six intervertebral discs (T12-L1 to L5-S1), spinal canal, and background. One of the sample annotated MRI is depicted in Fig. 2.
Fig. 2.
Example of Labeled Dataset for segmentation. (a) Original MRI, (b) Manually labeled ground truth mask of MRI, and (c) Class names of the labeled ground truth mask
Proposed Methodology
This work presents a comprehensive methodology for the segmentation and analysis of LS spine MRIs. The workflow involves several sequential steps, from initial segmentation to the final geometrical features measurements. The process is methodical and effective, with each stage building on the previous one to ensure improvement. The entire workflow of our suggested method is shown in Fig. 3. Individual steps of our proposed methodology are described below.
Fig. 3.
Workflow Diagram of Proposed Methodology
Segmentation Models
In this section we will discuss the segmentation models used for our work. For the segmentation task we have come across a several state-of the-art deep learning models like Original U-Net [23], VM-UNet [24], Swin transformer-based U-Net [25], and U-Net with transfer learning approach. Some of the top performer among these models will be discussed in details below:
U-Net
A popular and potent convolutional neural network (CNNs) architecture for image segmentation applications, especially in the field of medical imaging, is the U-Net [23] architecture. Capacity of this architecture to record both minute features and contextual information makes it ideal for the segmentation of MRIs into multiple classes. Below, we provide a detailed explanation of the U-Net model architecture implemented for this study. The U-Net architecture follows a symmetric encoder-decoder structure with skip connections. The encoder path (contracting path) captures context by progressively down-sampling the input image and learning feature representations, while the decoder path (expansive path) restores the spatial resolution through up-sampling and combining features from corresponding encoder layers. This design allows the model to achieve high segmentation accuracy by utilizing both high-level and low-level features.
The model architecture consists of input layer with shape (512 × 512x3) representing the three channel MRIs, encoder path consists of repeated convolutional layer with 64 filters, kernel size of (3 × 3) with ‘ReLU’ activation function and padding same. Similarly, decoder path consists of repeated upsampling layer with kernel size (2 × 2), and finally the output layer comprises of convolutional block of kernel size (1 × 1) and ‘softmax’ as activation function.
Swin U-Netr
We have utilized a 2D version of Swin UNetr [25] (UNet with Transformer Encoder) structure, combining CNNs with transformer-based encoders. The structure includes an input layer of MRIs sized (512 × 512x3). Self-attention is calculated within distinct windows that do not overlap, which are formed during the partitioning phase to enhance token interaction modeling efficiency. The structure starts with an input layer and then a patch embedding layer of dimensions (2 × 2x2) which extracts patch features and incorporates them into position embeddings.
The Swin transformer encoder is made up of multiple layers containing multi-head attention and feed-forward blocks, enabling the model to grasp distant relationships and context in the input image. The feature size is subsequently reduced with the resolution as shown in Eq. 1.
| 1 |
where, i represents the stages and H, W, D represents height, width, and channel respectively.
Skip connections are established at specific encoder layers to ensure preservation of detailed features and passed onto the decoding path. In decoding, convolutional blocks progressively reconstruct spatial details by merging information from the CNN features and the transformer encoder through deconvolutional (upsampling) layers. The skip connections from the transformer encoder are reshaped and fed into the residual block of (3 × 3x3) convolution layer and deconvolution layer of (2 × 2x2) at various stages, allowing the model to combine both global and local feature representations. The decoding path consists of several stages where convolutional and deconvolutional layers refine and upsample the feature maps. The final output layer generates a multi-class segmentation map using a (1 × 1) convolution, with 14 output channels representing the class probabilities for each pixel.
U-Net with ResNet50
This model architecture leverages the strengths of both U-Net [23] and ResNet50 [26], employing a transfer learning approach for multiclass segmentation of LS spine MRIs. The model architecture integrates a pre-trained ResNet50 as the encoder with a custom U-Net decoder, designed for segmenting MRIs into multiple classes. This combination aims to exploit the robust feature extraction capabilities of ResNet50 and the fine-grained segmentation capability of U-Net.
The model architecture consists of input layer with shape (512 × 512x3) representing the three channel MRIs, encoder path consists of ResNet50, pre-trained on the ImageNet dataset, The deepest layer from ResNet50 acts as a bridge, connecting the encoder and decoder parts of the network. Similarly, decoder path consists of repeated up-sampling layer with kernel size (2 × 2), and at the end the output layer comprises of convolutional block of kernel size (1 × 1) and ‘softmax’ as activation function.
Proposed DeepLabv3 + with ResNet50 and Attention Gate
A better decoder module is added to DeepLabv3 to create DeepLabv3 + [27], is a cutting-edge semantic segmentation model that contains a better decoder module as compared to DeepLabV3. It utilizes a powerful CNN as its backbone, acting as an encoder to extract high-level features from input images. In this implementation, ResNet50, a widely-used CNN architecture, serves as the decoder. Because of its capability in capturing subtle features and semantics in images, DeepLabv3 + [27] is well-suited for a variety of computer vision applications, including the segmentation of medical images. Figure 4 depicts the architecture of our proposed model.
Fig. 4.
Architecture of Proposed Model
The model architecture consists of input layer with shape (512 × 512x3), ResNet50 as encoder backbone, The DeepLabv3 + architecture’s Atrous Spatial Pyramid Pooling (ASPP) module is a crucial part that enables efficient multi-scale data gathering. It incorporates context information at numerous scales since it is composed of parallel atrous convolutional layers with varying dilation rates. The ASPP module expands each convolutional layer’s receptive field by the use of dilated convolutions, which helps the model to collect both local and global context information effectively. The ASPP module also has a 1 × 1 convolutional layer to lower the dimensionality of the feature maps and an average pooling layer to gather global context information. Then the decoder module combines features from the encoder with up-sampled feature maps to generate dense pixel-wise predictions. It is made up of skip connections from the encoder and a number of convolutional and up-sampling layers. The attention gate module is included in the skip connection part of the decoder module. The attention gate recognizes conspicuous areas of features and refines feature responses to keep just the activation that are related to the necessary information. The decoder module refines the predictions and improves segmentation accuracy. In this technique, feature maps are first given a higher spatial resolution via bilinear up-sampling, and then they are combined with encoder features. Lastly, a convolutional layer and a softmax activation function make up the output layer.
Loss Function
To direct the training process toward precise pixel-wise classification, the selection of the loss function is essential. For U-Net, ResNet50 U-Net, Swin U-Netr and DeepLabv3 + with ResNet50 and Attention gate architectures, we have utilized the ‘categorical cross-entropy loss function’ [28]. The categorical cross-entropy loss is widely used in multiclass classification problems. The categorical cross-entropy loss function is defined in Eq. (2).
| 2 |
where
The image has number of pixels.
is the number of classes
If class label is the correct classification for pixel then is a binary indicator (0 or 1).
is the predicted probability that pixel belongs to class .
Results and Analysis
This section performs the performance analysis of different segmentation models followed by the measurement of geometrical features. Different evaluation metrics are used in this paper to evaluate the performance of the segmentation models.
Evaluation Metrics for Segmentation Models
To evaluate the performance, several evaluation metrics are employed: Accuracy, Precision, Recall, Dice coefficient, and Intersection over Union (IoU) (Table 2).
Table 2.
Evaluation Matrix used to evaluate the performance of different models
| Parameter | Equation | Description |
|---|---|---|
| Accuracy |
Where TP stands for True Positive, TN for True Negative, FP for False Positive and FN for False Negative |
It calculates the percentage of all pixels that have been correctly categorized. It is a fundamental indicator that gives an overview of the model’s effectiveness |
| Precision | Precision is the percentage of accurately detected positive pixels out of all pixels anticipated to be positive. It is sometimes referred to as Positive Predictive Value | |
| Recall | The percentage of accurately detected positive pixels among all real positive pixels is known as recall, or sensitivity | |
| Dice Coefficient | The Dice coefficient, sometimes referred to as the Dice Similarity Index, quantifies the degree of overlap between the segmentation that is anticipated and the actual data | |
| Intersection over Union (IoU) | IoU of the anticipated segmentation and the ground truth is a ratio that is measured. It is sometimes referred to as the Jaccard index |
Performance Analysis of Segmentation Models
The performance of the segmentation models is evaluated on the test set. The results are summarized in the Table 3. As per best of our knowledge no similar existing state-of-the-art methods are available in the literature to compare with our proposed model for the segmentation of vertebrae, IVDs, and canal. However, U-Net and its variant are widely used for the segmentation of medical images. Hence, we have compared our performance with some of the prominent state-of-the-art segmentation in literature. The comparison results are depicted in Table 3, Fig. 5 shows some sample segmentation results of vertebrae, IVDs, and canal using proposed and compared models.
Table 3.
Results of various segmentation models used for vertebral bodies segmentation over the dataset
| S.No | Model | Accuracy | Precision | Recall | Dice Coefficient | IoU |
|---|---|---|---|---|---|---|
| 1 | U-Net | 0.915 | 0.907 | 0.913 | 0.911 | 0.825 |
| 2 | Swin U-Netr | 0.942 | 0.915 | 0.920 | 0.917 | 0.853 |
| 3 | U-Net with ResNet50 | 0.945 | 0.920 | 0.925 | 0.923 | 0.860 |
| 4 | Proposed Model | 0.965 | 0.945 | 0.942 | 0.944 | 0.895 |
Fig. 5.
Predictions of the Segmentation Models
Usually, U-Net achieves satisfactory results due to its straightforward architecture along with the pretrained models and effective use of skip connections make it a reliable choice for medical image segmentation. However, from Table 3 and Fig. 5 it is observed that the proposed DeepLabv3 + with ResNet50 and Attention gate model outperformed the other architectures in terms of Accuracy, Recall, Dice coefficient, and IoU. The ASPP module and the refined decoder contribute to its superior performance by capturing multi-scale context and improve localization.
To further validate the robustness and generalization capability of the DeepLabv3 + with ResNet50 and Attention gate model, we propose using 8-Fold Cross Validation (K-FCV). Using this method, the dataset is divided into 8 folds of equal size. One-fold is kept for validation throughout training, while the remaining 7 folds are utilized for training. Every fold serves as the validation set precisely once during the 8 iterations of this operation. By averaging the final performance measures across all 8 trials, a thorough assessment of the model is produced.
Implementing K-FCV for the DeepLabv3 + with ResNet50 model is a crucial step towards ensuring its reliability and robustness in medical image segmentation tasks. Table 4. depicts the performance improvement of our model using K-FCV. The superior architecture, coupled with a thorough validation process, positions this model as a highly effective tool for accurate and precise segmentation. Ultimately it will be benefiting clinical diagnostics and treatment planning.
Table 4.
Results of DeepLabV3 + with ResNet50 and Attention gate using 8-Fold Cross validation
| S.No | Model with K-Cross | Accuracy | Precision | Recall | Dice Coefficient | IoU |
|---|---|---|---|---|---|---|
| 1 | DeepLabV3 + with ResNet50 and Attention gate | 0.965 | 0.945 | 0.942 | 0.944 | 0.895 |
| 2 | Proposed Model with 8 – Fold Cross | 0.987 | 0.971 | 0.975 | 0.972 | 0.954 |
Geometrical Feature measurements
In the field of medical imaging, precise and accurate measurement of anatomical structures is crucial for diagnosis, treatment planning, and monitoring of various spinal conditions. The geometrical parameters of interest include the height, width, and signal intensity of the IVDs, as well as the dimensions of the vertebrae. These measurements provide valuable insights into the structural integrity and health of the spine.
Vertebrae Height and Width
Image processing algorithm is used to detect the corner for the measurement of vertebral height and width. Image processing helps to detect the corner of the gray-scale image. The Shi-Tomasi method is employed to identify up to four corners, ensuring that the most prominent points of the vertebral body are detected. The upper left, upper right, lower left, and lower right points of the vertebral body are identified by sorting and arranging these corner points. The height and width of the vertebra is calculated based on the refined corner points using the Euclidean distance between the top and bottom corner points as described in Eq. (3) and (4) respectively.
| 3 |
| 4 |
where, i stands for the ith vertebral body, which can be any number between 1 and 6 (a sacrum vertebrae and five lumbar vertebrae), TCP and BCP stand for top and bottom corner points, respectively, and LCP and RCP denoting the left and right corner points, respectively, while j indicates the corner point coordinate dimension values of 0 and 1.
Disc Height and Width
Image processing algorithm is used to detect the corner for the measurement of IVDs height and width. Again, same Shi-Tomasi method is employed to identify up to four corners, ensuring the most prominent points of the intervertebral discs body are detected. To determine the upper left, lower left, upper right, and lower right points of the IVDs, these corner points are sorted and organized. IVD height calculation has been done in two parts: First, the average IVD height has been calculated by taking average of all the possible points using Euclidean distance formulae and then the central IVD Height has been calculated for the center point of the IVDs described in Eqs. (5) and (6) respectively. Similarly, IVD width calculation has been done in two parts: Firstly, the average IVD width has been calculated by taking average of all the possible points using Euclidean distance formulae and then the central IVD Width has been calculated for the center point of the IVDs described in Eqs. (7) and (8) respectively.
| 5 |
| 6 |
where, j signifies the corner point coordinate dimension, values of 0, 1, k symbolizes all possible points from 1 to n, TCP stands for the top corner point and BCP for the bottom corner point, and i represents the ith IVDs in the range from 1 to 6 (T12-L1 to L5-S1).
| 7 |
| 8 |
where, j signifies the corner point coordinate dimension, values of 0, 1, and LCP and RCP stand for the left and right corner points, respectively, and i denotes the ith IVDs in the range from 1 to 6 (T12-L1 to L5-S1).
Disc-Height Index
To avoid individual differences, the midpoint of the endplate and the angle of the vertebral body are marked, and then the measuring line is produced based on the noted point. The formula for calculating the disc-height index (DHI) is shown in Eq. (9).
| 9 |
The ith lumbar IVD’s height is shown by , while the heights of the ith and (i + 1)th vertebral bodies are represented by and , respectively.
Canal Diameter
The algorithm involves several steps, including preprocessing the image to enhance the canal region, identifying key boundary points, and then computing the canal width at various points. It identifies the top-right, bottom-right, and mid-right points, then calculates the width at each horizontal slice between the top and bottom points. Equations (10) and (11) show the calculation of average and smallest canal width respectively.
| 10 |
| 11 |
where, i denotes the canal position at the ith IVDs, in the range from 1 to 6 (T12-L1 to L5-S1), j denotes the corner point coordinate dimension, values of 0, 1, k denotes the all-possible points from 1 to n and LCP denotes the left corner point and RCP denotes the right corner point similarly MLP stands for mid left point and MRP stands for mid right point. ACW denotes the average canal diameter and CDW denotes the center point canal diameter.
Signal Intensity
Since variations in signal intensity and height might be indicative of water-content loss in spinal IVDs, signal intensity is an important criterion to consider when evaluating the health and pathology of spinal discs. Characterizing the water-content loss situation using IVD is the primary objective in calculating the degree of signal-intensity peak-deviation from the center (∆SI) in our work. The algorithm includes Gaussian blurring to the input image to reduce noise and enhance the regions of interest. This is crucial for accurate pixel value extraction. Analysis of the pixel value histograms to identify the most common pixel value and the (mode) and the first and second peaks. Equation (12) shows the calculation of signal intensity difference.
| 12 |
where, i denotes the location of the ith IVD, and and respectively, denote the signal-intensity levels that the IVD histogram's first and second peaks. Some examples of measured vertebral and IVDs quantitative data has shown in Tables 5 and 6. respectively.
Table 5.
Example of some sample of vertebral quantitative measured data
| S.No | Vertebrae | Height | Width |
|---|---|---|---|
| 1 | L1 | 36.66 | 44.23 |
| 2 | L2 | 35.87 | 46.01 |
| 3 | L3 | 39.07 | 51.44 |
| 4 | L4 | 35.14 | 50.79 |
| 5 | L5 | 34.83 | 46.01 |
Table 6.
Example of some sample of IVD quantitative measured data
| Disc | ADH | CDH | DHI | ADW | CDW | ACW | SCW | ∆SI |
|---|---|---|---|---|---|---|---|---|
| L1-L2 | 14.68 | 14.68 | 0.38 | 18.82 | 18.5 | 32.00 | 32.00 | 68 |
| L2-L3 | 13.34 | 13.34 | 0.41 | 24.52 | 24.19 | 34.00 | 33.00 | 34 |
| L3-L4 | 16.00 | 16.00 | 0.39 | 15.00 | 15.13 | 28.00 | 28.00 | 68 |
| L4-L5 | 15.00 | 15.00 | 0.25 | 17.00 | 17.12 | 27.00 | 27.00 | 35 |
| L5-S1 | 23.09 | 23.09 | 0.44 | 19.92 | 20.62 | 40.13 | 36.00 | 24 |
Mean Squared Error (MSE)
The Mean Squared Error (MSE) was used to evaluate the accuracy of our model’s geometrical parameter measurements. This metric quantifies the average squared difference between predicted values and ground truth measurements, where lower MSE values signify greater precision. The calculated MSE values for the measurements were as follows (Tables 7 and 8).
Table 7.
MSE of vertebral quantitative measured data
| S.No | MSE Values | ||
|---|---|---|---|
| Vertebrae | Height | Width | |
| 1 | L1 | 0.4 mm2 | 1.2 mm2 |
| 2 | L2 | 1.0 mm2 | 0.4 mm2 |
| 3 | L3 | 1.21 mm2 | 0.8 mm2 |
| 4 | L4 | 0.81 mm2 | 0.9 mm2 |
| 5 | L5 | 0.36 mm2 | 1.4 mm2 |
Table 8.
MSE of IVD quantitative measured data
| Disc | ADH | CDH | DHI | ADW | CDW | ACW | SCW |
|---|---|---|---|---|---|---|---|
| L1-L2 | 1.0 mm2 | 0.8 mm2 | 1.2 mm2 | 1.1 mm2 | 0.9 mm2 | 1.0 mm2 | 0.5 mm2 |
| L2-L3 | 0.8 mm2 | 0.7 mm2 | 1.21 mm2 | 1.21 mm2 | 1.1 mm2 | 0.2 mm2 | 0.4 mm2 |
| L3-L4 | 1.2 mm2 | 0.8 mm2 | 0.81 mm2 | 1.4 mm2 | 0.7 mm2 | 0.71 mm2 | 0.5 mm2 |
| L4-L5 | 1.06 mm2 | 0.6 mm2 | 0.36 mm2 | 1.5 mm2 | 1.6 mm2 | 1.35 mm2 | 0.6 mm2 |
| L5-S1 | 0.9 mm2 | 1.0 mm2 | 1.0 mm2 | 0.8 mm2 | 1.4 mm2 | 0.60 mm2 | 0.7 mm2 |
Conclusion
This work is approved by Indian Council of Medical Research (ICMR) and funded by DHR-MRU. In this work, we have embarked on the development of a fully automated system for the quantitative analysis of lumbar spine MRIs. This work addresses the significant clinical challenge of variability and subjectivity in manual segmentation and manual measurements of geometrical features by providing a consistent and objective approach to lumbar spine analysis. The primary focus has been on automating the segmentation of lumbar spine structures and the feature extraction, followed by the measurements of geometrical features of lumbar spine. Using a transfer learning strategy, we have developed a robust semantic segmentation network to accurately identify IVD-related regions in T2 W MRIs. The segmentation components of the project have been successfully implemented, demonstrating the capability of the automated system to accurately identify and quantify spinal structures. An enhanced histogram approach and automated computation techniques are implemented in the quantitation phase to validate the geometric data and signal intensity of IVDs and vertebral bodies. Automating the quantitative analysis of the lumbar spine has significant clinical relevance, primarily in reducing the reliance on radiologists for routine evaluations. This technology enhances efficiency and throughput, streamlining the evaluation process and allowing radiologists to focus on complex cases.
The clinical practice relies on spinal geometrical parameters including disc height and vertebral alignment and canal width for diagnosing conditions such as degenerative disc disease and spinal stenosis and spondylolisthesis. The observation of decreased intervertebral disc height functions as a typical indicator for disc degeneration while spinal canal narrowing indicates stenosis. The model shows its potential clinical application through our comparison of automated measurements with established diagnostic thresholds that radiologists use.
Future work will focus on expanding the dataset and refining predictive models to further support clinical decision-making regarding surgical and non-surgical interventions for spinal disease.
Author Contributions
Purushottam Kumar (Conceptualizing the research idea, Designing the study methodology, collecting data, analyzing data, interpreting results, writing the manuscript, revising drafts).
Dr. Suyash Singh (Supervision, Conceptualizing the research idea, providing critical feedback, securing funding).
Dr. Bunil Kumar Balabantaray (Supervision, Conceptualizing the research idea, providing critical feedback).
Dr. Rajashree Nayak (Providing critical feedback, revising drafts and performing statistical analysis).
Funding
Approved by Indian Council of Medical Research (ICMR) and funded by Department of Health Research—Multidisciplinary Research Unit (DHR-MRU).
Data Availability
The authors confirm that data is available even in raw radiology forms and clinical formats.
Declarations
Ethics Approval
Ethical approval was taken from the Institute Ethical Committee AIIMS Raebareli.
Consent to Participate
Individual’s consent was taken.
Consent to Publish
Taken.
Competing interests
None.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Samartzis D, Karppinen J, Mok F, Fong DY, Luk KD, Cheung KM. A population-based study of juvenile disc degeneration and its association with overweight and obesity, low back pain, and diminished functional status. J Bone Joint Surg Am. 2011;93(7):662-670. 10.2106/JBJS.I.01568 [DOI] [PubMed] [Google Scholar]
- 2.GBD 2017 Disease and Injury Incidence and Prevalence Collaborators. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017 [published correction appears in Lancet. 2019 Jun 22;393(10190):e44. 10.1016/S0140-6736(19)31047-5.]. Lancet. 2018;392(10159):1789–1858. 10.1016/S0140-6736(18)32279-7 [DOI] [PMC free article] [PubMed]
- 3.Hoy D, Brooks P, Blyth F, Buchbinder R. The Epidemiology of low back pain. Best Pract Res Clin Rheumatol. 2010;24(6):769-781. 10.1016/j.berh.2010.10.002 [DOI] [PubMed] [Google Scholar]
- 4.Modic MT, Steinberg PM, Ross JS, Masaryk TJ, Carter JR. Degenerative disk disease: assessment of changes in vertebral body marrow with MR imaging. Radiology. 1988;166(1 Pt 1):193-199. 10.1148/radiology.166.1.3336678 [DOI] [PubMed] [Google Scholar]
- 5.Koes BW, van Tulder MW, Thomas S. Diagnosis and treatment of low back pain. BMJ. 2006;332(7555):1430-1434. 10.1136/bmj.332.7555.1430 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Pearce RH, Thompson JP, Bebault GM, Flak B. Magnetic resonance imaging reflects the chemical changes of aging degeneration in the human intervertebral disk. J Rheumatol Suppl. 1991;27:42-43. [PubMed] [Google Scholar]
- 7.Boden SD, Davis DO, Dina TS, Patronas NJ, Wiesel SW. Abnormal magnetic-resonance scans of the lumbar spine in asymptomatic subjects. A prospective investigation. J Bone Joint Surg Am. 1990;72(3):403–408. [PubMed]
- 8.Pfirrmann CW, Metzdorf A, Zanetti M, Hodler J, Boos N. Magnetic resonance classification of lumbar intervertebral disc degeneration. Spine (Phila Pa 1976). 2001;26(17):1873–1878. 10.1097/00007632-200109010-00011 [DOI] [PubMed]
- 9.Panjabi MM, Goel V, Oxland T, et al. Human lumbar vertebrae. Quantitative three-dimensional anatomy. Spine (Phila Pa 1976). 1992;17(3):299–306. 10.1097/00007632-199203000-00010 [DOI] [PubMed]
- 10.Schultz, A. H. (1960). Vertebral column and thorax (Vol. 4). Karger Medical and Scientific Publishers.
- 11.Waldenberg C, Hebelka H, Brisby H, Lagerstrand KM. Differences in IVD characteristics between low back pain patients and controls associated with HIZ as revealed with quantitative MRI. PLoS One. 2019;14(8):e0220952. Published 2019 Aug 22. 10.1371/journal.pone.0220952 [DOI] [PMC free article] [PubMed]
- 12.Neubert A, Fripp J, Engstrom C, et al. Three-dimensional morphological and signal intensity features for detection of intervertebral disc degeneration from magnetic resonance images. J Am Med Inform Assoc. 2013;20(6):1082-1090. 10.1136/amiajnl-2012-001547 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.POPE, M. H., HANLEY, E. N., MATTERI, R. E., WILDER, D. G., & FRYMOYER, J. W. (1977). Measurement of intervertebral disc space height. Spine, 2(4), 282-286. [Google Scholar]
- 14.Abbati, G., Bauer, S., Winklhofer, S., Schüffler, P. J., Held, U., Burgstaller, J. M., ... & Buhmann, J. M. (2017). MRI-based surgical planning for lumbar spinal stenosis. In Medical Image Computing and Computer Assisted Intervention− MICCAI 2017: 20th International Conference, Quebec City, QC, Canada, September 11–13, 2017, Proceedings, Part III 20 (pp. 116–124). Springer International Publishing.
- 15.Fan G, Li Y, Wang D, et al. Automatic segmentation of dura for quantitative analysis of lumbar stenosis: A deep learning study with 518 CT myelograms. J Appl Clin Med Phys. 2024;25(7):e14378. 10.1002/acm2.14378 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Lu, J. T., Pedemonte, S., Bizzo, B., Doyle, S., Andriole, K. P., Michalski, M. H., ... & Pomerantz, S. R. (2018, November). Deep spine: automated lumbar vertebral segmentation, disc-level designation, and spinal stenosis grading using deep learning. In Machine Learning for Healthcare Conference (pp. 403–419). PMLR.
- 17.Liang YW, Fang YT, Lin TC, et al. The Quantitative Evaluation of Automatic Segmentation in Lumbar Magnetic Resonance Images. Neurospine. 2024;21(2):665-675. 10.14245/ns.2448060.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Shastry, P., Sonawane, B., Mohan, K., Kumarasami, N., SP, K., Venkatesh, K. P., ... & Sivasailam, K. (2025). AI and Deep Learning for Automated Segmentation and Quantitative Measurement of Spinal Structures in MRI. arXiv preprint arXiv:2503.11281.
- 19.Tascini, G., & Zingaretti, P. (1993, October). Automatic quantitative analysis of lumbar bone radiographs. In 1993 IEEE Conference Record Nuclear Science Symposium and Medical Imaging Conference (pp. 1722–1726). IEEE.
- 20.Liu S, Guo C, Zhao Y, et al. A machine learning based quantification system for automated diagnosis of lumbar spondylolisthesis on spinal X-rays. Heliyon. 2024;10(17):e37418. Published 2024 Sep 4. 10.1016/j.heliyon.2024.e37418 [DOI] [PMC free article] [PubMed]
- 21.Akeda K, Hasegawa T, Togo Y, et al. Quantitative Analysis of Lumbar Disc Bulging in Patients with Lumbar Spinal Stenosis: Implication for Surgical Outcomes of Decompression Surgery. J Clin Med. 2023;12(19):6172. Published 2023 Sep 24. 10.3390/jcm12196172 [DOI] [PMC free article] [PubMed]
- 22.Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). LabelMe: a database and web-based tool for image annotation. International journal of computer vision, 77, 157-173. [Google Scholar]
- 23.Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 2015, proceedings, part III 18 (pp. 234–241). Springer international publishing.
- 24.Ruan, J., Li, J., & Xiang, S. (2024). Vm-unet: Vision mamba unet for medical image segmentation. arXiv preprint arXiv:2402.02491.
- 25.Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H. R., & Xu, D. (2021, September). Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In International MICCAI brainlesion workshop (pp. 272–284). Cham: Springer International Publishing.
- 26.He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
- 27.Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 801–818).
- 28.Zhang Z, Sabuncu MR. Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels. Adv Neural Inf Process Syst. 2018;32:8792-8802. [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The authors confirm that data is available even in raw radiology forms and clinical formats.





