Abstract
To develop an automated grading model for rectocele (RC) based on radiomics and evaluate its efficacy. This study retrospectively analyzed a total of 9,392 magnetic resonance imaging (MRI) images obtained from 222 patients who underwent dynamic magnetic resonance defecography (DMRD) over the period from August 2021 to June 2023. The focus was specifically on the defecation phase images of the DMRD, as this phase provides critical information for assessing RC. To develop and evaluate the model, the MRI images from all patients were randomly divided into two groups. 70% of the data were allocated to the training cohort to build the model, and the remaining 30% was reserved as a test cohort to evaluate its performance. First, the severity of RC was assessed using the RC MRI grading criteria by two independent radiologists. To extract and select radiomic features, two additional radiologists independently delineated the regions of interest (ROIs). These features were then dimensionality reduced to retain only the most relevant data for the analysis. The radiomics features were reduced in dimension, and a machine learning model was developed using a Support Vector Machine (SVM). Finally, receiver operating characteristic curve (ROC) and area under the curve (AUC) were used to evaluate the classification efficiency of the model. The AUC (macro/micro) of the model using defecation phase images was 0.794/0.824, and the overall accuracy was 0.754. The radiomics model built using the combination of DMRD defecation phase images is well suited for grading RC and helping clinicians diagnose and treat the disease.
Keywords: Rectocele, Radiomics, Grading, Dynamic magnetic resonance defecography, Magnetic resonance imaging
Subject terms: Diseases, Gastroenterology, Health care, Health occupations, Medical research, Risk factors, Signs and symptoms, Mathematics and computing
Background
Rectocele (RC) is a medical condition in which the anterior wall of the rectum protrudes outward through the rectovaginal septum, forming a bulge in the posterior wall of the vagina. Although many women with isolated RC are asymptomatic, as the condition progresses, it can lead to various symptoms. Symptoms include bowel obstruction, difficulty straining during bowel movements, constipation, and fecal incontinence1. When these symptoms become severe, they can significantly diminish a woman’s quality of life, making timely diagnosis and treatment critical. Research has shown that the prevalence of RC among women ranges from 12.9 to 18.6%, with an average annual incidence of 5.7 cases per 100 women2. As the population continues to age, the number of women affected by RC is expected to increase3. Without accurate diagnosis and effective treatment, RC can severely impact an individual’s quality of life and productivity. Magnetic resonance defecography (MRD) has emerged as a valuable diagnostic tool for assessing RC. It is a non-invasive imaging technique that offers several advantages, including multiplanar imaging capabilities, excellent soft tissue resolution, and the absence of ionizing radiation4. MRD is particularly useful because it not only provides objective measurements of RC size but also captures the dynamic emptying process, offering insights into the functional impact of the condition5. The grading of RC on magnetic resonance imaging (MRI) is based on measurements taken during the phase of maximum strain. A metering tool is used to measure the distance between the anal midline and the anterior rectal wall or the depth of rectal wall protrusion beyond its expected position. RC is categorized into three grades: little: protrusion less than 2 cm; mild: protrusion between 2 and 4 cm; big: protrusion greater than 4 cm6. Accurate grading is critical for clinical judgment and therapy scheduling. However, this grading process relies heavily on manual measurements that are time-consuming and labor-intensive. Furthermore, the outcomes are often affected by variations in screening methods and operator skill levels, leading to inconsistencies in diagnosis and treatment planning. The development of quantitative imaging assessment methods to accurately grade RC is therefore urgently needed. Radiomics, a quantitative imaging analysis technique, involves extracting detailed image features from routine medical images and analyzing these features to build diagnostic and prognostic models7. This method has shown promising results in various medical fields, such as cancer diagnosis, prognosis prediction, and classification of pelvic organ prolapse (POP)8. Although radiomics has been widely used in other domains, the research on RC automatic classification is still blank. This study aims to develop and validate a radiomics model for grading RC. The model uses dynamic magnetic resonance defecography (DMRD) images taken during the defecation phase to propose a quantitative image evaluation method, which saves time and effort, so as to realize the automatic classification of RC.
Methods
Patients
The study focused on patients who underwent DMRD between August 2021 and June 2023 at the Second Hospital of Jilin University in Changchun, China. Ethical approval for this study, in accordance with the tenets of the Declaration of Helsinki, was obtained from the Ethics Committee of the Second Hospital of Jilin University. Since the study was retrospective in nature, informed consent was waived by the Medical Ethics Committee. The inclusion criteria were as follows: (I) Patients who underwent DMRD at our hospital; (II) Patients with complete imaging and clinical data. The exclusion criteria were as follows: (I) History of pelvic malignancy; (II) Lesions where the region of interest (ROI) is not correctly displayed; (III) Poor-quality MRI images that cannot be evaluated such as motion artifacts, blurring, or discontinuity MRI images. Figure 1 shows the patient enrolment and distribution workflow.
Fig. 1.
The workflow of patient enrollment and distribution.
MRI image acquisition, pre-processing and radiological evaluation
The latest DMRD images, stored in DICOM format, were downloaded from the picture archiving and communication system at the Second Hospital of Jilin University. Given the diversity of MRI scanners used in the hospital, including Siemens and Philips machines, inherent variations existed in image quality, layer thickness, and voxel size between patients. To ensure consistency across all images, the research team employed the N4BiasFieldCorrection algorithm in SimpleITK (version 4.11), a widely used tool for correcting bias fields in MRI scans. This algorithm was applied to normalize the grey levels across all MRI images. This normalization process helps reduce intensity variations that can arise from differences in scanner settings and patient positioning, ensuring more accurate subsequent analysis. Additionally, to address the differences in image resolution and dimensions, the team resampled all MRI images and normalized the voxel size to 1 × 1 × 1 mm. This voxel size was determined to be optimal for precise ROI segmentation. Grading the severity of RC based on the MRI images was performed by two experienced pelvic floor radiologists (Radiologists A and B). Radiologist A had 4 years of diagnostic experience, while Radiologist B had 8 years of experience. Both radiologists independently assessed the severity of RC using the following grading scale: normal group: no protrusion. RC I degree: If the rectal protrusion was less than 2 cm. RC II degree: If the protrusion was between 2 and 4 cm. RC III degree: If the protrusion exceeded 4 cm. To minimize bias and maintain the objectivity of the grading process, no demographic or clinical information (such as patient age, medical history, or symptoms) was provided to the radiologists during their evaluation of the MRI images. When the radiologists’ assessments differed, a structured discussion was held to reconcile these differences and reach a consensus. This collaborative approach ensured the reliability and accuracy of the final classifications.
ROI segmentation and radiomics feature extraction
In this study, ROI, which in this case refers to the rectum, was manually delineated by two experienced radiologists (Fig. 2). Reader C had two years of diagnostic experience, while Reader D had five years of experience. Both radiologists used ITK-SNAP (version 4.11) software for the precise delineation of the rectal region in the MRI images (Fig. 3). An internal characterisation program implemented in Pyradiomics (https://github.com/Radiomics/pyradiomics) was used to extract the handmade radiomics features. Before extraction, all MRI images were resampled to an isotropic voxel size of 1 × 1 × 1 mm³ using nearest-neighbor interpolation in SimpleITK to standardize spatial resolution across cases. To ensure consistency across all images, the research team employed the N4BiasFieldCorrection algorithm in SimpleITK (version 4.11), a widely used tool for correcting bias fields in MRI scans. The extracted features were categorized into four main groups, each representing different aspects of the image data. The first group, called first-order features, describes the distribution of voxel intensities within a defined ROI in medical images, without considering spatial relationships between voxels. These features are extracted from the histogram of image values in ROI, and the intensity based features include mean, standard deviation, variance, maximum, median, range, kurtosis, etc. The geometric properties of the lesions are illustrated by the second group, shape-based features. These shape descriptors were derived from the binary mask of the ROI and include parameters such as volume, surface area, sphericity, compactness, and elongation, which reflect the 3D morphology of the lesion. The third set, texture features, describes the spatial distribution of ROI pixels and emphasizes spatial heterogeneity, including grey scale granularity, variation and image roughness. These were calculated from several texture matrices including the gray level co-occurrence matrix (GLCM), gray level run length matrix (GLRLM), gray level size zone matrix (GLSZM), and gray level dependence matrix (GLDM), capturing intra-lesion textural complexity from multiple perspectives. Finally, the fourth set, wavelet transform based features, offers multi-resolution picture descriptive characteristics derived from the wavelet transform of the source image. Specifically, the original images were decomposed into eight wavelet sub-bands using a coiflet-1 filter. Radiomics features were then extracted from each sub-band, enhancing sensitivity to both high- and low-frequency spatial information. All extraction parameters, including resampling settings and filter configurations, were recorded in a YAML parameter file to ensure reproducibility across experiments.
Fig. 2.
ROIs are manually outlined on MRI images. Radiomics features are then extracted from these MRI images by quantifying their intensity, shape and texture. The features were further screened. Subsequently, SVM models were used to assess the grading performance of the selected radiomics features. The accuracy of the model was assessed using ROC curves.
Fig. 3.
Manual ROI Segmentation Process Schematic.
Radiomics feature selection and signature construction
Analysis of variance and chi-square test were applied to the features; only those with a p-value < 0.05 were deemed statistically significant and retained. To reduce redundancy, Spearman’s rank correlation coefficients were calculated for features that demonstrated high repeatability. If two features were strongly correlated (correlation coefficient > 0.9), only one was retained to avoid redundancy. The least absolute shrinkage and selection operator (LASSO) regression model was employed to create a radiomics signature. LASSO shrinks the coefficients of less relevant features to zero, ensuring only the essential features are included in the final model. The optimal regularization parameter (λ) was determined using a 10-fold cross-validation approach, where the λ value minimizing cross-validation error was selected. Features with non-zero coefficients after LASSO regression were used to construct the final radiomics signature. A radiomics score was computed for each patient by taking a linear combination of the retained features, with each feature weighted according to its regression coefficient in the LASSO model. For the LASSO regression modeling, the Python scikit-learn package was used.
Development and evaluation of models
The dataset was randomly divided into two groups: a training group and a testing group, in a 7:3 ratio. Within the training group, 70% of the sample was allocated for building the model, with the other 30% used for independent testing. A Support Vector Machine (SVM) classifier was employed to develop the radiomics model. A SVM classifier with a radial basis function kernel was employed to develop the radiomics model. The SVM was implemented using the scikit-learn library in Python. The performance of the radiomics model was evaluated systematically using the test group. Prior to model training, all radiomics features were standardized using z-score normalization (mean = 0, standard deviation = 1), a necessary step to ensure SVM performance is not biased by feature scale. To comprehensively evaluate the model’s performance, several metrics were employed: accuracy, precision, recall, F1 scores, and receiver operating characteristics curve (ROC). The area under the curve (AUC) was calculated to quantify the model’s ability to distinguish between classes. A higher AUC value indicated better discriminative performance. Additionally, a confusion matrix was constructed to visualize the performance of the classification model, detailing true positive, true negative, false positive, and false negative counts. All evaluations were conducted using the scikit-learn library, ensuring consistency and reproducibility in performance assessment.
Statistical analysis
SPSS software (version 25.0, IBM) was used to perform t-tests, chi-squared tests or Fisher’s exact tests on baseline clinical characteristics. T-tests were used for analyzing continuous variables with homoscedasticity; these variables were represented by the mean (x) and standard deviation (s). For categorical variables, chi-square tests or Fisher’s exact tests were used, depending on the size of the data. A two-tailed p value of less than 0.05 was considered to indicate statistical significance. Python software (version 3.13.0; http://www.python.org) was used for Spearman’s rank correlation tests, z-score normalization, LASSO regression analysis, and ROC curve plotting.
Results
Demographic characteristics of patients
The study utilized a large dataset consisting of 9392 MRI images from 222 patients. These images were acquired during the defecation phase with DMRD. A total of 25 patients were eliminated from the research because of image quality problems. After the exclusions, the final dataset included 222 patients, with 43 men and 179 women. The dataset was split into two groups for analysis. The training group comprised 155 patients, whose data were used to develop the radiomics model. The test group consisted of 67 patients, whose data were reserved for independent evaluation of the model’s performance. Statistical analysis was conducted to compare the demographic characteristics of the training and test groups. The results indicated no significant difference in age between the two groups. However, there was a significant difference in gender distribution between the training and test groups. A comprehensive summary of the demographic characteristics of the patients, including their age and gender distribution, is provided in Table 1.
Table 1.
Baseline characteristics of the study population.
Characteristic | Training cohort (N = 155) | Testing cohort (N = 67) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Normal group | Rectocele I degree | Rectocele II degree | Rectocele III degree | p-value | Normal group | Rectocele I degree | Rectocele II degree | Rectocele III degree | p-value | |
Age(years)a | 51.48 (15.05) | 54.79 (12.74) | 54.38 (10.16) | 52.25 (7.68) | 0.329 | 47.92 (15.91) | 57.23 (12.08) | 52.47 (10.76) | 59.00 (0.00) | 0.181 |
Sexb | < 0.01 | < 0.01 | ||||||||
Male | 21 (40.39) | 8 (14.04) | 0 | 0 | 13 (52.00) | 1 (4.55) | 0 | 0 | ||
Female | 31 (59.62) | 49 (85.97) | 42 (100.00) | 4 (100.00) | 12 (48.00) | 21 (95.46) | 19 (100.00) | 1 (100.00) |
aData is means, with standard deviations in parentheses.
bData is number of patients, with percentages in parentheses.
Feature selection and construction of radiomics signature
In total, 1743 radiomics features were extracted, including 342 first-order, 14 shape-based, and 1387 texture features. Textural features were further divided into 418 GLCM features, 304 GLRLM features, 266 GLDM features, 304 GLSZM features, and 95 neighboring gray tone difference matrix (NGTDM) features. Z-score standardization was used to make the data follow a normal distribution with a mean of 0 and a standard deviation of 1, excluding 4 radiomics features as redundancy. We also conducted analysis of variance, chi-square test, and feature screening for all radiomic features. Only the p value < 0.05 of radiomic features were kept and 1347 features were excluded as redundancy. For features with high repeatability, Spearman’s rank correlation coefficient was also used to calculate the correlation between features and one of the features with correlation coefficient greater than 0.9 between any two features is retained. In this step, 262 radiomics features are excluded as redundancy. The LASSO regression model was used on the discovery data set for signature construction (Fig. 4). Depending on the regulation weight λ, LASSO shrinks all regression coefficients towards zero and sets the coefficients of many irrelevant features exactly to zero. In this step, 110 radiomics features are excluded as redundancy. The retained 20 features (including 6 first-order features, 6 GLSZM features, 1 shape feature, 1 NGTDM feature, 1 GLDM feature, 1 GLRLM feature, 4 GLCM features) with nonzero coefficients were used for regression model fitting and combined into a radiomics signature (Fig. 4). Subsequently, we obtained a radiomics score for each patient by a linear combination of retained features weighed by their model coefficients. Figure 5 shows the non-zero coefficient values for the final selected features.
Fig. 4.
Radiomics features selection based on LASSO algorithm. From left to right are ten-fold cross-validated coefficients, mean square error.
Fig. 5.
The histogram of the radiomics score based on the selected features of ROI.
Performance of radiomic models
The optimal model was obtained by contrasting with a Logistic Regression (LR), Random Forest (RF), Extremely Randomized Trees (Extra Trees), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM) classifier. Table 2 demonstrates the diagnostic ability of the radiomics model for the training and test cohorts. The SVM model performs the best. Therefore, the SVM model is chosen as the foundational model. In our research, particularly in the evaluation of classification models, the terms “micro” and “macro” refer to different methods of averaging performance metrics across multiple classes. Macro-averaging: Macro-averaging calculates the performance metric independently for each class and then computes the average across all classes. This approach treats all classes equally, regardless of their prevalence in the dataset. Micro-averaging aggregates the contributions of all classes to compute the average metric. Specifically, it sums up the true positives, false positives, and false negatives across all classes and then calculates the performance metric using these aggregated values. This approach gives more weight to classes with a higher number of samples, as it considers the overall performance across all instances. The radiomics model showed strong diagnostic capabilities in the training cohort. The AUC values, which measure the model’s ability to distinguish between classes, were 0.879 (macro) and 0.887 (micro). These high AUC values indicate excellent discriminatory performance during model training. In the test cohort, the model achieved an overall accuracy of 0.754. This means that approximately 75.4% of the predictions made by the model in the independent test set were correct. The micro precision, recall, and F1 scores of the model in the test cohort were 0.505, 0.836, and 0.629, respectively. The confusion matrices for both the training and test cohorts confirmed the model’s effective classification of cases into their respective categories (Fig. 6). These matrices demonstrated the model’s ability to make accurate predictions, validating its good predictive performance. The ROC curve analysis for the test cohort further validated the model’s performance (Fig. 7). It assessed the model’s classification ability across four categories, ranging from the normal group to different degrees of rectocele. The AUC values for these categories showed consistently good classification performance: normal group: AUC of 0.766, RC categories: AUC values of 0.773, 0.813, and 0.781 for grades I, II, and III, respectively. These results indicate that the radiomics model was particularly effective in distinguishing between normal and different degrees of RC severity.
Table 2.
Overall performance of six models based on training cohort and testing cohort.
Model | Accuracy (micro) | AUC(micro) | 95%CI(micro) | Sensitivity(micro) | Specificity(micro) | PPV(micro) | NPV(micro) | Precision(micro) | Recall(micro) | F1 | Threshold(micro) | Task |
---|---|---|---|---|---|---|---|---|---|---|---|---|
SVM | 0.818 | 0.887 | 0.860–0.914 | 0.819 | 0.817 | 0.599 | 0.931 | 0.599 | 0.819 | 0.692 | 0.352 | Train |
SVM | 0.754 | 0.824 | 0.767–0.880 | 0.836 | 0.726 | 0.505 | 0.930 | 0.505 | 0.836 | 0.629 | 0.315 | Test |
LR | 0.721 | 0.811 | 0.770–0.851 | 0.858 | 0.675 | 0.468 | 0.935 | 0.468 | 0.858 | 0.606 | 0.207 | Train |
LR | 0.746 | 0.768 | 0.699–0.837 | 0.791 | 0.731 | 0.495 | 0.913 | 0.495 | 0.791 | 0.609 | 0.233 | Test |
Random Forest | 0.702 | 0.820 | 0.784–0.856 | 0.819 | 0.662 | 0.447 | 0.917 | 0.447 | 0.819 | 0.579 | 0.276 | Train |
Random Forest | 0.649 | 0.728 | 0.665–0.791 | 0.746 | 0.617 | 0.394 | 0.879 | 0.394 | 0.746 | 0.515 | 0.246 | Test |
XG Boost | 0.839 | 0.944 | 0.926–0.962 | 0.942 | 0.804 | 0.616 | 0.977 | 0.616 | 0.942 | 0.745 | 0.263 | Train |
XG Boost | 0.631 | 0.716 | 0.650–0.782 | 0.866 | 0.552 | 0.392 | 0.925 | 0.392 | 0.866 | 0.540 | 0.193 | Test |
Light GBM | 0.844 | 0.896 | 0.868–0.923 | 0.774 | 0.867 | 0.659 | 0.920 | 0.659 | 0.774 | 0.712 | 0.359 | Train |
Light GBM | 0.612 | 0.716 | 0.650–0.782 | 0.821 | 0.542 | 0.374 | 0.901 | 0.374 | 0.821 | 0.514 | 0.256 | Test |
Extra Trees | 0.740 | 0.813 | 0.778–0.848 | 0.710 | 0.751 | 0.487 | 0.886 | 0.487 | 0.710 | 0.577 | 0.328 | Train |
Extra Trees | 0.496 | 0.669 | 0.601–0.738 | 0.940 | 0.348 | 0.325 | 0.946 | 0.325 | 0.940 | 0.483 | 0.222 | Test |
Fig. 6.
Confusion matrices for the radiomics model compared on the training and testing cohorts are displayed on the left and right sides, respectively. The figures in the confusion matrices represent the sum of the predicted class.
Fig. 7.
Four-class ROC curves for the training cohort (left) and testing cohort (right) are shown in radiomics model. The ROC curves of micro-average and macro-average are shown as dashed lines, indicating the overall discriminability of the four-class classification based on radiomics model.
Discussion
This study utilized DMRD images obtained during the defecation phase to extract radiomics features. The model employed a SVM classifier, a machine learning technique suited for handling high-dimensional datasets with non-linear relationships. A radiomics model was constructed to automate the grading of RC severity using quantitative features extracted from DMRD images. The radiomics model demonstrated strong diagnostic performance when tested on an independent dataset. In the test cohort, it achieved an overall accuracy of 0.754 and an AUC (micro) of 0.824, indicating its effectiveness in correctly classifying RC severity levels. The model’s ability to differentiate between categories of RC severity was validated through four-category ROC curve analysis, which produced the following AUC values: Normal group: AUC of 0.766, RC I degree: AUC of 0.773, RC II degree: AUC of 0.813, RC III degree: AUC of 0.781. These consistent AUC values reflect the model’s hierarchical classification performance, demonstrating its reliability across all stages of RC severity. The findings of the trial indicate that the radiomics model is highly effective in accurately diagnosing and staging RC. Its ability to classify RC severity into distinct categories highlights its potential as a valuable tool for clinicians.
As life expectancy continues to rise, the incidence of POP is predicted to rise. Among the various types of POP, RC is a common posterior chamber prolapse characterized by the forward projection of the anterior wall of the rectum9. RC can manifest through debilitating symptoms such as vaginal pressure, vaginal masses, sexual dysfunction, and constipation. These symptoms severely affect women’s quality of life, highlighting the need for timely and accurate diagnosis. For effective treatment of RC, it is critical to identify the severity of the condition and adopt appropriate treatment protocols based on its grade. Without a careful pre-operation assessment of the woman’s pelvic floor and proper identification and grading of pelvic floor disorders, there is a significant risk of surgical failure and the need for reoperation10. Dynamic MRI has become a cornerstone in the assessment of pelvic floor disorder due to its ability to visualize pelvic organ activity, pelvic floor muscle weakness, POP and defects in supporting structures. A non-radiation and non-invasive imaging technique, it offers a broad and detailed view of pelvic anatomy and function11. These advantages make MRI an optimal choice for assessing conditions like RC and other pelvic floor disorders. Introduced by Yang et al. in 1991, MRD is now regarded as the most effective imaging technique for evaluating pelvic floor disorders12. On MRI images, RC is identified as a protrusion of the anterior rectal wall beyond an extended line drawn from the anterior wall of the anal canal during maximum abdominal pressure or defecation6. RC severity is graded based on the extent of this protrusion: Grade 1: Protrusion < 2 cm above the extended line, Grade 2: Protrusion between 2 and 4 cm, Grade 3: Protrusion > 4 cm13. The current method of grading RC relies heavily on manual measurements, which are both time-consuming and subjective. These limitations introduce the risk of variability between observers, which may affect the consistency and accuracy of diagnoses. These inherent flaws in the manual grading process pose challenges to maintaining reproducibility for radiologists. This underscores the need for an objective and consistent grading system that can streamline the process and improve reliability.
Computer-aided diagnostic models have gained significant traction in medical imaging over recent years, playing a pivotal role in assisting radiologists with disease diagnosis. Radiomics, in particular, has emerged as a powerful methodology, enabling the extraction of quantitative features from images. This technology has opened the door to automated grading systems for conditions like RC, providing clinicians with valuable insights for diagnosis and treatment planning. Various studies have explored computer-aided techniques for diagnosing and classifying POP. Costa et al. developed an automated classification system for RC and bladder cystocele using defecography data from 21 patients14. This system employed a non-rigid alignment method based on a variational model, combined with K-mean grouping for classification. The system achieved an accuracy and F1 score of 0.86, demonstrating its potential to assist physicians. However, the study’s small sample size posed a risk of overfitting, limiting the generalizability of the findings, in our study we included 9392 MRI images from 222 patients. Robinson et al. constructed a feed-forward artificial neural network (ANN) that used clinical and historical data to differentiate between POP patients and those with good pelvic support15. The best-performing ANN achieved 94% accuracy, 85% sensitivity, and 90% specificity. These results underscore the potential of ANN models for diagnostic support, though they relied heavily on clinical information rather than imaging data. Onal et al. developed SVM-trained binary classification models to parse through demography data, medical history, and MRI-based features16. These models achieved high accuracy (> 90%) for anterior prolapse and good accuracy (80–90%) for apical and posterior prolapse. Their work utilized an automated pelvic floor measurement model to identify ROI on MRI scans but did not focus on specific POP staging or radiomic feature extraction. Consistent with Onal et al., we constructed a radiomics model using the SVM classifier and achieved effective classification performance for RC. As a machine learning technique based on the theory of structure minimisation, it has the power to identify subtle patterns in complex data, particularly in small sample, high-dimensional and non-linear medical data. SVM works by converting the incoming data to a higher dimensional feature space where it is more possible to identify patterns and relationships between data points17,18. In this transformed space, SVM searches for the optimal decision boundary—a concept that allows it to separate data points into different classes. This boundary, or hyperplane, enables the model to classify data points even when they cannot be separated linearly in their original feature space. The hyperplane is the central concept in SVM classification19. It acts as the decision boundary that separates data points into positive and negative classes20. This approach allows SVM to effectively classify data, even in highly complex, multi-dimensional feature spaces, making it a valuable tool in medical imaging and diagnostics, where subtle differences in data can have significant clinical implications. Wang et al. implemented a convolutional neural network (CNN)-based multi-label classification system to diagnose anterior vaginal wall prolapse, posterior vaginal wall prolapse, and uterine prolapse21. However, this system focused exclusively on diagnosing the type of prolapse, neglecting the severity grading required for guiding treatment decisions. This study addresses the limitations of previous research by targeting automatic graded classification of RC. Accurate grading is critical for clinicians to develop tailored treatment plans for different levels of RC severity. While deep learning techniques like CNNs are often considered state-of-the-art for image classification, their complex internal structures make them less interpretable22. Clinicians often struggle to understand how deep learning models arrive at specific decisions, which limits their clinical adoption. Radiomics, on the other hand, offers a more transparent approach by focusing on meaningful and traceable image features that provide actionable insights for diagnosis and treatment planning23. In our study, ROI segmentation technique that outlined ROIs along the entire rectal margin enabled the model to extract radiomics features with higher precision while filtering out irrelevant image information, thereby enhancing the model’s accuracy and diagnostic power.
In clinical practice, diagnosis and staging of RC are often dependent on manual assessment through physical examination and imaging interpretation, which can be subject to interobserver variability and diagnostic delays. Our model, by providing an automated and objective classification system based on imaging features, offers a standardized decision-support tool that could assist clinicians in more rapidly and consistently identifying RC stages. Specifically, the model’s high sensitivity (83.6%) and high negative predictive value (93.0%) suggest it could be particularly effective as a screening aid, helping to flag cases for further detailed clinical evaluation while minimizing missed diagnoses. In terms of integration, the model can be deployed as part of imaging analysis software within hospital information systems, where it can automatically provide probability scores and stage predictions alongside routine imaging outputs, without adding significant workflow burden. This seamless integration would allow clinicians to retain decision-making authority while benefiting from machine-learning-assisted triage support. Nevertheless, we acknowledge the limitations, including the moderate specificity (72.6%) and precision (50.5%), and recognize that the model currently performs less optimally on rare higher-grade RC cases (e.g., RC III degree) due to limited sample size. Future directions include expanding the dataset to improve the generalizability across all stages, particularly for severe cases, and prospective validation in real-world clinical settings. Thus, our model improves upon current diagnostic practices by offering an objective, reproducible, and rapid preliminary classification tool, which complements expert clinical judgment, aiming to enhance diagnostic efficiency and consistency.
This study has several limitations that should be acknowledged. One significant limitation of our research is its reliance on a retrospective analysis. All the radiographic images in this study were sourced from a single hospital, which may introduce selective bias. This single-center design may limit the heterogeneity of patient demographics and imaging protocols, thereby reducing the external validity of the findings. The lack of external datasets for validation means that the findings may not be fully representative of broader patient populations. In future studies, multi-center data collection and external validation cohorts should be considered to improve generalizability and reduce selection bias. The study used hand ROI delineation for every frame, a process that is inherently laborious and can lead to variability between observers. Variability arises because different radiologists may delineate the ROI differently, potentially affecting the consistency and reproducibility of the results. Moreover, inter-observer variability in manual segmentation can propagate errors into downstream radiomic feature extraction and model training, thus introducing systematic imaging bias. To overcome this problem, further work is needed to investigate the use of automatic segmentation methods, which could significantly improve the efficiency and reduce the errors associated with manual methods. Another imaging-related limitation stems from the potential inconsistency in MRI acquisition parameters, such as scanner type, field strength, slice thickness, and contrast protocols, which can affect feature robustness. Without standardized imaging protocols or post-hoc harmonization techniques, radiomics features may reflect acquisition artifacts rather than true biological differences. This study focused on classification but did not explore the relationship between the model’s classification results and post-operative outcomes. Understanding how the predictions of the radiomics model correlate with surgical or therapeutic outcomes is crucial for validating its clinical relevance. Future research should link radiomics-based grading with actual clinical endpoints, such as symptom improvement or surgical success, to confirm its predictive value in patient care.
Conclusions
Our research shows that an MRI-based radiomics model can correctly assess the grade of RC. The combination of SVM classifiers with information from defecation phase images ensures the accuracy and consistency of RC grading, which is crucial for clinical judgment and therapy scheduling.
Acknowledgements
We are grateful to the individuals for their support and advice in provision of data.
Author contributions
All authors have participated in collecting data, writing the manuscript, and reviewing the literature. W.L. analysed the data statistically and created tables and figures. Z.Z., R.Q., J.L., S.W. collect the survey data. W.L., J.L., Z.Z. critically and linguistically revised the manuscript. W.L., R.Q., M.W. contributed to the revision and preparation of the manuscript. W.L., J.L., and M.W. conceived and supervised the conduct of the study. All authors read and approved the final manuscript.
Data availability
All data generated or analysed during this study are included in this published article.
Declarations
Competing interests
The authors declare no competing interests.
Ethical approval
The Medical Ethics Committee of the Second Hospital of Jilin University (2024) Annual Review No. (421). approved this retrospective study.
Accordance statement
This study was conducted in accordance with the tenets of the Declaration of Helsinki, was obtained.
Informed consent
Informed consent was waived by the Medical Ethics Committee of the Second Hospital of Jilin University.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Zimmermann, E. F., Hayes, R. S., Daniels, I. R., Smart, N. J. & Warwick, A. M. Transperineal rectocele repair: a systematic review. ANZ J. Surg.87, 773–779 (2017). [DOI] [PubMed] [Google Scholar]
- 2.Karram, M. & Maher, C. Surgery for posterior vaginal wall prolapse. Int. Urogynecol. J.24, 1835–1841 (2013). [DOI] [PubMed] [Google Scholar]
- 3.Cundiff, G. W. & Fenner, D. Evaluation and treatment of women with Rectocele: focus on associated defecatory and sexual dysfunction. Obstet. Gynecol.104, 1403–1421 (2004). [DOI] [PubMed] [Google Scholar]
- 4.Kanmaniraja, D. et al. MR defecography review. Abdom. Radiol.46, 1334–1350 (2021). [DOI] [PubMed] [Google Scholar]
- 5.Mortele, K. J. & Fairhurst, J. Dynamic MR defecography of the posterior compartment: indications, techniques and MRI features. Eur. J. Radiol.61, 462–472 (2007). [DOI] [PubMed] [Google Scholar]
- 6.Maccioni, F. Functional disorders of the ano-rectal compartment of the pelvic floor: clinical and diagnostic value of dynamic MRI. Abdom. Imaging. 38, 930–951 (2013). [DOI] [PubMed] [Google Scholar]
- 7.Gardin, I. et al. Radiomics: principles and radiotherapy applications. Crit. Rev. Oncol. Hematol.138, 44–50 (2019). [DOI] [PubMed] [Google Scholar]
- 8.Liu, Z. et al. The applications of radiomics in precision diagnosis and treatment of oncology: opportunities and challenges. Theranostics9, 1303–1322 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Segal, J. L. & Karram, M. M. Evaluation and management of rectoceles. Curr. Opin. Urol.12, 345–352 (2002). [DOI] [PubMed] [Google Scholar]
- 10.Safir, M. H., Gousse, A. E., Rovner, E. S., Ginsberg, D. A. & Raz, S. 4-Defect repair of grade 4 cystocele. J. Urol.161, 587–594 (1999). [PubMed] [Google Scholar]
- 11.Rodríguez, L. V. & Raz, S. Diagnostic imaging of pelvic floor dysfunction. Curr. Opin. Urol.11, 423–428 (2001). [DOI] [PubMed] [Google Scholar]
- 12.Yang, A., Mostwin, J. L., Rosenshein, N. B. & Zerhouni, E. A. Pelvic floor descent in women: dynamic evaluation with fast MR imaging and cinematic display. Radiology179, 25–33 (1991). [DOI] [PubMed] [Google Scholar]
- 13.Reiner, C. S. & Weishaupt, D. Dynamic pelvic floor imaging: MRI techniques and imaging parameters. Abdom. Imaging. 38, 903–911 (2013). [DOI] [PubMed] [Google Scholar]
- 14.Costa, C. L., Macedo, T. A. A. & Barcelos, C. A. Z. Pre-diagnosis of pelvic floor disorders-based image registration and clustering. IEEE. pp, 572–577; Pre-diagnosis of pelvic floor disorders-based image registration and clustering | IEEE Conference Publication | IEEE Xplore (2019).
- 15.Robinson, C. J., Swift, S., Johnson, D. D. & Almeida, J. S. Prediction of pelvic organ prolapse using an artificial neural network. Am. J. Obstet. Gynecol.199, 193e1–193e1936 (2008). [DOI] [PubMed] [Google Scholar]
- 16.Onal, S. et al. Quantitative assessment of new MRI-based measurements to differentiate low and high stages of pelvic organ prolapse using support vector machines. Int. Urogynecol. J.26, 707–713 (2015). [DOI] [PubMed] [Google Scholar]
- 17.Li, G. et al. Diagnosis of renal diseases based on machine learning methods using ultrasound images. Curr. Med. Imaging. 17, 425–432 (2021). [DOI] [PubMed] [Google Scholar]
- 18.Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V. & Fotiadis D. I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J.13, 8–17 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Rodríguez-Pérez, R. & Bajorath, J. Evolution of support vector machine and regression modeling in chemoinformatics and drug discovery. J. Comput. Aided Mol. Des.36, 355–362 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Chen, J. et al. Ultrasound-Based radiomics for the classification of Henoch-Schönlein Purpura nephritis in children. Ultrason. Imaging. 46, 110–120 (2024). [DOI] [PubMed] [Google Scholar]
- 21.Wang, X. et al. Multi-label classification of pelvic organ prolapse using stress magnetic resonance imaging with deep learning. Int. Urogynecol. J.33, 2869–2877 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Li, W. et al. Nomogram model based on radiomics signatures and age to assist in the diagnosis of knee osteoarthritis. Exp. Gerontol.171, 112031 (2023). [DOI] [PubMed] [Google Scholar]
- 23.Gillies, R. J., Kinahan, P. E. & Hricak, H. Radiomics: images are more than pictures. They Are Data Radiology. 278, 563–577 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All data generated or analysed during this study are included in this published article.