Skip to main content
BMJ Open logoLink to BMJ Open
. 2025 Aug 22;15(8):e097575. doi: 10.1136/bmjopen-2024-097575

Predictive model integrating deep learning and clinical features based on ultrasound imaging data for surgical intervention in intussusception in children younger than 8 months

Yu-feng Qian 1,0, Jin-jin Zhou 2,0, San-li Shi 3,*, Wan-liang Guo 1,
PMCID: PMC12374633  PMID: 40846341

Abstract

Abstract

Objectives

The objective of this study was to identify risk factors for enema reduction failure and to establish a combined model that integrates deep learning (DL) features and clinical features for predicting surgical intervention in intussusception in children younger than 8 months of age.

Design

A retrospective study with a prospective validation cohort of intussusception.

Setting and data

The retrospective data were collected from two hospitals in south east China between January 2017 and December 2022. The prospective data were collected between January 2023 and July 2024.

Participants

A total of 415 intussusception cases in patients younger than 8 months were included in the study.

Methods

280 cases collected from Centre 1 were randomly divided into two groups at a 7:3 ratio: the training cohort (n=196) and the internal validation cohort (n=84). 85 cases collected from Centre 2 were designed as external validation cohort. Pretrained DL networks were used to extract deep transfer learning features, with least absolute shrinkage and selection operator regression selecting the non-zero coefficient features. The clinical features were screened by univariate and multivariate logistic regression analyses. We constructed a combined model that integrated the selected two types of features, along with individual clinical and DL models for comparison. Additionally, the combined model was validated in a prospective cohort (n=50) collected from Centre 1.

Results

In the internal and external validation cohorts, the combined model (area under curve (AUC): 0.911 and 0.871, respectively) demonstrated better performance for predicting surgical intervention in intussusception in children younger than 8 months of age than the clinical model (AUC: 0.776 and 0.740, respectively) and the DL model (AUC: 0.828 and 0.793, respectively). In the prospective validation cohort, the combined model also demonstrated impressive performance with an AUC of 0.890.

Conclusion

The combined model, integrating DL and clinical features, demonstrated stable predictive accuracy, suggesting its potential for improving clinical therapeutic strategies for intussusception.

Keywords: Paediatric radiology, Paediatric colorectal surgery, Artificial Intelligence


STRENGTHS AND LIMITATIONS OF THIS STUDY.

  • This was a multicentre study that included data from two tertiary paediatric hospitals in China.

  • The combined model was prospectively validated, enhancing its credibility.

  • The proposed models demonstrated practical value in predicting surgical intervention in paediatric intussusception and enhancing clinical therapeutic strategies.

  • The dataset of this study remained relatively small. A larger dataset would enhance the robustness and generalisability of our findings.

Introduction

Intussusception is a common cause of acute abdomen in infants, primarily in the first 2 years of life, with the highest incidence in those aged 4–10 months.1 2 It is defined as the invagination of one part of the intestine into another.3,5 Fluoroscopic or ultrasonographic guidance of hydrostatic and air enema reduction can be the preferred treatment of choice, with a range of overall success rates between 42% and 95%.6 Some children still require surgical intervention despite advancements and refinements in enema reduction techniques. Surgical reduction is reserved for failed cases of nonsurgical methods and cases presenting complications such as bowel necrosis and peritonitis.7 8 Identifying patients at a high risk of failure of non-operative reduction is crucial for optimising management strategies. Early recognition allows for better planning, such as planning the next step in advance and informing the parents about the possibility of unsuccessful reduction, thereby facilitating timely surgical intervention. In recent years, increasing studies have focused on younger paediatric patients.9 10 They have indicated that failure of non-operative reduction is more likely to occur in younger children, especially in those under 12 months, although a standardised age threshold has not been established. Shekherdimian et al highlighted that the failure rate of enema reduction in patients under 6 months could be as high as 78.4%, with 51.7% of the patients requiring bowel resection.11 In this context, it is imminent to identify risk factors for enema reduction failure in younger children and to establish a predictive model for identifying cases likely to require surgical intervention, aiming to decrease delays and improve salvage of the intussuscepted bowel.

Recently, deep convolutional neural networks have demonstrated phenomenal performance in computer vision and have found applications in medical imaging.12,14 They represent an emerging technique where features are automatically extracted from imaging data within the hidden layers of neural networks.15,17 This yields quantitative and high-throughput features from medical images such as CT, MRI and ultrasound. These features are then employed to develop diagnostic and predictive models by various machine learning and deep learning (DL) methods.18 However, compared with the prediction model only constructed based on clinical data, DL radiomics combined with clinical parameters can integrate clinical information and network features to provide complementary information for image features.19,21 We believe that with the help of DL networks, extracting DL features can identify characteristics that are not observable or are easily overlooked by the human eye, thereby effectively assisting in determining whether a child with intussusception requires surgical intervention.

In this study, we extracted the deep radiomics features from ultrasound images for intussusception in children younger than 8 months. Then, we proposed three models, namely, a clinical model, a DL model and a combined model that integrates DL radiomics features and clinical features for predicting surgical intervention in paediatric intussusception.

Materials and methods

Patients

This multicentre study, comprising both retrospective and prospective components, used data from two hospitals in south east China, namely, Children’s Hospital of Soochow University (Centre 1) and Affiliated Changzhou Children’s Hospital of Nantong University (Centre 2).

The subjects were diagnosed with acute intussusception via abdominal ultrasound conducted during their hospital visit. Then, those confirmed with intussusception were treated with air enema reduction. Patients who did not achieve successful reduction of intussusception after three enema reduction attempts required surgical treatment. The interval between the three enema reduction attempts was 1 hour if the patient was stable.

Between January 2017 and December 2022, in Centre 1, a total of 4818 consecutive patients were diagnosed with acute intussusception, of which 183 underwent surgery after failed air enema reduction. The patients were divided into the successful air enema group and the operation group based on the results of the air enema reduction and surgery. The overall failure rate of air enemas was 3.8%. However, further analysis revealed that the failure rate of air enemas varied with age, generally increasing as the child’s age decreased. More information is shown in online supplemental table 1 and figure 1. In patients younger than 8 months, the failure rate of air enemas was 23.7%, while in those older than 8 months, it was only 2.4%. The difference between the two age groups was statistically significant (p<0.001). Therefore, we focused on the children with intussusception younger than 8 months. This age threshold was not only supported by clinical observations and prior literature suggesting distinct anatomical and physiological characteristics in younger infants but also validated through exploratory analyses using multiple age cut-offs (6, 8, 10 and 12 months), with 8 months yielding the most statistically significant separation in air enema failure rates.

The inclusion criteria were as follows: (1) age <8 months, (2) both abdominal ultrasound examination and air enema reduction performed and (3) availability of images and clinical data. The exclusion criteria were as follows: (1) history of previous intestinal surgery and (2) inadequate ultrasound image quality that cannot be used for further analysis.

We retrospectively enrolled 390 patients with acute intussusception in children younger than 8 months across the two centres between January 2017 and December 2022. At Centre 1, 280 of the 299 patients met the inclusion criteria. These patients were then randomly divided into two groups at a 7:3 ratio, namely, the training cohort (n=196) and the internal validation cohort (n=84). At Centre 2, a total of 85 of the 91 patients met the criteria and were designated as the external validation cohort. Additionally, we enrolled 50 patients between January 2023 and July 2024 from Centre 1—the prospective validation cohort. Among these patients, 12 cases (approximately 3% of the dataset) had missing values in one or more clinical features. To ensure data integrity and minimise bias potentially introduced by imputation, these cases were excluded from all analyses. Preliminary assessment of the missing data pattern showed no significant differences between cases with missing data and those with complete records in key clinical variables. Therefore, the missing data were assumed to be missing at random, and complete case analysis was considered appropriate for this study.

We included all available and eligible cases within the study period to maximise the robustness and generalisability of the model. This pragmatic approach was chosen to reflect real-world clinical practice and ensure the diversity of patient characteristics. Figure 1 shows the patient recruitment workflow.

Figure 1. Flowchart of patient recruitment. Center 1, Children’s Hospital of Soochow University; Center 2, Affiliated Changzhou Children’s Hospital of Nantong University.

Figure 1

Ultrasound examination and image acquisition

All of the enrolled patients underwent an abdominal ultrasound examination prior to air enema reduction or surgery. The ultrasound machines included TOSHIBA Aplio400, VINNO 80, Canon Aplio400, Siemens Acuson S1000 and GE HealthCare E20. During the ultrasound scanning process, both a linear array probe and a convex array probe were used to conduct a thorough scan of the entire abdominal cavity. Longitudinal and transverse multisectional scans were conducted upon identifying any anomalies. Once the lesions were distinctly visualised, the location of the mass was recorded, the diameter of the ‘concentric circle’ sign was measured and the mesenteric lymph nodes in the lesions were observed. The greyscale images of intussusception were routinely stored in Picture Archiving and Communication Systems.

A static greyscale image of the most extensive long-axis cross-section for intussusception was selected to evaluate the characteristics of intussusception based on the following ultrasound features: nested position, mass size, pathologic lead points (PLPs), enlarged lymph nodes and hydrops abdominis. The ultrasound images were independently reviewed by two radiologists (J-jZ, who has 5 years of diagnostic experience and Y-fQ, who has 8 years of diagnostic experience). The inter-rater agreement between them was assessed using Cohen’s kappa coefficient, which was 0.87, demonstrating excellent consistency. Detailed results are presented in online supplemental table 2. If there was a discrepancy between the interpretations of the two radiologists, the final decision was made by a third radiologist (W-lG, who has more than 15 years of experience in ultrasound imaging).

Clinical characteristics and ultrasound features of the patients

We referred to clinical data along with conventional ultrasound features collectively as the clinical features. The patients’ clinical data were obtained from regular clinical records. A total of 16 clinical features were retrieved, namely, age, gender, abdominal pain, paroxysms of crying, vomiting, bloody stools, abdominal mass, fever, diarrhoea, nested position, mass size, PLPs, enlarged lymph nodes, abdominal and pelvic effusion, air enema pressure and duration of symptoms.

In this context, nested position was defined as the right abdomen or left abdomen depending on whether the mass was located on the right or on the left of the spine, respectively. Enlarged lymph nodes were defined as those with a short-axis diameter >15 mm. Mass size was defined as the average of the long and short diameters in the maximum section of ultrasound. The duration of symptoms was categorised as either exceeding 24 hours or not exceeding 24 hours.

Region of interest segmentation

In this study, all of the reviewed ultrasound images of the enrolled patients were stored in Digital Imaging and Communications in Medicine(DICOM) format, and then converted to Neuroimaging Informatics Technology Initiative(NII) format. Two radiologists (Y-fQ and W-lG), who were blinded to the results, used ITK-SNAP software (V.3.8.0, http://www.itksnap.org/) to manually segment regions of interest (ROIs). Each mass was outlined along its margins, and a bounding rectangle around the most extensive long-axis cross-section of the mass was used to crop the maximum ROI. The obtained images were used for subsequent model development. The interobserver and intraobserver agreements were evaluated using the intraclass correlation coefficient (ICC). 70 patients were randomly selected for repeated ROI segmentation by one of the radiologists (Yf-Q.) after a 3-week interval. An ICC >0.7 was considered an indicator of good consistency and was retained for subsequent analysis.

Deep learning feature extraction and feature compression

Transfer learning was applied herein from the pretrained convolutional neural networks for extracting DL features. In this study, all implementations were performed using the Onekey AI platform, which provides standardised DL pipelines based on PyTorch frameworks. We selected ResNet-50 as our pretrained model (torchvision implementation), which had originally been trained on the ImageNet Large Scale Visual Recognition Challenge 2012 dataset. We selected the largest cross-sectional view from the ultrasound images and normalised the intensity values to a range of [−1, 1] using min–max transformation. Each of those images from cropped subregions was then resized to 224×224 pixels using nearest-neighbour interpolation. These processed images served as the input for the model. After removing the final classification layer, 2048-dimensional DL feature vectors were extracted from the global average pooling layer of ResNet-50. To ensure a balanced representation of the features, principal component analysis (PCA) was subsequently applied for dimensionality reduction (using scikit-learn’s PCA with n_components=0.95). We standardised all of the features using z-score standardisation method and then conducted correlation coefficient screening with Spearman’s rank correlation coefficient test to evaluate the interrelationship between the features, aiming to ensure the robustness of the features. Any feature having a correlation coefficient >0.9 with any two features was retained with one of these two features. Finally, we applied the least absolute shrinkage and selection operator (LASSO) logistic regression (LR) algorithm, combined with 10-fold cross-validation for penalty parameter tuning to select the features with non-zero coefficients from the training cohort. The selected features and optimal parameters were then applied to the validation cohorts for model evaluation.

Construction of predictive models

After performing LASSO feature selection, we identified the ultimate retained features for DL models. Based on these features, we developed a DL predictive model with various machine learning algorithms, including LR, support vector machine, k-nearest neighbour, random forest, ExtraTrees, eXtreme Gradient Boosting and light gradient boosting machine. Traditional machine learning classifiers were implemented using the scikit-learn library. Hyperparameter tuning was performed using 10-fold cross-validation on the training set. Detailed hyperparameters are presented in online supplemental table 3.

Furthermore, we conducted univariate analysis to compare the clinical features between the two groups in the training cohort. The variables that showed significant differences (p<0.05) were further evaluated using multivariate analysis. For every independent factor, ORs with 95% CIs were determined to estimate the relative risk. The selected variables were then used to construct a clinical model employing the same machine-learning algorithms.

Finally, we constructed a combined model, which integrated the selected DL and clinical features, applying the same machine learning algorithms. We assessed the predictive performance of these models by comparing the area under the receiver operating characteristic curve (AUC-ROC), accuracy, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). Ultimately, the optimal models were obtained. Figure 2 shows the entire process of model construction.

Figure 2. Workflow of model development. LASSO, least absolute shrinkage and selection operator; PCA, principal component analysis; ROC, receiver operating characteristic; ROI, region of interest.

Figure 2

Statistical analysis

SPSS software (V.27.0) and Python (V.3.9.0, http://www.python.org) were used for statistical analyses. The normality of continuous variables was assessed using the Shapiro-Wilk test. Variables conforming to normal distribution were presented as mean±SD and compared by Student’s t-test. For continuous variables not meeting the normality assumption, non-parametric tests were applied. Categorical variables were expressed as numbers (percentages) and were analysed using the χ2 test.

The AUC, along with accuracy, sensitivity, specificity, PPV and NPV, was compared between the predictive models. DeLong’s test was employed to compare the AUC values. Calibration curves were used to assess the agreement between the observed and predicted outcomes. The clinical utility of the prediction models was assessed using decision curve analysis (DCA) to calculate the net benefit across different threshold probabilities. The Hosmer-Lemeshow test was applied to examine the consistency between the expected and observed probabilities in the prediction model. All of the calculations were performed with 95% CIs, and statistical significance was considered when p value was <0.05 in a two-sided analysis.

Patient and public involvement

Patients and the public were not involved in the design, conduct, reporting or dissemination plans of this research.

Results

Patient population and clinical characteristics

A total of 365 patients with intussusception in children younger than 8 months of age were enrolled for model construction. This cohort comprised 196 patients in the training cohort, 84 patients in the internal validation cohort and 85 patients in the external validation cohort, and the rates of surgical reduction were 25% (49/196), 26.19% (22/84) and 24.7% (21/85), respectively. Table 1 summarises the clinical and ultrasonic characteristics of these cohorts. There were no significant differences between the training and two validation cohorts.

Table 1. Clinical and ultrasound characteristics in the training, internal validation, external validation and prospective cohorts.

Characteristics A retrospective dataset Prospective cohort (n=50)
Training cohort (n=196) Internal validation cohort (n=84) External validation cohort (n=85) P value
Age, mean±SD, months 6.33±1.55 6.35±1.61 6.55±1.29 0.739 6.86±1.30
Pressure, mean±SD, kPa 12.03±0.73 11.99±0.96 12.06±0.92 0.622 12.01±0.85
Size, mean±SD, mm 30.62±3.66 30.38±3.93 30.30±5.20 0.770 30.75±3.60
Gender, No. (%) 0.322
 Female 77 (39.29) 26 (30.95) 35 (41.18) 18 (36.00)
 Male 119 (60.71) 58 (69.05) 50 (58.82) 32 (64.00)
Abdominal pain 0.830
 Negative 118 (60.20) 53 (63.10) 54 (63.53) 33 (66.00)
 Positive 78 (39.80) 31 (36.90) 31 (36.47) 17 (34.00)
Paroxysms of crying 0.600
 Negative 89 (45.41) 35 (41.67) 42 (49.41) 33 (66.00)
 Positive 107 (54.59) 49 (58.33) 43 (50.59) 17 (34.00)
Vomit 0.165
 Negative 54 (27.55) 20 (23.81) 31 (36.47) 10 (20.00)
 Positive 142 (72.45) 64 (76.19) 54 (63.53) 40 (80.00)
Bloody stools 0.363
 Negative 139 (70.92) 58 (69.05) 53 (62.35) 32 (64.00)
 Positive 57 (29.08) 26 (30.95) 32 (37.65) 18 (36.00)
Abdominal mass 0.391
 Negative 160 (81.63) 70 (83.33) 75 (88.24) 49 (98.00)
 Positive 36 (18.37) 14 (16.67) 10 (11.76) 1 (2.00)
Fever 0.098
 Negative 182 (92.86) 71 (84.52) 76 (89.41) 44 (88.00)
 Positive 14 (7.14) 13 (15.48) 9 (10.59) 6 (12.00)
Diarrhoea 0.437
 Negative 187 (95.41) 77 (91.67) 79 (92.94) 38 (76.00)
 Positive 9 (4.59) 7 (8.33) 6 (7.06) 12 (24.00)
Nested position 0.489
 Right 173 (88.27) 78 (92.86) 75 (88.24) 41 (82.00)
 Left 23 (11.73) 6 (7.14) 10 (11.76) 9 (18.00)
Lymphadenectasis 0.676
 Negative 170 (86.73) 76 (90.48) 75 (88.24) 48 (96.00)
 Positive 26 (13.27) 8 (9.52) 10 (11.76) 2 (4.00)
Abdominal and pelvic effusion 0.550
 Negative 162 (82.65) 70 (83.33) 66 (77.65) 34 (68.00)
 Positive 34 (17.35) 14 (16.67) 19 (22.35) 16 (34.00)
Symptom duration 0.061
 <24 hours 128 (65.31) 62 (73.81) 48 (56.47) 36 (72.00)
 ≥24 hours 68 (34.69) 22 (26.19) 37 (43.53) 14 (28.00)
PLPs 0.555
 Negative 183 (93.37) 81 (96.43) 79 (92.94) 48 (96.00)
 Positive 13 (6.63) 3 (3.57) 6 (7.06) 2 (4.00)

Numerical data are presented as mean±SD, and categorical data as numbers (n%).

PLPs, pathologic lead points.

The univariate analysis revealed that vomiting, bloody stools, nested position, mass size, PLPs, abdominal and pelvic effusion, and duration of symptoms were associated with surgical intervention for intussusception in the training cohort. Then, these characteristics were imported to multivariate LR analysis. The multivariate LR analysis revealed nested position, mass size, PLPs, bloody stools, abdominal and pelvic effusion, and duration of symptoms (all p<0.05) as the independent significant factors associated with surgical intervention for intussusception (table 2).

Table 2. Univariate and multivariable logistic regression analyses for selecting clinical features of model development.

Variable Univariate analysis Multivariate analysis
OR (95% CI) P value OR (95% CI) P value
Gender 1.875 (0.931 to 3.778) 0.079
Age (months) 0.895 (0.73 to 1.098) 0.288
Abdominal pain 0.588 (0.295 to 1.173) 0.132
Vomit 2.035 (1.032 to 4.012) 0.040* 1.429 (0.582 to 3.508) 0.436
Paroxysms of crying 1.428 (0.668 to 3.052) 0.357
Bloody stools 2.940 (1.49 to 5.801) 0.002* 2.291 (1.041 to 5.041) 0.039*
Abdominal mass 0.425 (0.155 to 1.163) 0.096
Fever 1.218 (0.364 to 4.073) 0.749
Diarrhoea 0.362 (0.044 to 2.97) 0.344
Nested position 6.133 (2.454 to 15.327) <0.001* 3.509 (1.131 to 10.883) 0.030*
Air enema pressure 1.228 (0.785 to 1.922) 0.368
Size (mm) 1.147 (1.046 to 1.259) 0.004* 1.184 (1.062 to 1.319) 0.002*
Enlarged lymph nodes 1.398 (0.566 to 3.453) 0.467
Abdominal and pelvic effusion 4.062 (1.871 to 8.823) <0.001* 3.171 (1.203 to 8.355) 0.020*
Symptom duration 2.519 (1.298 to 4.891) 0.006* 3.251 (1.498 to 7.056) 0.003*
PLPs 5.541 (1.72 to 17.857) 0.004* 4.031 (1.031 to 15.76) 0.045*
*

Represents p<0.05.

PLPs, pathologic lead points.

Feature extraction and selection for model construction

A total of 2048 DL features were extracted from segmentations. After the downscaling of DL features and LASSO LR, 13 features with non-zero coefficients were retained for the construction of the DL model. The coefficients and mean SEs of LASSO regression, resulting from the 10-fold validation, and the coefficients of these retained DL features are depicted in online supplemental figure 2. Based on the univariate and multivariate LR analyses, six clinical features were initially selected for the construction of the clinical model. The construction of the combined model was based on the six clinical features and 13 DL features. Among the 19 features, the DL features showed higher correlation coefficients compared with the clinical features.

Gradient-weighted Class Activation Mapping (Grad-CAM) was applied to visualise the network to explore the interpretability of the DL model. Grad-CAM generates a coarse localisation map that highlights important regions relevant to the classification target (online supplemental figure 3).

Performance and comparisons of three predictive models

The ROC curves for the predictive models created with different common machine learning algorithms are displayed in online supplemental figure 4, with the three optimal predictive models illustrated in online supplemental figure 5. Additional metrics are included in online supplemental table 4. In the internal validation cohort, the combined model achieved the highest performance for predicting surgical intervention in paediatric intussusception with an AUC of 0.911 (95% CI 0.8385 to 0.9826), accuracy of 0.893, sensitivity of 0.773, specificity of 0.935, PPV of 0.810 and NPV of 0.921, while the clinical model achieved an AUC of 0.776 (95% CI 0.6580 to 0.8933) and the DL model achieved an AUC of 0.828 (95% CI 0.7259 to 0.9295). In the external validation cohort, the combined model achieved the highest AUC of 0.871 (95% CI 0.7663 to 0.9762), accuracy of 0.906, sensitivity of 0.905, specificity of 0.906, PPV of 0.760 and NPV of 0.967. The clinical model and the DL model achieved lower AUCs, namely, 0.740 (95% CI 0.6199 to 0.8600) and 0.793 (95% CI 0.6797 to 0.9066), respectively. Furthermore, the combined model universally outperformed the clinical and DL models in terms of accuracy, sensitivity, specificity, PPV and NPV in the internal validation cohort and the external validation cohort. The quantitative comparison of these models is summarised in table 3, and ROC curves are illustrated in figure 3.

Table 3. Performance comparison of the clinical model, the DL model and the combined model for predicting surgical intervention in paediatric intussusception in children younger than 8 months.

Models Clinical DL Combined
Training cohort
 AUC (95% CI) 0.790 (0.7115 to 0.8680) 0.895 (0.8442 to 0.9464) 0.930 (0.8890 to 0.9717)
 ACC 0.770 0.832 0.847
 SEN 0.653 0.735 0.898
 SPE 0.810 0.864 0.830
 PPV 0.533 0.643 0.638
 NPV 0.875 0.907 0.961
Internal validation cohort
 AUC (95% CI) 0.776 (0.6580 to 0.8933) 0.828 (0.7259 to 0.9295) 0.911 (0.8385 to 0.9826)
 ACC 0.750 0.774 0.893
 SEN 0.591 0.727 0.773
 SPE 0.806 0.790 0.935
 PPV 0.520 0.552 0.810
 NPV 0.847 0.891 0.921
External validation cohort
 AUC (95% CI) 0.740 (0.6199 to 0.8600) 0.793 (0.6797 to 0.9066) 0.871 (0.7663 to 0.9762)
 ACC 0.588 0.753 0.906
 SEN 0.905 0.524 0.905
 SPE 0.484 0.828 0.906
 PPV 0.365 0.500 0.760
 NPV 0.939 0.841 0.967

ACC, accuracy; AUC, area under the receiver operating characteristic curve; DL, deep learning; NPV, negative predictive value; PPV, positive predictive value; SEN, sensitivity; SPE, specificity.

Figure 3. Receiver operating characteristic curves of the clinical, DL and combined models in the training (a), internal validation (b) and external validation (c) cohorts, respectively. AUC, area under curve; DL, deep learning.

Figure 3

We applied the DeLong test to further evaluate the models’ effectiveness (online supplemental table 5). In the internal validation cohort, the combined model significantly outperformed the clinical model (p=0.011), while no significant difference was found between the combined model and the DL model (p=0.107). In the external validation cohort, the combined model surpassed both the clinical model (p=0.005) and the DL model (p=0.049), indicating that the combined model exhibited the highest predictive accuracy.

In addition, the calibration curves for the combined model showed that the bias curves closely followed the ideal line, reflecting strong consistency between the predicted and actual outcomes in surgical intervention for intussusception in children younger than 8 months. The Hosmer-Lemeshow test results, with p values provided in online supplemental table 5, also demonstrated that the combined model had better predictive accuracy than the other models in both the internal and external validation cohorts. The calibration curves for every cohort are displayed in figure 4.

Figure 4. The calibration curves of the clinic model, DL model and combined model in the training (a), internal validation (b) and external validation (c) cohorts. DL, deep learning.

Figure 4

DCA was also employed to assess the clinical utility and benefits of predicting surgical intervention in paediatric intussusception across various models. As illustrated in figure 5, the DCA results demonstrated that the combined model offered the most substantial net benefit for predicting surgical intervention, thus outperforming both the clinical model and the DL model in the training, internal validation and external validation cohorts.

Figure 5. DCA for clinic model, DL model and combined model in the training (a), internal validation (b) and external validation (c) cohorts. DCA, decision curve analysis; DL, deep learning.

Figure 5

Validation on a prospective cohort

We also conducted a prospective validation to further evaluate the reliability and applicability of the combined model for predicting surgical intervention in paediatric intussusception. This phase of research involved the enrolment of a new cohort of patients from Centre 1, between January 2023 and July 2024. Importantly, this cohort was selected independently of the original training and validation datasets used in our prior analyses to ensure that the results of this validation were not biased by earlier data.

In the test set (n=50), the combined model demonstrated impressive performance metrics, with an AUC of 0.890 (95% CI 0.7545 to 1.0000), accuracy of 0.900, PPV of 0.636, NPV of 0.974 and F1 score of 0.737 (online supplemental table 6). The ROC curve and predicted sample scores are shown in online supplemental figures 6 and 7.

The results from the prospective validation demonstrated that the predictive model not only maintained a high level of accuracy but also exhibited consistency in different patient populations. This robustness signifies that the model has a strong potential to make reliable predictions regarding the need for surgical intervention in intussusception in children younger than 8 months.

Discussion

In this study, we developed and validated a combined model that integrates DL radiomics features with clinical features to predict the need for surgical intervention in intussusception in children younger than 8 months. The combined model achieved a high AUC of 0.911 in the internal validation cohort, outperforming the individual models based solely on clinical features (AUC=0.776) or DL features (AUC=0.828). In addition, the combined model demonstrated strong performance in prospective validation, achieving an AUC of 0.890, further underscoring its high accuracy and robustness.

This is the first study to combine clinical and DL features to predict surgical intervention in paediatric intussusception. Previous studies have reported various risk factors for surgery in paediatric patients with intussusception, often yielding incongruent results.22,24 Several studies have shown that clinicopathological risk factors for surgical intervention in paediatric intussusception include age, duration of abdominal pain, presence of bloody stool, C-reactive protein levels, trapped fluid and nested position.20 However, there is still no standardised set of criteria to guide clinical decisions. In the present study, based on the univariate and multivariate LR analyses, we affirmed that bloody stools, left-sided intussusception, larger mass, PLPs, hydrops abdominis and longer duration of symptoms were associated with an increased probability of surgical intervention. Fragoso et al25 and Somme et al26 published a similar report and pointed to symptom duration of 48 hours or more as a risk factor for failure of hydrostatic reduction in intussusception. In contrast, Lim et al27 did not find any relationship among symptom duration, clinical symptoms such as blood in the stool and vomiting, and success of ultrasound-guided water enema. Similarly, Liu et al28 found no significant difference in the success rate of ultrasound-guided hydrostatic enema between children with a disease duration of more than 48 hours and those with less. However, Khorana et al17 reported symptom duration exceeding 72 hours as a predictor of failed nonsurgical reduction. Our findings support the association between longer symptom duration and an increased risk of failed non-operative reduction. Indeed, prior studies have also emphasised the role of age in reduction failure, indicating that children under 1 year of age are at an increased risk for failure of reduction. Shekherdimian et al11 reported that individuals younger than 6 months of age experienced a high operative rate. Chua et al10 reported that the rate of air enema reduction was lower in patients younger than 3 months. This is similar to our results. These findings are further supported by our study, which evidences that children younger than 8 months have a higher likelihood of undergoing surgical intervention; however, no significant difference was found among age groups under 8 months of age.

In addition, earlier studies have noted that the presence of interloop fluid and free fluid in ultrasound indicates a potentially increased risk of surgery. Kim et al9 described that among ultrasound features, ascites, left-sided intussusception and trapped fluid were linked to enema reduction failure, with success rates dropping <50% when these signs were present. In the present study, ascites and left-sided intussusception were also identified as risk factors for failure of enema reduction. Furthermore, we found that the presence of a larger mass and PLPs significantly contributed to the failure of enema reduction.

Relying solely on clinical risk factors to predict the need for surgical intervention in intussusception is insufficiently accurate. A significant portion of children with intussusception can present without classic triads, and the risk factors detected by ultrasound might be more helpful than clinical features to predict the reduction outcome and give useful information to the surgeons for a preoperative candidate.29,31 Liu et al20 developed a nomogram based on clinical risk factors to predict surgical intervention in paediatric patients with intussusception. The nomogram demonstrated a classification performance with an AUC of 0.757. In the present study, we built a machine learning model using clinical risk factors combined with deep radiomics features from ultrasound images of intussusception. Our results indicated that the combined model exhibited the best performance with an AUC of 0.911. This approach provided complementary insights beyond image features, effectively compensating for their limitations and enhancing the model’s reliability in predicting surgical interventions for paediatric intussusception. In addition, the present study specifically addressed the age group that experiences the highest rates of enema failure in these cases, enabling a more focused analysis. Notably, our model demonstrated strong performance in both internal and external validation sets, and additional validation on prospective data confirmed its robustness and predictive accuracy.

The present study had several limitations. First, although this was a multicentre study and prospective testing was conducted, the dataset remained relatively small. Moreover, the prospective validation cohort was collected from the same hospital as the training and internal validation cohorts, which may limit the external validity of the model when applied to other centres or populations. A larger dataset involving more hospitals across different cities and countries would enhance the robustness and generalisability of our findings. Second, given the ultrasound data we used, we only employed two-dimensional features from the maximal ROI slice of the mass. However, the two-dimensional segmentation may lack some information to fully describe the features of the entire lesion. Third, we did not analyse the effect of specific intussusception types on the risk of surgical intervention for intussusception. Fourth, although standardised preprocessing and consistent feature extraction methods were applied to reduce variability across different ultrasound machines, we acknowledge that residual device-specific differences may still exist. A dedicated analysis of the impact of device-related variation on model performance was beyond the scope of the present study, but we recognise its importance and plan to investigate this in future work. Overall, while our findings contribute to predicting surgical intervention in paediatric intussusception, addressing these limitations in future studies will be essential for refining our models and enhancing clinical applicability.

Conclusion

Our study identified six significant risk factors associated with enema reduction failure in children younger than 8 months of age with intussusception: (1) bloody stools, (2) left-sided intussusception, (3) larger mass diameter, (4) presence of PLPs, (5) abdominal and pelvic effusion and (6) prolonged symptom duration (>24 hours). Based on these findings, we developed and validated an ultrasound-based combined model integrating clinical risk factors and DL features to predict surgical intervention in intussusception in children younger than 8 months of age. The combined model yielded a stable predictive accuracy, and it may have the potential to improve clinical therapeutic strategies for intussusception.

Supplementary material

online supplemental file 1
bmjopen-15-8-s001.docx (2.6MB, docx)
DOI: 10.1136/bmjopen-2024-097575

Footnotes

Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Prepublication history and additional supplemental material for this paper are available online. To view these files, please visit the journal online (https://doi.org/10.1136/bmjopen-2024-097575).

Provenance and peer review: Not commissioned; externally peer reviewed.

Patient consent for publication: Consent obtained from parent(s)/guardian(s).

Ethics approval: This study involves human participants. This study was approved by the Ethics Committee of the Children’s Hospital of Soochow University (Approval No. 2025CS131), which provided centralised ethics oversight for both the Children’s Hospital of Soochow University and the Affiliated Changzhou Children’s Hospital of Nantong University. Informed consent was waived for retrospectively collected data, and written informed consent for prospective participants was obtained from the parents or legal guardians of the children enrolled in the study.

Patient and public involvement: Patients and/or the public were not involved in the design, conduct, reporting or dissemination plans of this research.

Data availability free text: The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Data availability statement

No data are available.

References

  • 1.Kelley-Quon LI, Arthur LG, Williams RF, et al. Management of intussusception in children: A systematic review. J Pediatr Surg. 2021;56:587–96. doi: 10.1016/j.jpedsurg.2020.09.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Desai R, Curns AT, Patel MM, et al. Trends in intussusception-associated deaths among US infants from 1979-2007. J Pediatr. 2012;160:456–60. doi: 10.1016/j.jpeds.2011.08.012. [DOI] [PubMed] [Google Scholar]
  • 3.Marsicovetere P, Ivatury SJ, White B, et al. Intestinal Intussusception: Etiology, Diagnosis, and Treatment. Clin Colon Rectal Surg. 2017;30:30–9. doi: 10.1055/s-0036-1593429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Padilla BE, Moses W. Lower Gastrointestinal Bleeding & Intussusception. Surg Clin North Am. 2017;97:173–88. doi: 10.1016/j.suc.2016.08.015. [DOI] [PubMed] [Google Scholar]
  • 5.Xie X, Wu Y, Wang Q, et al. Risk factors for recurrence of intussusception in pediatric patients: A retrospective study. J Pediatr Surg. 2018;53:2307–11. doi: 10.1016/j.jpedsurg.2018.03.023. [DOI] [PubMed] [Google Scholar]
  • 6.Daneman A, Navarro O. Intussusception. Pediatr Radiol. 2004;34:97–108. doi: 10.1007/s00247-003-1082-7. [DOI] [PubMed] [Google Scholar]
  • 7.Nataraja RM, Khoo S, Ditchfield M, et al. Establishing content validity and fidelity of a novel paediatric intussusception air enema reduction simulator. ANZ J Surg. 2019;89:1133–7. doi: 10.1111/ans.14747. [DOI] [PubMed] [Google Scholar]
  • 8.Vo A, Levin TL, Taragin B, et al. Management of Intussusception in the Pediatric Emergency Department: Risk Factors for Recurrence. Pediatr Emerg Care. 2020;36:e185–8. doi: 10.1097/PEC.0000000000001382. [DOI] [PubMed] [Google Scholar]
  • 9.Kim PH, Hwang J, Yoon HM, et al. Predictors of failed enema reduction in children with intussusception: a systematic review and meta-analysis. Eur Radiol. 2021;31:8081–97. doi: 10.1007/s00330-021-07935-5. [DOI] [PubMed] [Google Scholar]
  • 10.Chua JHY, Chui CH, Jacobsen AS. Role of surgery in the era of highly successful air enema reduction of intussusception. Asian J Surg. 2006;29:267–73. doi: 10.1016/S1015-9584(09)60101-9. [DOI] [PubMed] [Google Scholar]
  • 11.Shekherdimian S, Lee SL. Management of pediatric intussusception in general hospitals: diagnosis, treatment, and differences based on age. World J Pediatr. 2011;7:70–3. doi: 10.1007/s12519-011-0249-9. [DOI] [PubMed] [Google Scholar]
  • 12.Gao R, Zhao S, Aishanjiang K, et al. Deep learning for differential diagnosis of malignant hepatic tumors based on multi-phase contrast-enhanced CT and clinical data. J Hematol Oncol. 2021;14:154. doi: 10.1186/s13045-021-01167-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wei W, Ma Q, Feng H, et al. Deep learning radiomics for prediction of axillary lymph node metastasis in patients with clinical stage T1–2 breast cancer. Quant Imaging Med Surg . 2023;13:4995–5011. doi: 10.21037/qims-22-1257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Feng L, Liu Z, Li C, et al. Development and validation of a radiopathomics model to predict pathological complete response to neoadjuvant chemoradiotherapy in locally advanced rectal cancer: a multicentre observational study. Lancet Digit Health. 2022;4:e8–17. doi: 10.1016/S2589-7500(21)00215-6. [DOI] [PubMed] [Google Scholar]
  • 15.Shin H-C, Roth HR, Gao M, et al. Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. IEEE Trans Med Imaging. 2016;35:1285–98. doi: 10.1109/TMI.2016.2528162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wang K, Lu X, Zhou H, et al. Deep learning Radiomics of shear wave elastography significantly improved diagnostic performance for assessing liver fibrosis in chronic hepatitis B: a prospective multicentre study. Gut. 2019;68:729–41. doi: 10.1136/gutjnl-2018-316204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Khorana J, Singhavejsakul J, Ukarapol N, et al. Prognostic indicators for failed nonsurgical reduction of intussusception. Ther Clin Risk Manag . 2016;12:1231–7. doi: 10.2147/TCRM.S109785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Parasher G, Wong M, Rawat M. Evolving role of artificial intelligence in gastrointestinal endoscopy. WJG . 2020;26:7287–98. doi: 10.3748/wjg.v26.i46.7287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Zheng X, Yao Z, Huang Y, et al. Deep learning radiomics can predict axillary lymph node status in early-stage breast cancer. Nat Commun. 2020;11 doi: 10.1038/s41467-020-15027-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Liu J, Wang Y, Jiang Z, et al. Developing a Nomogram for Predicting Surgical Intervention in Pediatric Intussusception After Pneumatic Reduction: A Multicenter Study from China. Ther Clin Risk Manag. 2024;20:313–23. doi: 10.2147/TCRM.S463086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Xie Y, Zhang J, Xia Y, et al. Fusing texture, shape and deep model-learned information at decision level for automated classification of lung nodules on chest CT. Information Fusion . 2018;42:102–10. doi: 10.1016/j.inffus.2017.10.005. [DOI] [Google Scholar]
  • 22.Fike FB, Mortellaro VE, Holcomb GW, et al. Predictors of failed enema reduction in childhood intussusception. J Pediatr Surg. 2012;47:925–7. doi: 10.1016/j.jpedsurg.2012.01.047. [DOI] [PubMed] [Google Scholar]
  • 23.McDermott VG, Taylor T, Mackenzie S, et al. Pneumatic reduction of intussusception: clinical experience and factors affecting outcome. Clin Radiol. 1994;49:30–4. doi: 10.1016/s0009-9260(05)82910-1. [DOI] [PubMed] [Google Scholar]
  • 24.van den Ende ED, Allema JH, Hazebroek FWJ, et al. Success with hydrostatic reduction of intussusception in relation to duration of symptoms. Arch Dis Child . 2005;90:1071–2. doi: 10.1136/adc.2004.066332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Fragoso AC, Campos M, Tavares C, et al. Pneumatic reduction of childhood intussusception. Is prediction of failure important? J Pediatr Surg. 2007;42:1504–8. doi: 10.1016/j.jpedsurg.2007.04.013. [DOI] [PubMed] [Google Scholar]
  • 26.Somme S, To T, Langer JC. Factors determining the need for operative reduction in children with intussusception: a population-based study. J Pediatr Surg. 2006;41:1014–9. doi: 10.1016/j.jpedsurg.2005.12.047. [DOI] [PubMed] [Google Scholar]
  • 27.Lim RZM, Lee T, Ng JYZ, et al. Factors associated with ultrasound-guided water enema reduction for pediatric intussusception in resource-limited setting: potential predictive role of thrombocytosis and anemia. J Pediatr Surg. 2018;53:2312–7. doi: 10.1016/j.jpedsurg.2018.01.004. [DOI] [PubMed] [Google Scholar]
  • 28.Liu ST, Li YF, Wu QY, et al. Is enema reduction in pediatric intussusception with a history of over 48 h safe: A retrospective cohort study. Am J Emerg Med. 2023;68:33–7. doi: 10.1016/j.ajem.2023.02.027. [DOI] [PubMed] [Google Scholar]
  • 29.Zhang Y, Dong Q, Li S-X, et al. Clinical and Ultrasonographic Features of Secondary Intussusception in Children. Eur Radiol. 2016;26:4329–38. doi: 10.1007/s00330-016-4299-1. [DOI] [PubMed] [Google Scholar]
  • 30.Lin X-K, Xia Q-Z, Huang X-Z, et al. Clinical characteristics of intussusception secondary to pathologic lead points in children: a single-center experience with 65 cases. Pediatr Surg Int. 2017;33:793–7. doi: 10.1007/s00383-017-4101-8. [DOI] [PubMed] [Google Scholar]
  • 31.Banapour P, Sydorak RM, Shaul D. Surgical approach to intussusception in older children: influence of lead points. J Pediatr Surg. 2015;50:647–50. doi: 10.1016/j.jpedsurg.2014.09.078. [DOI] [PubMed] [Google Scholar]

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    online supplemental file 1
    bmjopen-15-8-s001.docx (2.6MB, docx)
    DOI: 10.1136/bmjopen-2024-097575

    Data Availability Statement

    No data are available.


    Articles from BMJ Open are provided here courtesy of BMJ Publishing Group

    RESOURCES