Skip to main content
Radiology: Imaging Cancer logoLink to Radiology: Imaging Cancer
. 2026 Feb 20;8(2):e250041. doi: 10.1148/rycan.250041

Pediatric Personalized Deep Learning Models for Segmentation of Hepatoblastoma at CT and MRI

Gourav Modanwal 1,, Saurabh Kumar 1, Vidya Viswanathan 1, Cara E Morin 2,3, Mitchell A Rees 4, Judy H Squires 5, Elizabeth R Tang 6, Howard M Katzenstein 7, Alexander J Towbin 2,3, Anant Madabhushi 1,8,#, Gary R Schooler 2,3,#
PMCID: PMC13036680  PMID: 41718532

Abstract

Purpose

To evaluate the generalizability of adult-trained models for hepatoblastoma segmentation to pediatric patients and to develop two deep learning (DL) models, MPCT and MpMRI, specifically trained on pediatric contrast-enhanced CT and T2-weighted MRI scans, respectively.

Materials and Methods

Imaging data from the multicenter Children’s Oncology Group AHEP0731 trial (NCT00980460; May 2008–July 2018) were analyzed. DL models employing the three-dimensional U-Net architecture were trained using DCT-Train and DMRI-Train. These models were evaluated on DCT-Val and DMRI-Val using the Dice similarity coefficient (DSC), and model segmentations were compared with manual segmentations from three annotators (R1, R2, and R3), their consensus (Rc), and adult-trained model (MACT) segmentations. Volume percentage error analysis was performed to evaluate segmentation precision.

Results

A total of 104 participants (mean age ± SD, 28.2 months ± 30.5; 64 male; DCT-Train = 56, DCT-Val = 48) were included in the CT dataset and 123 (31.5 months ± 38.4; 87 male; DMRI-Train = 50, DMRI-Val = 73) in the MRI dataset. MPCT achieved good agreement with consensus segmentation (DSC = 0.86 [95% CI: 0.80, 0.91]) and exhibited higher agreement than MACTwith R1 (0.83 vs 0.55), R2 (0.85 vs 0.55), R3 (0.84 vs 0.54), and Rc (0.86 vs 0.55) segmentations. Volume percentage error analysis revealed that MPCT achieved segmentation results on par with or better than those of a novice annotator (R3) in high-precision scenarios. MPMRI also achieved a DSC of 0.86, demonstrating good agreement with Rc.

Conclusion

The pediatric-trained DL-based models outperformed adult-trained models for accurate segmentation of pediatric hepatoblastoma.

Keywords: Pediatrics, Deep Learning, Liver, MR-Imaging, Abdomen/GI, Algorithm Development

ClinicalTrials.gov NCT00980460

Supplemental material is available for this article.

© The Author(s) 2026. Published by the Radiological Society of North America under a CC BY 4.0 license.

Keywords: Pediatrics, Deep Learning, Liver, MR-Imaging, Abdomen/GI, Algorithm Development


Representative examples of hepatoblastoma segmentation results with the best, median, and worst Dice similarity coefficient scores for (A) CT and (B) MRI scans.


Visual abstract containing a key image and key points of the article.


Summary

Deep learning–based hepatoblastoma segmentation models trained on pediatric CT and MRI scans (MPCT and MPMRI) demonstrated good agreement with human annotators and their consensus. MPCT outperformed the adult CT-trained model (MACT).

Key Points

  • ■ Two deep learning–based hepatoblastoma segmentation models, MPCT and MPMRI, were trained on pediatric contrast-enhanced CT and T2-weighted MRI scans, respectively.

  • MPCT demonstrated strong agreement with two expert and one novice human annotator and their consensus (Rc) segmentation (Dice similarity coefficients: 0.83, 0.85, 0.84, and 0.86, respectively);MPMRI also demonstrated good agreement with Rc (DSC = 0.86).

  • ■ Agreement between MPCT and Rc (DSC = 0.86) was greater than that between Rc and an adult CT-trained model (DSC = 0.55) (P ≤ .0001).

Introduction

Hepatoblastoma is the most common hepatic malignancy of childhood, yet it is a rare disease with an incidence of 2.3 cases per million children per year (1). The rarity of the disease is compounded by its different pathologic types and subtypes, and because it occasionally occurs in patients with underlying liver or genetic diseases. Most commonly hepatoblastoma has been categorized pathologically as either epithelial or mixed epithelial-mesenchymal (2). However, as our understanding of the disease has grown, the number of histologic subtypes has expanded. Currently, pathologists recognize nine histologic subtypes of epithelial hepatoblastoma and two types of mixed epithelial-mesenchymal hepatoblastoma (3). Identification of some tumor subtypes may help guide therapeutic decision-making (4,5). Currently, hepatoblastoma pathologic subtype analysis of tumors relies on histologic assessment often occurring in limited tumor tissue obtained at biopsy at diagnosis (6). However, a few radiomics tools (7,8) have shown potential in identifying distinct tumor subtypes through imaging analysis, which may complement biopsy in ascertaining the diagnosis and better predict tumor heterogeneity. Radiomics has the potential to increase diagnostic accuracy, improve disease monitoring, and optimize treatment planning (9).

The first step of any radiomics analysis (10,11) is to identify regions of interest in the images for subsequent analysis using artificial intelligence (AI) frameworks. Segmenting these regions of interest, especially for complex lesions such as liver tumors, has traditionally been a time-consuming process that requires substantial manual input and a high level of subject matter expertise (12). Differences in subject matter expertise among manual human annotators can lead to interrater variability (13), which can introduce potential inconsistencies (14) in the AI analysis. Automating this segmentation task using AI can improve efficiency in the image analysis workflow and reduce variability in tumor segmentations.

The field of AI has rapidly expanded its applications in radiology, offering potential for improved diagnosis and prognosis (15). However, a concerning disparity exists in pediatric cancer imaging. Most of the research efforts have predominantly concentrated on adults (16), leaving the pediatric population underrepresented and resulting in inequitable access to these technologies (17). For adults, multiple techniques have been described, including the employment of deep learning (DL) methods for liver and liver tumor segmentation (16,18). These models have not been validated in children, and evidence suggests that machine learning models trained on data from adult samples may not necessarily translate effectively to the pediatric population (1921). To our knowledge, there is no empirical evidence yet on whether adult-trained models perform adequately for pediatric liver tumor segmentation. In this work, we hypothesized that models developed using pediatric data would perform similar to human annotators and outperform an adult-based model. Therefore, we aimed to evaluate the generalizability of adult-trained models and develop two DL-based segmentation models explicitly tailored for pediatric patients—one trained on contrast-enhanced CT scans (MPCT) and another on T2-weighted MRI scans (MPMRI). We further compared their performance with human annotators and quantified segmentation precision and bias using volume percentage error (VPE) analysis. An overview of the study’s workflow is illustrated in Figure 1.

Figure 1:

Overview of the workflow for training and evaluating nnU-Net–based deep learning models for pediatric hepatoblastoma segmentation on contrast-enhanced CT (left) and T2-weighted MRI (right), assessed using Dice similarity coefficient and volume percentage error against manual annotations.

Image shows an overview of the study workflow for developing and evaluating deep learning models trained on pediatric contrast-enhanced CT images (MPCT; left) and T2-weighted MRI scans (MPMRI; right) for pediatric hepatoblastoma segmentation. Both segmentation models were trained using the nnU-Net framework. The models were evaluated by calculating the Dice similarity coefficient (DSC) and compared with manual segmentations from three annotators (R1, R2, and R3) and their consensus (Rc). Additionally, volume percentage error (VPE) analysis was conducted to assess segmentation performance. 3D = three-dimensional.

Materials and Methods

This secondary analysis used de-identified imaging from the multicenter, prospective Children’s Oncology Group AHEP0731 clinical trial (NCT00980460; May 2008–July 2018) (22). Institutional review board approval and informed consent were obtained at participating sites for the parent trial; the present analysis of de-identified data required no additional review or exemption per local policy and was conducted in accordance with the Health Insurance Portability and Accountability Act.

This study was supported in part by funding from the St. Baldrick’s Foundation and the National Cancer Institute under multiple award numbers. This study did not receive direct support from any industry or commercial organization specific to the research described. All authors had full control over the data collection, analysis, interpretation, and the decision to submit the manuscript for publication. Although some authors have industry affiliations, none of these relationships influenced the conduct or reporting of this study.

Study Dataset

The imaging dataset included 226 unique participants with 358 CT scans and 144 MRI scans acquired at prespecified study time points at 105 unique health care centers in the United States and Canada. A portion of this anonymized dataset is publicly available via The Cancer Imaging Archive (https://www.cancerimagingarchive.net/collection/ahep0731/). A primary study objective was the use of imaging in guiding surgical planning, particularly through the use of the pretreatment extent of tumor (PRETEXT) staging system. Some participants included in this analysis have been reported previously (4), which focused on clinical outcomes of the AHEP0731 trial. The current study focuses exclusively on the development and validation of DL-based hepatoblastoma segmentation models. PRETEXT classifies liver tumor extent before treatment based on the number of uninvolved liver sections. The liver is divided into four anatomic sections—left lateral, left medial, right anterior, and right posterior—based on Couinaud segmentation. PRETEXT groups range from I (three uninvolved sections) to IV (all four sections involved), and this system is routinely used in pediatric liver cancer trials to inform treatment and surgical strategies. PRETEXT staging data were available for all cases in this cohort.

We randomly selected participants from a convenience sample of AHEP0731 enrollees who met all of the following: pathologically confirmed hepatoblastoma and baseline abdominal imaging at diagnosis (portal venous phase contrast-enhanced CT or T2-weighted fat-suppressed MRI). Participants who did not meet each of these criteria were not eligible for inclusion. No further exclusion criteria were applied beyond those established in the original trial protocol.

The CT dataset (DCT, n = 104) was split into a training set (DCT-Train, n = 56) and a validation set (DCT-Val, n = 48), both consisting of randomly selected portal venous phase CT scans for the development and evaluation of the MPCT model. Similarly, the MRI dataset (DMRI, n = 123) was also split into a training set (DMRI-Train, n = 50) and a validation set (DMRI-Val, n = 73), both consisting of T2-weighted fat-suppressed MRI series, and these sets were used for the development and evaluation of the MPMRI model. DCT-Val and DMRI-Val were held-out evaluation sets used only for final performance estimation; no hyperparameter tuning was performed on these sets. All imaging was considered standard of care, and locally defined imaging protocols were used.

Manual Segmentation by Human Annotators

Three annotators (R1 [A.J.T.], R2 [G.R.S.], and R3 [V.V.]) independently performed manual segmentation on tumors blinded to each other’s segmentation results. R1 and R2, both board-certified pediatric radiologists with a certificate of added qualification, had 17 and 14 years of experience, respectively. Both R1 and R2 had expertise in pediatric liver tumor imaging and had served as expert radiology reviewers for Children’s Oncology Group Liver Tumor Trials. R3 was considered a novice. R3 was a recent medical school graduate in radiology training at the time of study, serving as an imaging research fellow. Manual segmentation was performed using ITK-SNAP version 4.0 (Penn Image Computing & Science Laboratory) and 3D Slicer version 5.8.1 (Brigham and Women’s Hospital).

For both CT and MRI datasets, the training sets (DCT-Train and DMRI-Train) were annotated by R2, who manually constructed three-dimensional tumor segmentations for all imaging examinations. In contrast, for the validation sets (DCT-Val and DMRI-Val), manual three-dimensional tumor segmentations were independently generated by all the three annotators (R1, R2, and R3). Consensus segmentation (Rc) was obtained by combining the manual segmentations from three annotators, where a pixel was labeled as part of the tumor if at least two of the three annotators agreed. This majority agreement at the pixel level helped establish Rc. This approach reduces individual biases and variability, resulting in a more accurate and consistent reference standard segmentation; offers increased reliability over individual annotators; and serves as a strong benchmark for evaluating DL models.

Data preparation

Images were provided as Digital Imaging and Communications in Medicine files and were converted to the Neuroimaging Informatics Technology Initiative format for preprocessing, training, and evaluation. All images were resampled to a consistent voxel spacing (median of training set) using third-order spline interpolation for image data and nearest-neighbor interpolation for segmentation masks. Preprocessing was performed to standardize image intensities and improve generalizability across imaging sites. This step helps to make the analysis pipeline generalizable to all images within and outside of the dataset. For CT, we confined the pixel intensities to a soft tissue window of 0–350 HU (23). For MRI, we first performed bias correction (24) as an initial step to reduce the low-frequency, multiplicative bias field, which can cause uneven intensity distributions across the image. This step is essential for improving the uniformity of tissue signal intensities, thereby improving the accuracy of subsequent image analyses. Furthermore, given the qualitative nature of MRI, we standardized the liver region (including the tumor) in our images with respect to a template image and liver mask (25). To create a liver mask for standardization, we trained an nnU-Net model on 20 three-dimensional images of the liver and the corresponding mask from the Combined Healthy Abdominal Organ Segmentation dataset (26). Using the nnU-Net model trained on the Combined Healthy Abdominal Organ Segmentation dataset, we then generated a segmentation of the liver region in our dataset. Using the segmented liver and tumor regions, we applied histogram standardization techniques (25) to normalize the MRI scans based on the template image and mask within our dataset. Inference-time ablation results evaluating the contribution of individual preprocessing steps are provided in Table S1.

DL models development and evaluation

The model architecture followed the standard three-dimensional U-Net configuration as implemented in the nnU-Net framework (27). It employs a symmetric five-stage encoder-decoder structure with long-range skip connections that preserve spatial context across resolution levels (Fig 2). Each stage comprises two 3 × 3 × 3 convolutions followed by instance normalization and leaky rectified linear unit activation, supporting stable training and efficient feature extraction. Downsampling and upsampling are performed using strided and transposed convolutions, respectively, enabling hierarchical feature learning. Channel depth increases progressively from 32 to 320, enhancing the network’s capacity to model complex tumor morphologies. At the bottleneck, anisotropic kernel configurations (1 × 2 × 2) are used to account for the lower through-plane resolution. The three-dimensional U-Net architecture was trained de novo (ie, without any pretraining) to segment the tumor using nnU-Net framework. After preprocessing, we trained the models using the training set. The generalized Dice loss (28) was used as the optimization objective because it accounts for class imbalance by weighting classes inversely proportional to their volume. We set the initial learning rate to 1× 10−2 and applied a weight decay of 3 × 10−5. The learning rate was reduced on a plateau when validation performance ceased to improve, ensuring the model converged effectively while minimizing unnecessary training. Standard augmentation, including spatial transformations (random rotations up to ±30°, scaling within 0.7–1.4×, elastic deformations), intensity-based adjustments (gamma correction, brightness augmentation), and mirroring across axes, were applied. A total of 5000 epochs were used for MPCT and 1000 epochs for MPMRI. Additionally, for CT scans only, we had the ability to compare performance of MPCT to MACT, a publicly available liver tumor segmentation model trained on adult data (29).

Figure 2:

Diagram of the three-dimensional U-Net architecture used for tumor segmentation, showing an encoder–decoder structure with skip connections, convolutional blocks, and a final softmax layer for voxelwise classification.

Schematic of the three-dimensional U-Net architecture used for tumor segmentation. The network adopts an encoder-decoder design with skip connections that preserve spatial context. Each block consists of 3 × 3 × 3 convolutions followed by instance normalization (IN) and leaky ReLU (rectified linear unit) activation. Downsampling is performed using strided convolutions, and upsampling is achieved through transposed convolutions. Skip connections from the encoder to decoder help retain fine-grained details. The final layer applies a 1 × 1 × 1 convolution with softmax to generate voxelwise class probabilities. HB mask = hepatoblastoma tumor segmentation.

Statistical Analysis

We used the Dice similarity coefficient (DSC), calculated as the spatial overlap between binary segmentation masks (1 = tumor, 0 = nontumor) from the automated and reference standard tumor segmentations, to assess agreement: greater than 0.9 (excellent), 0.80–0.89 (good), 0.70–0.79 (fair), and less than 0.70 (poor). DSC was computed at the image level, and each image contributed equally. Beyond evaluating DSC agreement with the reference standard segmentation, we aimed to better understand the model’s performance and potential segmentation precision and bias by analyzing predicted volume (PV) and true volume (TV) obtained with respect to Rc. We acknowledge that models, like human decision-making, are prone to errors, but certain levels of inaccuracy may be acceptable depending on the specific task or application (30). By evaluating accuracy within acceptable limits defined by error tolerance thresholds, we can gauge model reliability and suitability for practical use in different tasks. We conducted a VPE analysis to assess accuracy across various error tolerance thresholds (very low: 1%, low: 5%, moderate: 10%, and high: 15%). Specifically, we measured the percentage of cases where the model’s volume error (e) (31) fell within predefined error tolerance thresholds (eg, e < 1%, e < 5%, etc), assessing how often the predictions and annotations stayed within acceptable limits. This analysis provides a quantitative measure of the ability of models to estimate tumor volumes accurately and is essential for understanding the clinical applicability of the models, where precise volume estimation is critical for reliable outcomes.

Additionally, we also analyzed the PV/TV ratio to understand model tendency toward undersegmentation (missed tumor regions) or oversegmentation (including irrelevant background pixels). The consensus segmentation (Rc), derived from majority agreement among three annotators, served as the reference standard for TV estimation. The percentage of cases with PV/TV ratios greater than 1 indicated potential oversegmentation, whereas those with ratios less than or equal to 1 suggested undersegmentation. The median PV/TV ratio provided an overall measure of segmentation tendency. Statistical significance was determined using the one-sample Wilcoxon signed rank test, which highlighted any significant differences from unity.

We estimated voxelwise model uncertainty by computing the Shannon entropy of the softmax output at inference. Higher entropy values correspond to greater predictive uncertainty. Entropy maps were generated for qualitative assessment, and mean entropy values were calculated per case to explore the relationship between uncertainty and segmentation performance. Additionally, we computed mean entropy within two tumor subregions: the core, defined as the eroded interior of the tumor mask using a two-dimensional disc-shaped structuring element (radius = 10 pixels), and the boundary, defined as the outer rim of the tumor mask (original tumor mask minus the eroded core), capturing edge regions prone to segmentation uncertainty. These subregion-level metrics allowed us to assess spatial patterns of uncertainty.

We used the Python (version 3.10.3; https://www.python.org/downloads/release/python-3103/), nnU-Net (version 1.7.1; https://github.com/MIC-DKFZ/nnUNet), PyTorch (version 1.10; https://pytorch.org/get-started/previous-versions/), SimpleITK (version 2.4.0; https://simpleitk.readthedocs.io/en/v2.4.0/gettingStarted.html), and SciPy (version 1.10.1; https://docs.scipy.org/doc/scipy-1.10.1/) packages for preprocessing, model training, and statistical analyses. All reported P values were two-sided, and a P value less than .05 was considered statistically significant. 95% CIs were computed using standard normal approximation methods.

Results

Dataset Characteristics

The participant characteristics for the CT datasets (DCT-Train and DCT-Val) and the MRI datasets (DMRI-Train and DMRI-Val) are summarized in Table 1. For the CT datasets, the 104 participants (40 female, 64 male) had a mean age of 28.2 months ± 30.5 (range, 2.0–170.0 months). PRETEXT grouping was as follows: two of 104 (1.9%) in group I, 30 of 104 (28.8%) in group II, 45 of 104 (43.3%) in group III, and 27 of 104 (26.0%) in group IV. For the MRI datasets, the 123 participants (35 female, 87 male, one missing) had a mean age of 31.5 months ± 38.4 (range, 0.0–189.0 months). PRETEXT distribution in the MRI cohort was group I, five of 123 (4.1%); group II, 36 of 123 (29.5%); group III, 64 of 123 (52.5%); and group IV, 17 of 123 (13.9%).

Table 1:

Summary of Participant Characteristics for the CT and MRI Datasets

Characteristic CT MRI
Overall (n = 104) Training Set (n = 56) Validation Set (n = 48) Overall (n = 123) Training Set (n = 50) Validation Set (n = 73)
Age (mo)* 28.2 ± 30.5 32.1 ± 35.4 23.7 ± 23.1 31.5 ± 38.4 31.5 ± 41.4 31.6 ± 36.5
Sex
 Female* 40 (38.5) 23 (41.1) 17 (35.4) 35 (28.7) 16 (32.0) 19 (26.4)
 Male 64 (61.5) 33 (58.9) 31 (64.6) 87 (71.3) 34 (68.0) 53 (73.6)
PRETEXT grouping
 Group I* 2 (1.9) 1 (1.8) 1 (2.1) 5 (4.1) 3 (6.0) 2 (2.8)
 Group II 30 (28.8) 11 (19.6) 19 (39.6) 36 (29.5) 13 (26.0) 23 (31.9)
 Group III 45 (43.3) 29 (51.8) 16 (33.3) 64 (52.5) 26 (52.0) 38 (52.8)
 Group IV 27 (26.0) 15 (26.8) 12 (25.0) 17 (13.9) 8 (16.0) 9 (12.5)

Note.—Data are presented as means ± SDs or numbers with percentages in parentheses. PRETEXT = pretreatment extent of disease.

*

Data are missing from one MRI case.

Examples of the DL-based automated segmentations (MPCT, MACT, and MPMRI) and manual segmentations (R1, R2, R3) are shown in Figure 3. This figure highlights cases where MPCT achieves the highest, median, and lowest DSC scores compared with Rc. Interrater DSC agreement among reference standard (ie, annotators [R1, R2, R3]), consensus segmentation (ie, Rc), and DL-based models (ie, MPCT and MACT) is presented in Table 2. The corresponding results for MRI segmentations (MPMRI) are shown in Table 3.

Figure 3:

Representative examples of hepatoblastoma segmentation results with the best, median, and worst Dice similarity coefficient scores for (A) CT and (B) MRI scans.

Example images of hepatoblastoma segmentation for the cases with the best, median, and worst DSC score on (A) CT and (B) MRI scans.

Table 2:

Interrater Agreement between Annotators’ Manual Assessments and Deep Learning Model Segmentations on CT Images

Rc(Consensus) R1 (Expert) R2 (Expert) R3 (Novice) MPCT MACT
Rc (consensus) 1 0.94 (0.93, 0.96) 0.95 (0.94, 0.96) 0.94 (0.92, 0.95) 0.86 (0.80, 0.91) 0.55 (0.46, 0.65)
R1 (expert) 0.94 (0.93, 0.96) 1 0.89 (0.87, 0.91) 0.88 (0.85, 0.90) 0.83 (0.78, 0.89) 0.55 (0.46, 0.64)
R2 (expert) 0.95 (0.94, 0.96) 0.89 (0.87, 0.91) 1 0.88 (0.86, 0.90) 0.85 (0.79, 0.90) 0.55 (0.46, 0.64)
R3 (novice) 0.94 (0.92, 0.95) 0.88 (0.85, 0.90) 0.88 (0.86, 0.90) 1 0.84 (0.79, 0.89) 0.54 (0.45, 0.65)
MPCT 0.86 (0.80, 0.91) 0.83 (0.78, 0.89) 0.85 (0.79, 0.90) 0.84 (0.79, 0.89) 1 0.58 (0.48, 0.67)
MACT 0.55 (0.46, 0.65) 0.55 (0.46, 0.64) 0.55 (0.46, 0.64) 0.54 (0.45, 0.65) 0.58 (0.48, 0.67) 1

Note.—Data are presented as Dice similarity coefficients with 95% CIs in parentheses. MPCT and MPMRI are the models trained on pediatric contrast-enhanced CT and T2-weighted MRI scans, respectively; MACT is a publicly available liver tumor segmentation model trained on adult CT data.

Table 3:

Interrater Agreement between Annotators’ Manual Assessments and Deep Learning Model Segmentations on MRI Scans

Rc (Consensus) R1 (Expert) R2 (Expert) R3 (Novice) MPMRI
Rc (consensus) 1 0.95 (0.94, 0.96) 0.95 (0.94, 0.96) 0.94 (0.92, 0.95) 0.86 (0.80, 0.91)
R1 (expert) 0.95 (0.94, 0.96) 1 0.90 (0.88, 0.91) 0.88 (0.86, 0.90) 0.83 (0.78, 0.87)
R2 (expert) 0.95 (0.94, 0.96) 0.90 (0.88, 0.91) 1 0.88 (0.86, 0.90) 0.85 (0.79, 0.90)
R3 (novice) 0.94 (0.92, 0.95) 0.88 (0.86, 0.90) 0.88 (0.86, 0.90) 1 0.84 (0.79, 0.89)
MPMRI 0.86 (0.80, 0.91) 0.83 (0.78, 0.87) 0.85 (0.79, 0.90) 0.84 (0.79, 0.89) 1

Note.—Data are presented as Dice similarity coefficients with 95% CIs in parentheses. MPMRI is a deep learning model trained on pediatric T2-weighted MRI scans.

CT Model Performance and VPE Analysis

The agreement between MPCT and Rc was good, with a DSC of 0.86 (95% CI: 0.80, 0.91). Similarly, MPCT exhibited good agreement with individual annotators, achieving a DSC of 0.83 (95% CI: 0.78, 0.89) with R1, 0.85 (95% CI: 0.79, 0.90) with R2, and 0.84 (95% CI: 0.79, 0.89) with R3. In comparison to MACT, MPCT consistently achieved higher DSCs across all annotator comparisons. Specifically, the DSC for agreement between MPCT and Rc, 0.86 (95% CI: 0.80, 0.91), was significantly (P < .0001) greater than that for agreement between MACT and Rc (0.55; 95% CI: 0.46, 0.65). Consistent differences were observed in the DSC agreement with R1 (MPCT: 0.83 vs MACT: 0.55; P < .001), R2 (MPCT: 0.85 vs MACT: 0.55; P < .001), and R3 (MPCT: 0.84 vs MACT: 0.54; P < .001). Additionally, the DSC for agreement between MACT and MPCT was 0.58 (95% CI: 0.48, 0.67), indicating poor agreement.

The VPE analysis presented in Table 4 reveals that the expert annotators (R1 and R2) generally achieved higher accuracy than the novice annotator (R3) and the DL models (MACT and MPCT) across all predefined error tolerances. Specifically, R1 excelled at very low error tolerance threshold (ie, very precise segmentation), achieving segmentation within 1% error for seven of 48 (14.58%) cases and within 5% error for 33 of 48 (68.75%) cases. R2 and R3 demonstrated better performance at high error tolerance threshold (15%), with R2 achieving segmentation within 15% error for 44 of 48 (91.67%) cases and R3 for 43 of 48 (89.58%) cases. Among the DL models, MPCT matched or outperformed the novice annotator (R3) in precise segmentation, achieving segmentation within 1% error for five of 48 (10.42%) cases and within 5% error for 28 of 48 (58.33%) cases, compared with R3’s 26 of 48 (54.17%). MPCT demonstrated a reasonable performance, particularly at moderate (e < 10%: 36 of 48 [75%]) and high (e < 15%: 40 of 48 [83.33%]) tolerance threshold, but MACT underperformed across all error tolerance thresholds.

Table 4:

Volume Percentage Error Analysis of Liver Tumor Segmentation Models Trained on CT and MRI Scans

Modality Error Tolerance R1 (Expert) R2 (Expert) R3 (Novice) MACT MPCT orMPMRI
CT (n = 48) Very low (e < 1) 7 (14.58) 5 (10.42) 5 (10.42) 0 (0.00) 5 (10.42)
Low (e < 5) 33 (68.75) 33 (68.75) 26 (54.17) 0 (0.00) 28 (58.33)
Moderate (e < 10) 40 (83.33) 38 (79.17) 40 (83.33) 1 (2.08) 36 (75.00)
High (e < 15) 42 (87.50) 44 (91.67) 43 (89.58) 4 (8.33) 40 (83.33)
MRI (n = 73) Very low (e < 1) 15 (20.55) 8 (10.96) 7 (9.59) NA 6 (8.22)
Low (e < 5) 50 (68.49) 52 (71.23) 43 (58.90) NA 28 (38.36)
Moderate (e < 10) 63 (86.30) 62 (84.93) 58 (79.45) NA 45 (61.64)
High (e < 15) 68 (93.15) 67 (91.78) 64 (87.67) NA 53 (72.60)

Note.—Data are numbers with percentages in parentheses. The error tolerance levels and corresponding accuracy for expert annotators (R1, R2), a novice annotator (R3), and deep learning models (MACT and MPCT orMPMRI) compared with consensus segmentation. Accuracy is defined as the percentage of cases where segmentation errors are below specified error tolerance levels (e < 1, e < 5, e < 10, e < 15). MPCT and MPMRI are models trained on pediatric contrast-enhanced CT and T2-weighted MRI scans, respectively; MACT is a publicly available liver tumor segmentation model trained on adult CT data. NA = not applicable.

MRI Model Performance and VPE Analysis

DSCs for MRI segmentations are presented in Table 3. MPMRI demonstrated good agreement with different levels of expertise and their consensus. The DSC for agreement between MPMRI and Rc was 0.86 (95% CI: 0.80, 0.91). In comparisons of individual annotators, the DSCs were as follows: 0.83 (95% CI: 0.78, 0.87) for R1, 0.85 (95% CI: 0.79, 0.90) for R2, and 0.84 (95% CI: 0.79, 0.89) for R3.

The VPE analysis (refer to Table 4) demonstrated that the expert annotators (R1 and R2) consistently achieved superior accuracy across all predefined error tolerance thresholds when compared with the novice annotator (R3) and the DL model (MPMRI). Specifically, R1 segmented within 1% error for 15 of 73 (20.55%) cases and within 5% error for 50 of 73 (68.49%) cases. R2 achieved comparable segmentation performance within 1% error for eight of 73 (10.96%) cases and within 5% error for 52 of 73 (71.23%) cases. Both experts maintained high accuracy at higher error margins, with R1 segmenting within 15% error for 68 of 73 (93.15%) cases and R2 for 67 of 73 (91.78%) cases. In comparison, the novice annotator (R3) had slightly lower accuracy, segmenting within 1% error for seven of 73 (9.59%) cases and within 5% error for 43 of 73 (58.90%) cases. The DL model (MPMRI) performed closely to R3, segmenting within 1% error for six of 73 (8.22%) cases and within 5% error for 28 of 73 (38.36%) cases, indicating a notable drop in accuracy compared with the experts. However, at the moderate (10%) and high (15%) tolerance thresholds, MPMRI showed moderate performance, with 45 of 73 (61.64%) and 53 of 73 (72.60%) cases falling within these errors, respectively, though the overall accuracy of the DL model (MPMRI) was less than that of both expert annotators and novice annotator R3.

Under- and Oversegmentation Tendencies

Trends toward under- and oversegmentation relative to Rc are summarized in Table 5. In the CT analysis, annotators showed a mix of under- and oversegmentation: R1 leaned toward undersegmentation, with 31 of 48 (64.6%) cases having a PV/TV less than or equal to 1, whereas R2 and R3 exhibited slight oversegmentation, with 32 of 48 (66.7%) and 26 of 48 (54.2%) cases having a PV/TV greater than 1; the median PV/TV was 1.02 and 1.01, respectively. By contrast, the DL models, particularly MACT, demonstrated a marked tendency toward severe undersegmentation: only one of 48 (2.1%) cases had a PV/TV greater than 1, with a median PV/TV of 0.56; this shift toward undersegmentation was statistically significant (P < .05). The undersegmentation tendency persisted for MPCT, though less pronounced (median PV/TV = 0.97). For MRI, R1 and R3 leaned toward oversegmentation—38 of 73 (52.1%) cases and 46 of 73 (63.0%) cases with a PV/TV greater than 1—whereas R2 tended toward undersegmentation—40 of 73 (54.8%) cases with a PV/TV less than or equal to 1. The DL model MPMRI showed a modest undersegmentation tendency, with 44 of 73 (60.3%) cases having a PV/TV less than or equal to 1 and a median PV/TV of 0.98.

Table 5:

Analysis of Under- and Oversegmentation Tendency

Modality Metric Expert 1 Expert 2 Expert 3 MACT MPCT orMPMRI
CT (n = 48) PV/TV > 1 17 (35.42) 32 (66.67) 26 (54.17) 1 (2.08) 15 (31.25)
PV/TV ≤ 1 31 (64.58) 16 (33.33) 22 (45.83) 47 (97.92) 33 (68.75)
Median PV/TV 0.99 1.02 1.01 0.56 0.97
P value .1066 .2684 .947 4.97 × 10−14 .0021
Tendency Undersegmentation Oversegmentation Oversegmentation Undersegmentation Undersegmentation
MRI (n = 73) PV/TV > 1 38 (52.05) 33 (45.21) 46 (63.01) 29 (39.73)
PV/TV ≤ 1 35 (47.95) 40 (54.79) 27 (36.99) 44 (60.27)
Median PV/TV 1 0.99 1.02 0.98
P value .33 .8452 .1312 .0526
Tendency Oversegmentation Undersegmentation Oversegmentation Undersegmentation

Note.—Data are numbers with percentages in parentheses for PV/TV > 1 and PV/TV ≤ 1. The statistical significance of differences between PV and TV was assessed using the one-sample WSRT. MPCT and MPMRI are models trained on pediatric contrast-enhanced CT and T2-weighted MRI scans, respectively; MACT is a publicly available liver tumor segmentation model trained on adult CT data. PV = predicted volume, TV = true volume, WSRT = Wilcoxon signed rank test.

Entropy-based Uncertainty Analysis

Entropy analysis revealed that uncertainty was primarily concentrated along tumor boundaries (Fig S1). Quantitatively, mean entropy in the boundary region was higher than in the core across both imaging modalities (Fig S2). For CT, the median of the mean entropy across test cases was 0.05 in the core (IQR, 0.02–0.08) and 0.27 in the boundary region (IQR, 0.23–0.30). For MRI, the corresponding values were 0.01 for the core (IQR, 0.002–0.039) and 0.13 for the boundary (IQR, 0.09–0.16).

Discussion

Accurate segmentation of pediatric liver tumors is a critical first step in many analyses, including those employing AI techniques for effective diagnosis, treatment planning, and monitoring for treatment response (10). However, most existing AI research (16,18) has focused predominantly on adult samples. Because of the distinct anatomic and pathologic characteristics of pediatric patients, adult-trained models have failed to generalize effectively to this population. In this work, we developed two segmentation models, MPCT and MPMRI, tailored for pediatric hepatoblastoma. MPCT was trained on contrast-enhanced CT images, and MPMRI was trained on T2-weighted fat-suppressed MRI scans, both sourced from the Children’s Oncology Group AHEP0731 trial. We evaluated these models in terms of their ability to provide efficient and reliable segmentation solutions for pediatric hepatoblastomas. Furthermore, we conducted a VPE analysis to assess the precision of tumor segmentation for both DL models and manual annotators relative to the consensus of the three annotators. This analysis provided an important measure of clinical reliability of segmentation across varying levels of error tolerance.

The MPCT and MPMRI segmentation models performed ably compared with human annotators, with good agreement between model and human annotations. Both models achieved high DSCs, ranging from 0.83 to 0.86, indicating good agreement between the predicted segmentations and the human annotator segmentations and their consensus segmentation. MPCT model also outperformed the adult liver tumor segmentation model (MACT) across all annotator comparisons, underscoring its superior consistency and reliability for pediatric cases. To our knowledge, this work is the first to demonstrate that liver tumor segmentation models trained on adult data do not generalize well to the pediatric population. These results are consistent with other work (19,21,32,33) showing that models trained on adult data do not perform at the same level when a similar task is performed using pediatric data. Specific to liver tumors, these reasons likely include the substantial differences in patient size, liver size, the average size of the liver tumor compared with the liver parenchyma, and the tumor imaging characteristics, including how hepatoblastoma tends to have a heterogeneous appearance with variable internal composition that can include calcification, hemorrhage, or small cystic spaces (17,34). These findings, along with evidence from other studies, emphasize the need for developing models specifically tailored to pediatric patients to achieve higher accuracy and better performance.

Our pediatric segmentation models also compare favorably with existing liver tumor segmentation models developed for adult patients. In earlier work, CC-DenseUNet (35) was proposed for CT-based liver tumor segmentation, achieving a DSC of 0.74. Another work using an attention-based model (36) for multiphase CT images reported a DSC of approximately 0.78. Arulappan et al (37) developed an asymmetric dilated convolutional encoder-decoder network for tumor segmentation in adult CT images, reaching an average DSC of 0.76. More recently, UNet++ was applied for liver tumor segmentation at MRI, yielding a lower DSC of 0.61 (38). Wesdorp et al (39) achieved a DSC of 0.86 for colorectal liver metastasis segmentation using an externally validated CT dataset. MULLET, a transformer-based model designed for multiphase contrast-enhanced CT, demonstrated comparable segmentation performance (DSC ≈ 0.78) and has been successfully deployed in clinical settings. Most recently, an AI-assisted platform (40) for hepatocellular carcinoma detection in CT achieved a DSC of 0.88 in an external clinical validation study. Although adult-trained models span a variety of liver tumor types and imaging modalities, our pediatric-specific models for hepatoblastoma achieved DSCs of 0.86 with both CT and MRI. This performance is comparable to, and in some cases exceeds, adult-focused approaches, despite the added challenge of working with a more heterogeneous and underrepresented pediatric tumor type. These results underscore our models’ ability to handle the unique complexities of pediatric liver tumors more effectively.

Our study also sheds light on the comparisons between novice and expert human annotators generating pediatric liver tumor segmentations. The results indicate that novice annotator (R3) segmentations had good agreement with those of expert annotators (R1 and R2) with DSCs of 0.88 for both CT and MRI segmentation tasks. However, the novice annotator showed lower performance than expert annotators when evaluated at very low (e < 1%) and low (e < 5%) error tolerance thresholds but had similar performance to expert annotators at moderate (e < 10%) and high (e < 15%) error tolerance thresholds. To our knowledge, this study provides one of the first comparative analyses of novice versus expert performance in hepatoblastoma segmentation. The data suggest that novice annotators do not perform at the same level as experts (especially at high precision tasks; ie, low error tolerance) and would likely benefit from input from expert’s oversight by those with subject matter expertise.

The VPE analyses highlight the inherent variability between the way human annotators and AI models define tumor boundaries. The expert annotators had slightly higher overall accuracy than the novice annotator and the MPCT and MPMRI models. This finding likely reflects the extensive experience of expert annotators and the refined techniques necessary to determine the most likely tumor margin—a task that is often complex. Not surprisingly, at higher error tolerance threshold, the distinctions between the preciseness of segmentations became less noticeable. This is because a larger error tolerance threshold (eg, 10%) will treat both small (eg, e = 1%) and large errors (eg, e = 9%) as acceptable, making it harder to see the differences in preciseness.

The annotators in our study showed tendencies toward both oversegmentation and undersegmentation, with inconsistencies observed among annotators and for the same annotator across different modalities (Table 5). Specifically, R1 and R2 exhibited a mix of undersegmentation and oversegmentation across modalities, although R3 consistently leaned toward oversegmentation, with no significant bias (P > .05). This result is not surprising and highlights the inherent variability and complexity in defining tumor boundaries. In contrast, the DL (MPCT and MPMRI) segmentation models demonstrated a tendency toward undersegmentation in our tests, with the difference being significant for MPCT. Although there are general benefits and drawbacks to both over- and undersegmentation depending on the specific task, models that undersegment may have the benefit of increased likelihood of including only the true tumor within the defined region of interest, which may benefit AI analyses (eg, hepatoblastoma subtype identification). However, undersegmentation may come at the cost of the exclusion of tumor regions, some of which may represent the most active or aggressive portions of the tumor (41). This exclusion is particularly concerning when considering growth at the tumor periphery, which may impact subsequent radiomics-based analysis. Undersegmentation also results in an underestimation of the true tumor volume, which may be of consequence if such a metric is used to longitudinally track tumors and monitor response to therapy.

The spatial distribution of uncertainty observed in entropy analysis suggests that both DL models (MPCT and MPMRI) exhibited higher confidence in the tumor cores and relatively higher uncertainty in boundary regions, where segmentation ambiguity is expected. These boundary areas often align with regions of known interobserver variability, reinforcing the interpretability and reliability of the model’s uncertainty output in clinical settings.

The ability of our pediatric-specific DL models to accurately segment hepatoblastoma carries meaningful clinical value. Accurate tumor delineation is essential for estimating tumor volume, which informs surgical planning, and enables precise tracking of tumor response during and after treatment. Reliable segmentation also facilitates radiomics analyses by providing consistent regions of interest for quantitative feature extraction, potentially aiding in subtype classification and prognostication (7,8). By reducing manual segmentation burden and interobserver variability, these models can streamline workflow in clinical and research settings, enhance reproducibility, and support more standardized care across institutions.

For real-world use, the proposed pediatric-specific segmentation models can be integrated into clinical workflows through picture archiving and communication systems or as part of dedicated AI workstations. These models can automatically generate tumor segmentations shortly after image acquisition, allowing radiologists to review and adjust results, thus reducing manual segmentation workload and saving time. Additionally, the outputs can serve as input for radiomics pipelines or tumor response monitoring tools, enabling early and objective decision-making. Successful clinical integration will require external validation across institutions, additional trial datasets, and deployment within secure, regulatory-compliant environments that support seamless interaction with existing imaging infrastructure.

Despite the promising results, our study had limitations. First, hepatoblastoma is a rare tumor. Our training and validation datasets are drawn from the Children’s Oncology Group AHEP0731 trial (NCT00980460), the largest and most diverse hepatoblastoma imaging dataset currently available, providing the best possible data foundation for constructing and validating DL models. As such, we derived data from one of the largest prospective studies ever performed, but our study includes only 104 CT and 123 MRI scans, a relatively small dataset for AI model construction. Second, the data are heterogeneous. The imaging studies included in the trial were performed at 105 unique imaging centers, each using their own institutional imaging protocols. Although this heterogeneity may impact the performance of the model, we believe that it also enhances the generalizability of our results. Third, the data used to train the models are now 6–16 years old, with the first scans in our dataset acquired in 2008 and the most recent acquired in 2018. It is not clear how advances in scanner hardware, acquisition techniques, and image reconstruction models could impact model performance. Thus, our AI segmentation models should be validated using more recently acquired scans. Fourth, the focus of our study on hepatoblastoma may limit model generalizability to other pediatric or adult liver tumors. Although the segmentation results are promising, their impact on subsequent radiomics analysis and the ability to predict disease outcomes remains unclear.

Future work should focus on external validation of these models using more recent clinical trial imaging data to assess performance on modern scanners and protocols. Broadening the scope to include segmentation of other pediatric liver tumors, such as hepatocellular carcinoma, fibrolamellar carcinoma, or metastatic lesions, could increase clinical utility. Additionally, it would be informative to explore how DL models, through guided learning with real-time feedback, can help novice annotators gain experience. In parallel, it is important to investigate how segmentation accuracy impacts radiomic feature extraction and predictive modeling.

In conclusion, we demonstrated that pediatric-derived AI models can segment hepatoblastoma using contrast-enhanced CT and T2-weighted fat-suppressed MRI with accuracy approaching near-human levels. The MPCT model outperformed the adult liver tumor segmentation model MACT, underscoring its superior consistency and reliability for pediatric cases. This study enables the integration of these segmentation models into clinical workflows, increasing the accuracy and efficiency of hepatoblastoma tumor delineation. By providing reliable segmentation, these models potentially support more precise treatment planning and monitoring and introduce opportunities for further assessment of more recent liver tumor trial data. Additionally, the reduction in manual segmentation efforts and interannotator variability facilitates streamlined radiologic assessments, empowering clinicians to make faster, data-driven decisions. Our work should serve as a foundation for the application of the current CT and MRI segmentation models to datasets containing patients with newer imaging studies and pediatric patients with other types of liver tumors. Our findings can aid in the development of new workflows and tools to quantitatively assess hepatoblastoma. Finally, our study findings emphasize the importance of developing and validating pediatric-specific DL models specifically tailored for pediatric populations.

Supplemental Files

Table S1, Figures S1-S2
rycan250041suppa1.pdf (434.6KB, pdf)
Conflicts of Interest
rycan250041coi.zip (598.1KB, zip)
*

A.M. and G.R.S. are co–senior authors.

Funding: Supported by funds from the St Baldrick’s Foundation, and the National Cancer Institute (R01CA268287A1, U01CA269181, U10CA180886, U10CA180899, R01CA26820701A1, R01CA249992-01A1, R01CA202752-01A1, R01CA208236-01A1, R01CA216579-01A1, R01CA220581-01A1, R01CA257612-01A1, 1U01CA239055-01, 1U01CA248226-01, 1U54CA254566-01), National Heart, Lung, and Blood Institute (1R01HL15127701A1, R01HL15807101A1), National Institute of Biomedical Imaging and Bioengineering (1R43EB028736-01), VA Merit Review Award (IBX004121A) from the United States Department of Veterans Affairs Biomedical Laboratory Research and Development Service the Office of the Assistant Secretary of Defense for Health Affairs, through the Breast Cancer Research Program (W81XWH-19-1-0668), the Prostate Cancer Research Program (W81XWH-20-1-0851), the Lung Cancer Research Program (W81XWH-18-1-0440, W81XWH-20-1-0595), the Peer Reviewed Cancer Research Program (W81XWH-18-1-0404, W81XWH-21-1-0345, W81XWH-21-1-0160), the Kidney Precision Medicine Project (KPMP) Glue Grant and sponsored research agreements from Bristol Myers-Squibb and AstraZeneca. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health, the U.S. Department of Veterans Affairs, the Department of Defense, or the United States Government.

Data sharing: Data generated or analyzed during the study are available from the corresponding author by request.

Abbreviations:

AI
artificial intelligence
DL
deep learning
DSC
Dice similarity coefficient
PRETEXT
pretreatment extent of disease
PV
predicted volume
TV
true volume
VPE
volume percentage error

Acknowledgments

We sincerely acknowledge Rohan N. Dhamdhere for his valuable contributions to this study.

Disclosures of conflicts of interest

Please see ICMJE form(s) for author conflicts of interest. These have been provided as supplemental materials.

References

  • 1. PDQ Pediatric Treatment Editorial Board . Childhood Liver Cancer Treatment (PDQ®): Health Professional Version . In: PDQ Cancer Information Summaries . Bethesda (MD): National Cancer Institute (US) ; 2002. . http://www.ncbi.nlm.nih.gov/books/NBK65790/. Accessed July 19, 2024 . [PubMed] [Google Scholar]
  • 2. Jeong SU , Kang HJ . Recent updates on the classification of hepatoblastoma according to the International Pediatric Liver Tumors Consensus . J Liver Cancer 2022. ; 22 ( 1 ): 23 – 29 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. López-Terrada D , Alaggio R , de Dávila MT , et al. ; Children’s Oncology Group Liver Tumor Committee Towards an international pediatric liver tumor consensus classification: proceedings of the Los Angeles COG liver tumors symposium . Mod Pathol 2014. ; 27 ( 3 ): 472 – 491 . [DOI] [PubMed] [Google Scholar]
  • 4. Vasudevan SA , Meyers RL , Finegold MJ , et al . Outcomes of children with well-differentiated fetal hepatoblastoma treated with surgery only: Report from Children’s Oncology Group Trial, AHEP0731 . J Pediatr Surg 2022. ; 57 ( 10 ): 251 – 256 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Trobaugh-Lotrario AD , Maibach R , Aronson DC , et al . Outcomes of Patients Treated for Hepatoblastoma with Low Alpha-Fetoprotein and/or Small Cell Undifferentiated Histology: A Report from the Children’s Hepatic Tumors International Collaboration (CHIC) . Cancers (Basel) 2023. ; 15 ( 2 ): 467 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Weldon CB , Madenci AL , Tiao GM , et al . Evaluation of the Diagnostic Biopsy Approach for Children with Hepatoblastoma: A Report from the Children’s Oncology Group AHEP 0731 Liver Tumor Committee . J Pediatr Surg 2020. ; 55 ( 4 ): 655 – 659 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Zhao L , Shi L , Huang SG , et al . Identification and validation of radiomic features from computed tomography for preoperative classification of neuroblastic tumors in children . BMC Pediatr 2023. ; 23 ( 1 ): 262 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Nie P , Wang N , Pang J , et al . CT-Based Radiomics Nomogram: A Potential Tool for Differentiating Hepatocellular Adenoma From Hepatocellular Carcinoma in the Noncirrhotic Liver . Acad Radiol 2021. ; 28 ( 6 ): 799 – 807 . [DOI] [PubMed] [Google Scholar]
  • 9. Madhogarhia R , Haldar D , Bagheri S , et al . Radiomics and radiogenomics in pediatric neuro-oncology: A review . Neurooncol Adv 2022. ; 4 ( 1 ): vdac083 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. van Timmeren JE , Cester D , Tanadini-Lang S , Alkadhi H , Baessler B . Radiomics in medical imaging—"how-to" guide and critical reflection . Insights Imaging 2020. ; 11 ( 1 ): 91 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Modanwal G , Dhamdhere R , Khera A , et al . QuLF-CT: A Radiomics-Based Tool for Quantification of Liver Fat Fraction on Cardiac CT . JACC Adv 2024. ; 3 ( 9 ): 101175 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Ma J , He Y , Li F , Han L , You C , Wang B . Segment anything in medical images . Nat Commun 2024. ; 15 ( 1 ): 654 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Yang F , Zamzmi G , Angara S , et al . Assessing Inter-Annotator Agreement for Medical Image Segmentation . IEEE Access 2023. ; 11 : 21300 – 21312 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Haarburger C , Müller-Franzes G , Weninger L , Kuhl C , Truhn D , Merhof D . Radiomics feature reproducibility under inter-rater variability in segmentations of CT images . Sci Rep 2020. ; 10 ( 1 ): 12688 . [Published correction appears in Sci Rep 2021;11(1):22670.] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Bera K , Braman N , Gupta A , Velcheti V , Madabhushi A . Predicting cancer outcomes with radiomics and artificial intelligence in radiology . Nat Rev Clin Oncol 2022. ; 19 ( 2 ): 132 – 146 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Modanwal G , Al-Kindi S , Walker J , et al . Deep-learning-based hepatic fat assessment (DeHFt) on non-contrast chest CT and its association with disease severity in COVID-19 infections: A multi-site retrospective study . EBioMedicine 2022. ; 85 : 104315 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Sammer MBK , Akbari YS , Barth RA , et al . Use of Artificial Intelligence in Radiology: Impact on Pediatric Patients, a White Paper From the ACR Pediatric AI Workgroup . J Am Coll Radiol 2023. ; 20 ( 8 ): 730 – 737 . [DOI] [PubMed] [Google Scholar]
  • 18. Gul S , Khan MS , Bibi A , Khandakar A , Ayari MA , Chowdhury MEH . Deep learning techniques for liver and liver tumor segmentation: A review . Comput Biol Med 2022. ; 147 : 105620 . [DOI] [PubMed] [Google Scholar]
  • 19. Hardie RC , Trout AT , Dillman JR , Narayanan BN , Tanimoto AA . Performance Analysis in Children of Traditional and Deep Learning CT Lung Nodule Computer-Aided Detection Systems Trained on Adults . AJR Am J Roentgenol 2024. ; 222 ( 2 ): e2330345 . [DOI] [PubMed] [Google Scholar]
  • 20. Rajaraman S , Yang F , Zamzmi G , Xue Z , Antani S . Can deep adult lung segmentation models generalize to the pediatric population? Expert Syst Appl 2023. ; 229 ( Pt A ): 120531 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Kumar K , Yeo AU , McIntosh L , Kron T , Wheeler G , Franich RD . Deep Learning Auto-Segmentation Network for Pediatric Computed Tomography Data Sets: Can We Extrapolate From Adults? Int J Radiat Oncol Biol Phys 2024. ; 119 ( 4 ): 1297 – 1306 . [DOI] [PubMed] [Google Scholar]
  • 22. National Cancer Institute (NCI) . Risk-Based Therapy in Treating Younger Patients With Newly Diagnosed Liver Cancer . 2009. . https://clinicaltrials.gov/study/NCT00980460 .
  • 23. Pomerantz SM , White CS , Krebs TL , et al . Liver and Bone Window Settings for Soft-Copy Interpretation of Chest and Abdominal CT . AJR Am J Roentgenol 2000. ; 174 ( 2 ): 311 – 314 . [DOI] [PubMed] [Google Scholar]
  • 24. Tustison NJ , Avants BB , Cook PA , et al . N4ITK: Improved N3 Bias Correction . IEEE Trans Med Imaging 2010. ; 29 ( 6 ): 1310 – 1320 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Nyúl LG , Udupa JK . On standardizing the MR image intensity scale . Magn Reson Med 1999. ; 42 ( 6 ): 1072 – 1081 . [DOI] [PubMed] [Google Scholar]
  • 26. Kavur AE , Gezer NS , Barış M , et al . CHAOS Challenge - combined (CT-MR) healthy abdominal organ segmentation . Med Image Anal 2021. ; 69 : 101950 . [DOI] [PubMed] [Google Scholar]
  • 27. Isensee F , Jaeger PF , Kohl SAA , Petersen J , Maier-Hein KH . nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation . Nat Methods 2021. ; 18 ( 2 ): 203 – 211 . [DOI] [PubMed] [Google Scholar]
  • 28. Sudre CH , Li W , Vercauteren T , Ourselin S , Cardoso MJ . Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations . arXiv 2017. Preprint posted online July 11, 2017; https://arxiv.org/abs/1707.03237 . [DOI] [PMC free article] [PubMed]
  • 29. Bilic P , Christ P , Li HB , et al . The Liver Tumor Segmentation Benchmark (LiTS) . Med Image Anal 2023. ; 84 : 102680 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Modanwal G , Sarawadekar K . A robust wrist point detection algorithm using geometric features . Pattern Recognit Lett 2018. ; 110 : 72 – 78 . [Google Scholar]
  • 31. Lyu J , Bartlett PF , Nasrallah FA , Tang X . Toward hippocampal volume measures on ultra-high field magnetic resonance imaging: a comprehensive comparison study between deep learning and conventional approaches . Front Neurosci 2023. ; 17 : 1238646 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Liu CY , Valanarasu JMJ , Gonzalez C , Langlotz C , Ng A , Gatidis S . Unlocking Robust Segmentation Across All Age Groups via Continual Learning . arXiv 2024. Preprint posted online April 19, 2024; https://arxiv.org/abs/2404.13185 .
  • 33. Salman R , Nguyen HN , Sher AC , Hallam K , Seghers VJ , Sammer MBK . Diagnostic performance of artificial intelligence for pediatric pulmonary nodule detection on chest computed tomography: comparison of simulated lower radiation doses . Eur J Pediatr 2023. ; 182 ( 11 ): 5159 – 5165 . [DOI] [PubMed] [Google Scholar]
  • 34. Sammer MBK , Sher AC , Towbin AJ . Ensuring Adequate Development and Appropriate Use of Artificial Intelligence in Pediatric Medical Imaging . AJR Am J Roentgenol 2022. ; 218 ( 1 ): 182 – 183 . [DOI] [PubMed] [Google Scholar]
  • 35. Li Q , Song H , Zhang W , et al . CC-DenseUNet: Densely Connected U-Net with Criss-Cross Attention for Liver and Tumor Segmentation in CT Volumes . In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) . 2021. ; 966 – 971 . https://ieeexplore.ieee.org/document/9669823. Accessed May 7, 2025 . [Google Scholar]
  • 36. Xu Y , Cai M , Lin L , et al . PA-ResSeg: A phase attention residual network for liver tumor segmentation from multiphase CT images . Med Phys 2021. ; 48 ( 7 ): 3752 – 3766 . [DOI] [PubMed] [Google Scholar]
  • 37. Arulappan A , Thankaraj ABR . Liver tumor segmentation using a new asymmetrical dilated convolutional semantic segmentation network in CT images . Int J Imaging Syst Tech 2022. ; 32 ( 3 ): 815 – 830 . [Google Scholar]
  • 38. Wang J , Peng Y , Jing S , Han L , Li T , Luo J . A deep-learning approach for segmentation of liver tumors in magnetic resonance imaging using UNet++ . BMC Cancer 2023. ; 23 ( 1 ): 1060 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Wesdorp NJ , Zeeuw JM , Postma SCJ , et al . Deep learning models for automatic tumor segmentation and total tumor volume assessment in patients with colorectal liver metastases . Eur Radiol Exp 2023. ; 7 ( 1 ): 75 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Shan R , Pei C , Fan Q , et al . Artificial intelligence-assisted platform performs high detection ability of hepatocellular carcinoma in CT images: an external clinical validation study . BMC Cancer 2025. ; 25 ( 1 ): 154 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Qian H , Huang Y , Xu L , Fu H , Lu B . Role of peritumoral tissue analysis in predicting characteristics of hepatocellular carcinoma using ultrasound-based radiomics . Sci Rep 2024. ; 14 ( 1 ): 11538 . [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1, Figures S1-S2
rycan250041suppa1.pdf (434.6KB, pdf)
Conflicts of Interest
rycan250041coi.zip (598.1KB, zip)

Articles from Radiology: Imaging Cancer are provided here courtesy of Radiological Society of North America

RESOURCES