Skip to main content
. 2021 Nov 18;11:772663. doi: 10.3389/fonc.2021.772663

Table 3.

Characteristics for machine-learning studies on autosegmentation.

Author, year of publication Study population HN subsite Imaging modality Textural and dosimetric parameters ROI(s) Tested ML algorithm(s) Statistical findings and model performance
Brunenberg et al., 2020 (68) 58 pts Mixed CT PGs, SMGs, thyroid, buccal mucosa, extended OC, pharynx constrictors, cricopharyngeal inlet, supraglottic area, MNDB, BS Commercially available DL model; external validation The best performance was reached for the MNDB (DSC 0.90; HD95 3.6 mm); the agreement was moderate for the aerodigestive tract with the exception of the OC. The largest variations were in the caudal and/or caudal directions (binned measurements).
Ma et al., 2019 (69) 90 pts NPC CT and MR GTVs CNNs Both M-CNN and C-CNN showed better performance on MR than on CT. C-CNN outperformed M-CNN in both CTs (higher mean Sn, DSC, and ASSD, comparable mean PPV) and MR applications (higher mean PPV, DSC, and ASSD, comparable mean Sn)
Vandewinckele et al., 2019 (58) 9 pts Mixed CT Cochlea, BS, upper esophagus, glottis area, MNDB, OC, PGs, inferior, medial and superior PCMs, SC, SMGs, supraglottic Lar CNN The longitudinal CNN is able to improve the segmentation results in terms of DSC compared with the DIR for 6/13 considered OARs. The longitudinal approach outperforms the cross-sectional one in terms of both DSC and ASSD for 6 different organs (BS, upper esophagus, OC, PGs, PCM medial, and SMGs)
Hänsch et al., 2018 (63) 254 pts, 254 R PGs, 253 L PGs Mixed CT Ipsi- and contralateral PGs DL U-net The 3 ANNs showed comparable performance for training and internal validation sets (DSC ≈0.83). The 2-D ensemble and 3-D U-net showed satisfactory performance when externally validated (AUC and DSC: 0. 865 and 0.880, respectively; 2-D U-net omitted)
Mocnik et al., 2018 (62) 44 pts Not specified CT and MR PGs CNN The multimodal CNN (CT + MR) compared favorably with the single modality CNN (CT only) in the 80.6% of cases. Overall, DSCs value were 78.8 and 76.5, respectively. Both multi- and single-modality CNNs showed satisfactory registration performance
Nikolov et al., 2018 (60) 486 pts, 838 CT scans for training, test and internal validation; 46 pts and 45 CT scans for external validation Mixed CT Brain, BS, L and R cochlea, L and R LG, L and R Lens, L and R Lung, MNDB, L and R ON, L and R Orbit, L and R PGs, SC, L and R SMG 3D U-Net The segmentation algorithm showed good generalizability across different datasets and has the potential of improving segmentation efficiency. For 19/21 performance metrics (surface and volumetric DSC) were comparable with experienced radiographers; less accuracy was demonstrated for brainstem and R-lens
Ren et al., 2018 (70) 48 pts Not specified CT Chiasm, L and R ON 3D-CNNs The proposed segmentation method outperformed the one developed by the MICCAI 2015 challenge winner for all the considered ROIs (DSC chiasm: 0.58 ± 0.17 vs. 0.38; DSC ONs 0.71 ± 0.08 vs. 0.68)
Tong et al., 2018 (61) 32 pts Not specified CT L and R PGs, BS, Chiasm, L and R ONs, MNDB, L and R SMG FCNN with and without SRM Accuracy and robustness of the model were improved when incorporating shapes prior to SRM use for all considered ROIs. Segmentation results were satisfactory, ranging from DSC values of 0.583 for the chiasm to 0.937 for the MNDB. Average time for segmenting the whole structure set was 9.5 s
Zhu et al., 2018 (59) 271 CT scans Not specified CT BS, Chiasma, MNDB, L and R ON, L and R PG, L and R SMG Implemented 3D U-Net (AnatomyNet) The AnatomyNet allowed for an average improvement in segmentation performance of 3.3% (DSC) as compared with previously published data of the MICCAI 2015 challenge. Segmentation time was 0.12 s for the whole structure set.
Doshi et al., 2017 (53) 10 pts/102 MR slices Mixed MR GTVs FCLSM PLCSF showed a good performance vs the consensus manual outline (DSC: 0.79, RAD: 39.5%, MHD: 2.15, PCC: 0.89, p < 0.05) and outperformed 2 Ncut and MS clustering algorithms (the former being less accurate for small lesions and for low-contrast regions and more computationally demanding, the latter leading to more frequent over-segmentation)
Ibragimov et al., 2017 (64) 50 pts Not specified CT SC, MNDB, PGs, SMGs, Lar, Phar, R and L EB, R and L ON, optic chiasm CNN-MRF Model performance was satisfactory for almost all considered OARs (DSC values as follows—spinal cord: 87 ± 3.2; mandible: 89.5 ± 3.6; PGs DSC: 77.3 ± 5.8; submandibular glands DSC: 71.4 ± 11.6; Lar DSC: 85.6 ± 4.2; phar DSC: 69.3 ± 6.3; eye globes DSC: 88.0 ± 3.2; optic ONs DSC: 62.2 ± 7.2; optic chiasm: 37.4 ± 13.4)
Liang et al., 2017 (55) 185 pts NPC CT BS, R and L EB, R and L lens, Lar, R and L MNDB, OC, R and L MAS, SC, R and left PG, R and L T-M, R and L ON CNNs (ODS-net) ODS-net showed satisfactory Sn and Sp for most OARs (range: 0.997–1.000 and 0.983–0.999, respectively), with DSC >0.85 when compared with manually segmented contours. ODS-net outperformed a competing FCNN (p < 0.001 for all organs). Image delineation was faster in ODS than in FNC, as well, with average time of 30 vs. 52 s, respectively
Men et al., 2017 (55) 230 pts NPC CT GTV-T, GTV-N, CTV DDNN DDNN generated accurate segmentations for GTV-T and CTV (ground truth: manual segmentation), with DSC of 0.809 and 0.826, respectively, Performance for GTV-N was less satisfactory (DSC: 0.623). DDNN outperformed a competing model (VGG-16) for all the analyzed segmentations
Stefano et al., 2017 (72) 4 phantom experiments+ 18 pts/40 lesions Mixed PET GTVs RW Both the K-RW and the AW-RW compare favorably with previously developed methods in delineating complex-shaped lesions; accuracy on phantom studies was satisfactory
Wang et al., 2017 (56) 111 pts Mixed CT Cochlea, BS, upper esophagus, glottis area, MNDB, OC, PGs, inferior, medial and superior PCMs, SC, SMGs, supraglottic Lar 3D U-Net The model showed satisfactory performance for most of the 9 considered ROIs; when compared with other models, it ranked first in 5/9 cases (L and R PG, L and R ON, L SMG), and second in 4/9 cases
Beichel et al., 2016 (52) 59 pts/230 lesions Mixed PET GTVs Semiautomated segmentation (LOGISMOS) Segmentation accuracy measured by the DSC was comparable for semiautomated and manual segmentation (DSC: 0.766 and 0.764, respectively)
Yang et al., 2014 (65) 15 pts/30 PGs/57 MRs Mixed MR Ipsi- and contralateral PGs SVM Average DSC between automated and manual contours were 91.1% ± 1.6% for the L PG and 90.5% ± 2.4% for the R PG. Performance was slightly better for the L PG, also when assessed per the averaged maximum and average surface distance
Cheng G et al., 2013 (66) 5 pts, 10 PGs NPC MR Ipsi- and controlateral PGs SVM Mean DSC between automated and physician’s PG contours was 0.853 (range: 0.818–0.891)
Qazi et al., 2011 (67) 25 pts Not specified CT I MNDB, BS, L and R PG, L and R SMG, L and R node level IB, L and R node levels II–IV Atlas based segmentation As compared with manual delineations by an expert, the automated segmentation framework showed high accuracy with DSC of 0.93 for the MNDB, 0.83 for the PGs,.83 for SMGs and 0,.74 for nodal levels
Chen et al., 2010 (54) 15 pts/15 neck nodal levels Mixed CT II, III, and IV neck nodal levels ASM The ASM outperformed the atlas-based method (ground truth: manually segmented contours), with higher DSC (10.7%) and lower mean and median surface errors (−13.6% and −12.0%, respectively)
Yu et al., 2009 (73) 10 pts/10 GTV-T and 19 GTV-N Mixed PET and CT I GTVs KNN The feature-based classifier showed better performance than other delineation methods (e.g. standard uptake value of 2.5, 50% maximal intensity and signal/background ratio)

2D/3D, 2/3-dimensional; ANN, Artificial Neural Network; ASM, active shape model; ASSD, average symmetric surface distance; AW-RW, K-RW algorithm with adaptive probability threshold; BS, brainstem; CNN, convolutional neural network; C-CNN, combined CNN, CT, computed tomography; CTV, clinical target volume; D, dosimetric; DDNN, deep deconvolutional neural network; DIR, deformable image registration; DL, deep learning; DSC, Dice Similarity Coefficient; EB, eyeball; FCLSM, modified fuzzy c-means clustering integrated with the level set method; FCNN, fully convolutional neural network; GTV-N, nodal-gross tumor volume; GTV-T, tumor-gross tumor volume; HD, Hausdorff distance; I, imaging; KNN, k-nearest neighbors; K-RW, RW algorithm with K-means; L, left; Lar, larynx; LG, lacrimal gland; LOGISMOS, layered optimal graph image segmentation of multiple objects and surfaces; M-CNN, multimodality convolutional neural network; MHD, modified Hausdorff distance; MICCAI, Medical Image Computing and Computer Assisted Intervention; MNDB, mandible; MR, magnetic resonance; MRF, Markov random field; MAS, mastoid; MS, mean shift; Ncut, normalized cut; NPC, nasopharyngeal carcinoma; OAR, organ at risk; LG, lacrimal gland; OC, oral cavity; ODS-net, organs at risk detection and segmentation network; ON, optic nerve; p, p-value; PCC, Pearson correlation coefficient; PCM, pharyngeal constrictors muscles; PET, positron emission tomography; PG, parotid gland; Phar, pharynx; PLCSF, pharyngeal and laryngeal cancer segmentation framework; PPV, positive predictive value; pt, patient; R, right; RAD, relative area difference; ROI, region of interest; RW, Rescola Wagner; SC, spinal cord; s, second; SMG, submandibular gland; Sn, sensitivity; Sp, specificity; SRM, shape representation model; SVM, support vector machine; VGG-16, visual geometry group-16.