Skip to main content
. 2020 Nov 30;3(11):e2027426. doi: 10.1001/jamanetworkopen.2020.27426

Table 1. Autosegmentation Performance on 3 Head and Neck Data Sets.

Data set Dice score, mean (SD)
Brainstem Mandible Spinal cord Globe Parotid SMG
Left Right Left Right Left Right
IOV-10a
Annotator 1 89.3 (4.2) 98.6 (1.0) 92.9 (1.5) 96.4 (0.9) 96.5 (1.1) 92.7 (3.5) 92.7 (3.5) 92.3 (3.4) 92.3 (2.6)
Annotator 2 91.8 (2.0) 98.5 (0.5) 91.8 (2.3) 95.6 (1.3) 96.7 (1.1) 91.1 (4.3) 91.2 (3.7) 91.3 (4.7) 91.3 (5.4)
Annotator 3 89.6 (2.7) 96.9 (1.0) 81.9 (7.3) 96.5 (0.8) 95.7 (1.0) 88.2 (3.8) 90.1 (2.8) 91.6 (2.8) 90.3 (8.0)
Ensemble 88.5 (2.0) 97.0 (1.0) 87.7 (3.6) 94.8 (1.0) 94.5 (1.9) 88.5 (2.3) 87.8 (4.1) 87.0 (2.9) 85.1 (5.3)
Agreement between annotators, κ 0.831 0.971 0.836 0.927 0.939 0.838 0.845 0.848 0.836
Agreement between annotators and model 0.806 0.966 0.844 0.917 0.931 0.852 0.825 0.803 0.794
Main data set, ensembleb 85.0 (3.7) 95.7 (2.3) 84.0 (3.8) 92.9 (1.6) 93.1 (1.5) 87.9 (3.8) 87.8 (4.3) 87.5 (2.3) 86.7 (3.5)
External data set, ensemblec 84.9 (6.8) 93.8 (2.5) 80.3 (7.7) 92.7 (3.6) 93.3 (1.4) 84.3 (4.6) 84.5 (4.3) 83.3 (9.1) 78.2 (21.1)
External data set,c Nikolov et al15 79.1 (9.6) 93.8 (1.6) 80.0 (7.8) 91.5 (2.1) 92.1 (1.9) 83.2 (5.4) 84.0 (3.7) 80.3 (7.8) 76.0 (16.5)
External data set, radiographerc 89.5 (2.2) 93.9 (2.3) 84.0 (4.8) 92.9 (1.9) 93.0 (1.7) 86.7 (3.5) 87.0 (3.1) 83.3 (19.7) 74.9 (30.2)

Abbreviations: IOV, interobserver variability; SMG, submandibular glands.

a

IOV-10 data set included 10 images. In the IOV study, a subset of the main data set was annotated multiple times by 2 radiation oncologists and a trained reader. Later, the proposed model was compared against each human expert. The statistical agreement between annotators and model were measured with Fleiss κ values.

b

Main data set included 20 images.

c

External data set included 26 images. For the external data set, the reference ground truth contours were delineated by an expert head and neck oncologist, and IOV between clinical experts was measured by comparing the reference contours with those produced by an experienced radiographer.15