Table 1. Autosegmentation Performance on 3 Head and Neck Data Sets.
Data set | Dice score, mean (SD) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Brainstem | Mandible | Spinal cord | Globe | Parotid | SMG | ||||
Left | Right | Left | Right | Left | Right | ||||
IOV-10a | |||||||||
Annotator 1 | 89.3 (4.2) | 98.6 (1.0) | 92.9 (1.5) | 96.4 (0.9) | 96.5 (1.1) | 92.7 (3.5) | 92.7 (3.5) | 92.3 (3.4) | 92.3 (2.6) |
Annotator 2 | 91.8 (2.0) | 98.5 (0.5) | 91.8 (2.3) | 95.6 (1.3) | 96.7 (1.1) | 91.1 (4.3) | 91.2 (3.7) | 91.3 (4.7) | 91.3 (5.4) |
Annotator 3 | 89.6 (2.7) | 96.9 (1.0) | 81.9 (7.3) | 96.5 (0.8) | 95.7 (1.0) | 88.2 (3.8) | 90.1 (2.8) | 91.6 (2.8) | 90.3 (8.0) |
Ensemble | 88.5 (2.0) | 97.0 (1.0) | 87.7 (3.6) | 94.8 (1.0) | 94.5 (1.9) | 88.5 (2.3) | 87.8 (4.1) | 87.0 (2.9) | 85.1 (5.3) |
Agreement between annotators, κ | 0.831 | 0.971 | 0.836 | 0.927 | 0.939 | 0.838 | 0.845 | 0.848 | 0.836 |
Agreement between annotators and model | 0.806 | 0.966 | 0.844 | 0.917 | 0.931 | 0.852 | 0.825 | 0.803 | 0.794 |
Main data set, ensembleb | 85.0 (3.7) | 95.7 (2.3) | 84.0 (3.8) | 92.9 (1.6) | 93.1 (1.5) | 87.9 (3.8) | 87.8 (4.3) | 87.5 (2.3) | 86.7 (3.5) |
External data set, ensemblec | 84.9 (6.8) | 93.8 (2.5) | 80.3 (7.7) | 92.7 (3.6) | 93.3 (1.4) | 84.3 (4.6) | 84.5 (4.3) | 83.3 (9.1) | 78.2 (21.1) |
External data set,c Nikolov et al15 | 79.1 (9.6) | 93.8 (1.6) | 80.0 (7.8) | 91.5 (2.1) | 92.1 (1.9) | 83.2 (5.4) | 84.0 (3.7) | 80.3 (7.8) | 76.0 (16.5) |
External data set, radiographerc | 89.5 (2.2) | 93.9 (2.3) | 84.0 (4.8) | 92.9 (1.9) | 93.0 (1.7) | 86.7 (3.5) | 87.0 (3.1) | 83.3 (19.7) | 74.9 (30.2) |
Abbreviations: IOV, interobserver variability; SMG, submandibular glands.
IOV-10 data set included 10 images. In the IOV study, a subset of the main data set was annotated multiple times by 2 radiation oncologists and a trained reader. Later, the proposed model was compared against each human expert. The statistical agreement between annotators and model were measured with Fleiss κ values.
Main data set included 20 images.
External data set included 26 images. For the external data set, the reference ground truth contours were delineated by an expert head and neck oncologist, and IOV between clinical experts was measured by comparing the reference contours with those produced by an experienced radiographer.15