. 2019 Jun;40(6):938–945. doi: 10.3174/ajnr.A6077

Table 2:

Comparison of performance metrics of segmentations for different CNN models^a

Model	Dice	Precision	Sensitivity
LOWB	6.5 (0.3–20.9)	5.7 (0.3–32.7)	8.5 (0.3–28.5)
ADC^b	56.4 (27.1–75.4)	59.4 (22.3–78.4)	58.2 (32.7–78.9)
DWI	72.3 (46.2–82.5)	73.0 (38.3–88.1)	84.0 (62.4–90.8)
ADC+LOWB	76.5 (51.9–86.1)	78.1 (47.2–88.8)	79.2 (66.6–89.7)
DWI+LOWB	76.7 (58.4–85.4)	79.4 (52.0–89.8)	83.0 (64.8–90.6)
DWI+ADC	79.0 (57.1–86.4)	79.0 (62.1–90.5)	82.6 (68.4–91.4)
DWI+ADC+LOWB	78.9 (56.2–86.2)	77.4 (55.0–89.8)	83.4 (71.3–91.8)
E2 (DWI+ADC)	82.0 (62.9–88.1)	82.0 (65.1–92.6)^b	84.1 (71.0–92.6)
E3 (DWI+ADC+LOWB)	82.2 (64.9–88.9)	83.2 (67.7–93.3)	83.9 (71.9–92.4)

All metrics are denoted in percentages as median (IQR). Of the nonensemble models, significant differences in Dice, precision, and sensitivity were found (P < .001). The ensemble models, E2 and E3, were superior to all other models (P < .001).

Excludes 1 subject with an automatically segmented lesion volume of zero because precision is undefined in this circumstance.