Table 2.
Metrics | Dataset | DeepMedic | DAGMNet_CH3 | DAGMNet_CH2 | UNet_CH3 | UNet_CH2 | FCN_CH3 | FCN_CH2 |
---|---|---|---|---|---|---|---|---|
Dice score | Testing (n = 459) | 0.74 (0.17); 0.79 | 0.76 (0.16); 0.81 | 0.75 (0.17); 0.80 | 0.75 (0.18); 0.81 | 0.74 (0.20); 0.80 | 0.68 (0.20); 0.72 | 0.66 (0.20); 0.71 |
STIR 2 (n = 140) | 0.76 (0.18); 0.82 | 0.75 (0.21); 0.82 | 0.75 (0.21); 0.81 | 0.73 (0.24); 0.82 | 0.73 (0.24); 0.82 | 0.70 (0.22); 0.75 | 0.68 (0.24); 0.75 | |
STIR 1 (n = 140) | 0.55 (0.27); 0.60 | 0.51 (0.30); 0.59 | 0.48 (0.32); 0.58 | 0.49 (0.31); 0.59 | 0.48 (0.32); 0.58 | 0.49 (0.28); 0.55 | 0.44 (0.30); 0.46 | |
Testing L (n = 163) | 0.85 (0.09); 0.87 | 0.83 (0.10); 0.86 | 0.84 (0.09); 0.86 | 0.85 (0.09); 0.88 | 0.84 (0.10); 0.87 | 0.81 (0.10); 0.84 | 0.80 (0.11); 0.83 | |
STIR 2 L (n = 76) | 0.84 (0.13); 0.88 | 0.81 (0.18); 0.88 | 0.82 (0.16); 0.89 | 0.81 (0.20); 0.89 | 0.81 (0.21); 0.88 | 0.79 (0.18); 0.84 | 0.77 (0.21); 0.86 | |
STIR 1 L (n = 50) | 0.67 (0.25); 0.78 | 0.64 (0.28); 0.77 | 0.64 (0.29); 0.79 | 0.59 (0.30); 0.72 | 0.62 (0.30); 0.76 | 0.61 (0.28); 0.74 | 0.59 (0.30); 0.73 | |
Testing M (n = 144) | 0.74 (0.13); 0.76 | 0.75 (0.14); 0.80 | 0.74 (0.14); 0.77 | 0.76 (0.14); 0.79 | 0.74 (0.16); 0.79 | 0.67 (0.15); 0.71 | 0.66 (0.15); 0.68 | |
STIR 2 M (n = 43) | 0.73 (0.13); 0.77 | 0.75 (0.13); 0.77 | 0.75 (0.15); 0.78 | 0.72 (0.20); 0.78 | 0.71 (0.22); 0.77 | 0.66 (0.16); 0.70 | 0.66 (0.16); 0.70 | |
STIR 1 M (n = 51) | 0.53 (0.24); 0.59 | 0.49 (0.28); 0.59 | 0.43 (0.30); 0.50 | 0.45 (0.31); 0.57 | 0.42 (0.30); 0.37 | 0.47 (0.24); 0.53 | 0.39 (0.26); 0.42 | |
Testing S (n = 152) | 0.63 (0.18); 0.67 | 0.68 (0.19); 0.73* | 0.66 (0.22); 0.72 | 0.65 (0.22); 0.72 | 0.62 (0.25); 0.69 | 0.54 (0.22); 0.58 | 0.51 (0.22); 0.56 | |
STIR 2 S (n = 21) | 0.52 (0.21); 0.56 | 0.51 (0.25); 0.55 | 0.48 (0.27); 0.51 | 0.49 (0.28); 0.55 | 0.53 (0.26); 0.62 | 0.45 (0.24); 0.48 | 0.40 (0.22); 0.42 | |
STIR 1 S (n = 39) | 0.43 (0.25); 0.52 | 0.37 (0.29); 0.48 | 0.34 (0.31); 0.29 | 0.41 (0.31); 0.48 | 0.38 (0.32); 0.47 | 0.37 (0.25); 0.42 | 0.32 (0.27); 0.38 | |
Precision | Testing (n = 459) | 0.76 (0.21); 0.82 | 0.83 (0.17); 0.88* | 0.81 (0.18); 0.87 | 0.80 (0.18); 0.86 | 0.81 (0.19); 0.87 | 0.70 (0.22); 0.75 | 0.68 (0.22); 0.73 |
STIR 2 (n = 140) | 0.75 (0.19); 0.79 | 0.80 (0.20); 0.87* | 0.78 (0.20); 0.85 | 0.78 (0.21); 0.84 | 0.80 (0.19); 0.85 | 0.72 (0.20); 0.78 | 0.73 (0.20); 0.78 | |
STIR 1 (n = 140) | 0.62 (0.26); 0.67 | 0.62 (0.31); 0.72 | 0.55 (0.33); 0.64 | 0.65 (0.31); 0.77 | 0.66 (0.32); 0.78 | 0.57 (0.28); 0.65 | 0.57 (0.33); 0.69 | |
Sensitivity | Testing (n = 459) | 0.78 (0.17); 0.83* | 0.73 (0.19); 0.77 | 0.74 (0.21); 0.79 | 0.76 (0.21);0.83 | 0.71 (0.23); 0.78 | 0.71 (0.21); 0.77 | 0.69 (0.23); 0.76 |
STIR 2 (n = 140) | 0.82 (0.21); 0.91* | 0.76 (0.24); 0.85 | 0.78 (0.25); 0.87 | 0.76 (0.28); 0.90 | 0.75 (0.28); 0.88 | 0.74 (0.26); 0.85 | 0.72 (0.27); 0.82 | |
STIR 1 (n = 140) | 0.59 (0.32); 0.65 | 0.52 (0.33); 0.62 | 0.53 (0.37); 0.65 | 0.48 (0.35); 0.53 | 0.46 (0.35); 0.53 | 0.52 (0.32); 0.61 | 0.43 (0.33); 0.41 | |
Subject detection rate | Testing (n = 459) | 1.00 (0.05); | 0.99 (0.08); | 0.98 (0.12); | 0.99 (0.11); | 0.98 (0.15); | 0.98 (0.13); | 0.97 (0.17); |
[0.99, 1.00] | [0.99, 1.00] | [0.97, 1.00] | [0.98, 1.00] | [0.96, 0.99] | [0.97, 0.99] | [0.96, 0.99] | ||
STIR 2 (n = 140) | 0.99 (0.08); | 0.98 (0.14); | 0.99 (0.12); | 0.98 (0.14); | 0.97 (0.17); | 0.98 (0.14); | 0.99 (0.12); | |
[0.98,1.01] | [0.95, 1.00] | [0.97, 1.01] | [0.95, 1.00] | [0.94, 1.00] | [0.95, 1.00] | [0.97, 1.01] | ||
STIR 1 (n = 140) | 0.96 (0.20); | 0.90 (0.30); | 0.84 (0.36); | 0.87 (0.33); | 0.85 (0.36); | 0.91 (0.29); | 0.85 (0.36); | |
[0.92, 0.99] | 0.85, 0.95 | [0.78, 0.90] | [0.82, 0.93] | [0.79, 0.91] | [0.86, 0.96] | [0.79, 0.91] | ||
Spearman correlation of dice and lesion volume size | Testing (n = 459) | 0.62 [0.57, 0.68] | 0.44 [0.37, 0.51] | 0.48 [0.41, 0.55] | 0.53 [0.46, 0.59] | 0.53 [0.46, 0.59] | 0.63 [0.57, 0.68] | 0.65 [0.59, 0.70] |
STIR 2 (n = 140) | 0.68 [0.58, 0.76] | 0.49 [0.36, 0.61] | 0.54 [0.41, 0.65] | 0.55 [0.42, 0.65] | 0.51 [0.37, 0.62] | 0.60 [0.48, 0.69] | 0.59 [0.48, 0.69] | |
STIR 1 (n = 140) | 0.42 [0.28, 0.55] | 0.42 [0.27, 0.55] | 0.42 [0.27, 0.55] | 0.30 [0.14, 0.44] | 0.36 [0.21, 0.50] | 0.44 [0.29, 0.56] | 0.42 [0.28, 0.55] | |
Spearman correlation of dice and lesion DWI contrast | Testing (n = 459) | 0.60 [0.54, 0.66] | 0.65 [0.59, 0.70] | 0.61 [0.55, 0.66] | 0.62 [0.56, 0.68] | 0.64 [0.59, 0.69] | 0.64 [0.59, 0.69] | 0.65 [0.59, 0.70] |
STIR 2 (n = 140) | 0.45 [0.31, 0.57] | 0.57 [0.44, 0.67] | 0.54 [0.41, 0.65] | 0.54 [0.41, 0.65] | 0.55 [0.42, 0.65] | 0.52 [0.38, 0.63] | 0.56 [0.43, 0.66] | |
STIR 1 (n = 140) | 0.52 [0.38, 0.63] | 0.56 [0.43, 0.66] | 0.41 [0.26, 0.54] | 0.51 [0.37, 0.62] | 0.40 [0.25, 0.53] | 0.45 [0.30, 0.57] | 0.42 [0.28, 0.55] | |
Spearman correlation of dice and lesion ADC contrast | Testing (n = 459) | −0.33 [−0.41, −0.24] | −0.48 [−0.55, −0.41] | −0.47 [−0.53, −0.39] | −0.44 [−0.51, −0.36] | −0.46 [−0.53, −0.38] | −0.41 [−0.48, −0.33] | −0.40 [−0.48, −0.32] |
STIR 2 (n = 140) | −0.31 [−0.45, −0.15] | −0.37 [−0.51, −0.22] | −0.36 [−0.50, −0.21] | −0.40 [−0.53, −0.25] | −0.42 [−0.55, −0.28] | −0.38 [−0.51, −0.23] | −0.41 [−0.54, −0.26] | |
STIR 1 (n = 140) | −0.24 [−0.39, −0.08]+ | −0.30 [−0.44, −0.14] | −0.27 [−0.42, −0.11]+ | −0.29 [−0.44, −0.13] | −0.30 [−0.44, −0.14] | −0.13 [−0.29, 0.03]+ | −0.20 [−0.35, −0.03]+ | |
Spearman correlation of lesion and predict volume size | Testing (n = 459) | 0.97 [0.96, 0.97] | 0.97 [0.97, 0.98] | 0.97 [0.96, 0.97] | 0.97 [0.96, 0.98] | 0.97 [0.96, 0.97] | 0.97 [0.96, 0.97] | 0.97 [0.96, 0.97] |
STIR 2 (n = 140) | 0.97 [0.96, 0.98] | 0.96 [0.94, 0.97] | 0.96 [0.94, 0.97] | 0.93 [0.90, 0.95] | 0.89 [0.86, 0.92] | 0.95 [0.93, 0.96] | 0.94 [0.91, 0.96] | |
STIR 1 (n = 140) | 0.87 [0.83, 0.91] | 0.84 [0.79, 0.89] | 0.80 [0.73, 0.85] | 0.81 [0.74, 0.86] | 0.79 [0.72, 0.85] | 0.84 [0.78, 0.88] | 0.79 [0.72, 0.85] | |
Spearman correlation of lesion and predict DWI contrast | Testing (n = 459) | 0.87 [0.85, 0.89] | 0.89 [0.86, 0.90] | 0.88 [0.86, 0.90] | 0.87 [0.84, 0.89] | 0.88 [0.86, 0.90] | 0.86 [0.83, 0.88] | 0.85 [0.82, 0.87] |
STIR 2 (n = 140) | 0.83 [0.77, 0.88] | 0.81 [0.74, 0.86] | 0.85 [0.80, 0.89] | 0.87 [0.82, 0.90] | 0.88 [0.84, 0.91] | 0.83 [0.77, 0.87] | 0.82 [0.76, 0.87] | |
STIR 1 (n = 140) | 0.61 [0.50, 0.71] | 0.70 [0.61, 0.78] | 0.59 [0.47, 0.69] | 0.69 [0.58, 0.77] | 0.74 [0.65, 0.81] | 0.59 [0.48, 0.69] | 0.50 [0.36, 0.62] | |
Spearman correlation of lesion and predict ADC contrast | Testing (n = 459) | 0.77 [0.74, 0.81] | 0.84 [0.81, 0.86] | 0.83 [0.80, 0.86] | 0.82 [0.79, 0.85] | 0.83 [0.80, 0.86] | 0.80 [0.77, 0.83] | 0.80 [0.76, 0.83] |
STIR 2 (n = 140) | 0.85 [0.80, 0.89] | 0.86 [0.81, 0.90] | 0.91 [0.87, 0.93] | 0.87 [0.82, 0.90] | 0.93 [0.90,0.95] | 0.90 [0.86, 0.93] | 0.84 [0.79, 0.88] | |
STIR 1 (n = 140) | 0.51 [0.38, 0.63] | 0.58 [0.46, 0.68] | 0.52 [0.38, 0.63] | 0.55 [0.42, 0.66] | 0.57 [0.44, 0.67] | 0.53 [0.39, 0.64] | 0.47 [0.32, 0.59] | |
Median of false positives | Not visible (n = 499) | 14 | 0 | 0 | 0 | 0 | 0 | 0 |
Number of subjects whose FP > 10 voxels | Not visible (n = 499) | 275 | 132 | 78 | 55 | 36 | 182 | 88 |
False positive subject detection rate | Not visible (n = 499) | 0.55 (0.50); | 0.26 (0.44); | 0.16 (0.36); | 0.11 (0.31); | 0.07 (0.26); | 0.36 (0.48); | 0.18 (0.38); |
[0.51, 0.59]* | [0.23, 0.30] | [0.12, 0.19] | [0.08, 0.14] | [0.05, 0.09] | [0.32, 0.41] | [0.14, 0.21] | ||
False positive subject detection rate (retrospect evaluation) | Not visible (n = 499) | 0.53 (0.50); | 0.24 (0.43); | 0.14 (0.35); | 0.10 (0.30); | 0.06 (0.24); | 0.34 (0.47); | 0.15 (0.36); |
[0.48, 0.57]* | [0.21, 0.28] | [0.11, 0.17] | [0.07, 0.12] | [0.04, 0.08] | [0.30, 0.38] | [0.12, 0.18] | ||
Number of trainable parameters | All | 24.5 M | 10.7 M | 10.7 M | 10.0 M | 10.0 M | 10.1 M | 10.1M |
CPU inference time in seconds | Testing (n = 459) | 85.68 | 30.10 (0.52) | 29.09 (0.46) | 19.30 (0.34) | 18.71 (0.37) | 7.15 (0.52) | 6.80 (0.55) |
GPU inference time in seconds | Testing (n = 459) | 14.97 | 5.91 (0.44) | 4.82 (0.30) | 3.78 (0.18) | 3.59 (0.18) | 2.40 (0.18) | 2.26(0.18) |
Metrics (dice, precision, sensitivity) are represented as “mean (standard deviation); median”; subject detection rate is represented as “mean (standard deviation); [95% CI]”. The correlations are shown as “correlation coefficient; [95% CI]”. “+” indicates no significant correlations (P value>1E − 3); all the other correlations were significant with P value≤1E − 3. In dataset column, L = large; M = moderate; S = small lesion groups. The statistical significant difference between DAGMNet_CH3 and DeepMedic is labeled by “*”.