Skip to main content
. 2020 Jun 11;39(8):2638–2652. doi: 10.1109/TMI.2020.3001810

TABLE II. Scan Level Segmentation Performance.

Methods The Harbin dataset
dice recall worst-case
Ours 0.783 ± 0.080 0.776± 0.072 (0.577, 0)
2D U-net 0.565± 0.275 0.625± 0.292 (0.097, 13)
H-DUNet 0.597± 0.104 0.802 ±0.058 (0.124, 3)
MPUnet 0.449± 0.206 0.448± 0.190 (0.000, 31)
3D U-net 0.621± 0.112 0.702± 0.111 (0.032, 12)
3D V-net 0.641± 0.187 0.769± 0.123 (0.049, 8)
Ours* 0.783 ±0.080 0.776± 0.072
2D U-net* 0.593± 0.273 0.636± 0.291
H-DUNet* 0.605± 0.102 0.803 ±0.058
MPUnet* 0.559± 0.165 0.496± 0.155
3D U-net* 0.658± 0.105 0.707± 0.109
3D V-net* 0.667± 0.182 0.770± 0.123
Ours# 0.802 ±0.072 0.794± 0.068 (0.656, 0)
2D U-net# 0.617± 0.189 0.653± 0.202 (0.201, 0)
H-DUNet# 0.643± 0.095 0.823 ±0.042 (0.377, 0)
MPUnet# 0.543± 0.118 0.566± 0.095 (0.236, 0)
3D U-net# 0.706± 0.084 0.779± 0.75 (0.334, 0)
3D V-net# 0.708± 0.100 0.788± 0.71 (0.385, 0)
Ours@ 0.903 ±0.037 0.898± 0.032 (0.728, 0)
2D U-net@ 0.767± 0.169 0.787± 0.163 (0.224, 0)
H-DUnet@ 0.820± 0.053 0.904 ±0.021 (0.477, 0)
MPUnet@ 0.683± 0.138 0.660± 0.095 (0.254, 0)
3D U-net@ 0.826± 0.084 0.849± 0.077 (0.424, 0)
3D V-net@ 0.855± 0.055 0.887± 0.050 (0.503, 0)

The best performer under each criterion is in bold. The performance is shown in the form of average±standard deviation. The last column shows the worst-case dice and the number of failure cases (defined as dice below 0.2). The same 200% data augmentation was applied to train all the methods. Rows marked with * show the performance of the method by excluding the failure cases. Rows marked with # show the performance of the method by training and testing on the highest-quality subset from our dataset, which corresponds to the CT scans from the same machine with the highest signal-to-noise ratio, which were visually confirmed by radiologists. Rows marked with @ show the performance of the method by a less stringent evaluation criterion: as long as a predicted infection point is within 2 pixels from a true infection point, it is counted as a true positive. This criterion will make the prediction task much easier, especially for the early-stage patients.