TABLE II. Scan Level Segmentation Performance.
| Methods | The Harbin dataset | ||
|---|---|---|---|
| dice | recall | worst-case | |
| Ours | 0.783 ± 0.080 | 0.776± 0.072 | (0.577, 0) |
| 2D U-net | 0.565± 0.275 | 0.625± 0.292 | (0.097, 13) |
| H-DUNet | 0.597± 0.104 | 0.802 ±0.058 | (0.124, 3) |
| MPUnet | 0.449± 0.206 | 0.448± 0.190 | (0.000, 31) |
| 3D U-net | 0.621± 0.112 | 0.702± 0.111 | (0.032, 12) |
| 3D V-net | 0.641± 0.187 | 0.769± 0.123 | (0.049, 8) |
| Ours* | 0.783 ±0.080 | 0.776± 0.072 | – |
| 2D U-net* | 0.593± 0.273 | 0.636± 0.291 | – |
| H-DUNet* | 0.605± 0.102 | 0.803 ±0.058 | – |
| MPUnet* | 0.559± 0.165 | 0.496± 0.155 | – |
| 3D U-net* | 0.658± 0.105 | 0.707± 0.109 | – |
| 3D V-net* | 0.667± 0.182 | 0.770± 0.123 | – |
| Ours# | 0.802 ±0.072 | 0.794± 0.068 | (0.656, 0) |
| 2D U-net# | 0.617± 0.189 | 0.653± 0.202 | (0.201, 0) |
| H-DUNet# | 0.643± 0.095 | 0.823 ±0.042 | (0.377, 0) |
| MPUnet# | 0.543± 0.118 | 0.566± 0.095 | (0.236, 0) |
| 3D U-net# | 0.706± 0.084 | 0.779± 0.75 | (0.334, 0) |
| 3D V-net# | 0.708± 0.100 | 0.788± 0.71 | (0.385, 0) |
| Ours@ | 0.903 ±0.037 | 0.898± 0.032 | (0.728, 0) |
| 2D U-net@ | 0.767± 0.169 | 0.787± 0.163 | (0.224, 0) |
| H-DUnet@ | 0.820± 0.053 | 0.904 ±0.021 | (0.477, 0) |
| MPUnet@ | 0.683± 0.138 | 0.660± 0.095 | (0.254, 0) |
| 3D U-net@ | 0.826± 0.084 | 0.849± 0.077 | (0.424, 0) |
| 3D V-net@ | 0.855± 0.055 | 0.887± 0.050 | (0.503, 0) |
The best performer under each criterion is in bold. The performance is shown in the form of average±standard deviation. The last column shows the worst-case dice and the number of failure cases (defined as dice below 0.2). The same 200% data augmentation was applied to train all the methods. Rows marked with * show the performance of the method by excluding the failure cases. Rows marked with # show the performance of the method by training and testing on the highest-quality subset from our dataset, which corresponds to the CT scans from the same machine with the highest signal-to-noise ratio, which were visually confirmed by radiologists. Rows marked with @ show the performance of the method by a less stringent evaluation criterion: as long as a predicted infection point is within 2 pixels from a true infection point, it is counted as a true positive. This criterion will make the prediction task much easier, especially for the early-stage patients.