Skip to main content
. Author manuscript; available in PMC: 2022 Jan 1.
Published in final edited form as: Med Image Anal. 2020 Oct 13;67:101840. doi: 10.1016/j.media.2020.101840

Table 3:

Our Models Genesis lead the best or comparable performance on five distinct medical target tasks over six self-supervised learning approaches (revised in 3D) and three competing publicly available (fully) supervised pre-trained 3D models. For ease of comparison, we evaluate AUC score for the two classification tasks (i.e., NCC and ECC) and IoU score for the three segmentation tasks (i.e., NCS, LCS, and BMS). All of the results, including the mean and standard deviation (mean±s.d.) across ten trials, reported in the table are evaluated using our dataset splitting, elaborated in Sec. 3.2. For every target task, we have further performed independent two sample t-test between the best (bolded) vs. others and highlighted boxes in blue when they are not statistically significantly different at p = 0.05 level. The footnotes compare our results with the state-of-the-art performance for each target task, using the evaluation metric for the data acquired from competitions.

Pre-training Approach Target tasks
NCC1 (%) NCS2 (%) ECC3 (%) LCS4 (%) BMS5 (%)
Random with Uniform Init 94.74±1.97 75.48±0.43 80.36±3.58 78.68±4.23 60.79±1.60
No Random with Xavier Init (Glorot and Bengio, 2010) 94.25±5.07 74.05±1.97 79.99±8.06 77.82±3.87 58.52±2.61
Random with MSRA Init (He et al., 2015) 96.03±1.82 76.44±0.45 78.24±3.60 79.76±5.43 63.00±1.73

I3D (Carreira and Zisserman, 2017) 98.26±0.27 71.58±0.55 80.55±1.11 70.65±4.26 67.83±0.75
(Fully) supervised NiftyNet (Gibson et al., 2018b) 94.14±4.57 52.98±2.05 77.33±8.05 83.23±1.05 60.78±1.60
MedicalNet (Chen et al., 2019b) 95.80±0.49 75.68±0.32 86.43±1.44 85.52±0.58 66.09±1.35

De-noising (Vincent et al., 2010) 95.92±1.83 73.99±0.62 85.14±3.02 84.36±0.96 57.83±1.57
In-painting (Pathak et al., 2016) 91.46±2.97 76.02±0.55 79.79±3.55 81.36±4.83 61.38±3.84
Jigsaw (Noroozi and Favaro, 2016) 95.47±1.24 70.90±1.55 81.79±1.04 82.04±1.26 63.33±1.11
Self-supervised DeepCluster (Caron et al., 2018) 97.22±0.55 74.95±0.46 84.82±0.62 82.66±1.00 65.96±0.85
Patch shuffling (Chen et al., 2019a) 91.93±2.32 75.74±0.51 82.15±3.30 82.82±2.35 52.95±6.92
Rubik’s Cube (Zhuang et al., 2019) 96.24±1.27 72.87±0.16 80.49±4.64 75.59±0.20 62.75±1.93
Genesis Chest CT (ours) 98.34±0.44 77.62±0.64 87.20±2.87 85.10±2.15 67.96±1.29
1

The winner in LUNA (2016) holds an official score of 0.968 vs. 0.971 (ours)

2

Wu et al. (2018) holds a Dice of 74.05% vs. 75.86%±0.90% (ours)

3

Zhou et al. (2017) holds an AUC of 87.06% vs. 87.20%±2.87% (ours)

4

The winner in LiTS (2017) with post-processing holds a Dice of 96.60% vs. 93.19%±0.46% (ours without post-processing)

5

MRI Flair images are only utilized for segmenting brain tumors, so the results are not submitted to BraTS 2018.

Genesis Chest CT is slightly outperformed by MedicalNet in LCS because the latter has been (fully) supervised pre-trained on the LiTS dataset.