Table 2.
The image captioning results of our method and others on the MSCOCO Karpathy test split after CIDEr-D score optimization.
Method | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | METEOR | ROUGE-L | CIDEr-D | SPICE |
---|---|---|---|---|---|---|---|---|
LSTM [20] | - | - | - | 31.9 | 25.5 | 54.3 | 106.3 | - |
SCST [30] | - | - | - | 34.2 | 26.7 | 55.7 | 114.0 | - |
RFNet [40] | 79.1 | 63.1 | 48.4 | 36.5 | 27.7 | 57.3 | 121.9 | 21.2 |
UpDown [22] | 79.8 | - | - | 36.3 | 27.7 | 56.9 | 120.1 | 21.4 |
Cai et al. [42] | 80.0 | 64.3 | 49.6 | 37.5 | 28.2 | 58.2 | 126.0 | 21.8 |
UpDown+RD [43] | 80.0 | - | - | 37.8 | 28.2 | 57.9 | 125.3 | - |
UpDown+STAM [41] | 80.2 | 64.4 | 49.7 | 37.7 | 28.2 | 58.1 | 125.9 | 21.7 |
UpDown+LAT [44] | 80.4 | - | - | 37.7 | 28.4 | 58.3 | 127.1 | 22.0 |
VRAtt-Soft [45] | 80.2 | 63.3 | 48.7 | 37.3 | 28.4 | 61.4 | 121.8 | 21.8 |
UpDown+MA [46] | 80.2 | - | - | 37.5 | 28.4 | 58.2 | 125.4 | 22.0 |
Ours: PW | 80.4 | 65.1 | 50.8 | 39.1 | 28.7 | 58.7 | 127.6 | 22.2 |
Ours: CW | 80.6 | 65.2 | 50.9 | 39.1 | 28.7 | 58.8 | 127.2 | 22.1 |