Table 7.
The performance comparison with the state-of-the-art methods
| Method | Parameters (million) | GT Proposals | Learned Proposals | ||||
|---|---|---|---|---|---|---|---|
| B@3 | B@4 | M | B@3 | B@4 | M | ||
| Rahman et al. [24] | – | 3.04 | 1.46 | 7.23 | 1.85 | 0.90 | 4.93 |
| Iashin et al. [20] | – | 4.12 | 1.81 | 10.09 | 2.31 | 0.92 | 6.80 |
| iPerceive DVC [41] | – | 5.23 | 2.34 | 11.77 | 2.59 | 1.07 | 7.29 |
| BMT [43] | 54.92 | 4.63 | 1.99 | 10.90 | 3.84 | 1.88 | 8.44 |
| Iashin et al. [20] | 149.7 | 5.83 | 2.86 | 11.72 | 2.60 | 1.07 | 7.31 |
| iPerceive DVC [41] | 158.37 | 6.13 | 2.98 | 12.27 | 2.93 | 1.29 | 7.87 |
| Lu et al. [44] | – | 6.04 | 2.78 | 11.79 | 3.01 | 1.31 | 7.34 |
| CM | 4.69 | 2.19 | 11.08 | 3.98 | 1.84 | 8.93 | |
| CMCR | 67.34 | ||||||
“”Single visual modal data is used
“”Cross-modal data is used. The best results are highlighted