. 2023 Feb 24:1–18. Online ahead of print. doi: 10.1007/s40747-023-00998-5

Table 7.

The performance comparison with the state-of-the-art methods

Method	Parameters (million)	GT Proposals			Learned Proposals
Method	Parameters (million)	B@3	B@4	M	B@3	B@4	M
Rahman et al. [24]	–	3.04	1.46	7.23	1.85	0.90	4.93
Iashin et al. [20] $*$	–	4.12	1.81	10.09	2.31	0.92	6.80
iPerceive DVC [41] $*$	–	5.23	2.34	11.77	2.59	1.07	7.29
BMT [43]	54.92	4.63	1.99	10.90	3.84	1.88	8.44
Iashin et al. [20] $^{†}$	149.7	5.83	2.86	11.72	2.60	1.07	7.31
iPerceive DVC [41] $^{†}$	158.37	6.13	2.98	12.27	2.93	1.29	7.87
Lu et al. [44] $*$	–	6.04	2.78	11.79	3.01	1.31	7.34
CM	$53.55$	4.69	2.19	11.08	3.98	1.84	8.93
CMCR	67.34	$6.78$	$3.13$	$12.98$	$4.27$	$2.06$	$10.09$

“ $*$ ”Single visual modal data is used

“ $^{†}$ ”Cross-modal data is used. The best results are highlighted