Skip to main content
. 2022 Sep 12;17(9):e0274300. doi: 10.1371/journal.pone.0274300

Table 1. Comparisons among our models with other published baselines.

Models Batch size Detecting modality Loss Optimizer Learning rate Acc., AUROC, Acc., AUROC,
dev set (n = 500)* dev set (n = 500) * test set (n = 1000) * test set (n = 1000) *
Image-Grid 32 Image Cross entropy AdamW 1.00E-05 0.500±0.045
(0.436–0.536)
0.516±0.027
(0.478–0.543)
0.511±0.023
(0.478–0.526)
0.514±0.018
(0.498–0.530)
Image-Region 32 Image Cross entropy AdamW 5.00E-05 0.513±0.032
(0.484–0.548)
0.549±0.030
(0.508–0.579)
0.531±0.023
(0.502–0.558)
0.561±0.039
(0.526–0.617)
Text BERT 64 Text Cross entropy AdamW 5.00E-05 0.569±0.020
(0.548–0.588)
0.625±0.047
(0.579–0.669)
0.586±0.024
(0.556–0.612)
0.639±0.006
(0.633–0.645)
Late Fusion 32 Image&Text Cross entropy AdamW 5.00E-05 0.589±0.031
(0.544–0.612)
0.641±0.040
(0.613–0.700)
0.619±0.011
(0.608–0.630)
0.679±0.018
(0.665–0.705)
ConcatBERT 32 Image&Text Cross entropy AdamW 1.00E-05 0.576±0.038
(0.540–0.616)
0.645±0.012
(0.629–0.655)
0.622±0.023
(0.588–0.636)
0.682±0.017
(0.659–0.696)
MMBT-Grid 32 Image&Text Cross entropy AdamW 1.00E-05 0.603±0.042
(0.544–0.644)
0.672±0.018
(0.654–0.696)
0.631±0.014
(0.616–0.650)
0.694±0.006
(0.687–0.700)
MMBT-Region 32 Image&Text Cross entropy AdamW 5.00E-05 0.605±0.059
(0.524–0.652)
0.649±0.067
(0.585–0.722)
0.642±0.032
(0.608–0.672)
0.690±0.046
(0.646–0.735)
ViLBERT 32 Image&Text Cross entropy AdamW 1.00E-05 0.633±0.020
(0.612–0.656)
0.717±0.035
(0.677–0.747)
0.659±0.007
(0.652–0.668)
0.732±0.015
(0.716–0.753)
Visual BERT 32 Image&Text Cross entropy AdamW 5.00E-05 0.638±0.023
(0.612–0.668)
0.722±0.010
(0.711–0.732)
0.664±0.013
(0.656–0.684)
0.748±0.011
(0.732–0.757)
ViLBERT CC 32 Image&Text Cross entropy AdamW 1.00E-05 0.656±0.009
(0.648–0.668)
0.730±0.035
(0.691–0.773)
0.664±0.009
(0.652–0.674)
0.739±0.016
(0.724–0.757)
Visual BERT COCO 32 Image&Text Cross entropy AdamW 5.00E-05 0.648±0.032
(0.608–0.676)
0.732±0.017
(0.711–0.752)
0.664±0.020
(0.646–0.692)
0.737±0.025
(0.711–0.770)
OSCAR+FC 50 Image&Tag&Text Cross entropy AdamW 5.00E-06 0.666±0.038
(0.626–0.706)
0.758±0.042
(0.703–0.803)
0.677±0.010
(0.664–0.689)
0.762±0.016
(0.749–0.786)
OSCAR+RF 50 Image&Tag&Text Cross entropy AdamW 5.00E-06 0.667±0.034
(0.618–0.698)
0.759±0.014
(0.745–0.777)
0.684±0.002
(0.682–0.686)
0.768±0.021
(0.737–0.784)

Footnotes: Acc., accuracy; AUROC, area under the receiver operating characteristic.

*Mean±standard error with the range was calculated from evaluations of four final models.