Skip to main content
. 2021 Jul 22;34(23):20449–20461. doi: 10.1007/s00521-021-06276-0

Table 9.

Resulted metrics for testing the models based on collection 3 (’text’) for the label fake (the comparison between architectures, ’bert-large-uncased_TDE’, ’bert-large-uncased-whole-word-masking_TDE’, ’roberta-large_TDE’, ’roberta-large-openai-detector_TDE’)

Metric Bert-large Bert-large-uncased Roberta Roberta-large-
-Uncased_TDE’ -Whole-word -Large_TDE’ Openai-
-Masking_TDE’ Detector_TDE’
True positive (TP) 1007 1005 1008 1011
True negative (TN) 1108 1105 1109 1104
False positive (FP) 12 15 11 16
False negative (FN) 13 15 12 9
Precision 0.9882 0.9853 0.9892 0.9844
Recall 0.9864 0.9853 0.9882 0.9912
f1-score 0.9877 0.9853 0.9887 0.9878
Accuracy 98.83% 98.60% 98.92% 98.83%