Skip to main content
. 2021 Apr 6;7:e443. doi: 10.7717/peerj-cs.443

Table 5. Multi-dataset classifiers.

Instances where training and test data belong to the same language model are highlighted (bold).

Training data Test data
s xl s-k xl-k GPT3 Grover GPT2-un GPT2-k GPT2 All
Accuracies
GPT2-un 0.827 0.726 0.508 0.497 0.473 0.458 0.817 0.500 0.636 0.602
GPT2-k 0.323 0.430 0.921 0.839 0.515 0.602 0.381 0.871 0.607 0.616
GPT2 0.767 0.726 0.866 0.682 0.512 0.590 0.773 0.777 0.785 0.725
All 0.809 0.690 0.880 0.772 0.510 0.643 0.760 0.824 0.782 0.755
AUC
GPT2-un 0.940 0.834 0.410 0.398 0.470 0.517 0.897 0.401 0.590 0.560
GPT2-k 0.216 0.320 0.969 0.920 0.530 0.512 0.273 0.942 0.592 0.625
GPT2 0.932 0.800 0.940 0.829 0.566 0.593 0.877 0.881 0.865 0.787
All 0.907 0.754 0.940 0.863 0.586 0.685 0.837 0.900 0.859 0.824