Skip to main content
. 2021 Apr 6;7:e443. doi: 10.7717/peerj-cs.443

Table A4. Multi-dataset classifiers, no Q.

Instances where training and test data belong to the same language model are highlighted (bold).

Training data Test data
s xl s-k xl-k GPT3 Grover GPT2-un GPT2-k GPT2 All
Accuracies
GPT2-un 0.890 0.771 0.466 0.451 0.458 0.537 0.830 0.457 0.645 0.600
GPT2-k 0.470 0.469 0.905 0.834 0.622 0.650 0.471 0.869 0.670 0.653
GPT2 0.846 0.718 0.862 0.784 0.580 0.598 0.781 0.823 0.805 0.744
All 0.855 0.721 0.867 0.780 0.714 0.688 0.785 0.825 0.808 0.770
AUC
GPT2-un 0.962 0.859 0.291 0.271 0.444 0.450 0.909 0.277 0.594 0.558
GPT2-k 0.197 0.293 0.968 0.917 0.757 0.703 0.245 0.942 0.594 0.628
GPT2 0.934 0.803 0.942 0.864 0.681 0.599 0.867 0.901 0.887 0.818
All 0.938 0.808 0.942 0.856 0.755 0.746 0.871 0.898 0.888 0.856