Skip to main content
. 2021 Apr 6;7:e443. doi: 10.7717/peerj-cs.443

Table 3. Single-dataset classifiers.

Accuracy scores of the classifiers evaluated on generations from the different language models. Along the diagonal (bold), training and test data belong to the same language model.

Training data Test data
s xl s-k xl-k GPT3 Grover
Acc. AUC Acc. AUC Acc. AUC Acc. AUC Acc. AUC Acc. AUC
s 0.897 0.964 0.728 0.838 0.487 0.302 0.471 0.290 0.475 0.474 0.479 0.454
xl 0.740 0.937 0.759 0.836 0.504 0.434 0.489 0.382 0.468 0.423 0.516 0.485
s-k 0.338 0.247 0.445 0.328 0.927 0.975 0.808 0.924 0.537 0.769 0.502 0.671
xl-k 0.292 0.223 0.382 0.322 0.908 0.967 0.858 0.932 0.535 0.545 0.503 0.514
GPT3 0.436 0.234 0.452 0.316 0.736 0.821 0.658 0.749 0.779 0.859 0.589 0.654
Grover 0.333 0.285 0.439 0.422 0.662 0.785 0.643 0.738 0.537 0.552 0.692 0.767