Table A4. Multi-dataset classifiers, no Q.
Training data | Test data | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
s | xl | s-k | xl-k | GPT3 | Grover | GPT2-un | GPT2-k | GPT2 | All | |
Accuracies | ||||||||||
GPT2-un | 0.890 | 0.771 | 0.466 | 0.451 | 0.458 | 0.537 | 0.830 | 0.457 | 0.645 | 0.600 |
GPT2-k | 0.470 | 0.469 | 0.905 | 0.834 | 0.622 | 0.650 | 0.471 | 0.869 | 0.670 | 0.653 |
GPT2 | 0.846 | 0.718 | 0.862 | 0.784 | 0.580 | 0.598 | 0.781 | 0.823 | 0.805 | 0.744 |
All | 0.855 | 0.721 | 0.867 | 0.780 | 0.714 | 0.688 | 0.785 | 0.825 | 0.808 | 0.770 |
AUC | ||||||||||
GPT2-un | 0.962 | 0.859 | 0.291 | 0.271 | 0.444 | 0.450 | 0.909 | 0.277 | 0.594 | 0.558 |
GPT2-k | 0.197 | 0.293 | 0.968 | 0.917 | 0.757 | 0.703 | 0.245 | 0.942 | 0.594 | 0.628 |
GPT2 | 0.934 | 0.803 | 0.942 | 0.864 | 0.681 | 0.599 | 0.867 | 0.901 | 0.887 | 0.818 |
All | 0.938 | 0.808 | 0.942 | 0.856 | 0.755 | 0.746 | 0.871 | 0.898 | 0.888 | 0.856 |