Skip to main content

View full-text article in PMC

. 2021 Apr 6;7:e443. doi: 10.7717/peerj-cs.443

Table 5. Multi-dataset classifiers.

Instances where training and test data belong to the same language model are highlighted (bold).

Training data	Test data
	s	xl	s-k	xl-k	GPT3	Grover	GPT2-un	GPT2-k	GPT2	All
	Accuracies
GPT2-un	0.827	0.726	0.508	0.497	0.473	0.458	0.817	0.500	0.636	0.602
GPT2-k	0.323	0.430	0.921	0.839	0.515	0.602	0.381	0.871	0.607	0.616
GPT2	0.767	0.726	0.866	0.682	0.512	0.590	0.773	0.777	0.785	0.725
All	0.809	0.690	0.880	0.772	0.510	0.643	0.760	0.824	0.782	0.755
	AUC
GPT2-un	0.940	0.834	0.410	0.398	0.470	0.517	0.897	0.401	0.590	0.560
GPT2-k	0.216	0.320	0.969	0.920	0.530	0.512	0.273	0.942	0.592	0.625
GPT2	0.932	0.800	0.940	0.829	0.566	0.593	0.877	0.881	0.865	0.787
All	0.907	0.754	0.940	0.863	0.586	0.685	0.837	0.900	0.859	0.824