Skip to main content

View full-text article in PMC

. 2021 Apr 6;7:e443. doi: 10.7717/peerj-cs.443

Table A4. Multi-dataset classifiers, no Q.

Instances where training and test data belong to the same language model are highlighted (bold).

Training data	Test data
	s	xl	s-k	xl-k	GPT3	Grover	GPT2-un	GPT2-k	GPT2	All
	Accuracies
GPT2-un	0.890	0.771	0.466	0.451	0.458	0.537	0.830	0.457	0.645	0.600
GPT2-k	0.470	0.469	0.905	0.834	0.622	0.650	0.471	0.869	0.670	0.653
GPT2	0.846	0.718	0.862	0.784	0.580	0.598	0.781	0.823	0.805	0.744
All	0.855	0.721	0.867	0.780	0.714	0.688	0.785	0.825	0.808	0.770
	AUC
GPT2-un	0.962	0.859	0.291	0.271	0.444	0.450	0.909	0.277	0.594	0.558
GPT2-k	0.197	0.293	0.968	0.917	0.757	0.703	0.245	0.942	0.594	0.628
GPT2	0.934	0.803	0.942	0.864	0.681	0.599	0.867	0.901	0.887	0.818
All	0.938	0.808	0.942	0.856	0.755	0.746	0.871	0.898	0.888	0.856