Skip to main content

View full-text article in PMC

. 2024 Apr 30;10:e1999. doi: 10.7717/peerj-cs.1999

Table 6. Performance of models.

Krippendorff’s alpha (α) performance of models averaged over datasets and prompts, best results in bold. N total = 11,880.

Model	α(CI) n per model = 1980
GPT-4	.78 (.76, .81)
GPT-3.5-turbo	.62 (.59, .65)
Davinci-003	.47 (.45, .50)
Flan-T5-XXL	.45 (.42, .47)
Davinci-002	.41 (.38, .44)
Command-XL	.32 (.29, .35)