Table 4.
Fleiss’ K (95% CI) |
Between Four Human Raters | Human Raters and ChatGPT3.5 | Human Raters and ChatGPT4.0 | ||||||
---|---|---|---|---|---|---|---|---|---|
All Cases | Text | No Text | All Cases | Text | No Text | All Cases | Text | No Text | |
Overall | .646 | .577 | .702 | .320 | .272 | .355 | .523 | .482 | .546 |
(.610–.682) | (.522–.631) | (.654–.750) | (.294–.346) | (.233–.310) | (.320–.391) | (.496–.551) | (.441–.524) | (.508–.583) | |
Level 1 (resuscitation) |
.696 | .488 | .828 | .182 | .067 | .294 | .565 | .322 | .750 |
(.639–.752) | (.409–.568) | (.748–.908) | (.138–.226) | (.006–.129) | (.232–.356) | (.522–.609) | (.261–.384) | (.688–.812) | |
Level 2 (emergent) |
.710 | .671 | .743 | .281 | .282 | .256 | .600 | .565 | .610 |
(.654–.767) | (.591–.750) | (.663–.823) | (.238–.325) | (.221–.343) | (.194–.318) | (.557–.644) | (.503–.626) | (.548–.672) | |
Level 3 (urgent) |
.593 | .539 | .649 | .359 | .333 | .386 | .443 | .440 | .446 |
(.537–.650) | (.459–.618) | (.569–.729) | (.316–403) | (.272–.394) | (.324–.448) | (.400–.487) | (.378–.501) | (.384–.508) | |
Level 4 (less urgent) |
.616 | .505 | .685 | .429 | .358 | .467 | .592 | .468 | .559 |
(.560–.673) | (.426–.584) | (.605–.765) | (.385–.473) | (.296–.419) | (.405–.529) | (.485–.573) | (.406–.529) | (.497–.621) | |
Level 5 (non-urgent) |
.660 | .330 | .708 | .492 | .247 | .526 | .464 | .247 | .481 |
(.604–.717) | (.251–.409) | (.628–.788) | (.449–.536) | (.186–.308) | (.464–.588) | (.421–.508) | (.186–.308) | (.419–.543) |