Table 1.
Comparison of the distribution of scores between the two experts for all ChatGPT responses collected (n = 100)
| Expert 2 | Expert 1 | ||||
|---|---|---|---|---|---|
| 1 point | 2 points | 3 points | 4 points | 5 points | |
| 1 point | 0 | 0 | 0 | 0 | 0 |
| 2 points | 0 | 0 | 0 | 0 | 0 |
| 3 points | 0 | 5 | 18 | 14 | 9 |
| 4 points | 0 | 0 | 3 | 6 | 15 |
| 5 points | 0 | 0 | 1 | 12 | 17 |
Scores were defined as follows: 1 point was given for “Irrelevant response/no response”, 2 points were given for “Relevant response with major inaccuracies and potential for harm”, 3 points were given for “Relevant response with major inaccuracies and no potential for harm”, 4 points were given for “Relevant response with minor inaccuracies and no potential for harm”, and 5 points were given for “Relevant response without any inaccuracies”