Skip to main content
. 2024 Jul 18;60(2):121–133. doi: 10.1111/jre.13323

TABLE 2.

(Top) Descriptive statistics of periodontal residents' performance divided by year of training. (Middle) Results of sub‐analyses. Section analysis. Performance of each large language model divided by exam sections and respective p value comparing the difference in performances. (Bottom) Results of performance analysis of artificial intelligence models on the most difficult periodontal in‐service exam questions.

Exam Year
2020 (368) 2021 (337) 2022 (305) 2023 (302) 2020–2023
PGY‐1 Residents
N 174 (33.14%) 158 (33.05%) 182 (32.21%) 192 (36.16%) 706
Avg Score

214.86 ± 29.94

(58.39% ± 29.94)

216.04 ± 33.43

(64.06% ± 33.43)

203.13 ± 28.76

(66.57% ± 28.76)

193.18 ± 34.56

(63.92% ± 34.56)

206.43 ± 32.84

(63.48% ± 31.67)

PGY‐2 Residents
N 182 (34.67%) 170 (35.56%) 203 (35.92%) 198 (37.29%) 753
Avg Score

222.58 ± 30.17

(60.53% ± 30.17)

229.51 ± 33.04

(68.02% ± 33.04)

216.57 ± 28.97

(71.00% ± 28.97)

206.82 ± 33.27

(68.45% ± 33.27)

218.86 ± 31.11

(66.25% ± 31.61)

PGY‐3 Residents
N 169 (32.19%) 150 (31.38%) 180 (31.85%) 141 (26.55%) 640
Avg Score

229.07 ± 28.75

(62.22% ± 28.75)

232.61 ± 35.31

(68.99% ± 35.31)

219.29 ± 28.81

(71.84% ± 28.81)

223.91 ± 28.95

(74.18% ± 28.95)

224.32 ± 30.32

(69.06% ± 30.45)

All Residents
N 525 479 565 531 2375
Avg Score

222.11 ± 30.14

(60.35% ± 30.14)

226.03 ± 34.57

(67.04% ± 34.57)

213.11 ± 29.63

(69.81% ± 29.63)

206.43 ± 34.76

(68.35% ± 34.76)

214.62 ± 32.71

(66.39% ± 32.77)

Google Gemini 260 (70.65) 247 (73.29) 231 (75.73) 218 (72.18) 956 (72.86%)
GPT‐3.5 230 (62.5) 230 (68.24) 213 (69.83) 179 (59.27) 852 (64.93%)
GPT‐4 (290) 78.80% (266) 78.93% (247) 80.98% (241)79.80 1044 (79.57%)
GPT‐3.5 vs. Bard <0.01 b 0.15 0.1 <0.001 b <0.001 b
GPT4 vs. Bard <0.001 <0.001 <0.001 <0.001 <0.001
GPT4 vs. GPT‐3.5 <0.001 <0.001 <0.001 <0.001 <0.001
Section GPT‐4 score Gemini score GPT‐3.5 score p Value
GPT4 vs. Gemini GPT‐4 vs. GPT‐3.5 GPT‐3.5 vs. Gemini
Embryology and Anatomy 80 (83.33%) 77 (80.2%) 67 (69.8%) .02 <.01 .09
Biostatistics, Experimental Design, and data analysis 14 (93.33%) 11 (73.3%) 13 (86.7%) .58 .26 .37
Biochemistry‐Physiology 114 (93.44%) 103 (84.4%) 104 (85.2%) <.001 <.001 .85
Microbiology and Immunology 101 (88.59%) 92 (80.7%) 94 (82.5%) <.001 <.001 .73
Periodontal Etiology and Pathology 109 (78.41%) 97 (69.8%) 83 (59.7%) <.001 <.001 .07
Pharmacology and Therapeutics 131 (91.60%) 123 (86.0%) 118 (82.5%) <.001 <.001 .41
Diagnosis 77 (70%) b 66 (60.0%) 61 (55.5%) .32 .01 .49
Treatment Planning and Prognosis 79 (70.53%) 69 (61.6%) 42 (37.5%) <.001 .03 <.001
Therapy 206 (69.12%) 184 (61.7%) 140 (47.0%) <.001 <.001 <.001
Oral Pathology/Oral Medicine 148 (90.79%) 134 (82.2%) 130 (79.8%) <.001 <.001 .57
Difficult Questions
Model Correct Total Percentage p‐Value
GPT‐4 80 127 62.99% .02 a , .09 b
GPT‐3.5 69 127 54.33% .02 a , .70 c
Gemini 73 127 57.48% .09 b , .70 c
Residents 52 127 40.52%

Abbreviations: Avg, Average; N, Number of residents who participated in the exam. Bold values indicate a statistically significant p value.

a

p Value for chi‐square test, comparing GPT‐4 versus GPT‐3.5.

b

p Value for chi‐square test, comparing GPT‐4 versus Bard.

c

p Value for chi‐square test, comparing GPT‐3.5 versus Bard.