Skip to main content
. 2021 Feb 26;479(7):1598–1612. doi: 10.1097/CORR.0000000000001685

Table 7.

Testing dataset on Genant fracture grading in the lumbar spine to compare AI performance with human labels in accuracy and sensitivity with bootstrapping method

Degree of lumbar fractures Accuracy Sensitivity
Mean, % 95% CI, % p value Mean, % 95% CI, % p value
Grade 1 84 83.55-84.23 < 0.001 78 77.39-78.53 < 0.001
Grade 2 95 94.74-95.14 < 0.001 99 98.41-100.0 < 0.001
Grade 3 94 93.50-93.97 < 0.001 97 97.12-97.57 < 0.001

A total of 141 fractured lumbar vertebrae were included in the test dataset; the fractured lumbar vertebrae included Grade 1 (n = 50), Grade 2 (n = 54), and Grade 3 (n = 37) fractures.