Skip to main content
. Author manuscript; available in PMC: 2023 Jun 10.
Published in final edited form as: Radiology. 2019 Jul 9;292(3):695–701. doi: 10.1148/radiol.2019181343

Table 2:

Comparison of the deep learning algorithm, consensus of three ACR TI-RADS committee expert readers, and nine radiologists on the test set of 99 nodules. FNA = fine-needle aspiration, ROC AUC = area under the receiver operating characteristic curve, SD = standard deviation.

Reader FNA
Follow-up
ROC AUC Years of experience
Sensitivity Specificity Sensitivity Specificity

Deep learning 13/15 (87%) 44/84 (52%) 14/15 (93%) 32/84 (38%) 0.87
Expert consensus 13/15 (87%) 43/84 (51%) 15/15 (100%) 34/84 (40%) 0.91 26–32
Reader 1 14/15 (93%) 40/84 (48%) 15/15 (100%) 28/84 (33%) 0.91 20–25
Reader 2 13/15 (87%) 24/84 (29%) 15/15 (100%) 14/84 (17%) 0.76 20
Reader 3 12/15 (80%) 40/84 (48%) 15/15 (100%) 27/84 (32%) 0.85 13
Reader 4 12/15 (80%) 40/84 (48%) 15/15 (100%) 28/84 (33%) 0.83 13
Reader 5 11/15 (73%) 49/84 (57%) 14/15 (93%) 34/84 (40%) 0.78 3
Reader 6 11/15 (73%) 59/84 (70%) 13/15 (87%) 51/84 (61%) 0.85 32
Reader 7 12/15 (80%) 42/84 (50%) 15/15 (100%) 33/84 (39%) 0.81 4
Reader 8 13/15 (87%) 32/84 (38%) 14/15 (93%) 19/84 (23%) 0.79 32
Reader 9 14/15 (93%) 37/84 (44%) 15/15 (100%) 26/84 (31%) 0.83 20
Readers 1–9 0.82
83% (7.5%) 48% (11.7%) 97% (4.8%) 34% (12.4%) 17 (10)
mean (SD) (0.05)