Table 5. Performance metrics for deep learning models in predicting low vision prognosis.

Bolded values reflect the best value for that performance metric. Values in parentheses are 95% confidence intervals. Threshold values reflect the threshold that was chosen to generate the F1 scores, precision, and recall. PPV = Positive Predictive Value. NPV = Negative Predictive Value

Value (95% Confidence Interval)	AUROC	AUPRC	F1	Sensitivity (Recall)	Specificity	PPV (Precision)	NPV	Accuracy	Threshold
(A) Structured Model	0.80 (0.75–0.85)	0.73 (0.64–0.81)	0.70 (0.61–0.74)	0.73 (0.62–0.78)	0.76 (0.68–0.81)	0.68 (0.57–0.74)	0.80 (0.72–0.84)	0.73 (0.68–0.78)	0.40
(B) CNN Word Embedding Text Model	0.82 (0.77–0.87)	0.78 (0.70–0.84)	0.67 (0.60–0.74)	0.67 (0.60–0.76)	0.77 (0.70–0.83)	0.67 (0.59–0.75)	0.77 (0.71–0.83)	0.74 (0.68–0.78)	0.45
(C) CUI One-Hot Text Model	0.71 (0.65–0.76)	0.62 (0.53–0.70)	0.64 (0.57–0.70)	0.79 (0.71–0.86)	0.52 (0.45–0.59)	0.53 (0.46–0.60)	0.78 (0.70–0.85)	0.66 (0.57–0.68)	0.35
(D) CUI Cui2vec Text Model	0.66 (0.59–0.72)	0.52 (0.43–0.62)	0.58 (0.51–0.64)	0.80 (0.72–0.87)	0.42 (0.35–0.49)	0.45 (0.38–0.52)	0.77 (0.69–0.85)	0.65 (0.50–0.62)	0.20
(E) A+B Combined Model	0.82 (0.76–0.87)	0.79 (0.72–0.85)	0.69 (0.63–0.75)	0.79 (0.71–0.86)	0.66 (0.58–0.72)	0.61 (0.54–0.69)	0.82 (0.75–0.88)	0.74 (0.66–0.76)	0.35
(F) A+C Combined Model	0.79 (0.73–0.83)	0.73 (0.64–0.80)	0.63 (0.56–0.70)	0.64 (0.56–0.73)	0.73 (0.67–0.80)	0.63 (0.54–0.71)	0.75 (0.68–0.81)	0.73 (0.64–0.75)	0.40
(G) A+D Combined Model	0.79 (0.73–0.84)	0.75 (0.68–0.82)	0.67 (0.59–0.74)	0.66 (0.57–0.75)	0.80 (0.74–0.86)	0.67 (0.58–0.76)	0.80 (0.74–0.85)	0.77 (0.70–0.80)	0.45
(H)A+D + FC Word Embedding Model	0.79 (0.73–0.84)	0.75 (0.67–0.82)	0.66 (0.59–0.73)	0.66 (0.57–0.75)	0.79 (0.73–0.85)	0.66 (0.57–0.74)	0.80 (0.74–0.85)	0.76 (0.69–0.79)	0.45