Table 3.
F1 | Sensitivity | Specificity | Precision | NPV | Accuracy | AUROC (95% CI) |
P values† | AUPRC (95% CI) |
||
---|---|---|---|---|---|---|---|---|---|---|
Rule-based | Exact Match | 0.586 | 0.980 | 0.962 | 0.730 | 0.728 | 0.796 | – | – | – |
Augmented Match | 0.857 | 0.883 | 0.882 | 0.858 | 0.869 | 0.870 | – | – | – | |
Guo et al. (single)1* | 0.816 | 0.723 | 0.952 | 0.936 | 0.777 | 0.838 | – | – | – | |
Guo et al. (combined)2* | 0.843 | 0.766 | 0.951 | 0.939 | 0.804 | 0.859 | – | – | – | |
| ||||||||||
Machine Learning | Random Forest | 0.892 | 0.832 | 0.976 | 0.972 | 0.860 | 0.904 | 0.903 (0.879, 0.926) |
0.002 | 0.942 (0.925, 0959) |
Support Vector Machine | 0.886 | 0.808 | 0.993 | 0.991 | 0.844 | 0.900 | 0.901 (0.877, 0.924) |
<0.001 | 0.944 (0.930, 0.955) |
|
Linear Regression | 0.882 | 0.799 | 0.994 | 0.991 | 0.837 | 0.896 | 0.898 (0.874, 0.921) |
<0.001 | 0.947 (0.932, 0.959) |
|
XGBoost | 0.892 | 0.828 | 0.978 | 0.975 | 0.858 | 0.903 | 0.900 (0.876, 0.923) |
0.002 | 0.946 (0.927, 0.963) |
|
Deep Learning | ClinicalBERT_TGD | 0.917 | 0.854 | 0.983 | 0.980 | 0.865 | 0.912 |
0.923
(0.902, 0.945) |
– |
0.958
(0.945, 0.973) |
P values were calculated to compare the AUROC between ClinicalBERT_TGD and other machine learning baselines using the two-sided DeLong test.
Best single-rule algorithm was based on ≥2 diagnosis codes and ≥1 keyword(s)
Best combined rule was either gender field indicates transgender or ≥1 diagnosis code(s) plus ≥1 TGD keyword(s)
Codes and keywords can be found in the paper by Guo et al. [17].