. Author manuscript; available in PMC: 2024 Nov 1.

Published in final edited form as: J Biomed Inform. 2023 Sep 29;147:104507. doi: 10.1016/j.jbi.2023.104507

Table 4.

Sub-analysis of patients with missing structured sex and gender demographics in Dataset I

		F1	Sensitivity	Specificity	Precision	NPV	Accuracy	AUROC (95% CI)	P values^†	AUPRC (95% CI)
Rule-based	Exact Match	0.254	0.983	0.852	0.770	0.391	0.777	–	–	–
	Augmented Match	0.658	0.908	0.756	0.860	0.703	0.833	–	–	–
	Guo et al. (single)	0.766	0.674	0.951	0.887	0.837	0.851	–	–	–
	Guo et al. (combined)	0.788	0.706	0.951	0.892	0.850	0.862	–	–	–

Machine Learning	Random Forest	0.901	0.837	0.957	0.977	0.728	0.874	0.896 (0.868, 0.925)	<0.001	0.964 (0.952, 0.974)
	Support Vector Machine	0.900	0.827	0.979	0.988	0.721	0.874	0.905 (0.880, 0.928)	<0.001	0.970 (0.959, 0.977)
	Linear Regression	0.889	0.811	0.971	0.984	0.701	0.861	0.890 (0.864, 0.917)	<0.001	0.962 (0.951, 0.972)
	XGBoost	0.901	0.837	0.957	0.977	0.728	0.874	0.897 (0.870, 0.922)	<0.001	0.964 (0.948, 0.977)

Deep Learning	ClinicalBERT_TGD	0.923	0.906	0.975	0.940	0.960	0.954	0.944 (0.913, 0.967)	–	0.941 (0.912, 0.965)

^†

P values were calculated to compare the AUROC between ClinicalBERT_TGD and other machine learning baselines using the two-sided DeLong test.

Best single-rule algorithm was based on ≥2 diagnosis codes and ≥1 keyword(s)

Best combined rule was either gender field indicates transgender or ≥1 diagnosis code(s) plus ≥1 TGD keyword(s)