. Author manuscript; available in PMC: 2024 Nov 1.

Published in final edited form as: J Biomed Inform. 2023 Sep 29;147:104507. doi: 10.1016/j.jbi.2023.104507

Table 3.

Performance of TGD identification algorithms on Dataset I (development set)

		F1	Sensitivity	Specificity	Precision	NPV	Accuracy	AUROC (95% CI)	P values^†	AUPRC (95% CI)
Rule-based	Exact Match	0.586	0.980	0.962	0.730	0.728	0.796	–	–	–
	Augmented Match	0.857	0.883	0.882	0.858	0.869	0.870	–	–	–
	Guo et al. (single)^1*	0.816	0.723	0.952	0.936	0.777	0.838	–	–	–
	Guo et al. (combined)^2*	0.843	0.766	0.951	0.939	0.804	0.859	–	–	–

Machine Learning	Random Forest	0.892	0.832	0.976	0.972	0.860	0.904	0.903 (0.879, 0.926)	0.002	0.942 (0.925, 0959)
	Support Vector Machine	0.886	0.808	0.993	0.991	0.844	0.900	0.901 (0.877, 0.924)	<0.001	0.944 (0.930, 0.955)
	Linear Regression	0.882	0.799	0.994	0.991	0.837	0.896	0.898 (0.874, 0.921)	<0.001	0.947 (0.932, 0.959)
	XGBoost	0.892	0.828	0.978	0.975	0.858	0.903	0.900 (0.876, 0.923)	0.002	0.946 (0.927, 0.963)
Deep Learning	ClinicalBERT_TGD	0.917	0.854	0.983	0.980	0.865	0.912	0.923 (0.902, 0.945)	–	0.958 (0.945, 0.973)

^†

P values were calculated to compare the AUROC between ClinicalBERT_TGD and other machine learning baselines using the two-sided DeLong test.

Best single-rule algorithm was based on ≥2 diagnosis codes and ≥1 keyword(s)

Best combined rule was either gender field indicates transgender or ≥1 diagnosis code(s) plus ≥1 TGD keyword(s)

Codes and keywords can be found in the paper by Guo et al. [17].