Skip to main content
. Author manuscript; available in PMC: 2024 Nov 1.
Published in final edited form as: J Biomed Inform. 2023 Sep 29;147:104507. doi: 10.1016/j.jbi.2023.104507

Table 2.

Summary of Datasets I and II for model development and evaluation

Dataset I
(N = 3150)
Dataset II
(N = 200)

Clinician-confirmed TGD patients
(N=1575)
n (%)
Non-TGD patients filtered by keyword search
(N=1575)
n (%)
TGD patients by chart review

(N=180)
n (%)
Non-TGD patients by chart review
(N=20)
n (%)
Age, mean (SD) year 35.94 (16.04) 60.92 (18.0) 34.52 (15.48) 57.85 (20.27)

Race, n (%)
 Asian 77 (4.89) 37 (2.35) 8 (4.44) 1 (5.0)
 Black 116 (7.37) 84 (5.33) 12 (6.67) 2 (10.0)
 More than one race 50 (2.54) 6 (0.38) 6 (3.33) 0 (0.0)
 Other 177 (11.24) 116 (7.37) 24 (13.33) 2 (10.0)
 White 1155 (73.33) 1332 (84.57) 130 (72.22) 15 (75.0)

Ethnicity
 Hispanic 22 (1.40) 41 (2.60) 9 (5.0) 1 (5.0)
 Non-Hispanic 1351 (85.78) 1321 (83.87) 146 (81.11) 15 (75.0)
 Other 415 (12.83) 213 (13.52) 25 (13.89) 4 (20.0)

Patients with keywords, n (%)
 Diagnoses 957 (60.76) 0 103 (57.22) 0
 Procedures 422 (26.8) 0 10 (5.56) 3 (15.0)
 Clinical notes 1402 (89.02) 0 172 (95.56) 17 (85.0)

Patients with missing gender fields, n (%) 884 (56.13) 691 (43.87%) 84 (46.67) 15 (75.0)