Skip to main content
. 2025 Aug 21;13:e72938. doi: 10.2196/72938

Table 1. Patient characteristics for the most important features in the machine learning models, according to the validated dataset (n=27,561). The validated dataset represents a dataset where both the EHRsa andCTDsb agree.

Characteristics Values
Age (years), mean (SD) 32 (5)
BMI (kg/m2), mean (SD) 26.2 (5.3)
Systolic blood pressure (mm Hg), mean (SD) 111 (11)
Diastolic blood pressure (mm Hg), mean (SD) 67 (8)
Parity, mean (SD) 0.9 (1.1)
Ethnic origin, n (%)
 Caucasian 24,180 (87.8)
 South East Asian 1360 (4.9)
 Black African 554 (2.0)
 Asian 489 (1.8)
 Middle Eastern 154 (0.6)
 Latin American 26 (0.1)
 Mixed 10 (0.1)
 Other 788 (3)
Occupation skill level (ISCOc), n (%)
 Level 0 (unemployed) 5254 (19)
 Level 1 (elementary occupations) 369 (1.3)
 Level 2 (clerical and service) 4404 (15.9)
 Level 3 (technicians and associates) 2389 (8.6)
 Level 4 (professionals and managers) 15,145 (55.1)
Family history of diabetes mellitus, n (%) 6407 (23.3)
History of GDMd, n (%) 1078 (3.9)
Other endocrine problems, n (%) 5854 (21.4)
Prevalence of GDM, n (%) 3188 (11.7)
a

EHR: electronic health record.

b

CTD: clinical team database.

c

ISCO: International Standard Classification of Occupations.

d

GDM: gestational diabetes mellitus.