Skip to main content
. 2021 May 20;4:86. doi: 10.1038/s41746-021-00455-y

Table 2.

Descriptive analysis of the cohorts.

Characteristic Pretraining DHF-Cerner PaCa-Cerner PaCa-Truven
Cohort size (n) 28,490,650 672,647 29,405 42,721
Percent of patients with the eventa 15% 14% 0.07% 0.06%
Average age on last/index encounter (std) 41 61 65 63
Gender—Male (%) 45% 47% 45% 48%
Race
 White (%)  68% 72% 77% NA
 African American (%) 15% 16% 13%
 Asian/Pacific Islander (%) 2% 2% 2%
 African American (%) 2% 2% 1%
Average number of visits per patient 8 17 7 19
Average number of codes per patient 15 33 14 18
Vocabulary size 82,603 26,427 13,071 7002
ICD-10 codes (%) 33.8% 13.3% 20.7% 0%

aThe event for pretraining is a prolonged hospitalization >7 days. The event for DHF-Cerner is the development of heart failure for diabetic patients. The event for PaCa-Cerner and PaCa-Truven is the diagnosis of pancreatic cancer and the percent is from the dataset total population.