Skip to main content
. 2019 Oct 8;27(1):99–108. doi: 10.1093/jamia/ocz161

Table 1.

Summary statistics of the EMR datasets

SD Dataset CSD Dataset
Patients 2 246 444 1 045 634
Number of ICD-9 codes 944 854
Age distribution, %
 0-17 y 21 17
 18-44 y 32 29
 45-64 y 24 26
 >64 y 23 28
Male/female, % 47/53 47/53
Codes per patient 8.11 14.76
Patients per code 19 298 18 080

CSD: clean Synthetic Derivative; EMR: electronic medical record; ICD-9: International Classification of Diseases-Ninth Revision; SD: Synthetic Derivative.