Skip to main content
. 2023 May 8;29(5):1113–1122. doi: 10.1038/s41591-023-02332-5

Fig. 2. Characteristics of the Danish and US-VA patient registries.

Fig. 2

a, Distributions for age at pancreatic cancer diagnosis in the two cohorts. b,c, The Danish (DK) dataset has a longer median length of disease trajectories but lower median number of disease codes per patient compared to the US-VA dataset, so the ML process, independently in each dataset, has to cope with very different distributions of disease trajectories in terms of length of trajectories and density of the number of disease codes. Color level indicates the number of patients in a given bin. d,e, Background check on the distribution disease codes in the clinical records: prevalence of known risk factors in cancer versus non-cancer patients in the DK (d) and US-VA (e) datasets, counting whether a disease code occurred at least once in a patient’s history previous to their pancreatic cancer code (cancer) or 2 years previous to the end of data (no cancer).