Skip to main content
. 2024 Jan 19;14:1731. doi: 10.1038/s41598-024-51938-3

Table 3.

Results from clustering of diagnosis data.

Cluster size Proportion out of data set Cluster breakdown
Cluster Number of patients Proportion out of all patients* RWD RCT RWD RCT Most prevalent diagnosis codes**
Patients without cluster assignment 5133 18.1% 20.5% 8.9% 90.2% 9.8%
0 366 1.3% 0.0% 6.4% 1.1% 98.9% Chronic renal failure (N18), Vascular dementia (F01), Non-insulin-dependent diabetes mellitus (E11), Depressive episode (F32), Essential (primary) hypertension (I10), Obesity (E66), Disorders of lipoprotein metabolism and other lipidaemias (E78), Chronic ischaemic heart disease (I25), Sleep disorders (G47), Other anaemias (D64)
1 562 2.0% 0.0% 9.8% 1.4% 98.6% Non-insulin-dependent diabetes mellitus) (E11), Chronic renal failure (N18), Unspecified diabetes mellitus (E14), Other specified diabetes mellitus) (E13), Glomerular disorders in diseases classified elsewhere (N08), Essential (primary) hypertension (I10), Disorders of lipoprotein metabolism and other lipidaemias (E78), Obesity (E66), Chronic ischaemic heart disease (I25), Other cataract (H26)
2 1622 5.7% 7.2% 0.0% 100.0% 0.0% No diagnoses with prevalence > 20%
3 453 1.6% 0.2% 7.1% 10.8% 89.2% Chronic renal failure (N18), Non-insulin-dependent diabetes mellitus (E11), Essential (primary) hypertension (I10)
4 17,383 61.5% 59.9% 67.8% 78.0% 22.0% Essential (primary) hypertension (I10), Non-insulin-dependent diabetes mellitus (E11), Chronic renal failure (N18), Disorders of lipoprotein metabolism and other lipidaemias (E78), Heart failure (I50), Chronic ischaemic heart disease (I25), Pneumonia, organism unspecified (J18)
5 271 0.1% 1.2% 0.0% 100.0% 0.0% No diagnoses with prevalence > 20%
6 264 0.1% 1.2% 0.0% 100.0% 0.0% Other diseases of urinary system (N39), Other soft tissue disorders (M79)
7 399 1.4% 1.8% 0.0% 100.0% 0.0% Pneumonia, organism unspecified (J18), Acute myocardial infarction (I21)
8 1164 4.1% 5.1% 0.0% 100.0% 0.0% Pneumonia, organism unspecified (J18), Heart failure (I50)
9 386 1.4% 1.7% 0.0% 100.0% 0.0% Heart failure (I50), Pneumonia, organism unspecified (J18)
10 281 0.1% 1.2% 0.0% 100.0% 0.0% Heart failure (I50), Pneumonia, organism unspecified (J18), Nonrheumatic aortic valve disorders (I35), Acute myocardial infarction (I21), Other chronic obstructive pulmonary disease (J44)

*All patients used for the clustering of diagnosis data set. ** Diagnoses with at least 20% prevalence in the cluster are listed in descending order of prevalence out of 65 diagnosis codes used for VAE model training.

Proportion out of data set describes how the RWD and RCT patients are distributed along the different clusters. Cluster breakdown shows how large proportions of the patients belonging to a cluster come from RWD and RCT sets.