Skip to main content
[Preprint]. 2024 Aug 8:rs.3.rs-4798448. [Version 1] doi: 10.21203/rs.3.rs-4798448/v1

Table 1:

Disease class size before and after PCA-based data compression. Composite disease clusters (latent factors) are retained for latent factors whose explained variance exceed the explained variance of equivalent noise-based latent factors. PCA was applied separately for each disease class.

Disease Category Number of Phecodes Number of latent factors Total Explained Variance of latent factors (%)
Circulatory System 155 4 50.1
Congenital Anomalies 54 1 38.3
Dermatological 94 4 45.3
Digestive 155 6 48.3
Endocrine/Metabo lic 146 2 51.5
Genitourinary 156 10 52.1
Hematopoietic 54 3 64.4
Infectious Diseases 55 2 44.7
Injuries and Poisonings 125 2 21.6
Mental Disorders 70 4 64.5
Musculoskeletal 120 2 36.0
Neoplasms 132 3 39.9
Neurological 78 5 53.5
Pregnancy Complications 44 1 22.9
Respiratory 75 6 64.0
Sense Organs 113 1 23.0
Symptoms 36 2 46.0
Total 1,662 58 45.1 (Mean)