Table IV. Performance Comparison On Transferring Pre-Trained Representations To Risk Prediction Task.
Task | Subset sample size (% of full training set) | No. (%) of positive | Hi-BEHRT | |||
---|---|---|---|---|---|---|
Without pretraining | With pretraining | |||||
AUR OC | AUP RC | AUR OC | AU PRC | |||
HF | 13,827 (1%) | 633 (4.5) | 0.84 | 0.19 | 0.86 | 0.23 |
69,136 (5%) | 3,317(4.8) | 0.86 | 0.23 | 0.88 | 0.26 | |
Diabetes | 13,201 (1%) | 643 (4.9) | 0.70 | 0.13 | 0.73 | 0.16 |
66,003 (5%) | 3,223 (4.9) | 0.79 | 0.22 | 0.79 | 0.22 | |
CKD | 13,228 (1%) | 1,224 (9.3) | 0.73 | 0.22 | 0.76 | 0.26 |
66,140 (5%) | 6,110(9.2) | 0.77 | 0.26 | 0.80 | 0.33 | |
Stroke | 12,114(1%) | 1,579 (13.0) | 0.65 | 0.21 | 0.67 | 0.23 |
60,568 (5%) | 7,827 (13.0) | 0.68 | 0.24 | 0.76 | 0.41 |
% of positive represents the percentage of positive cases in the dataset, for example, HF has 633 positive cases in the 1% subset, and it is around 4.5 percent over 13,827 patients.