Skip to main content
. 2022 Oct 26;12:17917. doi: 10.1038/s41598-022-23011-4

Table 2.

Continuous update of patient datasets and improvement of XGBoost base models for stroke risk prediction in the simulated learning health system.

Dataset/metrics Patients Variables XGBoost RF SVM KNN
pt30k 30 K 58
Recall 0.792 0.708 0.711 0.626
Precision 0.898 0.920 0.938 0.906
AUC 0.881 0.844 0.848 0.802
Accuracy 0.925 0.911 0.915 0.889
pt60k 60 K 83
Recall 0.827 0.746 0.787 0.640
Precision 0.922 0.949 0.955 0.936
AUC 0.901 0.866 0.887 0.813
Accuracy 0.938 0.925 0.936 0.898
pt90k 90 K 104
Recall 0.843 0.760 0.807 0.646
Precision 0.932 0.958 0.961 0.944
AUC 0.911 0.875 0.898 0.817
Accuracy 0.945 0.931 0.943 0.901
pt120k 120 K 117
Recall 0.892 0.838 0.871 0.698
Precision 0.964 0.982 0.976 0.960
AUC 0.940 0.916 0.932 0.844
Accuracy 0.964 0.955 0.962 0.916
pt150k 150 K 124
Recall 0.908 0.855 0.888 0.720
Precision 0.964 0.987 0.977 0.957
AUC 0.948 0.925 0.940 0.855
Accuracy 0.969 0.961 0.967 0.922

Initial dataset: 30 K patients; 4 data updates, each with additional 30 K patients. The XGBoost base model was also compared to RF, SVM and KNN base models at the time of each data update.