Table 2.
Continuous update of patient datasets and improvement of XGBoost base models for stroke risk prediction in the simulated learning health system.
| Dataset/metrics | Patients | Variables | XGBoost | RF | SVM | KNN |
|---|---|---|---|---|---|---|
| pt30k | 30 K | 58 | ||||
| Recall | 0.792 | 0.708 | 0.711 | 0.626 | ||
| Precision | 0.898 | 0.920 | 0.938 | 0.906 | ||
| AUC | 0.881 | 0.844 | 0.848 | 0.802 | ||
| Accuracy | 0.925 | 0.911 | 0.915 | 0.889 | ||
| pt60k | 60 K | 83 | ||||
| Recall | 0.827 | 0.746 | 0.787 | 0.640 | ||
| Precision | 0.922 | 0.949 | 0.955 | 0.936 | ||
| AUC | 0.901 | 0.866 | 0.887 | 0.813 | ||
| Accuracy | 0.938 | 0.925 | 0.936 | 0.898 | ||
| pt90k | 90 K | 104 | ||||
| Recall | 0.843 | 0.760 | 0.807 | 0.646 | ||
| Precision | 0.932 | 0.958 | 0.961 | 0.944 | ||
| AUC | 0.911 | 0.875 | 0.898 | 0.817 | ||
| Accuracy | 0.945 | 0.931 | 0.943 | 0.901 | ||
| pt120k | 120 K | 117 | ||||
| Recall | 0.892 | 0.838 | 0.871 | 0.698 | ||
| Precision | 0.964 | 0.982 | 0.976 | 0.960 | ||
| AUC | 0.940 | 0.916 | 0.932 | 0.844 | ||
| Accuracy | 0.964 | 0.955 | 0.962 | 0.916 | ||
| pt150k | 150 K | 124 | ||||
| Recall | 0.908 | 0.855 | 0.888 | 0.720 | ||
| Precision | 0.964 | 0.987 | 0.977 | 0.957 | ||
| AUC | 0.948 | 0.925 | 0.940 | 0.855 | ||
| Accuracy | 0.969 | 0.961 | 0.967 | 0.922 |
Initial dataset: 30 K patients; 4 data updates, each with additional 30 K patients. The XGBoost base model was also compared to RF, SVM and KNN base models at the time of each data update.