. 2022 Oct 26;12:17917. doi: 10.1038/s41598-022-23011-4

Table 2.

Continuous update of patient datasets and improvement of XGBoost base models for stroke risk prediction in the simulated learning health system.

Dataset/metrics	Patients	Variables	XGBoost	RF	SVM	KNN
pt30k	30 K	58
Recall			0.792	0.708	0.711	0.626
Precision			0.898	0.920	0.938	0.906
AUC			0.881	0.844	0.848	0.802
Accuracy			0.925	0.911	0.915	0.889
pt60k	60 K	83
Recall			0.827	0.746	0.787	0.640
Precision			0.922	0.949	0.955	0.936
AUC			0.901	0.866	0.887	0.813
Accuracy			0.938	0.925	0.936	0.898
pt90k	90 K	104
Recall			0.843	0.760	0.807	0.646
Precision			0.932	0.958	0.961	0.944
AUC			0.911	0.875	0.898	0.817
Accuracy			0.945	0.931	0.943	0.901
pt120k	120 K	117
Recall			0.892	0.838	0.871	0.698
Precision			0.964	0.982	0.976	0.960
AUC			0.940	0.916	0.932	0.844
Accuracy			0.964	0.955	0.962	0.916
pt150k	150 K	124
Recall			0.908	0.855	0.888	0.720
Precision			0.964	0.987	0.977	0.957
AUC			0.948	0.925	0.940	0.855
Accuracy			0.969	0.961	0.967	0.922

Initial dataset: 30 K patients; 4 data updates, each with additional 30 K patients. The XGBoost base model was also compared to RF, SVM and KNN base models at the time of each data update.