. 2022 Jun 2;2:871630. doi: 10.3389/fepid.2022.871630

Table 3.

Performance of the models based on derivation set variations compared with the reference model in Dutch primary care EHR data (n = 89,491).

		Derivation set characteristics**			Performance metrics **
Data preparation challenge	Derivation set variation description	Sample size (range)	Percentage events (range)	Median follow-up time (days; range)	C-statistic (95% CI)	Calibration curve intercept (95% CI)	Calibration curve slope (95% CI)
Reference derivation set*	NA	62,644 (62,557–62,730)	7.5 (7.5–7.6)	2,912 (2,904–2,920)	0.67 (0.67–0.67)	0.00 (−0.01 to 0.00)	1.00 (1.00–1.01)
Run-in variations	2 years run-in	58,168 (58,098–58,236)	7.0 (7.0–7.1)	2,832 (2,832–2,832)	0.67 (0.67–0.67)	0.00 (−0.01 to 0.00)	1.00 (0.99–1.00)
	3 years run-in	54,958 (54,884–55,031)	6.4 (6.4–6.5)	2,833 (2,833–2,833)	0.67 (0.67–0.67)	0.02 (0.01 to 0.03)	1.02 (1.01–1.03)
Variations in outcome definition	ATC (excl. ASA) or ICPC	63,376 (63,301–63,448)	5.1 (5.1–5.2)	2,933 (2,925–2,940)	0.67 (0.67–0.67)	−0.40 (−0.41 to −0.40)	0.67 (0.66–0.67)
	ATC only	63,518 (63,436–63,597)	7.5 (7.4–7.5)	2,916 (2,909–2,922)	0.68 (0.68–0.68)	−0.01 (−0.02 to 0.00)	0.99 (0.99–1.00)
	ATC (excl. ASA) only	64,739 (64,662–64,819)	4.6 (4.5–4.6)	2,968 (2,956–2,979)	0.68 (0.68–0.68)	−0.52 (−0.53 to −0.51)	0.59 (0.59–0.60)
	ICPC only	64,089 (63,998–64,180)	3.4 (3.3–3.4)	3,025 (3,010–3,040)	0.66 (0.66–0.66)	−0.84 (−0.85 to −0.83)	0.43 (0.43–0.44)
Missing data method variations	Complete Case	7,601 (7,573–7,629)	11.4 (11.3–11.5)	2,425 (2,409–2,442)	0.62 (0.62–0.62)	0.53 (0.51 to 0.54)	1.69 (1.67–1.71)
	Mean imputation	62,548 (62,478–62,618)	7.5 (7.5–7.6)	2,910 (2,901–2,918)	0.66 (0.66–0.66)	0.01 (0.00 to 0.02)	1.01 (1.00–1.02)

ASA, acetylsalicylic acid; ICPC, International Classification of Primary Care diagnosis codes; ATC, Anatomical Therapeutic Chemical medication codes.

*The reference derivation and validation set is defined by 1 year run-in, imputation using MICE, and outcome definition based on ICPC or ATC codes (including aspirin).

**Derivation set characteristics and performance metrics are given as average across 50 bootstrap samples.