. 2021 Mar 29;19:80. doi: 10.1186/s12916-021-01953-2

Table 4.

Model performances across SYSU1, SYSU2, SZPH, and TCGA testing sets

Metrics	LUAD	LUSC	SCLC	PTB	OP	NL	Macro-avg
Cohorts	LUAD	LUSC	SCLC	PTB	OP	NL	Macro-avg
Precision
SYSU1	0.80	0.75	1.00	0.89	1.00	1.00	0.91
SYSU2	0.85	0.88	0.79	0.80	0.88	0.96	0.86
SZPH^a	0.97	0.84	0.94	–	–	1.00	0.94
TCGA^b	0.82	0.70	–	–	–	1.00	0.84
Macro-avg	0.86	0.79	0.91	0.85	0.94	0.99*	0.89
Recall
SYSU1	1.00	0.75	0.77	0.80	0.60	0.93	0.81
SYSU2	0.84	0.72	0.94	0.93	0.84	0.95	0.87
SZPH^a	0.93	0.97	0.67	–	–	0.91	0.87
TCGA^b	0.68	0.94	–	–	–	0.78	0.80
Macro-avg	0.86	0.85	0.79	0.87	0.72	0.89*	0.84
F1-score
SYSU1	0.89	0.75	0.87	0.84	0.75	0.96	0.84
SYSU2	0.85	0.79	0.86	0.86	0.86	0.95	0.86
SZPH^a	0.95	0.90	0.78	–	–	0.95	0.90
TCGA^b	0.74	0.80	–	–	–	0.88	0.80
Macro-avg	0.86	0.81	0.84	0.85	0.81	0.94*	0.85

^aFor the SZPH dataset, no PTB or OP WSIs were available

^bFor TCGA dataset, only LUAD, LUSC, and NL WSIs were available

*Maximum Macro-avg value across the datasets of different diseases

Bold font: Maximum value of specific metrics across different data cohorts