. 2023 Jul 6;36(7):1107–1120. doi: 10.1021/acs.chemrestox.3c00086

Table 8. Model Performance Comparison between Single-Task and Multi-Task Models on the External Test Set.

Endpoint	Method	Pseudolabel tasks	MCC	F1 score	Recall	Precision	Specificity
Overall	Single-task NN	-	0.48	0.54	0.68	0.45	0.88
Overall	Multi-task NN	-	0.50	0.56	0.71	0.47	0.88
Overall	Multi-task NN	All pseudolabels	0.53	0.59	0.74	0.48	0.89
Overall	Multi-task NN	Subset pseudolabels	0.56	0.61	0.82	0.48	0.88

Membrane potential	Single-task NN	-	0.57	0.62	0.66	0.59	0.94
Membrane potential	Multi-task NN	-	0.54	0.60	0.68	0.54	0.92
Membrane potential	Multi-task NN	All pseudolabels	0.56	0.61	0.73	0.53	0.90
Membrane potential	Multi-task NN	Subset pseudolabels	0.56	0.62	0.73	0.53	0.91