Table 8. Model Performance Comparison between Single-Task and Multi-Task Models on the External Test Set.
| Endpoint | Method | Pseudolabel tasks | MCC | F1 score | Recall | Precision | Specificity |
|---|---|---|---|---|---|---|---|
| Overall | Single-task NN | - | 0.48 | 0.54 | 0.68 | 0.45 | 0.88 |
| Overall | Multi-task NN | - | 0.50 | 0.56 | 0.71 | 0.47 | 0.88 |
| Overall | Multi-task NN | All pseudolabels | 0.53 | 0.59 | 0.74 | 0.48 | 0.89 |
| Overall | Multi-task NN | Subset pseudolabels | 0.56 | 0.61 | 0.82 | 0.48 | 0.88 |
| Membrane potential | Single-task NN | - | 0.57 | 0.62 | 0.66 | 0.59 | 0.94 |
| Membrane potential | Multi-task NN | - | 0.54 | 0.60 | 0.68 | 0.54 | 0.92 |
| Membrane potential | Multi-task NN | All pseudolabels | 0.56 | 0.61 | 0.73 | 0.53 | 0.90 |
| Membrane potential | Multi-task NN | Subset pseudolabels | 0.56 | 0.62 | 0.73 | 0.53 | 0.91 |