. 2023 Dec 9;39(12):btad743. doi: 10.1093/bioinformatics/btad743

Table 1.

Effect of input features.^a

	TPP2 AUROC	TPP2 AP	TPP3 AUROC	TPP3 AP
1. $α β$ (CDR3)	0.830 $\pm$ 0.000	0.574 $\pm$ 0.000	0.513 $\pm$ 0.008	0.179 $\pm$ 0.003
2. $α β$ (CDR3) + VJ	0.891 $\pm$ 0.000	0.665 $\pm$ 0.001	0.548 $\pm$ 0.007	0.192 $\pm$ 0.004
3. $α β$ (CDR3) + MHC	0.837 $\pm$ 0.000	0.583 $\pm$ 0.000	0.611 $\pm$ 0.002	0.243 $\pm$ 0.002
4. $α β$ (CDR3) + VJ + MHC	0.897 $\pm$ 0.000	0.676 $\pm$ 0.000	0.692 $\pm$ 0.007	0.289 $\pm$ 0.006
5. $α β$ (long)	0.888 $\pm$ 0.000	0.663 $\pm$ 0.000	0.528 $\pm$ 0.008	0.191 $\pm$ 0.004
6. $α β$ (long) + MHC	0.893 $\pm$ 0.000	0.674 $\pm$ 0.000	0.682 $\pm$ 0.010	0.284 $\pm$ 0.007
7. $α β$ (long) + VJ + MHC	0.906 $\pm$ 0.000	0.698 $\pm$ 0.000	0.691 $\pm$ 0.008	0.291 $\pm$ 0.005
8. $α β$ (long) + VJ + MHC [ $D_{α β, α, β}$ ]	0.906 $\pm$ 0.000	0.691 $\pm$ 0.001	0.693 $\pm$ 0.008	0.294 $\pm$ 0.007

The model was trained on $D_{α β, β}$ using different subsets of the input features. Here CDR3 and long in parenthesis denote the context used for the ProtBERT embeddings and VJ and MHC denote if the respective categorical features were used. We also compared the model on $D_{α β, α, β}$ that contains also datapoints that have only the $α$ chain but not the $β$ chain (row 8). Reported values are the mean of the five 10-fold cross-validation runs together with the standard error. The values corresponding to best performing configurations are bolded.