Table 1.
MHC associated with peptides in each dataset.
PDB (all) | PDB (ML) | Expt | Dash | 10X | Atlas | NewVdj | |
---|---|---|---|---|---|---|---|
A01 | 2 | 6 | 27 (25) | ||||
A02 | 71 | 211 | 98 (98) | 6,561 (5,754) | 1679 (1,422) | 303 (302) | 5,510 (4,812) |
A03 | 3,377 (2,922) | 145 (126) | |||||
A11 | 2 | 5 | 1908 (1,673) | ||||
A24 | 5 | 15 | 287 (239) | 4 (4) | |||
A25 | 145 (126) | ||||||
A68 | 435 (253) | ||||||
B07 | 2 | 34 (29) | 290 (254) | ||||
B08 | 4 | 10 | 273 (254) | 55 (55) | |||
B27 | 2 | 7 | 1 (1) | ||||
B35 | 16 | 33 | 28 (28) | 2 (2) | 67 (14) | 145 (126) | |
B37 | 1 | 1 | |||||
B38 | 290 (127) | ||||||
B44 | 3 | 11 | 6 (6) | 145 (127) | |||
B51 | 1 | 2 | |||||
B57 | 1 | 5 | |||||
DQ | 10 | 34 (0) | |||||
DR | 14 | 9 (0) | |||||
E | 3 | 7 | 7 (7) | ||||
H-2D | 11 | 40 | 6,561 (5,434) | ||||
H-2K | 7 | 13 | 8,749 (7,423) | 4 (4) | |||
H-2L | 12 | 38 | |||||
IA | 5 | 1 (0) | |||||
IE | 7 | 20 (0) | |||||
Total | 179 | 404 | 126 (126) | 21,871 (18,611) | 7,587 (6,566) | 511 (393) | 7,105 (5,951) |
For each set analysed, the number of complexes with each MHC gene is shown. PDB (all) is used in the first section of the paper to analyse contacts, whilst PDB (ML) is the set used for the supervised learning. The numbers for PDB (ML), expt, dash, atlas, and newVdj represent the starting set of sequences that were submitted to the pipeline after removing duplicates, including both binding and non-binding complexes, as well as sequences that could not be modelled by TCRpMHCmodels. The number in brackets are the structures for which prediction was successful.