Skip to main content
. 2021 Sep 8;12:730908. doi: 10.3389/fphys.2021.730908

Table 1.

MHC associated with peptides in each dataset.

PDB (all) PDB (ML) Expt Dash 10X Atlas NewVdj
A01 2 6 27 (25)
A02 71 211 98 (98) 6,561 (5,754) 1679 (1,422) 303 (302) 5,510 (4,812)
A03 3,377 (2,922) 145 (126)
A11 2 5 1908 (1,673)
A24 5 15 287 (239) 4 (4)
A25 145 (126)
A68 435 (253)
B07 2 34 (29) 290 (254)
B08 4 10 273 (254) 55 (55)
B27 2 7 1 (1)
B35 16 33 28 (28) 2 (2) 67 (14) 145 (126)
B37 1 1
B38 290 (127)
B44 3 11 6 (6) 145 (127)
B51 1 2
B57 1 5
DQ 10 34 (0)
DR 14 9 (0)
E 3 7 7 (7)
H-2D 11 40 6,561 (5,434)
H-2K 7 13 8,749 (7,423) 4 (4)
H-2L 12 38
IA 5 1 (0)
IE 7 20 (0)
Total 179 404 126 (126) 21,871 (18,611) 7,587 (6,566) 511 (393) 7,105 (5,951)

For each set analysed, the number of complexes with each MHC gene is shown. PDB (all) is used in the first section of the paper to analyse contacts, whilst PDB (ML) is the set used for the supervised learning. The numbers for PDB (ML), expt, dash, atlas, and newVdj represent the starting set of sequences that were submitted to the pipeline after removing duplicates, including both binding and non-binding complexes, as well as sequences that could not be modelled by TCRpMHCmodels. The number in brackets are the structures for which prediction was successful.