Skip to main content
. 2022 Nov 24;13:7238. doi: 10.1038/s41467-022-34904-3

Fig. 6. HLA prediction model built on AlphaPeptDeep functionalities.

Fig. 6

a The pipeline with the HLA prediction model to extract potential HLA peptides from the proteome databases. The HLA model is a binary classifier that predicts if a given sequence is a potentially presented HLA sequence. b Our HLA prediction model boosts the number of identified HLA-I peptides compared to other tools. Cell line HLA data from RA957 with sequence lengths from 8 to 14 were used. The top bar plots show the number of identified unique sequences of HLA allele types for each search method. The bottom bar plots the relative frequency of these HLA allele types. ‘Trash’ means the peptides cannot be assigned to any HLA allele types by MixMHCpred at 5% rank level. ‘AlphaPeptDeep lib’ (red) refers to the library predicted by the sample-specific HLA model and our MS2 and RT models. The bars represent DDA data analyzed by MaxQuant, and the DIA data analyzed by DIA-Umpire, or PEAKS-Online including de novo sequencing. AlphaPeptDeep with transfer learning for the sample-specific HLA library clearly outperforms these. Training on the sample-specific peptides without transfer learning obtained similar number of identified HLA peptides (“no pretrain” in (b)). The results of omitting the dominant HLA-A*68:01 (A6801) HLA type in the pan-model and using transfer learning with including 1000 or 100 of these peptides identified by direct-DIA from the data are shown in the last two bars of the A6801 type (see arrows in the panel). RT retention time, CCS collision cross section, MS2 intensities of fragment spectra, HLA human leukocyte antigen, DDA data-dependent acquisition, DIA data-independent acquisition.