Skip to main content
. 2021 Jun 1;9(6):579. doi: 10.3390/vaccines9060579

Table 2.

Machine learning methods to predict vaccine immunogenicity and efficacy. Different machine learning algorithms can be used. The quality of the model needs to be evaluated, and there are different metrics to assess a model performance, such as accuracy (defined as the number of correct predictions divided by the total number of input data), Area Under the Receiver Operator Characteristic curve (AUROC) or Root Mean Squared Error for regressions. It depends on the machine learning method itself. (Ab, antibody; ClaNC, classification to nearest centroid; DAMIP, discriminant analysis via mixed integer programming; HAI, hemagglutination-inhibition; CHMI, Controlled Human Malaria Infection; * accuracy except otherwise mentioned).

Vaccine Vaccinees Predicted Responses Predictors Machine Learning Method Performance * Reference
Yellow fever vaccine (YF-17D) Healthy adults The magnitude of the activated CD8+ T cell and neutralizing Ab responses Early blood transcriptional signatures ClaNC and DAMIP Up to 90% and 100% respectively [52]
Seasonal Trivalent Inactivated influenza Vaccine (TIV) Patients 50–89 years old suffering from multiple chronic medical conditions The magnitude of plasma HAI Ab response Baseline signatures among 26 input continuous or categorical variables inc. previous vaccination, low grade chronic inflammation, chronic infections, blood cell counts Neural network (multilayer perceptron (MLP), radial-basis function network (RBFN) and probabilistic network (PNN)) and Logistic regression 72.5% of average hit rate across 10 samples [184]
Seasonal Trivalent Inactivated influenza Vaccine (TIV) Healthy adults The magnitude of plasma HAI Ab response Early blood transcriptional signatures DAMIP Up to 90% [185]
Seasonal Trivalent Inactivated influenza Vaccine (TIV) Healthy adults, inc. young (20–30 years) and older subjects (60 to 89 years) The magnitude of plasma HAI Ab response Baseline blood transcriptional, cytokines and cell populations signatures Logistic regression 84% [178]
Seasonal Trivalent Inactivated influenza Vaccine (TIV) and pandemic H1N1 (pH1N1) vaccine Healthy adults The magnitude of the Ab response Baseline HAI titer, blood cell populations, transcripts and pathways signatures Diagonal linear discriminant analysis (for cell frequency data and when cell frequency and pathway status were combined); or partial least square (for data dimension reduction due to the large number of genes) followed by linear discriminant analysis (PLS-LDA) for transcript data alone 0.86 of AUROC [60]
Seasonal Trivalent Inactivated influenza Vaccine (TIV) over 5 seasons Human adults, inc. elderlies (>65 years) The magnitude of plasma HAI Ab response Early blood transcriptional signatures DAMIP and artificial neural network classifier >80% [10]
Seasonal Trivalent Inactivated influenza Vaccine (TIV) Healthy adults (50 to 74 years) The magnitude of the B-cell ELISPOT and plasma HAI Ab responses Early blood cell composition, mRNA-Seq, and DNA methylation signatures The ensemble learner (inc. Generalized linear models, Recursive Partitioning, and Regression Trees), and random forest models 0.64–0.79 of AUROC [186]
Seasonal Trivalent Inactivated influenza Vaccine (TIV) Healthy adults The magnitude of plasma HAI Ab response Baseline HAI titer and blood transcriptional signatures Gaussian Mixture Model (GMM) R2 = 0.64 for the correlation between observed and
predicted data
[187]
Seasonal Trivalent Inactivated influenza Vaccine (TIV) Healthy adults The magnitude of the Ab response Early blood transcriptional signatures Logistic Multiple Network-constrained Regression 69% [188]
Seasonal Trivalent Inactivated influenza Vaccine (TIV) over 8 seasons Healthy adults The magnitude of the specific Ab response Baseline blood cell populations signatures 128 machine learning algorithms suitable for classification using Sequential Iterative Modeling “OverNight” (SIMON), inc. Diagonal Discriminant Analysis, Partial Least Squares, Linear Discriminant Analysis, Logic Regression, Neural Network, Random Forest Up to 0.92 of AUROC [179]
Seasonal Trivalent Inactivated influenza Vaccine (TIV) given transcutaneously, intradermally or intramuscularly Healthy adults The magnitude of the specific T CD8+ and Ab responses Early blood transcriptional and serum cytokines signatures Logistic regression 0.93 to 0.96 of AUROC [189]
Seasonal Trivalent Inactivated influenza Vaccine (TIV) and 23-valent pneumococcal polysaccharide vaccine Old patients (>65 years) with chronic kidney disease with or without non-dialysis The magnitude of the HAI Ab and anti-PnPS IgG responses Baseline signatures among 30 input continuous or categorical variables inc. previous vaccinations, low grade chronic inflammation, chronic infections, blood cell counts Multivariable linear regression model p < 0.05 [190]
RTS,S malaria vaccine Healthy adults The protection against CHMI Early blood transcriptional signatures DAMIP >80% [181]
Candidate malaria vaccine composed of a Self-Assembling Protein Nanoparticles presenting the malarial circumsporozoite protein (CSP) adjuvanted with three different liposomal formulations: liposome plus Alum, liposome plus QS21, or both Rhesus macaques Adjuvant condition Vaccine-induced immune response signatures among many variables inc. serology, fluorospot, ICS from blood, liver, LN and spleen Random forest followed by Linear regression analysis 92% [32]
Live-attenuated varicella zoster virus (VZV) vaccine Healthy adults, inc. younger (25–40 years) and older (60–79 years) The magnitude of the specific T and IgG responses Early blood transcriptional, metabolite clusters, cytokines, and cell populations signatures Multivariate regression model (Partial least square) p < 0.05 [180]
Monovalent oral polio vaccine type 3 (mOPV3) Infants aged 6–11 months Seroconversion or shedding of vaccine virus as a marker of vaccine “take” Baseline enteric pathogens blood cell populations, and plasma cytokines signatures Random forest 58% [191]
Two distinct live attenuated Tularemia vaccine administered by scarification Healthy humans The magnitude of the specific Ab and activated CD4 and CD8 T cell responses Early blood transcriptional signatures Logistic regression 26% of mean misclassification error [39]
rVSV-ZEBOV Healthy adults The magnitude of the Ab response Early blood transcriptional, plasma cytokine and cell populations signatures Sparse partial least-squares followed by multivariable linear regression 0.77 of root square residuals leave-one-out explaining 55% of the variability [12]
DNA/rAd5 HIV-1 preventive candidate vaccine Healthy adults HIV infection Magnitude and quality of CD4 and CD8 T cells PCA followed by Cox proportional hazards regression model, and Logistic regression with lasso Up to 0.75 of AUROC [192]
Seven preventive HIV-1 vaccine regimens (inc. DNA, NYVAC, ALVAC, MVA, AIDSVAX) Healthy adults The magnitude of long-term immune responses Baseline demographic variables and peak immune responses Regularized random forest and linear regression models R = 0.91 for the correlation between observed andpredicted data [193]
41 different vaccine vectors all expressing the same antigen Mice The quality of late T-cell responses Early transcriptome of dendritic cells Random forest Up to 98% [194]