Skip to main content
. 2021 Dec 20;25(1):103668. doi: 10.1016/j.isci.2021.103668

Table 1.

Cross-validation of various input k-mer sequences

Model Test set S5F (5-mer) DeepSHM (5-mer) DeepSHM (9-mer) DeepSHM (15-mer) DeepSHM (21-mer)
Substitution rate IGHV1 0.52 0.57 0.57 0.56 0.56
IGHV3 0.51 0.54 0.55 0.55 0.54
IGHV4 0.54 0.57 0.57 0.58 0.57
IGHV2, 5, 6, 7 0.53 0.55 0.56 0.56 0.55
Avg correlation 0.52 0.56 0.56 0.56 0.55
Best - S5F NA 0.04 0.04 0.04 0.03
Mean - S5F NA 0.02 0.03 0.01 −0.01
p-value NA 3.52E-15 3.46E-17 3.78E-9 0.52
Mutation frequency IGHV1 0.69 0.74 0.79 0.82 0.82
IGHV3 0.68 0.74 0.79 0.8 0.79
IGHV4 0.69 0.74 0.79 0.84 0.84
IGHV2, 5, 6, 7 0.70 0.69 0.76 0.78 0.77
Avg correlation 0.69 0.73 0.78 0.81 0.80
Best - S5F NA 0.04 0.09 0.12 0.11
Mean - S5F NA 0.03 0.07 0.09 0.09
p-value NA 2.76E-15 2.31E-17 2.31E-17 2.31E-17
Weighted substitution (substitution rate) IGHV1 0.52 0.55 0.53 0.55 0.53
IGHV3 0.51 0.52 0.53 0.52 0.51
IGHV4 0.54 0.55 0.54 0.57 0.54
IGHV2, 5, 6, 7 0.53 0.53 0.54 0.55 0.53
Avg correlation 0.52 0.54 0.54 0.55 0.53
Best - S5F NA 0.02 0.02 0.03 0.01
Mean - S5F NA −0.06 −0.08 −0.11 −0.17
p-value NA 4.19E-14 2.91E-16 6.82E-17 2.31E-17
Weighted substitution (mutation frequency) IGHV1 0.69 0.74 0.78 0.80 0.81
IGHV3 0.68 0.73 0.78 0.80 0.78
IGHV4 0.69 0.74 0.78 0.80 0.82
IGHV2, 5, 6, 7 0.70 0.70 0.75 0.77 0.78
Avg correlation 0.69 0.73 0.77 0.79 0.80
Best - S5F NA 0.04 0.08 0.10 0.11
Mean - S5F NA 0 0.04 0.04 0
p-value NA 0.001 2.33E-8 1.49E-7 0.25

The correlations of repeatedly trained models using different random seeds (but the same hyperparameters) for neural network training had small standard deviations, in all cases below 0.01. p-values are from a Wilcoxon signed-rank test comparing the training results for each model with the corresponding S5F model accuracy. p-values were corrected (Benjamini-Hochberg) for multiple comparisons.