Fig. 6. The effect of protein descriptors and bioactivity types on Q.E.D model accuracy.
The bars show Pearson correlations between the measured and Q.E.D model-predicted pKd’s calculated over the 394 Round 2 compound-kinase pairs based on different a protein kernels and b training bioactivity data types. The total number of training bioactivity data points is written in parentheses. The original, submitted Q.E.D model based on the full amino acid sequence-based protein kernel and using Kd, Ki, and EC50 bioactivities in the training dataset is marked with red. No other changes were introduced to the submitted Q.E.D model, which is an ensemble of the regressors with different regularization hyperparameter values and eight compound kernels, but where each regressor is built upon the same protein kernel based on full amino acid sequences. The protein kernel and training bioactivity type used in the baseline model are marked in boldface. The numbers inside the bars are Benjamini–Hochberg adjusted two-sided P values calculated with the Pearson and Filon test for comparing the correlation of the submitted Q.E.D model and each of its re-trained variants. Since the two correlations under comparison are calculated on the same set of data points and they have one variable in common (measured pKd), the dependence between pKd’s predicted by the submitted Q.E.D model and the new model variant is taken into account in the statistical test. Significant P values (adjusted P < 0.05) are written in boldface. Source data are provided as a Source Data file54.
