Skip to main content
. Author manuscript; available in PMC: 2019 Jan 24.
Published in final edited form as: Cell Syst. 2017 Dec 6;6(1):116–124.e3. doi: 10.1016/j.cels.2017.11.003

Figure 2. Protein-specific gradient boosting models can accurately predict variant effect scores.

Figure 2

We trained a model for each protein using a randomly selected 80% of data, with 20% reserved for testing. (A) A radar plot of Pearson’s correlation coefficients between observed and predicted variant effect scores illustrates protein-specific model performance on both training (dark red) and testing data (light red). The PAB1 RRM domain-specific model predicts the effects of variants withheld from training well (Pearson’s R > 0.75), and was used to predict the 197 missing variant effect scores. (B) The completed Pab1 RRM domain sequence-function map is shown for positions 126–200. Each mutagenized position is a column, and each amino acid substitution is a row. Wild type-like variants are colored dark blue and inactive variants are colored light blue. Predicted effects are denoted by black borders.