Skip to main content
. 2020 Jun 16;11:3061. doi: 10.1038/s41467-020-16961-8

Fig. 6. Prediction of frameshifting potential and efficiency.

Fig. 6

a Outline of the process of generating features and building and testing predictive models. b, c Prediction scores (area under the receiver operating characteristic curve (ROC AUC) in the case of classification (b) and Pearson correlation coefficients in the case of regression (c)) on held-out test data for the indicated PRF event(s) or the entire library using different feature sets (slip: nucleotide identities in the canonical XXXYYYZ slippery site; tai: tAI scores around the slippery site, aa: amino acid class (unpolar, polar, and charged) around the slippery site, dg: MFE of downstream regions, sec: predicted pairedness of downstream positions). d ROC curve showing performance of a classifier trained on 80% of all designed library variants passing filtering and testing on the remaining 20%. e For the test set (20% of all designed library variants of all −1 PRF events (left) or HIV-1 (right) passing filtering), the measured % GFP fluorescence is plotted against the model prediction (trained on 80% of all library variants of all −1 PRF events (left) or HIV-1 (right) passing filtering); Pearson correlation coefficient and the associated two-tailed p-value are reported. f Prediction scores (ROC AUC) on held-out test data (20%) for the indicated PRF event (columns) using the full set of features and training the model on the training data (80%) from the indicated PRF event (rows).