Skip to main content
. 2021 Apr 12;12:2165. doi: 10.1038/s41467-021-22489-2

Fig. 7. Quantitative model predicts editing level by combining complex RNA sequence and structure features.

Fig. 7

a Structure features annotated by bpRNA and included in featurization of RNA variants. b High-level feature groups for input to XGBoost analysis. u1 = structural element immediately upstream (5′) of editing site; u2 = structural element upstream of u1; site = structural element within which the editing site is found; d1 = structural element downstream (3′) of editing site; d2 = structural element downstream of d1; d3 = structural element downstream of d2. Definition of each feature is listed in Supplementary Data 1. c Illustration of a putative model for binding of the NEIL1 RNA to the ADAR1. The ADAR1 deaminase domain (silver) are modeled from ADAR2 by Phyre2. The dsRNA-binding domains (pink) are modeled in one possible conformation as described in the “Methods”. The editing site mismatch (also considered a 1:1 internal loop) on NEIL1 is shown in red and the editing A shown as space filled. The upstream (purple and light purple) and downstream (yellow, orange, and light orange) immediately adjacent to the editing site are colored according to shown in b. d XGBoost editing level predictions for variants of NEIL1 (orange), TTYH2 (purple), and AJUBA (green) within the test split (15% random split of positions). R2 is a measure of the % variance explained. Spearman R indicates correlation between observed and predicted editing values. Error bands (in gray) the 95 pointwise confidence bound for the mean predicted value, using linear smoothing. e SHAP annotation of feature contributions for the NEIL1 test split variant with the highest observed editing level. Features with positive SHAP scores (drive the prediction over the dataset base value) are indicated in pink; features with negative SHAP values (drive the prediction below the dataset base value) are indicated in blue. Base value refers to the mean predicted editing level across the test split. Output value refers to the XGBoost prediction on this example. The four features with the highest absolute value SHAP scores are shown. f SHAP annotation of feature contributions for the NEIL1 test split variant with the lowest observed editing level. g SHAP values for the 20 most important features driving XGoost editing level predictions on the test split for NEIL1, TTYH2, and AJUBA. Each dot indicates a variant in the test split and the dot color shows the SHAP value from high (red) to low (blue). Features (y-axis) are ranked from top (most significant) to bottom (least significant) by predictive importance.