Figure 2.

Bar plot with random forest performance and feature importance by RT. Classification performance is represented as Area Under Curve (AUC) of Receiver Operating Characteristic (ROC). Colour-code represents the feature importance for the classification. Data was averaged from triplicates. Jump = jump rate. C, T, G = mismatch components, which add up to 100%. Mismatch = mismatch rate. Arrest = arrest rate. Percentages represent feature importance in random forest analysis = mean loss in classification accuracy, if values of respective feature are permutated. (See also Supplement Figure S9 for additional information on TGIRT and HIV-RT)