Skip to main content
. 2022 May 16;11:e78944. doi: 10.7554/eLife.78944

Figure 6. Random forest classifiers identify polymerase II (Pol II) pause loci across deletion strains, with different feature importance values across deletion strains.

(A) Heatmap illustrating the mean AUC for the random forest classifier when trained (75% of loci) and tested (25% of loci) on each deletion strain. Deletion strains are hierarchically clustered along the x-axis. (B) Heatmap showing the AUC values from random forest classifiers trained on all pauses from one deletion strain (y-axis) and tested on those unique pauses observed in another deletion strain (x-axis). Both axes are hierarchically clustered to reveal similarities in AUC values across deletion strains. Tiles when the same training and testing strain are indicated are colored according to the AUC for that deletion strain when 75% of pauses in that deletion strain are used for training and the remaining 25% are used for testing as reported in (A).

Figure 6.

Figure 6—figure supplement 1. Random forest classifiers can predict polymerase II pause loci across deletion strains, with different feature importance values across deletion strains.

Figure 6—figure supplement 1.

(A) Correlation between the number of reproducible pauses identified in each deletion strain and the AUC measurements for random forest classifiers trained on full set of features. The variation among deletion strain AUC measurements is not fully explained by the number of reproducible pauses identified in each deletion strain, as measured by Pearson correlation. (B) Heatmap illustrating feature importance for each feature, across all deletion strains. Deletion strains are hierarchically clustered along the x-axis, in the same order as in Figure 6A. (C–E) ROC curves and corresponding AUC values for random forest models trained on cdc39 (B), dst1∆ (C), and ubp8∆ (D), respectively.