Figure 9.

Visualization of the 500 globally most important features (fingerprint bits) as calculated for the optimal random forest classifier for the 1090 substances in the training set. The features are arranged so that their (MDI) global importance reduces from the left to right with the “on” bits shown in black. The color bar on the y-axis shows the group membership using the same colors as in Figure 2. The group number is shown at the right. Some groups are robustly identified by a small number of globally less important bits that are nevertheless consistently “on” for all group members, as is the case for group 47, 52 and 70 (see fingerprint bits in red box). The (MDI) global feature importance is also shown in Figures S5 and S6 in the supporting information.