Skip to main content
. 2018 Nov 2;4(11):1520–1530. doi: 10.1021/acscentsci.8b00507

Figure 5.

Figure 5

Notional comparison of cross-validation splitting algorithms. The first four vertical panels from the left depict simple examples of random split, stratified split, time split, and scaffold split. The rightmost panel depicts a toy example of the agglomerative split proposed in this work. Both scaffold split and agglomerative split group similar data points together to promote the generalizability of the network to new data. Scaffold split uses the algorithm introduced by Bemis and Murcko47 to group ligands into common frameworks. The agglomerative split uses hierarchical agglomerative clustering to group ligand–protein systems according to pairwise sequence or structural similarity of the proteins. This figure is adapted from ref (3) with permission from the Royal Society of Chemistry.