Figure 1.
Schematic representation of the traditional model benchmarking (A) and the methodology employed in comparing the impact of different negative data sampling methods on model performance (B). Models 1, 2 and 3 (colourful hexagons) were trained on data set A, B and C (colourful rectangles), respectively. Each data set was generated by an appropriate negative sampling method (white ovals) and a positive sample (blue rectangles). In the evaluation process, the models were compared only on the benchmark set C, built with the same method as the training set C, thereby introducing some bias in favour of Model 3 in the benchmark analysis (A). Architectures were developed based on published models, and they represent the algorithm with all its parameters involved in the machine learning cycle (white parallelograms). Each architecture was trained on the same positive data set (the white rectangle) and a negative sample was generated by one of the 11 negative sampling methods (white ovals) five times to verify the repeatability. The training and benchmark sample are indicated as blue and red rectangles, respectively. The models (orange hexagons) represent instances of architectures trained on given data sets and were validated on each benchmark sample. The results of model performance were indicated as white clouds (B).
