Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2022 Aug 21;23(5):bbac343. doi: 10.1093/bib/bbac343

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2022. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

PMC Copyright notice

Schematic representation of the traditional model benchmarking (A) and the methodology employed in comparing the impact of different negative data sampling methods on model performance (B). Models 1, 2 and 3 (colourful hexagons) were trained on data set A, B and C (colourful rectangles), respectively. Each data set was generated by an appropriate negative sampling method (white ovals) and a positive sample (blue rectangles). In the evaluation process, the models were compared only on the benchmark set C, built with the same method as the training set C, thereby introducing some bias in favour of Model 3 in the benchmark analysis (A). Architectures were developed based on published models, and they represent the algorithm with all its parameters involved in the machine learning cycle (white parallelograms). Each architecture was trained on the same positive data set (the white rectangle) and a negative sample was generated by one of the 11 negative sampling methods (white ovals) five times to verify the repeatability. The training and benchmark sample are indicated as blue and red rectangles, respectively. The models (orange hexagons) represent instances of architectures trained on given data sets and were validated on each benchmark sample. The results of model performance were indicated as white clouds (B).