Skip to main content
. 2024 Apr 18;27(5):109770. doi: 10.1016/j.isci.2024.109770

Figure 4.

Figure 4

The influence of cluster-based filtering on model performance

According to the dataset configuration, the model was divided into four main categories: Original, Trainset-only, Testset-only and Clustered. And the significance was determined using paired t-tests.

(A) The PPVs (left), AUROCs (middle), and AUPRs (right) of VitTCR under different dataset configurations were compared. Five iterations of 5-fold cross-validation were conducted for the three configurations, and each dot in this figure represents a fold replicate, with dark gray lines connecting the dots corresponding to each iteration.

(B) The distribution of percentile values for epitopes in the dataset before (Before, left panel) and after (After, right panel) filtering. Each point in the violin plot corresponds to an individual epitope, and the values depicted in the figure represent the medians.