Skip to main content
. 2022 Sep 26;12:16047. doi: 10.1038/s41598-022-19608-4

Table 2.

Overview of the number of protein sequences in the training, validation and test set of the seven downstream protein structural prediction tasks.

Dataset SS3 SS8 BUR ASA PPI EPI HPR
Training set 8 803 8 803 8 803 8 803 324 179 2 949
Validation set 1 102 1 102 1 102 1 102 81 45 738
Test set 1 102 1 102 1 102 1 102 137 56 1 230

The prediction tasks secondary structure in three (SS3) and eight (SS8) classes, buried residues (BUR) and absolute solvent accessibility (ASA) are based on the same dataset. Furthermore, we included the prediction tasks protein-protein interaction interfaces (PPI), epitope interfaces (EPI) and hydrophobic patches (HPR).