Table 2.
Overview of the number of protein sequences in the training, validation and test set of the seven downstream protein structural prediction tasks.
| Dataset | SS3 | SS8 | BUR | ASA | PPI | EPI | HPR |
|---|---|---|---|---|---|---|---|
| Training set | 8 803 | 8 803 | 8 803 | 8 803 | 324 | 179 | 2 949 |
| Validation set | 1 102 | 1 102 | 1 102 | 1 102 | 81 | 45 | 738 |
| Test set | 1 102 | 1 102 | 1 102 | 1 102 | 137 | 56 | 1 230 |
The prediction tasks secondary structure in three (SS3) and eight (SS8) classes, buried residues (BUR) and absolute solvent accessibility (ASA) are based on the same dataset. Furthermore, we included the prediction tasks protein-protein interaction interfaces (PPI), epitope interfaces (EPI) and hydrophobic patches (HPR).