Table 5.
Performance of KmPred on the test set based on sequence similarity.
| Identical or homologous percentage | MSE | R2 | Pearson correlation | Spearman correlation | Test set size |
|---|---|---|---|---|---|
| No removal of any test samples | 0.62 | 0.55 | 0.74 | 0.73 | 2342 |
| There are no identical sequences (100%) shared between training and testing datasets. | 0.76 | 0.47 | 0.69 | 0.68 | 2292 |
| There are no homologous sequences more than 90% shared between training and testing. | 0.75 | 0.47 | 0.69 | 0.68 | 2292 |
| There are no homologous sequences more than 50% shared between training and testing. | 0.76 | 0.47 | 0.68 | 0.68 | 2293 |
| There are no homologous sequences more than 10% shared between training and testing. | 1.38 | 0.26 | 0.58 | 0.59 | 41 |