Table 4: Using a tokenized inhibitor encoding does not diminish model performance.
Despite lacking the information that encodes the chemical structure, Junk SMILES strings do not lead to worse model performance when “Standard Split” is used, confirming the extent to which the apparent performance of published models may be reliant on informational leakage.
| Dataset Splitting Technique | Dataset | CI | MSE | Pearson R |
|---|---|---|---|---|
| Junk SMILES | Anastassiadis | 0.768 | 0.166 | 0.794 |
| Christmann-Franck | 0.794 | 0.316 | 0.817 | |
| Davis | 0.894 | 0.181 | 0.856 | |
| Elkins | 0.768 | 0.145 | 0.670 |