Skip to main content
[Preprint]. 2023 Sep 6:2023.09.04.556234. [Version 1] doi: 10.1101/2023.09.04.556234

Table 4: Using a tokenized inhibitor encoding does not diminish model performance.

Despite lacking the information that encodes the chemical structure, Junk SMILES strings do not lead to worse model performance when “Standard Split” is used, confirming the extent to which the apparent performance of published models may be reliant on informational leakage.

Dataset Splitting Technique Dataset CI MSE Pearson R
Junk SMILES Anastassiadis 0.768 0.166 0.794
Christmann-Franck 0.794 0.316 0.817
Davis 0.894 0.181 0.856
Elkins 0.768 0.145 0.670