Skip to main content
. 2018 Dec 10;10:60. doi: 10.1186/s13321-018-0315-6

Table 2.

Data cleaning results on the dataset from Obach et al. [17], the SIN list from ChemSec [18] and the EPISuite™ solubility dataset

Category Obach et al. [17] SIN list [18] EPISuite™
Total 668 913 5761
Maintain (w/duplicates) 607 256 4635
Maintain (w/o duplicates) (H reliability) 514 163 3536
Maintain (w/o duplicates) (M reliability) 85 68 850
Manual check 47 115 639
Rejected (mixtures) 0 23 9
Rejected (inorganic or unusual elements) 2 194 96
Rejected (missing/ambiguous) 12 394 395
Maintain (manual check) (w/duplicates) 652 335 5014
Maintain (manual check) (w/o duplicates) (H reliability) 515 163 3554
Maintain (manual check) (w/o duplicates) (M reliability) 128 127 1171
Rejected (manual check failed) 2 36 260

Numbers refer to results before and after the manual check procedure