Table 3.
Classification report for the external (hold-out) set with optimal fingerprint and model parameterization. The rows are arranged in descending F1-score with the RF model. The SI contains a visualization of the classification report for both models and training/test datasets (Figures S1-S4).
| precision | recall | F1-score | support | ||||
|---|---|---|---|---|---|---|---|
| group number and name | kNN | RF | kNN | RF | kNN | RF | |
| (85) (tetrahydro)furan primary alcohol derivatives and their oxidation products | 0.833 | 1.000 | 1.000 | 1.000 | 0.909 | 1.000 | 5 |
| (84) 1,2-ethanediols and their carbonates | 0.750 | 1.000 | 1.000 | 1.000 | 0.857 | 1.000 | 3 |
| (80) Aliphatic nitriles | 1.000 | 1.000 | 0.900 | 1.000 | 0.947 | 1.000 | 10 |
| (79) Aliphatic primary amides | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 6 |
| (78) Alkyl aryl and cyclic diaryl esters of phosphoric acid | 0.667 | 1.000 | 1.000 | 1.000 | 0.800 | 1.000 | 2 |
| (73) Aralkylaldehydes | 1.000 | 1.000 | 0.800 | 1.000 | 0.889 | 1.000 | 5 |
| (72) Aromatic nitriles | 0.857 | 1.000 | 0.857 | 1.000 | 0.857 | 1.000 | 7 |
| (64) Brominated cycloalkanes, alcohols, phosphates, triazine triones, diphenyl ethers and diphenyl alkyls (flame retardants related substances) | 0.833 | 1.000 | 0.833 | 1.000 | 0.833 | 1.000 | 6 |
| (57) Dialkyl (and diaryl) dithiophosphates (DDP) | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 2 |
| (53) Dihydropurinedione derivatives | 0.750 | 1.000 | 1.000 | 1.000 | 0.857 | 1.000 | 3 |
| (52) Ditriazine stilbenedisulfonic acid dyes (optical brighteners) | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 4 |
| (47) Glycidyl ethers and esters | 1.000 | 1.000 | 0.750 | 1.000 | 0.857 | 1.000 | 8 |
| (42) Linear aliphatic ketones | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 3 |
| (41) Linear and branched alpha-beta unsaturated ketones | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 3 |
| (31) Ortho-phthalates | 0.714 | 1.000 | 0.714 | 1.000 | 0.714 | 1.000 | 7 |
| (26) Polyol amines | 1.000 | 1.000 | 0.750 | 1.000 | 0.857 | 1.000 | 4 |
| (24) Salicylate esters | 0.833 | 1.000 | 1.000 | 1.000 | 0.909 | 1.000 | 5 |
| (23) Salicylic acid, its salts and alkylated derivatives | 0.500 | 1.000 | 0.333 | 1.000 | 0.400 | 1.000 | 3 |
| (18) Thioureas | 0.667 | 1.000 | 1.000 | 1.000 | 0.800 | 1.000 | 2 |
| (16) Vinylbenzene derivatives | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 3 |
| (14) acrylate and methacrylate amines | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 3 |
| (11) chlorinated aromatic hydrocarbons | 0.833 | 1.000 | 1.000 | 1.000 | 0.909 | 1.000 | 10 |
| (8) imidazoles | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 5 |
| (3) tetrahydroxymethyl and tetraalkyl phosphonium salts | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 2 |
| (17) Unsubstituted and linear aliphatic-substituted cyclic ketones | 0.917 | 0.917 | 1.000 | 1.000 | 0.957 | 0.957 | 11 |
| (13) aralkylamines | 0.818 | 0.917 | 0.818 | 1.000 | 0.818 | 0.957 | 11 |
| (6) primary aliphatic diamines and their salts | 0.900 | 1.000 | 1.000 | 0.889 | 0.947 | 0.941 | 9 |
| (58) Cyclic ethers | 0.857 | 0.875 | 0.857 | 1.000 | 0.857 | 0.933 | 7 |
| (32) Organic phosphonic acids, salts and esters | 1.000 | 1.000 | 0.500 | 0.875 | 0.667 | 0.933 | 8 |
| (9) hydrocarbyl siloxanes | 1.000 | 1.000 | 0.833 | 0.833 | 0.909 | 0.909 | 6 |
| (81) Acyl glycinates and sarcosinates | 0.800 | 0.800 | 1.000 | 1.000 | 0.889 | 0.889 | 4 |
| (43) Isophthalates, Terephthalates and Trimellitates | 0.500 | 0.800 | 1.000 | 1.000 | 0.667 | 0.889 | 4 |
| (21) Simple manganese compounds | 0.600 | 1.000 | 0.600 | 0.800 | 0.600 | 0.889 | 5 |
| (12) aromatic ethers | 0.800 | 0.800 | 1.000 | 1.000 | 0.889 | 0.889 | 4 |
| (5) simple vanadium compounds | 1.000 | 1.000 | 0.600 | 0.800 | 0.750 | 0.889 | 5 |
| (−) miscellaneous chemistry | 0.906 | 0.865 | 0.806 | 0.889 | 0.853 | 0.877 | 36 |
| (22) Simple Lithium compounds | 0.857 | 0.778 | 0.857 | 1.000 | 0.857 | 0.875 | 7 |
| (65) Branched/cyclic dialiphatic ethers (excluding alpha,beta-unsaturated ethers) | 1.000 | 1.000 | 1.000 | 0.750 | 1.000 | 0.857 | 4 |
| (50) Esters from linear saturated dicarboxylic acids and branched aliphatic alcohols | 0.600 | 1.000 | 0.750 | 0.750 | 0.667 | 0.857 | 4 |
| (37) Molybdenum and its simple compounds | 0.600 | 0.750 | 1.000 | 1.000 | 0.750 | 0.857 | 3 |
| (27) Polycarboxylic acid monoamines, hydroxy derivatives and their salts with monovalent cations | 1.000 | 1.000 | 0.750 | 0.750 | 0.857 | 0.857 | 4 |
| (7) nitroalkanes | 0.857 | 0.750 | 1.000 | 1.000 | 0.923 | 0.857 | 6 |
| (75) Alpha-chloro aliphatic carboxylate derivatives | 0.583 | 0.727 | 0.875 | 1.000 | 0.700 | 0.842 | 8 |
| (66) Branched carboxylic acids and its salts | 0.750 | 0.727 | 0.750 | 1.000 | 0.750 | 0.842 | 8 |
| (71) Benzoates | 0.800 | 0.667 | 1.000 | 1.000 | 0.889 | 0.800 | 4 |
| (49) Ethoxylated < C6 alcohols (other than methanol and ethanol); ethoxylated aromatic alcohols | 1.000 | 0.667 | 1.000 | 1.000 | 1.000 | 0.800 | 2 |
| (28) Phthalic anhydrides and hydrogenated phthalic anhydrides | 0.857 | 0.714 | 1.000 | 0.833 | 0.923 | 0.769 | 6 |
| (70) Bisphenol A (BPA) derivatives | 0.667 | 0.750 | 0.500 | 0.750 | 0.571 | 0.750 | 4 |
| (51) Esters from branched or non-aromatic cyclic dicarboxylic acids and aliphatic alcohols | 0.750 | 1.000 | 0.600 | 0.600 | 0.667 | 0.750 | 5 |
| (29) Paraben acid, salts and esters | 0.750 | 0.750 | 0.750 | 0.750 | 0.750 | 0.750 | 4 |
| (62) Caesium compounds | 0.200 | 1.000 | 0.500 | 0.500 | 0.286 | 0.667 | 2 |
| (1) thioxanthenones | 1.000 | 1.000 | 1.000 | 0.500 | 1.000 | 0.667 | 2 |
| (82) Acyl derivatives from alpha-amino acids other than glutamic acid, glycine or sarcosine | 1.000 | 1.000 | 0.667 | 0.333 | 0.800 | 0.500 | 3 |
| (38) Miscellaneous bisphenols | 0.500 | 1.000 | 0.333 | 0.333 | 0.400 | 0.500 | 3 |
| (15) Zirconium and its simple inorganic compounds | 1.000 | 0.500 | 0.250 | 0.250 | 0.400 | 0.333 | 4 |
| (59) Cyclic acetals from aldehydes | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 2 |
| (36) Mono-, di-phenyl phosphite derivatives | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 2 |
| Accuracy | 0.8312 | 0.9026 | |||||
| macro avg | 0.8089 | 0.8904 | 0.8164 | 0.8629 | 0.7963 | 0.8611 | 308 |
| weighted avg | 0.8448 | 0.9031 | 0.8312 | 0.9026 | 0.8246 | 0.8924 | 308 |