Two-class real-random classification |
After permutation of descriptors on a mirror TS, the two matrices are merged and a classification model is built to distinguish real values from random ones. (17,24) |
Leverage |
Based on calculation of the leverage (hi). New compounds that are above the hi threshold are considered outside the AD. (25,26) |
PCA (threshold: mean±3*SD) |
After calculation of the two first PC of TS descriptors a threshold is set for each PC equal to mean ± 3*standard deviation. If values for PCs of new compounds fall outside the established range, the prediction is considered unreliable. (23) |
PCA (threshold: 0.5-0.95 percentile) |
Same as the method above, but the threshold is established on the 0.5th and 0.95th percentile of distribution of TS compounds. (23) |
Nearest neighbor distance |
It is based on calculation of the average Euclidean distances between all pairs of TS compounds. If the distance of a VS compound from its nearest neighbor in TS is greater than a given threshold, it is out of AD. (27,28) |
Atom centered fragment (ACF) |
All ACFs are calculated (a central non-hydrogen atom with all atoms bonded to it) of the TS. A test compound is considered within the AD if each ACF obtained by its decomposition is part of the ACFs identified in the TS. (29–31) |
Fingerprint |
The average similarity (Tanimoto based on PubChem fingerprints) of test compounds with the TS is determined. If average similarity is lower than 0.1 the compound is outside the AD. (23) |