Table III.
Methods Chosen for Defining the AD, Brief Description and Reference
Method | Description |
---|---|
Two-class real-random classification | After permutation of descriptors on a mirror TS, the two matrices are merged and a classification model is built to distinguish real values from random ones. (17,24) |
Leverage | Based on calculation of the leverage (hi). New compounds that are above the hi threshold are considered outside the AD. (25,26) |
PCA (threshold: mean±3*SD) | After calculation of the two first PC of TS descriptors a threshold is set for each PC equal to mean ± 3*standard deviation. If values for PCs of new compounds fall outside the established range, the prediction is considered unreliable. (23) |
PCA (threshold: 0.5-0.95 percentile) | Same as the method above, but the threshold is established on the 0.5th and 0.95th percentile of distribution of TS compounds. (23) |
Nearest neighbor distance | It is based on calculation of the average Euclidean distances between all pairs of TS compounds. If the distance of a VS compound from its nearest neighbor in TS is greater than a given threshold, it is out of AD. (27,28) |
Atom centered fragment (ACF) | All ACFs are calculated (a central non-hydrogen atom with all atoms bonded to it) of the TS. A test compound is considered within the AD if each ACF obtained by its decomposition is part of the ACFs identified in the TS. (29–31) |
Fingerprint | The average similarity (Tanimoto based on PubChem fingerprints) of test compounds with the TS is determined. If average similarity is lower than 0.1 the compound is outside the AD. (23) |