Skip to main content
. 2018 Dec 27;36(2):28. doi: 10.1007/s11095-018-2561-8

Table III.

Methods Chosen for Defining the AD, Brief Description and Reference

Method Description
Two-class real-random classification After permutation of descriptors on a mirror TS, the two matrices are merged and a classification model is built to distinguish real values from random ones. (17,24)
Leverage Based on calculation of the leverage (hi). New compounds that are above the hi threshold are considered outside the AD. (25,26)
PCA (threshold: mean±3*SD) After calculation of the two first PC of TS descriptors a threshold is set for each PC equal to mean ± 3*standard deviation. If values for PCs of new compounds fall outside the established range, the prediction is considered unreliable. (23)
PCA (threshold: 0.5-0.95 percentile) Same as the method above, but the threshold is established on the 0.5th and 0.95th percentile of distribution of TS compounds. (23)
Nearest neighbor distance It is based on calculation of the average Euclidean distances between all pairs of TS compounds. If the distance of a VS compound from its nearest neighbor in TS is greater than a given threshold, it is out of AD. (27,28)
Atom centered fragment (ACF) All ACFs are calculated (a central non-hydrogen atom with all atoms bonded to it) of the TS. A test compound is considered within the AD if each ACF obtained by its decomposition is part of the ACFs identified in the TS. (2931)
Fingerprint The average similarity (Tanimoto based on PubChem fingerprints) of test compounds with the TS is determined. If average similarity is lower than 0.1 the compound is outside the AD. (23)