Skip to main content
. 2015 Mar;11(1):21–31. doi: 10.2174/1573409911666150414145937

Table 4.

Listing of detected QSAR shortcomings and pitfalls which were reported in the literature [3].

Pitfalls Comments
Small simple and limited chemical variability It does not exist an ideal sample number size for QSAR, but is clear with larger sample size, the results become more representative. Here, the compound number is small (n=22), and the chemical variety is so limited. Basically it consist in ester (-O-CO-), ether (-O-), nitro (-NO2) and chloride, all of which occupy different positions of the phenyl ring of the molecules.
Composition of training and test sets With a small sized series, the distribution of chemical items may not be representative in terms of the activity and chemical variability.
Meaningless descriptor selections A common QSAR descriptor is pKa with high correlation to biological activity, which is not the case of dipole moment (DM, dipole). DM takes on different values with changing conformations. When DM is calculated for artificially held planar molecules, it is loaded onto the Z-axis only, while even small torsional changes are reflected by huge changes in DM values.
not constant coefficients and constants The calculated pKa values are not equal to literature reports. It takes different values in different programs; moreover, each software consider different ionized forms, making difficult its selection/consideration for QSAR equations.
Depending on the software, DM takes on different values due to normalization of input data.
Certain descriptors like molar refractivity (MR) are very similar in magnitude in most programs. In contrast, polar surface area (PSA) should change according to the conformation of the molecule which implies that PSA is a 3D descriptors [44]. Other programs, however, calculate a “flat” PSA based on 2D data (atom connectivities and radii).
Starting geometries for 3D-QSAR Albeit, the active conformation is not necessarily identical to the observed crystal structure, and since no NTZ-PFOR complex has been solved, two pieces of information were taken into account to assess the active conformation of the NTZ scaffold: (1) its crystallographic record deposited in CCDC [47] and (2) the final pose of NTZ docked into the ligand binding site of the cofactor TPP-PFOR complex. As a direct result, both geometries are practically the same, see Supplementary Material (Fig. SD2-C).
Errors of descriptor calculations (acidity, dissociation) The experimental acidity value of NTZ is reported as pKa≈6 [32] for the conjugated acid / neutral thiazole system ([B-H+] / [B]) which corresponds to approx. 90% neutral species under physiological conditions. The calculated value, pKa≈8 [76], however, inverts the cationic/nonionic portions (10% neutral species). With no experimental value at hand, the (wrong) cationic forms would have been taken as input for the QSAR and docking studies.
Lipole-dipole collinearity The algorithm of Lipole calculation is derived from the dipole moment equation (DM = q*r, q = atomic partial charge, r = VDW atomic radius) and atomic lipophilic values replace atomic partial charges. Despite different scale and units (charges and lipophilic fragments, same VDW radii), the equal calculation protocol generates collinearity.
Linearity hypothesis The a priori assumption of linearity might be the main drawback in QSAR studies. Since data sampling is not complete, because no scientist would seek to explore the weaker, less active or more toxic data segments, it is often not clear if linearity is a first principle of nature or just appears due to insufficient data spread. Outliers and activity cliffs are first signs of nonlinear relationships between independent variables and response (biological activity, dependent variable).
Ligand based alignment (LBA) The X-ray (crystal) conformation of NTZ may not constitute the biological active conformation. The hitherto unsolved structure of the NTZ-PFOR complex constitutes a disadvantage in case of higher dimensional QSAR where reliable conformational data is required. Results based on 2D descriptors (connectivities, drawings, SMILES, etc.) do not need special information while ligands can be superposed on their more rigid substructure or common scaffold (LBA).
Multiple solutions We generated different equations based on different conformations and methodologies. It is not clear whether modeling based on NTZ X-ray conformation reflects realistic molecule geometries for binding site interaction, because the NTZ liganded binding site complex has not been elucidated. According to the pKa value of NTZ (here: 6.2, located on exothiazolic N amide), it can be inferred that all molecules treated here, present their activity at anionic form. Then, a new QSAR equation generation step based on descriptors calculated considering anionic compounds (without H at exothiazolic N amide, same training set), give us smaller R2 values that those obtained with X-ray data. The ideal case is considering the anionic form directly related with biological activity, because a small structural change can be reflected in huge descriptor magnitudes differences. This last QSAR equation generated with ionic compounds (pIC50= -2.36+2.28 pKa+0.17DM + 0.62 Lipole; R2Train=0.74, Q2=0.65, r2m=0.37, n=18; R2Test=0.75, Q2F1=0.75, Q2F2=0.75, Q2F3=0.75, r2m=0.68, n=4) could be seen as a poor predictive equation, but it becomes a better reflection of the biological behavior of our molecules.
Prodrugs and active metabolites Some publications describe NTZ as a prodrug, albeit the biological activity of NTZ itself has been reported, too. Nevertheless, upon hydrolysis of the acetyl group, the metabolite TIZ shows comparable antiprotozoal potency.
Incompatible concepts and contradictions
(chance correlation)
Sometimes, linear equations in 2D QSAR include conformation-dependent descriptors in a way where spatial information about structural requirements for the ligands and the binding site remains unknown. Hence, conformation-dependent descriptors contribute to establish the “rules” governing the relations between structures and activities, without any reason to be present in the equation except for chance correlations: “… because the relevant features only appear in molecules that also contain the wrong features” [3].