Table 1.
Model Description | Y variable | Number of LVs | R2X | Q2Y | Significance (P value) |
---|---|---|---|---|---|
A. PLS | ln(U-Cd) | 3 | 0.251 | 0.237 | < 0.01 |
B. PLS (current smokers excluded) |
ln(U-Cd) | 5 | 0.308 | 0.330 | < 0.001 |
C. PLS (past and current smokers excluded) |
ln(U-Cd) | 1 | 0.0729 | 0.142 | < 0.001 |
D. PLS | sex | 3 | 0.241 | 0.104 | > 0.05 |
E. PLS | age | 2 | 0.216 | 0.224 | < 0.001 |
F. PLS | ln(U-NAG) | 1 | 0.054 | 0.162 | < 0.001 |
G. PLS-DA | Smoking historya | 2 | 0.194 | 0.185 | < 0.01 |
aSmoking history was defined as either 1 = never smoked and past smoker (n = 106) or 2 = current smoker (n = 20), one individual did not complete the lifestyle questionnaire. Spectra that exhibited signs of bacterial contamination, analgesics or ethanol were excluded from these analyses. All variables were mean-centred and scaled to unit variance. NMR data were reduced to 1,127 data points of δ 0.01 resolution. Sample numbers for PLS models: A, D, E and F: n = 127. B: n = 106. C: n = 79. PLS-DA (model G) n = 126. Number of latent variables in a model were auto-fitted in SIMCA-P+. All models were assessed for validity by Y variable permutation analysis (1,000 permutations, see additional file 1 Figure S4). Scores scatter plots for each multivariate model can also be found in additional file 1 (Figure S5). ln(U-Cd), natural logarithm of urinary cadmium; ln(U-NAG), natural logarithm of urinary-N-acetyl-β-D-glucosaminidase; LV, latent variable; n, sample number; PLS, partial least squares; PLS-DA, partial least squares - discriminant analysis. R2X is the proportion of variance in the X matrix (i.e. spectral NMR data) described by the PLS model. Q2Y is the ability of the PLS model to predict the Y-score (ln(U-Cd), sex, age, ln(U-NAG) or smoking status) of a novel sample or the "cross-validated goodness-of-fit".