Skip to main content
. 2021 Nov 9;61(3):1299–1317. doi: 10.1007/s00394-021-02692-z

Table 1.

Statistics of computed PCA models illustrating the effect of transformation and scaling methods on the datasets considering individual metabolites and sums of metabolites belonging to the same aglycone compound

Individual metabolites
Data pre-treatment Model quality parameters Pattern*
Transformation Mean centering + scaling R2X (cum) Q2 (cum)
None None 0.965 0.781 C
Centering 0.935 0.681 P
Centering + UV 0.621 0.247 C
Centering + Pareto 0.818 0.551 P
Log None 0.862 0.622 P
Centering 0.704 0.303 R
Centering + UV 0.622 0.230 R
Centering + Pareto 0.648 0.257 R
Power None 0.991 0.527 P
Centering 0.990 0.511 P
Centering + UV 0.621 -0.070 C
Centering + Pareto 0.926 0.560 P
Sums of metabolites
Data pre-treatment Model quality parameters Pattern*
Transformation Mean centering + scaling R2X (cum) Q2 (cum)
None None 0.999 0.282 C
Centering 0.998 0.094 C
Centering + UV 0.715  − 0.139 C
Centering + Pareto 0.949 0.076 C
Log None 0.933 0.396 C
Centering 0.728  − 0.210 C
Centering + UV 0.705  − 0.210 C
Centering + Pareto 0.704  − 0.210 C
Power None 1.00 0.063 C
Centering 1.00 0.045 C
Centering + UV 0.720  − 0.210 C
Centering + Pareto 0.998  − 0.020 C

The two parameters R2X (cum) and Q2 (cum) represent, respectively, the model fit (or explained variation) and the predictive ability. The higher these values, the better the model. Abbreviations: UV: Unit Variance. Centering + UV is so-called autoscaling

* “Pattern” stands for “pattern of metabolite distribution”: P, data distribution on the basis of the phase II metabolism; C, distribution on the basis of the colonic metabolism; R, random distribution (no biological explanation)