. 2021 Nov 9;61(3):1299–1317. doi: 10.1007/s00394-021-02692-z

Table 1.

Statistics of computed PCA models illustrating the effect of transformation and scaling methods on the datasets considering individual metabolites and sums of metabolites belonging to the same aglycone compound

Individual metabolites
Data pre-treatment		Model quality parameters		Pattern*
Transformation	Mean centering + scaling	R²X (cum)	Q² (cum)	Pattern*
None	None	0.965	0.781	C
	Centering	0.935	0.681	P
	Centering + UV	0.621	0.247	C
	Centering + Pareto	0.818	0.551	P
Log	None	0.862	0.622	P
	Centering	0.704	0.303	R
	Centering + UV	0.622	0.230	R
	Centering + Pareto	0.648	0.257	R
Power	None	0.991	0.527	P
	Centering	0.990	0.511	P
	Centering + UV	0.621	-0.070	C
	Centering + Pareto	0.926	0.560	P
Sums of metabolites
Data pre-treatment		Model quality parameters		Pattern*
Transformation	Mean centering + scaling	R²X (cum)	Q² (cum)	Pattern*
None	None	0.999	0.282	C
	Centering	0.998	0.094	C
	Centering + UV	0.715	− 0.139	C
	Centering + Pareto	0.949	0.076	C
Log	None	0.933	0.396	C
	Centering	0.728	− 0.210	C
	Centering + UV	0.705	− 0.210	C
	Centering + Pareto	0.704	− 0.210	C
Power	None	1.00	0.063	C
	Centering	1.00	0.045	C
	Centering + UV	0.720	− 0.210	C
	Centering + Pareto	0.998	− 0.020	C

The two parameters R²X (cum) and Q² (cum) represent, respectively, the model fit (or explained variation) and the predictive ability. The higher these values, the better the model. Abbreviations: UV: Unit Variance. Centering + UV is so-called autoscaling

^* “Pattern” stands for “pattern of metabolite distribution”: P, data distribution on the basis of the phase II metabolism; C, distribution on the basis of the colonic metabolism; R, random distribution (no biological explanation)