I |
Centering |
|
O |
Focus on the differences and not the similarities in the data |
Remove the offset from the data |
When data is heteroscedastic, the effect of this pretreatment method is not always sufficient |
|
II |
Autoscaling |
|
(-) |
Compare metabolites based on correlations |
All metabolites become equally important |
Inflation of the measurement errors |
|
Range scaling |
|
(-) |
Compare metabolites relative to the biological response range |
All metabolites become equally important. Scaling is related to biology |
Inflation of the measurement errors and sensitive to outliers |
|
Pareto scaling |
|
O |
Reduce the relative importance of large values, but keep data structure partially intact |
Stays closer to the original measurement than autoscaling |
Sensitive to large fold changes |
|
Vast scaling |
|
(-) |
Focus on the metabolites that show small fluctuations |
Aims for robustness, can use prior group knowledge |
Not suited for large induced variation without group structure |
|
Level scaling |
|
(-) |
Focus on relative response |
Suited for identification of e.g. biomarkers |
Inflation of the measurement errors |
|
III |
Log transformation |
|
Log O
|
Correct for heteroscedasticity, pseudo scaling. Make multiplicative models additive |
Reduce heteroscedasticity, multiplicative effects become additive |
Difficulties with values with large relative standard deviation and zeros |
|
Power transformation |
|
√O |
Correct for heteroscedasticity, pseudo scaling |
Reduce heteroscedasticity, no problems with small values |
Choice for square root is arbitrary. |