Skip to main content
. 2006 Jun 8;7:142. doi: 10.1186/1471-2164-7-142

Table 1.

Overview of the pretreatment methods used in this study. In the Unit column, the unit of the data after the data pretreatment is stated. O represents the original Unit, and (-) presents dimensionless data. The mean is estimated as: x¯i=1Jj=1Jxij MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaqeamaaBaaaleaacqWGPbqAaeqaaOGaeyypa0ZaaSaaaeaacqaIXaqmaeaacqWGkbGsaaWaaabCaeaacqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaaaeaacqWGQbGAcqGH9aqpcqaIXaqmaeaacqWGkbGsa0GaeyyeIuoaaaa@3DF5@ and the standard deviation is estimated as: si=j=1J(xijx¯i)2J1 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaWgaaWcbaGaemyAaKgabeaakiabg2da9maakaaabaWaaSaaaeaadaaeWbqaamaabmaabaGaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGHsislcuWG4baEgaqeamaaBaaaleaacqWGPbqAaeqaaaGccaGLOaGaayzkaaWaaWbaaSqabeaacqaIYaGmaaaabaGaemOAaOMaeyypa0JaeGymaedabaGaemOsaOeaniabggHiLdaakeaacqWGkbGscqGHsislcqaIXaqmaaaaleqaaaaa@45A6@. x˜ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacaaaa@2E34@ and x^ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaqcaaaa@2E35@ represent the data after different pretreatment steps.

Class Method Formula Unit Goal Advantages Disadvantages
I Centering x˜ij=xijx¯i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0JaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGHsislcuWG4baEgaqeamaaBaaaleaacqWGPbqAaeqaaaaa@3A94@ O Focus on the differences and not the similarities in the data Remove the offset from the data When data is heteroscedastic, the effect of this pretreatment method is not always sufficient

II Autoscaling x˜ij=xijx¯isi MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0ZaaSaaaeaacqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaraWaaSbaaSqaaiabdMgaPbqabaaakeaacqWGZbWCdaWgaaWcbaGaemyAaKgabeaaaaaaaa@3DA4@ (-) Compare metabolites based on correlations All metabolites become equally important Inflation of the measurement errors
Range scaling x˜ij=xijx¯i(ximaxximin) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0ZaaSaaaeaacqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaraWaaSbaaSqaaiabdMgaPbqabaaakeaadaqadaqaaiabdIha4naaBaaaleaacqWGPbqAdaWgaaadbaGagiyBa0MaeiyyaeMaeiiEaGhabeaaaSqabaGccqGHsislcqWG4baEdaWgaaWcbaGaemyAaK2aaSbaaWqaaiGbc2gaTjabcMgaPjabc6gaUbqabaaaleqaaaGccaGLOaGaayzkaaaaaaaa@4BF0@ (-) Compare metabolites relative to the biological response range All metabolites become equally important. Scaling is related to biology Inflation of the measurement errors and sensitive to outliers
Pareto scaling x˜ij=xijx¯isi MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0ZaaSaaaeaacqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaraWaaSbaaSqaaiabdMgaPbqabaaakeaadaGcaaqaaiabdohaZnaaBaaaleaacqWGPbqAaeqaaaqabaaaaaaa@3DB4@ O Reduce the relative importance of large values, but keep data structure partially intact Stays closer to the original measurement than autoscaling Sensitive to large fold changes
Vast scaling x˜ij=(xijx¯i)six¯isi MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0ZaaSaaaeaadaqadaqaaiabdIha4naaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyOeI0IafmiEaGNbaebadaWgaaWcbaGaemyAaKgabeaaaOGaayjkaiaawMcaaaqaaiabdohaZnaaBaaaleaacqWGPbqAaeqaaaaakiabgwSixpaalaaabaGafmiEaGNbaebadaWgaaWcbaGaemyAaKgabeaaaOqaaiabdohaZnaaBaaaleaacqWGPbqAaeqaaaaaaaa@47A9@ (-) Focus on the metabolites that show small fluctuations Aims for robustness, can use prior group knowledge Not suited for large induced variation without group structure
Level scaling x˜ij=xijx¯ix¯i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0ZaaSaaaeaacqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaraWaaSbaaSqaaiabdMgaPbqabaaakeaacuWG4baEgaqeamaaBaaaleaacqWGPbqAaeqaaaaaaaa@3DC6@ (-) Focus on relative response Suited for identification of e.g. biomarkers Inflation of the measurement errors

III Log transformation x˜ij=10log(xij)xij=x˜ijx˜¯i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakqaabeqaaiqbdIha4zaaiaWaaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGH9aqpdaahaaWcbeqaaiabigdaXiabicdaWaaakiGbcYgaSjabc+gaVjabcEgaNnaabmaabaGaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaaakiaawIcacaGLPaaaaeaacuWG4baEgaWeamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0JafmiEaGNbaGaadaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaiyaaraWaaSbaaSqaaiabdMgaPbqabaaaaaa@4C62@ Log O Correct for heteroscedasticity, pseudo scaling. Make multiplicative models additive Reduce heteroscedasticity, multiplicative effects become additive Difficulties with values with large relative standard deviation and zeros
Power transformation x˜ij=(xij)xij=x˜ijx˜¯i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakqaabeqaaiqbdIha4zaaiaWaaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGH9aqpdaGcaaqaamaabmaabaGaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaaakiaawIcacaGLPaaaaSqabaaakeaacuWG4baEgaWeamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0JafmiEaGNbaGaadaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaiyaaraWaaSbaaSqaaiabdMgaPbqabaaaaaa@4654@ √O Correct for heteroscedasticity, pseudo scaling Reduce heteroscedasticity, no problems with small values Choice for square root is arbitrary.