Skip to main content
. 2019 May 16;18(8):1683–1699. doi: 10.1074/mcp.RA118.001169

Table I. Various manipulation methods including 3 transformation, 18 pretreatment (2 centering, 4 scaling, and 12 normalization methods), and 7 imputation methods together with their purpose of data manipulation and/or corresponding statistical/biological assumptions. All manipulation methods were abbreviated using three-letter code. As a transformation method integrating with subsequent normalization technique, the VSN determined data-dependent transformation parameters by having a built-in transformation (47, 79).

Classes Abb. Manipulation Method Purpose/Assumption of the Manipulation Method
Transformation BOX Box-Cox Transformation Making asymmetric data fulfill the normality assumption in a regression model by converting the protein abundances into a more symmetric distribution (80).
LOG Log Transformation Converting the distribution of ratios of abundance values of proteins into a more symmetric (almost normal distribution) and minimizing the effect of proteins with extreme abundance (39).
VSN Variance Stabilization Normalization Having a built-in transformation (47) and making individual observations more directly comparable (81); assuming that most of the proteins across different samples are not differentially expressed (81).
Pretreatment
    Centering MEC Mean-Centering Converting all the intensities to fluctuations around zero instead of around the mean of the protein intensities; assuming all proteins are equally important (35).
MDC Median-Centering Making all the intensities to fluctuations around zero instead of around the median of the protein intensities; assuming all proteins are equally important (35).
    Scaling ATO Auto Scaling Adjusting each protein abundance for systematic variance using the standard deviation of each protein of all samples as scaling factor (82); assuming all proteins are equally important (35).
PAR Pareto Scaling Scaling each protein abundance for systematic variance using the square root of the standard deviation of each protein of all samples as scaling factor (83); assuming all proteins are equally important (35).
VAS Vast Scaling Adjusting each protein abundance for systematic variance using the coefficient of variation of each protein of all samples as scaling factor (84); assuming all proteins are equally important (35).
RAN Range Scaling Scaling each protein abundance for systematic variance using the abundance range of each protein of all samples as scaling factor (85); assuming all proteins are equally important (35).
    Normalization MEA Mean Normalization Ensuring the protein abundance values from all studied samples directly comparable with each other (86); assuming the mean level of the protein abundance is constant for all samples (27).
MED Median Normalization Making the protein intensities from all individual samples directly comparable with each other (27, 86); assuming the median level of the protein abundance is constant for all samples (36).
MAD Median Absolute Deviation Ensuring the comparability of protein intensities among all samples (86); assuming the median level of the protein abundance and the spread of abundances are the same in all samples (37).
TIC Total Ion Current Making the protein intensities from all samples directly comparable with each other (86); assuming the total area under the protein abundance curve is constant among samples (38).
CYC Cyclic Loess Assuming that the intensities of the vast majority of the proteins are not changed in control and case groups (33, 87) and the systematic bias is nonlinearly dependent on the protein abundances (27).
LIN Linear Baseline Scaling Assuming that the abundances of the majority of the proteins in samples are unchanged under the studied condition (33, 88) and the systematic bias is linearly dependent on the protein intensities (39).
RLR Robust Linear Regression Assuming that the intensities of the majority of the proteins are not changed in control and case groups (87, 88) and the systematic bias is linearly dependent on the magnitude of protein abundances (27).
LOW Locally Weighted Scatterplot Smoothing Assuming that the abundances of the majority of the proteins are unchanged under the studies circumstances (40, 87) and the systematic bias is nonlinearly dependent on the protein intensities (40).
EIG EigenMS Overcoming the problems caused by the heterogeneity in the protein intensities of studied samples (43, 89); Does not require any assumption about the relative strength of signals due to each source of variation (43).
PQN Probabilistic Quotient Normalization Ensuring the comparability of protein intensities among all samples (86); assuming that the majority of the protein intensities does not vary for the studied classes (41).
QUA Quantile Normalization Making the protein intensities from all samples directly comparable with each other (86); assuming that the majority of protein intensity signals are unchanged among samples (40).
TMM Trimmed Mean of M Values Ensuring the protein abundance values from all studied samples directly comparable with each other (86); assuming the majority of proteins are not differentially expressed between control and case groups (42).
Imputation BAK Background Imputation Assuming that the protein values are missing because of having small concentrations in the sample and thus cannot be detected during the MS run (27).
BPC Bayesian Principal Component Imputation Imputing based on the variational Bayesian framework that does not force orthogonality between the principal components (90).
CEN Censored Imputation Imputing the lowest intensity values in the dataset by assuming that the missing of protein values is because of being below detection capacity (27).
KNN K-nearest Neighbor Imputation Finding k most similar proteins (k-nearest neighbors) and using a weighted average over these k proteins to estimate the missing protein values (27, 91).
LLS Local Least Squares Imputation Representing a studied protein that has missing values as a linear combination of a number of proteins similar to this particular protein (92).
SVD Singular Value Decomposition Applying this imputation method to the data to obtain sets of mutually orthogonal expression patterns of all proteins in the data (91).
ZER Zero Imputation Imputing the missing intensities of the studied proteins by directly replacing these missing values with a number of zeros (27).