Transformation |
BOX |
Box-Cox Transformation |
Making asymmetric data fulfill the normality assumption in a regression model by converting the protein abundances into a more symmetric distribution (80). |
|
LOG |
Log Transformation |
Converting the distribution of ratios of abundance values of proteins into a more symmetric (almost normal distribution) and minimizing the effect of proteins with extreme abundance (39). |
|
VSN |
Variance Stabilization Normalization |
Having a built-in transformation (47) and making individual observations more directly comparable (81); assuming that most of the proteins across different samples are not differentially expressed (81). |
Pretreatment |
|
|
|
Centering
|
MEC |
Mean-Centering |
Converting all the intensities to fluctuations around zero instead of around the mean of the protein intensities; assuming all proteins are equally important (35). |
|
MDC |
Median-Centering |
Making all the intensities to fluctuations around zero instead of around the median of the protein intensities; assuming all proteins are equally important (35). |
Scaling
|
ATO |
Auto Scaling |
Adjusting each protein abundance for systematic variance using the standard deviation of each protein of all samples as scaling factor (82); assuming all proteins are equally important (35). |
|
PAR |
Pareto Scaling |
Scaling each protein abundance for systematic variance using the square root of the standard deviation of each protein of all samples as scaling factor (83); assuming all proteins are equally important (35). |
|
VAS |
Vast Scaling |
Adjusting each protein abundance for systematic variance using the coefficient of variation of each protein of all samples as scaling factor (84); assuming all proteins are equally important (35). |
|
RAN |
Range Scaling |
Scaling each protein abundance for systematic variance using the abundance range of each protein of all samples as scaling factor (85); assuming all proteins are equally important (35). |
Normalization
|
MEA |
Mean Normalization |
Ensuring the protein abundance values from all studied samples directly comparable with each other (86); assuming the mean level of the protein abundance is constant for all samples (27). |
|
MED |
Median Normalization |
Making the protein intensities from all individual samples directly comparable with each other (27, 86); assuming the median level of the protein abundance is constant for all samples (36). |
|
MAD |
Median Absolute Deviation |
Ensuring the comparability of protein intensities among all samples (86); assuming the median level of the protein abundance and the spread of abundances are the same in all samples (37). |
|
TIC |
Total Ion Current |
Making the protein intensities from all samples directly comparable with each other (86); assuming the total area under the protein abundance curve is constant among samples (38). |
|
CYC |
Cyclic Loess |
Assuming that the intensities of the vast majority of the proteins are not changed in control and case groups (33, 87) and the systematic bias is nonlinearly dependent on the protein abundances (27). |
|
LIN |
Linear Baseline Scaling |
Assuming that the abundances of the majority of the proteins in samples are unchanged under the studied condition (33, 88) and the systematic bias is linearly dependent on the protein intensities (39). |
|
RLR |
Robust Linear Regression |
Assuming that the intensities of the majority of the proteins are not changed in control and case groups (87, 88) and the systematic bias is linearly dependent on the magnitude of protein abundances (27). |
|
LOW |
Locally Weighted Scatterplot Smoothing |
Assuming that the abundances of the majority of the proteins are unchanged under the studies circumstances (40, 87) and the systematic bias is nonlinearly dependent on the protein intensities (40). |
|
EIG |
EigenMS |
Overcoming the problems caused by the heterogeneity in the protein intensities of studied samples (43, 89); Does not require any assumption about the relative strength of signals due to each source of variation (43). |
|
PQN |
Probabilistic Quotient Normalization |
Ensuring the comparability of protein intensities among all samples (86); assuming that the majority of the protein intensities does not vary for the studied classes (41). |
|
QUA |
Quantile Normalization |
Making the protein intensities from all samples directly comparable with each other (86); assuming that the majority of protein intensity signals are unchanged among samples (40). |
|
TMM |
Trimmed Mean of M Values |
Ensuring the protein abundance values from all studied samples directly comparable with each other (86); assuming the majority of proteins are not differentially expressed between control and case groups (42). |
Imputation |
BAK |
Background Imputation |
Assuming that the protein values are missing because of having small concentrations in the sample and thus cannot be detected during the MS run (27). |
|
BPC |
Bayesian Principal Component Imputation |
Imputing based on the variational Bayesian framework that does not force orthogonality between the principal components (90). |
|
CEN |
Censored Imputation |
Imputing the lowest intensity values in the dataset by assuming that the missing of protein values is because of being below detection capacity (27). |
|
KNN |
K-nearest Neighbor Imputation |
Finding k most similar proteins (k-nearest neighbors) and using a weighted average over these k proteins to estimate the missing protein values (27, 91). |
|
LLS |
Local Least Squares Imputation |
Representing a studied protein that has missing values as a linear combination of a number of proteins similar to this particular protein (92). |
|
SVD |
Singular Value Decomposition |
Applying this imputation method to the data to obtain sets of mutually orthogonal expression patterns of all proteins in the data (91). |
|
ZER |
Zero Imputation |
Imputing the missing intensities of the studied proteins by directly replacing these missing values with a number of zeros (27). |