Robustness | A measure for how easily outlier values distort results:
|
Unbalanced | Describes unequal group sizes or missing values, methods assuming balanced groups will have misleading results |
Positive skew | Asymmetric distribution of data with more small than large values, common in flow cytometry and many other biological measures |
Data pre-processing | Pre-processing aims to normalize data distribution (i.e. make a bell-shape) by changing all values according to one or several defined mathematical equations
Cell count differences are not per se reflective of their biological importance; thus centering and scaling reduces the stark differences of cell numbers between the cell populations to allow comparisons of different cell populations. Are vital for multivariate statistical methods, otherwise results will be dominated by cells with highest counts or highest noise
|
Data contaminations | Denotes all kinds of problematic values in the data, such as sample outliers, single value outliers, or missing values |
Outlier | A value so different from the rest that it could be for example an analytical error |
Univariate or multivariate | Univariate methods investigate each measured data on its own (e.g. analyzing only CD3+ T cells irrespective of the 15 other cell populations), whereas multivariate methods analyze multiple/all measured data at once (e.g. all 16 cell populations)
|