TABLE 3.
List of data transformation and feature scaling techniques prior to dimensionality reduction.
Type | Advantages | Limitation | Technique | Reference |
---|---|---|---|---|
Normalization | Identifies and removes systematic variability. Increases the learning speed. | Less effective if high number of outliers exist in the data. | Quantile | Larsen et al. (2014) |
Smyth and Speed (2003) | ||||
Schmidt et al. (2004) | ||||
Loess | Franks et al. (2018) | |||
Karthik and Sudha (2021) | ||||
Larsen et al. (2014) | ||||
Huang et al. (2018) | ||||
Bolstad et al. (2003) | ||||
Doran et al. (2007) | ||||
Data transformation | Reduces the variance and reduces the skewness of the distribution of data points. | Data do not always approximate the log-normal distribution. | Log transformation | Pirooznia et al. (2008) |
Pan et al. (2002) | ||||
Doran et al. (2007) | ||||
Standardization | Ensures feature distributions have mean = 0. Applicable to datasets with many outliers. | Less effective when data distribution is not Gaussian, or the standard deviation is very small. | z-score | Peterson and Coleman (2008) |
Cheadle et al. (2003) | ||||
De Guia et al. (2019) | ||||
Chandrasekhar et al. (2011) | ||||
Pan et al. (2002) |