Box 2 | Dimensionality reduction |
The high dimensionality of transcriptomes, and other biological data (e.g. proteomes, epigenomes, etc.), provides a challenge for visualization as well as for selecting informative features for clustering and classification. Dimensionality-reduction approaches aim at finding a smaller number of features that can adequately represent the original high dimensional data in a lower dimensional space. The conventional principal component analysis (PCA) is the most commonly used dimensionality reduction method. Despite its utility, PCA can only capture linear rather than non-linear relationships, which are inherent in many biological applications. Several non-linear dimensionality reduction techniques have been proposed (e.g. Isomap (Tenenbaum et al. 2000)), see (Lee and Verleysen 2005) for an extensive review. The t-distributed stochastic neighbor embedding (t- SNE) method (Maaten and Hinton 2008) has been widely used to visualize biological data in two dimensions by preserving both the global and local relationships between the data points in the high-dimensional space (Saadatpour et al. 2015). |