Table 2.
Technique | Implementation |
---|---|
Principal Components Analysis (PCA) [35] | Scikit-learn [45] |
Multidimensional Scaling (MDS) | Scikit-learn [45] |
Student’s t-distributed Stochastic Neighbour Embedding (t-SNE) | According to original publication [36] |
Three differing dimensionality reduction techniques were employed; these methods provide a means to interpret the approximate structure of data in extremely high dimensional space (such as physicochemical space) on a two dimensional page. PCA locates a lower dimensional hyperplane of highest variance in a hyperspace, and projects the data onto the hyperplane. MDS attempts to preserve distances in high dimensional space with those lower dimensional space. Student’s t-distributed Stochastic Neighbour Embedding also employs distance based scaling, yet imposes statistical distributions on these; it has been asserted [36] that it outperforms other methods for locating structure in high dimensional data, whilst avoiding overcrowding the centre of the low dimensional space with data points.