Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2022 Dec 20;18(12):e1010764. doi: 10.1371/journal.pcbi.1010764

Improved visualization of high-dimensional data using the distance-of-distance transformation

Jinke Liu 1,2,*, Martin Vinck 1,2
Editor: Emma Claire Robinson3
PMCID: PMC9812310  PMID: 36538561

Abstract

Dimensionality reduction tools like t-SNE and UMAP are widely used for high-dimensional data analysis. For instance, these tools are applied in biology to describe spiking patterns of neuronal populations or the genetic profiles of different cell types. Here, we show that when data include noise points that are randomly scattered within a high-dimensional space, a “scattering noise problem” occurs in the low-dimensional embedding where noise points overlap with the cluster points. We show that a simple transformation of the original distance matrix by computing a distance between neighbor distances alleviates this problem and identifies the noise points as a separate cluster. We apply this technique to high-dimensional neuronal spike sequences, as well as the representations of natural images by convolutional neural network units, and find an improvement in the constructed low-dimensional embedding. Thus, we present an improved dimensionality reduction technique for high-dimensional data containing noise points.

Author summary

Biological datasets are often high-dimensional, e.g. the genetic profile of cells or the firing pattern of neural populations. Dimensionality reduction methods like t-SNE are commonly used to represent the high-dimensional data in a low-dimensional embedding space. The visualization helps us to identify the underlying clustering patterns and shed light on the information hidden within the data. We show that in situations where there exist scattering noise points, clustering patterns in the data tend to be heavily distorted. Here, we show that using a distance-of-distance (DoD) transformation of the dissimilarity matrix between data points, the influence of scattering noise is effectively removed. This neighborhood-based transformation is most effective when the dimensionality of the dataset is high. We show that this technique improves low-dimensional embedding for several high-dimensional datasets, such as the convolutional neural network representation of natural images or the neuronal population representation of visual stimuli.


This is a PLOS Computational Biology Methods paper.

Introduction

A major goal of data science is to extract patterns from high-dimensional data containing multiple features. It is typically required to construct a low-dimensional representation of high-dimensional data for the purpose of visualization, noise reduction, or feature extraction. In fields such as biology, where high-dimensional data sets are common, dimensionality reduction approaches are widely adopted. For instance, in neuroscience, dimensionality reduction techniques have been used to study the way in which neuronal populations represent motor and visual information [13]. It is also a standard approach to study the the genetic profiles of different cell types [47]. Dimensionality reduction techniques based on embeddings including t-SNE [8, 9] and UMAP [10] have been developed to represent high-dimensional data with only two or three components. The principle underlying these techniques is to treat data points as particles that are attracted to their neighbors and repelled by distant data points. Despite their usefulness, it is known that algorithms like t-SNE have inherent limitations, such as: sensitivity to hyper-parameters like perplexity; difficulty to capture global structure in the data especially when there are many clusters [11]. Therefore, it is important to optimize the pre-processing of the data and application of low-dimensional embedding techniques [7].

Here we show another problem with methods like t-SNE, namely that its performance strongly deteriorates when the data set contains many noise points. We show that the low-dimensional embedding space can become crowded due to the presence of noise points. The basic mechanism is that noise points repel each other and therefore start overlapping with clusters, even though the noise points have large distances to the clusters. As a result, meaningful patterns in the data can be masked. To our knowledge, there exists no simple solution to this “scattering noise problem”. Although clustering techniques like HDBSCAN can be used to identify noise points [12], these techniques do not solve the scattering noise problem in terms of low-dimensional visualization. Furthermore, although in some situations PCA may aid to denoise the data, PCA can also remove important information from high-dimensional data sets. As we will show, PCA does not in general effectively solve the scattering noise problem.

Here, we present a simple technique to solve the scattering noise problem. We show that scattering noise problem for high-dimensional datasets can be effectively alleviated with a transformation of the distance matrix. We call this the distance-of-distance (DoD) transformation, because it considers the differences between distances in a certain neighborhood of the data points. We apply the DoD transformation to both electrophysiological recordings of neurons in the mouse visual cortex during the presentation of drifting grating stimuli, as well as representations of natural image patches by convolutional neural network units. We demonstrate that in both cases, the DoD transformation facilitates the separation between noise points and cluster points in the low-dimensional embedding space.

Materials and methods

Simulation

We generated high-dimensional cluster points and noise points by randomly sampling from the multivariate Gaussian distributions with a standard deviation of 0.1. We drew points within one cluster from the same Gaussian distribution, while noise points were independently distributed. Then, we calculated the distance matrix between all pairs of data points by using either euclidean or city-block metric. To find the low-dimensional embeddings, we applied t-SNE algorithm with perplexity value ranging from 5 to 50 and initialization with different random seed. We used an open implementation of t-SNE algorithm from sklearn (version 0.23.2). In some settings, we adopted PCA initialization combined with PCA preprocessing (S3 Fig).

To analyze how dimensionality and number of points influenced the performance of DoD transformation, we simulated cluster points from Gaussian distributions and noise points randomly distributed in a hyper-cube. Next, to compute the cluster-to-cluster distance, we considered all pairs of points, each from a different cluster, then we calculated the average distance of all such pairs. To compute the cluster-to-noise distance, we considered all pairs of points, such that one is a noise point and the other from a cluster, and then calculated the average distance between all such pairs. To compute the noise-to-noise distance, we considered all pairs of noise points, then we calculated the average distance of all such pairs. Next, we used DoD transformation to manipulate the original pairwise distance matrix D. After the manipulation, we obtained the distance-of-distance matrix F. We measured the distance shrinkage of all three types of distances (cluster-to-cluster, cluster-to-noise, and noise-to-noise) in two ways: Either the absolute shrinkage was calculated as Δ=d¯-f¯, or the fraction was calculated as f¯/d¯. Therefore, a larger delta distance or a smaller fraction indicates a larger shrinkage. In order to measure the clustering performance, we adopted the commonly used metric Adjusted Rand Index (ARI).

K nearest neighbor classifier

We used a K nearest neighbor classifier to measure the performance of the DoD transformation on noise-free datasets. We chose the optimal parameter K of KNN based on cross-validated classification score. Then with the optimal parameter, we built the KNN model on both the original distance matrix and the distance matrix after the DoD transformation. We then compared the 5-fold cross-validated score of the classifiers.

Neural data analysis

We analyzed neural data from area V1 obtained via electrophysiological Neuropixel recordings (Allen Institute, [13]). The drifting grating visual stimulus consists of a full-field sinusoidal grating that moves in a direction perpendicular to the orientation of the grating. The spatial-temporal frequency of the drifting grating stimulus is not considered in our study. In the public dataset provided by Allen Institute, the drifting grating stimulus moves in 8 different directions. They were shown to the animal in a random order (S7(A) Fig). Example raster plots were taken from drifting grating response of session 754829445. We selected the visual neurons with high signal to noise ratio (snr ≥ 0.3), and the total number of neurons were 191. We applied SPOTDisClust algorithm [14] on the population spiking patterns within 100 ms after the stimulus onset. We used the output SPOTDis matrix for the following t-SNE analysis. Please refer to S7 Fig to find more details.

Image data analysis

Images were obtained from the ImageNet data set. We used the pretrained VGG16 [15] as the convolutional neural network model. We cropped original images to create image patches that match the input size expected by VGG16. For each image patch, we extracted the responses of artificial neurons in the fully connected layer (fc6) as its representation. The dimensionality of the feature vector is 4096. Code is available at Github.

Results

Simulation

There are various techniques to construct a low-dimensional embedding of high-dimensional data, such as t-SNE [8, 9] and UMAP [10]. These unsupervised techniques are commonly used to visualize the results of clustering and to study the geometry of high-dimensional data. For some applications, however, a part of the data set could be comprised of noise points that are randomly scattered in a high-dimensional space. For example, the activity pattern of a high-dimensional neuron ensemble might show consistent clustering when the neural response is driven by specific stimuli, but could otherwise exhibit random behavior during spontaneous activity. When a dataset contains many noise points, the t-SNE and UMAP embedding exhibit a typical “scattering noise problem” (Fig 1C). That is, the noise points tend to be spread uniformly in the low-dimensional embedding space and are located near the clusters, despite the fact that the noise points are well separated from the cluster points. This scattering noise problem occurs because the noise points have, on average, a large distance between themselves, which causes them to repel each other. Thus, noise points can end up near or in a cluster region and can effectively mask the clusters that are present in the data set. Here, we develop a technique to address the scattering noise problem using a transformation of the distance matrix. We will show that the performance of low-dimensional embedding techniques is often improved by such transformation of the distance matrix. We start from a scenario where there are clusters, but also noise points that are scattering in a high-dimensional space. Consider the distance of a noise point P to its nearest neighbors. In a high-dimensional space, we expect another noise point Q to have a similar distance to the nearest neighbors of P as P itself. In other words, even though the distance between two noise points P and Q can be large, their distances to their respective nearest neighbors might in fact be very similar in a high-dimensional space. This is simply due to the fact that scattering points do not have particular clustering patterns, which makes any point almost identical to others in terms of their neighboring distance structure. This observation led us to compute the differences between the distances (i.e. the distance-of-distance), such that the scattering nature of noise points can be better captured after the transformation. Specifically, for each pair of data points, we consider the joint set of K neighbors of these two data points, and then compute the distance-of-distance w.r.t. this set of neighbor points.

Fig 1. Scattering noise problem in t-SNE and its alleviation through the DoD transformation.

Fig 1

In this simulation, we generated data points for five clusters and one noise cluster. For each of the five clusters, we sampled 20 points from different multivariate Gaussian distributions in a high-dimensional space (D = 50). Another 200 noise points were sampled uniformly from the same space. A: The original Euclidean (L2) distance matrix between the data points. Data points were ordered by clusters. B: The dissimilarity matrix after the DoD transformation. We computed the distance-of-distances for all pairs of points w.r.t. their nearest 10 neighbors. Red bar on the left indicates cluster points and black bar indicates scattering noise points. C: t-SNE visualization based on the original dissimilarity matrix. D: t-SNE visualization based on the dissimilarity matrix after the DoD transformation. Points from different clusters are labelled in different colors, noise points are labelled in grey.

Mathematically speaking, given a dataset XRN×D with N samples and D features, the distance matrix is constructed as DRN×N with either L1 or L2 distance metric, where di,j = ∥XiXj1 or ∥XiXj2. For any two data points Xi,XjRD, we found their sets of K nearest neighbors I and J respectively. Then we manipulated the original distance matrix D to yield another new distance matrix F. We take the average absolute difference between the two points’ distances to the selected neighborhood as the new distance as following,

fi,j12K(nI|dn,i-dn,j|+mJ|dm,i-dm,j|) (1)

Subsequently, the distance-of-distance matrix F is used to substitute the original distance matrix used in the t-SNE algorithm. Applying this transformation, we should be able to obtain a high similarity between noise points, while retaining the distances from noise points to the clusters, and the distances between the clusters. Moreover, we expect the method to be relatively insensitive to the choice of neighborhood size K. As long as K is kept smaller than the cluster size, the average density of the cluster neighborhood should not change much, therefore, the shrinkage of noise-cluster and noise-noise distances should in general apply (S1 Fig). We illustrate this behavior as an example (Fig 1C), where in the t-SNE embedding, noise points are located near or in the cluster regions. After the DoD transformation, the noise points attract each other, and do not randomly scatter over the low-dimensional embedding anymore (Fig 1D). Consequently, compared with the standard t-SNE, the noise points form a separate cluster that is isolated from the cluster points, thereby providing a better match with cluster labels. Even in simulations where there are fewer noise points than cluster points in the data set, the DoD transformation was still robust (S4 Fig). Furthermore, the DoD transformation helps to separate the noise cloud from the clusters regardless of which distance metric we use to construct the original distance matrix D. The results obtained by using L2 metric are almost identical to those by using L1 metric. Moreover, in situations where the noise labels are unknown, we can infer their identity based on the DoD transformation. By comparing the magnitude of the distance changes, it provides us with an automated way of denoising data (S2 Fig).

Intuitively, we expect the effectiveness of the DoD transformation to be dependent on two factors, namely the dimensionality of the data set and the number of data points. When the number of noise data points is relatively large compared to the dimensionality of the data, we expect that the DoD transformation has a minor effect, because two noise points will show relatively dissimilar distances to their respective neighbors. For example, on a one-dimensional line, if we take two out of many random noise points, then each noise point will have a nearby neighbor, and the distance-of-distances will be similar to the original distance. However, when the dimensionality of the data is relatively high compared to the number of noise points, we expect that the DoD transformation makes a large difference compared to the original distance. To investigate this, we performed several simulations with a varying number of cluster/noise points and dimensionality. In these simulations, we drew noise points and cluster centers from a Gaussian distribution in a D-dimensional space (with a diagonal covariance matrix). Cluster points were generated from Gaussian distributions around the cluster centers. We observed that when the dimensionality of the feature space was low, the DoD transformation had minor effects on the low-dimensional embedding and on the distance matrix. However, when the dimensionality of the feature space was relatively high, the DoD transformation was highly effective in separating the cluster points from the noise points (Fig 2).

Fig 2. Influence of dimensionality and number of noise points on the performance of the DoD transformation.

Fig 2

In these simulations, 20 cluster points were sampled from five different multivariate Gaussian distributions. The dimensionality was, from left to right, 5, 20, 20, and 50. The number of noise points was, from left to right, 200, 200, 2000 and 2000. distance-of-distances were computed w.r.t. 5 neighbor points. A: The original dissimilarity matrix. B: The dissimilarity matrix after the DoD transformation. Red bar on the left indicates cluster points and black bar indicates scattering noise points. C: t-SNE visualization based on original dissimilarity matrix. D: t-SNE visualization based on the dissimilarity matrix after the DoD transformation. Points from different clusters are labelled in different colors and noise points are labelled in grey.

There are other techniques, such as principal component analysis (PCA), that are commonly used in combination with t-SNE. We showed that the scattering noise problem cannot be solved by simply using PCA for either initialization or preprocessing (S3 Fig). Furthermore, perplexity is a parameter in t-SNE algorithm that controls how near a point needs to be in order to be considered as a neighbor to a given point. We showed that by simply tuning perplexity, the scattering noise problem cannot be solved (S5 Fig).

Another degree of freedom in the DoD transformation comes from the parameter K, which controls the neighborhood size. We showed that our method is generally robust to the choice of K, even in situations where it is slightly larger than the cluster size. But unsurprisingly, the method fails when K approaches the total number of points in the data set (S1 Fig). In situations where there are no scattering noise, applying DoD transformation to only clustering data introduced very limited distortion (S6(A) Fig). Even in situations where one cluster has lower density than the others, the DoD transformation did not significantly change the geometry of the low-dimensional manifolds (S6(B) Fig).

Theoretical analysis

The simulations shown in Fig 2 suggest that the DoD transformation is most effective when the dimensionality of the feature space is relatively high. In this section, we will formalize the notion of the DoD transformation and explain why it improves the performance of low-dimensional embedding techniques such as t-SNE.

Suppose that a data set consists of N points in a D-dimensional feature space. Each data point can be represented by a D-dimensional vector. Furthermore, suppose that the noise points are uniformly scattered in a D-dimensional hyper-cube and that there are several clusters whose cluster centers are also uniformly scattered. To simplify our analysis, we will assume that the clusters have infinite density. In other words, points that belong to the same cluster have a mutual distance of zero. We will use the normalized L1 norm (i.e. the Manhattan distance metric divided by the dimensionality) to measure the distances between any pair of data points. Since the distance is normalized by dimensionality, the distance will be finite for a finite volume, as D → ∞. The analysis can be generalized to the L2 norm (Euclidean distance metric), but we use the L1 to make the analytical derivation easier. Consider that we have M clusters Cm and a set of noise points Σ. Consider a pair of cluster and noise points jCm and σ ∈ Σ with a distance of dj,σ. Let j* be the nearest neighbor such that dj,j* is the L1 distance between point j and its the first nearest neighbor. We consider a simple case where the DoD transformation has a joint neighborhood size of 2. Applying the DoD transformation with neighborhood size of 2, the distance-of-distance fj,σ between a noise and a cluster point can be expressed as

fj,σ=12|dj,j*-dσ,j*|+12|dj,σ*-dσ,σ*|. (2)

Because we assumed that the cluster is infinitely dense, the equalities dj,j* = 0 and dσ,j* = dσ,j hold. Hence,

fj,σ=12dσ,j+12|dj,σ*-dσ,σ*|. (3)

If there are in total N points scattering uniformly in a unit volume, the average distance of a point to its first nearest neighbor can be approximated [16] as

E{dσ,σ*}13(1N)1D, (4)

where the factor 13 takes into account that we compute the normalized L1-distance. Note that as D → ∞, dσ,σ*13, the expected (normalized) L1-distance between any two points that are uniformly distributed in a hyper-cube.

Because σ* is another random noise point, we have E{dj,σ*}dj,σ. Furthermore, dj,σ will be larger than dσ,σ*, because σ* is the first neighbor of σ (assuming that the first neighbor of the noise point is not a cluster point). Therefore, we can simplify the expression of the distance-of-distances fj,σ as

E{fj,σ}12(2dj,σ-13(1N)1D). (5)

Now consider two noise points σ ∈ Σ, ϵ ∈ Σ together with their nearest neighbors ϵ*, σ*, and apply the same argument there. We have

fσ,ϵ12|dσ,ϵ*-dϵ,ϵ*|+12|dϵ,σ*-dσ,σ*|. (6)

Note that E{dσ,ϵ*}dσ,ϵ and that E{dϵ,σ*}dϵ,σ. Hence the transformed distance can be expressed as

E{fσ,ϵ}12(2dσ,ϵ-23(1N)1D). (7)

Finally, we can see that the transformed distance between two points from two different clusters or from the same clusters will be identical to the original distance, given the assumption that a cluster is infinitely dense.

Thus, compared to the original distance, the distance-of-distance between two noise points decreases more strongly than the distance between a cluster and a noise point, by the amount of 1/6 (1/N)1/D, while the distance between two cluster points is preserved. It can be further seen that when D → ∞, (1/N)1/D → 1 and therefore fσ,ϵ → 0. Hence, as D approaches infinity, we obtain the asymptotes E{fσ,ϵ}0 and E{fj,σ}1/6.

Therefore, in very high-dimensional spaces, the DoD transformation preserves the geometrical distances between the cluster points, but pushes the noise points together, while relatively preserving distance between the clusters and the noise points. For a low-dimensional embedding technique, this means that the noise points will now be attracted to each other and be repelled by the clusters. Suppose that we want the difference in distance shrinkage between cluster-to-noise and noise-to-noise point to be greater than a threshold θ. The resulting inequality shows a linear dependence on D but a logarithmic dependence on N:

16(1N)1D>θlogN<-6Dlogθ. (8)

Dimensionality and number of points influence distance-of-distances

We performed further simulations to support these theoretical analyses. In the first simulation, we examined how the DoD transformation affects the distance between noise and cluster points. The theoretical analysis above predicts that if the dimensionality D grows, the distance between pairs of noise points should show a relatively strong decrease, whereas the distance between a cluster and a noise point should show a relatively small decrease as compared to the original distance. To test this, we generated data using Gaussian mixture models. We then examined how the dimensionality and the number of noise points affects the distance-of-distances between cluster and noise points, two noise points and two points belonging to different clusters. As predicted, due to the DoD transformation, the distances between data points shrink. Furthermore, the average shrinkage of the distance between two noise points was larger than the case for a cluster point and a noise point. In addition, the shrinkage of distances increased as a function of dimensionality D, and decreased as a function of number of points N. (Fig 3).

Fig 3. Influence of dimensionality and number of points on distance-of-distances.

Fig 3

A-B: Influence of dimensionality D on distance-of-distances. A: Absolute shrinkage of Euclidean distance (i.e. original distance minus the distance-of-distances) as a function of dimensionality D. B: Fraction of original distance (distance-of-distances divided by original distance) as a function of dimensionality D. The distance between cluster points is unaffected by the Distance-to-Distance transformation (because the clusters had infinite density). Because of the DoD transformation, the distance between noise points shrinks more than the distance between cluster and noise points. As a result, noise points become relatively more similar to each other than to other cluster points. C-D. Influence of the number of points N on distance-of-distances. C: Absolute shrinkage of distance as a function of number of points N. D: Fraction of original distance as a function of number of points N. As the number of data points increases, the noise-to-noise and cluster-to-noise distance-of-distances become more similar to each other. The dimensionality in this example was D = 20. The number of neighbors w.r.t which distance-of-distances were computed was 10. The error bar indicates the standard deviation across simulations with different initialization settings.

DoD transformation improves clustering

Next, we examined whether the DoD transformation improves clustering performance on the t-SNE embedding. To study this, we generated high-dimensional data from Gaussian distributions with different number of data points and different dimensionality. We then created low-dimensional embeddings and used the K-means algorithm to identify clusters in the t-SNE embeddings. To measure the clustering performance, we compared the true cluster labels with the inferred cluster labels using the Adjusted Rand Index (ARI). Fig 4A and 4B show how the DoD transformation improves the clustering performance and the t-SNE embedding. We found that the DoD transformation strongly alleviated the scattering noise problem in the low-dimensional embedding and improved the clustering performance, as measured by ARI. As the dimensionality D of the data increased, the ARI score strongly increased. Conversely, the ARI score decreased as a function of the number of noise points (Fig 4C). As predicted from our theoretical analyses, clustering performance showed an approximately linear dependence on D and a logarithmic dependence on N. This was indicated by the presence of a diagonal line in the heat map of ARI changes. This analysis shows that the DoD transformation improves the performance of clustering with the existence of scattering noise points, especially for large D and a small number of data points N.

Fig 4. DoD transformation improves K-means clustering, with a linear dependence on dimensionality and a logarithmic dependence on the number of data points.

Fig 4

A: Example of K-means clustering. We sampled 20 cluster points from five different multivariate Gaussian distributions with dimensionality D = 20. Another 500 noise points were sampled uniformly from the same space. Left, t-SNE visualization. Right, K-means clustering based on t-SNE embeddings. Colors correspond to the original labels. The ARI score of clustering was 0.17. B: Distance-of-distance transformation improves the clustering. Left, t-SNE visualization of the distance-of-distance matrix. Right, K-means clustering. The ARI score of clustering was 0.95. C: Effect of distance-of-distance on ARI score, as a function of the number of data points and the dimensionality. Data points were sampled accordingly with different number of noise points and dimensionality D. Dimensionality D varied from 10 to 20 linearly. The number of noise points N varied from 256 to 4096 exponentially with a base of 2. Left, The ARI score of K-means clustering on the t-SNE embeddings of the original distance matrix. Middle, The ARI score of K-means clustering on the 2D t-SNE embeddings of the distance-of-distance matrix. Right, The improvement of ARI score, i.e. difference between the left and middle matrix, which shows the predicted linear dependence on D and logarithmic dependence on N.

Application of DoD transformation to the representation of drifting gratings by mouse visual cortex

We then applied the DoD transformation to high-dimensional empirical data. As a first application, we applied the DoD transformation to the problem of unsupervised detection of spiking sequences in high-dimensional neural data. Previously, techniques have been developed for unsupervised detection of high-dimensional spiking sequence patterns, by using a distance measure between spike trains based on optimal transport (SPOTDist) [14, 17]. By definition, the SPOTDist measure only considers the temporal relationships between spike trains, but is invariant to a scaling of the firing rate [14]. We used this technique to analyze the visual cortical data of Allen Institute Brain Observatory [13]. We wondered whether drifting grating stimuli moving in opposite directions would be represented by different temporal spiking sequences (Fig 5A). The drifting grating stimulus consists of a full-field sinusoidal grating that moves in a direction perpendicular to the orientation of the grating. In the public dataset provided by Allen Institute, the drifting grating stimulus move in 8 different directions (S7 Fig). Furthermore, we also wondered if the neural representations of these stimuli would be similar to the neural vectors of spontaneous activities during the inter-stimulus-interval (Fig 5B). This was motivated by previous studies suggesting a relationship between spontaneous and stimulus-driven or task-evoked neural activity [1821]. For each trial, we analyzed the responses in the first 100 ms after the stimulus onset and then used the SPOTDist method to compute the pairwise distance between spiking patterns (Fig 5C, Left). Using the SPOTDist distance matrix, the standard t-SNE algorithm revealed a separation of neuronal spiking patterns responding to drifting grating stimulus of different orientations. However, with the standard t-SNE, epochs of spontaneous activities were located in the same region of the low-dimensional embedding as the stimulus-evoked responses (Fig 5D, Left). This visualization seems to suggest that there is relatively high similarity between spontaneous activity and stimulus-driven activity, and that there is some form of replay or preplay of the different stimulus patterns in the inter-stimulus period. Alternatively, the similarity might have been a consequence of the scattering noise problem described above. Consistent with the latter interpretation, we found that the DoD transformation separated the spontaneous activity epochs from the stimulus-evoked epochs. After the DoD transformation, the low-dimensional embedding contained a region for the spontaneous activity that was clearly separated from the stimulus clusters (Fig 5D, Right). This indicates that population vectors during spontaneous activities and activities evoked by drifting gratings are clearly distinguishable, and that the DoD transformation effectively identifies this separation. It also corresponds with previous findings in rodents that the spontaneous activities are living in a space orthogonal to the evoked visual response [22]. In addition, we now observed a clearer separation between the different stimuli. Using KNN to quantify the classification of stimulus orientations, we found that the performance score was improved after the DoD transformation (S8 Fig). Moreover, in the absence of scattering noise, we also showed that the application of DoD transformation did not introduce distortion to the continuous structure in the data (S9 and S10 Figs). However, we observed that the neural response to gratings moving in different directions were not separated both before and after the DoD transformation. The lack of direction separation is due to the fact that the majority of recorded neurons from primary visual cortex have only orientation tuning but not direction tuning (S7(B) Fig). To summarize, we demonstrated that using DoD transformation, we can solve the scattering noise problem that exists in real neural data.

Fig 5. DoD transformation improves clustering of neural spiking sequences.

Fig 5

A: Example raster plot of spiking pattern to drifting gratings of eight different directions. We analyzed the spiking events in the time window of 100 ms after the stimulus onset. B: Example raster plot of spontaneous neural response. We analyzed the spiking events in the time window of 100 ms in intertrial intervals (number of neurons: 191). C: Pairwise distances between all spiking patterns. Left, Original SPOTDist matrix between spiking patterns. SPOTDist is a distance measure that compares the similarity of the spiking patterns based on optimal transport distance (i.e. the minimum energy to transform one spiking pattern into another spiking pattern). Total number of stimulus-driven trials was 630. Right, Distance matrix after the DoD transformation. D: Low-dimensional embeddings of all spiking patterns. Left, 2D embeddings of t-SNE on the SPOTDist matrix between spiking patterns. Right, 2D embedding of t-SNE on the distance-of-SPOTDist matrix. Drifting grating trials with different directions are labelled in different colors as in (A). Spontaneous activities are colored in grey. Trials with missing labels are colored in black.

Application of DoD transformation to the representation of natural images by convolutional neural network

Next, we analyzed the high-dimensional representations of natural images by VGG16, which is a common convolutional neural network used for object recognition [15]. By transformations of input data through multiple convolutional layers, VGG16 represents an image in the fully connected layer as a high-dimensional feature vector of length 4096. A linear classifier can be built upon these feature representations in deeper layers to decode the object identity. We defined our feature vector as the activations of artificial neurons in the fully connected layer of a pre-trained VGG16 network. We then computed the t-SNE embeddings based on these feature vectors of image patches. We first analyzed t-SNE embeddings for image patches that contain 8 out of 1000 different object classes from ImageNet data set. The t-SNE embeddings showed a clear clustering for different object classes (Fig 6A). We then randomly selected 250 image patches from the remaining 992 classes. We only took one image from each distinct class, therefore, it is assumed that these images would scatter in the t-SNE embeddings and mask the clusters, i.e. creating the scattering noise problem. Indeed, we observed that the t-SNE embedding coordinates for these randomly chosen image patches overlapped with the clustered image patches (Fig 6B). Because VGG16 representations live in a high-dimensional space, we predicted that the DoD transformation should lead to a separation of these noise-like scattering image patches from the clustered images. Indeed, the DoD transformation relocated the activation patterns into a separate region of the low-dimensional embedding, while preserving the geometry of object relationships as compared to the original t-SNE (Fig 6C). Using KNN algorithm, we quantified how classification accuracy of the clustering points varied after the DoD transformation. We calculated distance matrices by using either the high-dimensional neural network representations, or the low-dimensional t-SNE embeddings. We found that after the DoD transformation, the cross-validated KNN accuracy increased from 84.3% to 91.2% on high-dimensional neural representations, and it increased from 85.9% to 90.1% on the low-dimensional embeddings. Moreover, applying DoD transformation to data without scattering noise did not distort the clustering patterns (S11 Fig). Furthermore, we also showed that the scattering noise problem could also occur even when the clustering points came from a different image data set (S12 Fig). Therefore, in the case where the clustering data were masked by the scattering noise data, the DoD transformation can better separate them from each other.

Fig 6. DoD transformation on natural image patch representations by convolutional neural network units.

Fig 6

A: Low-dimensional manifold for images from 8 different classes. Distances were computed based on the high-dimensional feature vectors in the fully connected layer of the VGG16 network. B: Low-dimensional manifold for data included both image from the chosen classes and random images. C: DoD transformation separates clustering images from randomly scattering images. Left, Dissimilarity matrix. Middle, 2D t-SNE embedding without labeling. Right, 2D t-SNE embedding with class labeling. 8 classes are labelled in different colors and scattering images are labelled in grey.

Discussion

We have presented a technique to improve the performance of low-dimensional embedding techniques like t-SNE in the presence of scattering noise points. Such a situation can be common for high-dimensional empirical data where clusters are sparse and a part of the data points represent noise, which may often be the case in biological data. For example, if we were to observe brain activity for several hours, then neural activity may form clear patterns only for a fraction of time, e.g. when neurons are activated by an external stimulus in the receptive field. Importantly, the DoD transformation will yield comparable performance when the data contains only true clusters. Moreover, as we showed, the DoD transformation confers benefits especially when the dimensionality of the data is high. For high-dimensional data, embedding techniques like t-SNE have benefits compared to techniques like PCA, because they can create linear embeddings also for data living on a non-linear manifold, and do not restrict analysis to a few components representing only a fraction of the total variance. Therefore, this technique can be useful for analyzing the geometry of neural representations because it yields low-dimensional coordinates, which can then be related to other behavioral or stimulus parameters.

A disadvantage of the DoD transformation is its computational cost, because it requires computation of the entire N × N distance matrix first. Thus, the runtime of the transformation increases exponentially with the number of data points (S13 Fig). By contrast, efficient algorithms exist for computing t-SNE based on neighborhood distances, avoiding an N × N computational complexity. Another point of consideration is the hyper parameter of the total number of neighbors used to compute the DoD transformation. Although we observed strong improvements with relatively small neighborhood sizes, increasing the neighborhood size beyond the cluster size may lead to distortions (S1 Fig).

In conclusion, we have presented a simple and theoretically motivated transformation of the distance matrix by computing distance-of-distances, which improves clustering of high-dimensional data in the presence of noise points, and have provided several applications to neural networks and biological data where this technique was useful and led to more accurate conclusions.

Supporting information

S1 Text. Influence of neighborhood size.

(PDF)

S2 Text. Unsupervised noise detection.

(PDF)

S3 Text. PCA preprocessing.

(PDF)

S4 Text. Fewer noise points than cluster points.

(PDF)

S5 Text. Influence of perplexity.

(PDF)

S6 Text. Distortion in noise-free situations.

(PDF)

S7 Text. Allen Institute Brain Observatory electrophysiological recordings.

(PDF)

S8 Text. Improvement of classification.

(PDF)

S9 Text. Distortion of real neural data by DoD transformation.

(PDF)

S10 Text. Distortion of real convolutional neural network data by DoD transformation.

(PDF)

S11 Text. Application of DoD transformation to CNN representation of images from different data sets.

(PDF)

S12 Text. Runtime of DoD transformation.

(PDF)

S1 Fig. Effect of neighborhood size K.

A: 5 clusters, each with 20 points; 200 scattering noise points; dimensionality of 50; Original embedding (left) and DoD transformation with a neighborhood size of 5, 20, 50 and 100. B: 50 clusters, each with 20 points; 1000 scattering noise points; dimensionality of 50; Original embedding (left) and DoD transformation with a neighborhood size of 10, 50, 1000 and 2000.

(PNG)

S2 Fig. Inference of noise points based on neighborhood overlap.

A: Distribution of overlap rate of neighborhood identity before and after the DoD transformation. B: Points with smaller overlap rate (< 65%) were identified as scattering noise points (black).

(PNG)

S3 Fig. PCA is limited in terms of solving scattering noise problem.

50 clusters, each with 20 points; 1000 scattering noise points; dimensionality of 50; PCA preprocessing uses the first 10 principal components; DoD transformation uses neighborhood size of 10.

(PNG)

S4 Fig. DoD transformation keeps its performance when there are fewer noise points.

50 clusters, each with 20 points; 500 noise points; dimensionality of 50; DoD transformation with a neighborhood size of 5. Left, original t-SNE visualization. Right, t-SNE visualization with DoD transformation.

(PNG)

S5 Fig. Larger perplexity of t-SNE algorithm does not solve scattering noise problem.

50 clusters, each with 20 points; 1000 noise points; dimensionality of 50; DoD transformation with a neighborhood size of 5; perplexity values are 5, 50, 100, 500, 1000 from left to right. A: t-SNE on original distance matrix. B: t-SNE on distance matrix after DoD transformation.

(PNG)

S6 Fig. DoD transformation does not distort the clustering.

A: 5 clusters, each with 20 points; dimensionality of 50; DoD transformation with neighborhood size ranging from 5 to 30. B: 5 clusters, each with 20 points; dimensionality of 50; DoD transformation with neighborhood size ranging from 5 to 30. All clusters were generated from multivariate Gaussian distribution and one of them is with a larger standard deviation (0.5) than the rest (0.2).

(PNG)

S7 Fig. Neural spiking data from Allen Institute.

A: Illustration of recording session of drifting grating visual stimulus. B: Direction tuning curves of recorded units in session 754829445.

(PNG)

S8 Fig. DoD transformation improves KNN classification of neural spiking sequences.

For both the distances in the high-dimensional space and the distances in the low-dimensional embedding, the cross-validated KNN classification scores are higher after DoD transformation.

(PNG)

S9 Fig. DoD transformation keeps the ring structure of neural representations of grating stimulus.

Each data point represents the populational firing pattern in a given trial (n = 3200). Trials with different stimulus orientations are labelled in different colors.

(PNG)

S10 Fig. Neural manifolds were better separated by DoD transformation.

Each data point represents the populational spiking pattern in a given trial (n = 600). Trials with different stimulus orientations are labelled in different colors.

(PNG)

S11 Fig. Object manifolds were not distorted by the DoD transformation.

2D t-SNE embeddings for 8 random ImageNet classes maintain clustering patterns after the DoD transformation.

(PNG)

S12 Fig. DoD transformation on sketch images represented by convolutional neural network units.

A: Low-dimensional manifold for 634 sketch patches from 5 different classes. Distances were computed based on the high-dimensional vectors in the fully connected layer of the pretrained Alexnet network. B: Low-dimensional manifold for data including both sketch patches and 50 random ImageNet patches. C: DoD transformation separates ImageNet patches from sketch patches. Left, Euclidean distance matrix. Right, 2D t-SNE embeddings. Sketch images of different classes are labelled in different colors. ImageNet patches are labelled in grey.

(PNG)

S13 Fig. Runtime of the DoD transformation.

(PNG)

Data Availability

The paper uses publicly electrophysiological data from Allen Institute Brain Observatory 1.1 (https://allensdk.readthedocs.io/en/latest/visual_coding_neuropixels.html). The images are taken from public available ImagNet dataset (https://www.image-net.org/) and Sketch dataset (https://sketchy.eye.gatech.edu/). All code is shared at a github repository at https://github.com/Jinke-Liu/Distance-of-Distance-tSNE.

Funding Statement

This project was supported by a BMBF Grant to M.V. (Computational Life Sciences, project BINDA, 031L0167). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Gallego JA, Perich MG, Miller LE, Solla SA. Neural manifolds for the control of movement. Neuron. 2017;94(5):978–984. doi: 10.1016/j.neuron.2017.05.025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Cunningham JP, Byron MY. Dimensionality reduction for large-scale neural recordings. Nature neuroscience. 2014;17(11):1500–1509. doi: 10.1038/nn.3776 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Stringer C, Pachitariu M, Steinmetz N, Carandini M, Harris KD. High-dimensional geometry of population responses in visual cortex. Nature. 2019; p. 1. doi: 10.1038/s41586-019-1346-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Amir EaD, Davis KL, Tadmor MD, Simonds EF, Levine JH, Bendall SC, et al. viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nature biotechnology. 2013;31(6):545–552. doi: 10.1038/nbt.2594 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Li W, Cerise JE, Yang Y, Han H. Application of t-SNE to human genetic data. Journal of bioinformatics and computational biology. 2017;15(04):1750017. doi: 10.1142/S0219720017500172 [DOI] [PubMed] [Google Scholar]
  • 6. Becht E, McInnes L, Healy J, Dutertre CA, Kwok IW, Ng LG, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nature biotechnology. 2019;37(1):38–44. doi: 10.1038/nbt.4314 [DOI] [PubMed] [Google Scholar]
  • 7. Kobak D, Berens P. The art of using t-SNE for single-cell transcriptomics. Nature communications. 2019;10(1):1–14. doi: 10.1038/s41467-019-13056-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Maaten Lvd, Hinton G. Visualizing data using t-SNE. Journal of machine learning research. 2008;9(Nov):2579–2605. [Google Scholar]
  • 9. Hinton GE, Roweis ST. Stochastic neighbor embedding. In: Advances in neural information processing systems; 2003. p. 857–864. [Google Scholar]
  • 10.McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426. 2018;.
  • 11. Wattenberg M, Viégas F, Johnson I. How to use t-SNE effectively. Distill. 2016;1(10):e2. doi: 10.23915/distill.00002 [DOI] [Google Scholar]
  • 12.Campello RJ, Moulavi D, Sander J. Density-based clustering based on hierarchical density estimates. In: Pacific-Asia conference on knowledge discovery and data mining. Springer; 2013. p. 160–172.
  • 13. Siegle JH, Jia X, Durand S, Gale S, Bennett C, Graddis N, et al. A survey of spiking activity reveals a functional hierarchy of mouse corticothalamic visual areas. Biorxiv. 2019; p. 805010. [Google Scholar]
  • 14. Grossberger L, Battaglia FP, Vinck M. Unsupervised clustering of temporal patterns in high-dimensional neuronal ensembles using a novel dissimilarity measure. PLoS computational biology. 2018;14(7):e1006283. doi: 10.1371/journal.pcbi.1006283 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556. 2014;.
  • 16. Bhattacharyya P, Chakrabarti BK. The mean distance to the nth neighbour in a uniform distribution of random points: an application of probability theory. European Journal of Physics. 2008;29(3):639. doi: 10.1088/0143-0807/29/3/023 [DOI] [Google Scholar]
  • 17. Sotomayor-Gomez B, Battaglia FP, Vinck M. A geometry of spike sequences: Fast, unsupervised discovery of high-dimensional neural spiking patterns based on optimal transport theory. bioRxiv. 2020;. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Berkes P, Orbán G, Lengyel M, Fiser J. Spontaneous cortical activity reveals hallmarks of an optimal internal model of the environment. Science. 2011;331(6013):83–87. doi: 10.1126/science.1195870 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Chaudhuri R, Gerçek B, Pandey B, Peyrache A, Fiete I. The intrinsic attractor manifold and population dynamics of a canonical cognitive circuit across waking and sleep. Nature neuroscience. 2019;22(9):1512–1520. doi: 10.1038/s41593-019-0460-x [DOI] [PubMed] [Google Scholar]
  • 20. Nikolić D, Häusler S, Singer W, Maass W. Distributed fading memory for stimulus properties in the primary visual cortex. PLoS biology. 2009;7(12):e1000260. doi: 10.1371/journal.pbio.1000260 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Luczak A, Barthó P, Harris KD. Spontaneous events outline the realm of possible sensory responses in neocortical populations. Neuron. 2009;62(3):413–425. doi: 10.1016/j.neuron.2009.03.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Stringer C, Pachitariu M, Steinmetz N, Reddy CB, Carandini M, Harris KD. Spontaneous behaviors drive multidimensional, brainwide activity. Science. 2019;364(6437):eaav7893. doi: 10.1126/science.aav7893 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010764.r001

Decision Letter 0

Thomas Serre, Emma Claire Robinson

12 Dec 2021

Dear Prof. Dr. Vinck,

Thank you very much for submitting your manuscript "Noise-robust low-dimensional embedding using distance-of-distance transformation" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

In particular, reviewer 3 raises major concerns that certain features of tsne, that are traditionally used to address the same problem, may have accidentally been turned off or not implemented. Therefore it is imperative that the authors address this concern both in the manuscript and also ideally through the open release of their code. Both reviewer 2 and reviewer 3 raised major concerns over the clarity of the paper and the transparent benchmarking of the method against t-sne in more real world situations with more testing of the impact of changing different parameters.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Emma Claire Robinson

Associate Editor

PLOS Computational Biology

Thomas Serre

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The paper by Liu & Vinck suggests a method to transform a NxN pairwise distance matrix such that "noise" points get smaller distances between each other. As a result, when the transformed distance matrix is used for low-dimensional visualization, such as t-SNE, then all noise points get assembled into one "island". This can be convenient as it allows to visually separate noise points. The authors apply their method to spike train recordings and show that it helps distinguish evoked activity and spontaneous activity.

I found the method interesting and the experimental demonstrations convincing, and the paper can be a good fit to PLoS Comp Bio. At the same time, I believe major revision is needed to improve the presentation clarity.

MAJOR ISSUES

* Intuition paragraph (lines 44-50) is very unclear. Sentence in lines 44-46 -- why is that true? Should this be intuitive? It's written as if this sentence should be self-evident, but I think it isn't. Line 47: "to other noise points" -- should this be "to neighbouring noise points"? Is this statement about _all_ noise points or only _neighbouring_ noise points? Line 50: this verbal formulation can be unclear without a formula, I suggest to move the formula and exact definition of your new distance from the Methods here.

* How do the results depend on K? This is never shown. Consider example in Figure 1 (which is very impressive by the way). What happens if you use K smaller than 10? K larger than 10? K equal to the sample size? This needs to be shown. I am actually unsure whether K=n (where n is sample size) will work or will fail. This is very important, and should be explained in the intuition paragraph (see above).

* Are there any downsides of using the transformation? This is never discussed, but should be. In particular, what happens with the noise-free dataset if it is transformed? The authors suggest that nothing much would change, but they need to show direct evidence/quantification. What if the noise-free dataset has some continuous structures? What if one of the clusters has low density (but is still well isolated from the other clusters)? Can you take some real-world noise-free data and show t-SNE embeddings before/after the transformation?

* Continuing previous point: this can be done in Figure 5. If you apply your transformation to the data in panel A, do t-SNE/UMAP become worse or not? You could show it. You could also quantify it by computing kNN classification accuracy, not only in the t-SNE/UMAP embedding, but also directly based on distances (before t-SNE).

* Figure 1 and subsequent simulation Figures (e.g. Figure 2) always have many more noise points than non-noise points. Will the algorithm work as well if noise points are only a small fraction of the total sample size? Please show that.

* line 92: can variance be approximated as well? Using the variance could strengthen the analysis.

* line 188: "clearer separation" -- is this some t-SNE effect or a real effect in the distances matrix? To answer that, you could use kNN classification accuracy, or do before/after t-SNE embeddings of only evoked response points.

* Most of the Methods (sections 4.1-4.2) is textbook description of t-SNE. Frankly, it can be removed, or condensed to a small pargaraph.

* Section 4.3 should be moved to Results, it basically consists of 1 formula. The notation is very confusing! D is used for dimensionality and also to denote distance matrix. k was called K before. Why two notations for the set of nearest neighbors? You don't need to define nearest neighbors, so formulas 18 are unnecessary. \\matcal N is never defined. In equation 19, do you divide by 2k or by the size of the I \\cup J? The sum should be over union of I and J which is not denoted by {I,J}.

* Can the distance matrix after transformation be used for clustering?

* Can your method somehow infer which of the points actually corresponded to noise? Take Figure 1: imagine I don't know that grey points are noise. Can the algorithm identify that?

MINOR ISSUES

* Ref [4] is a strange choice in line 8. This paper does use t-SNE but it's a very minor application there, while scRNA-seq literature has a lot of papers that use t-SNE/UMAP more prominently, to visualize much more complex datasets.

* Introduction is too short, sloppy in places, and does not give literature overview:

a) What is "density-based" in line 20? How is t-SNE density-based?

b) line 22: "maximally preserving pairwise distances" -- sloppy. t-SNE does not aim to preserve distances at all.

c) "crowding problem" -- unfortunate terminology, as the original 2008 t-SNE paper uses the same term "crowding problem" to refer to something else!

d) there are too few references. Are there any related papers at all? That deal with noise in the data? If not in the context of dimensionality reduction, then maybe clustering? If you cannot find any relevant research whatsoever, at least say so.

* line 58: "data poits" -> "noise data points"?

* Figure 2: add column titles indicating dimensionality.

* line 82: K was used for number of neighbors, now it's number of clusters -- confusing

* line 111: "while preserving distance between the clusters and the noise points" -- that's sloppy formulation, these distances are not exactly preserved.

* Figure 3: what are error bars?

* Figure 3: unclear what is the cluster variance used for GMM simulation -- it should be some realistic data, not infinitely dense.

* Figure 6, panel B: y-axis ticks are cropped

* Figure 6, panel C: could you label "signal" and "noise" parts of the distance matrix?

* Figure 6, panel D: what are the black points?

* line 204: please mention some memory and runtime requirements. How long does it take for what sample size.

* line 206: "further work is needed..." -- is this at all feasible? Sounds rather unfeasible to me.

* line 210: "distortions" -- this needs to be shown, see my comment above about choice of K

* Section 4.4: what t-SNE implementation was used? Give version. And UMAP?

* line 271: "different initialization" -- what initialization?

* line 273: "small variance" -- this is insufficient level of detail. Please describe all your experiments EXACTLY. Make it clear what refers to what experiment (which figure) exactly.

Reviewer #2: I have attached my review as a word document.

Reviewer #3: In this paper, Liu and Vinck describe an interesting strategy for obtaining more informative 2D embeddings of high-dimensional data. The strategy is simple: convert the distance matrix typically used by embedding methods into a distance of distances matrix. This appears to be very useful on two real-world datasets, and the simulated data also drives the point home. I have two major concerns for this manuscript.

Major points:

1) There is an automatic rescaling of distances in t-SNE/UMAP based on the number of neighbors each point has, via the perplexity parameter. This should take care of the relative density problem that this paper deals with. I would like to make sure that this automatic rescaling was enabled, and that the authors always determined a separate sigma_j for each point. However, the code was not shared, so I cannot check this directly. Furthermore, there are many Euclidian distances presented to us throughout the manuscript, which makes me think the distances were not re-estimated. Furthermore, the theoretical analyses indeed do not consider the rescaling of distances implicit in t-SNE/UMAP.

If sigma_i was indeed estimated directly for each point as it should be, then I would like to know why the authors think that was not enough. If it was estimated, then I think a lot of the visualizations should change to show the transformed similarity matrices instead of the Euclidian distances, since the transformed similarity matrices are the actual data that t-SNE uses. The theoretical analysis itself probably has to be redone to introduce the sigma_j's in section 2.1. Confusingly, sigma here takes a different meaning, so that variable name has to change too.

I am not sure how you are running t-SNE, but if you're passing in a similarity matrix directly to an open implementation of t-SNE, then you would have needed to do the point-by-point rescaling by yourself. There is way too little information in the methods about what you did, so please add all those details there as well. There are some other important parameters in t-SNE and UMAP, you need to specify those in your Methods section.

2) The problem with tSNE and UMAP is that there are many tricks for making them work better, and even when you do all the tricks right, there are many aspects of the embedding that seem arbitrary. Some computational researchers reject this outright, claiming that tSNE and UMAP are bad representations of the data and are just there to make "pretty pictures". I wouldn't go as far, I think there's some inherent use for these visualizations, and there are some consistent ways to get good embeddings (see The Art of tSNE). If the authors can compare their tricks to these more standard and consistent tSNE methods, then I think the paper can be a useful contribution to our bag-of-tricks for tSNE and UMAP.

In particular, the authors have to show that the crowding problem persists when the tSNE is run in this more standard way: reduce dimensions by PCA, initialize tSNE with PCA, use a perplexity of approx N/100 where N is the number of points. They should also show that the problem persists with different tSNE parameters, such as higher or lower perplexity. Also, UMAP has its own set of parameters that needs to be explored, and all the parameter settings have to be specified in the methods.

Minor points:

1) heatmaps in all images should be scaled in a more reasonable way (except 4c which is fine) . Right now they just all look red. Perhaps scale between 1% and 99% saturation?

2) the analysis in 2.5 is interesting, but you should mention that the orthogonality of stimulus and spontaneous activity was shown in Stringer et al, Science 2019 for rodents. The activity patterns literally live in different linear subspaces.

3) Also with respect to 2.5, a comment should be made about the lack of separation of patterns to opposite directions of motion. We know there is direction selectivity in V1, but embedding algorithms have to throw some information away, and the direction selectivity seems to indeed disappear.

4) You should add the information about dimensions and number of points to the figure so it's clear without carefully reading the legend. Or at least separate the first two columns from the last two columns so the reader doesn't think it's a progression of 4 settings for one parameter like I originally thought.

5) Line 228: this is just not true. Choosing the perplexity is one of the most important decision when running t-SNE, as well as a few other things like initialization with PCA and how many PCs to keep from the data (see The Art of TSNE).

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: None

Reviewer #2: No: They have stated the code and data are available on github, but the link has not been provided.

Reviewer #3: No: They haven't shared the code yet.

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Dmitry Kobak

Reviewer #2: Yes: Alex Diaz-Papkovich

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Attachment

Submitted filename: Review.docx

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010764.r003

Decision Letter 1

Thomas Serre, Emma Claire Robinson

28 Sep 2022

Dear Prof. Dr. Vinck,

Thank you very much for submitting your manuscript "Improved visualization of high-dimensional data using the distance-of-distance transformation" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Emma Claire Robinson

Academic Editor

PLOS Computational Biology

Thomas Serre

Section Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The revision is a MASSIVE improvement, with all raised points adequately addressed. I wish I saw such a thorough and comprehensive revision more often! I only have minor comments left.

MINOR COMMENTS

Fig S1A -- to illustrate the point better, you could remove K=10 and K=15 from the top row (K=5 and K=20 are nearly identical anyway) and rather add K=50 and K=100, which would show how the performance degrades for larger K.

Fig S1A -- replace "original" -> "original embedding"? The word "original" suggests to me that this panel shows the original 2D data which is of course not the case as the generated data are higher-dimensional

Fig S6 -- "K" instead of "number of neighbors"? That's how it was in Fig S1

Fig S6 -- corr coefficients can be rounded e.g. to two or three digits, currently the precision looks odd

Fig S7 -- here I am confused: are the simulated data 2D or higher-dimensional? If the data are higher-dimensional, then I don't understand why the "original" t-SNE shows smaller density for the lower-left cluster. t-SNE adjusts the kernel width to reach a given perplexity and typically does not preserve the cluster density.

"introduces very limited distortions and preserve" --> "... preserves"

Fig S12 -- red dots in the matrix look really weird and confusing. If it's simpler all values above the top colorbar limit, then I would suggest to color them dark blue (same shade as in the top end of the colorbar).

"In situations where there are less noise than the number of clusters" -- cumbersome formulation. "... where the number of noise points is smaller than the number of non-noise points"?

Fig S2 -- what was the K here? Same as for DoD computation?

"(e.g. distances among correlation matrices, or optimal transport distances over spiking patterns [13].)" -- remove the period

line 69: unclear why you use L1 distance here and not L2. Same about line 72. Would it work with L2 distance in either the first, or the second, or both places? Could be nice to show that empirically e.g. for the situation in Figure 1

Fig 4C -- it is confusing to me that N grows from top to bottom and not from bottom to top...

line 222 -- please state here what the dimensionality is (4096?)

Reviewer #2: The authors have addressed the points from my previous review adequately. The article is well-presented and interesting, however it will need to be proofread for spelling and grammar errors.

-Inconsistencies in writing "neighbour" versus "neighbor", "DoD" versus "Distance-of-Distance", etc

-Author summary: typo with "hihg-dimensional". I believe it should also be "low-dimensional" instead of "low dimensional" to be consistent

-The paragraph at 173 is unclear and needs to be rewritten. It has several grammar and spelling errors: "less noise points than..." instead of "fewer noise points than...", "our method are" instead of "our method is", "one cluster has low density than" instead of "one cluster has lower density than"

-381: should be "requires computation of the..."

-384: should be "an NxN..."

-393: I believe the authors mean to convey that their method provided improved or more accurate conclusions rather than simply "qualitatively different" conclusions (at least that's how I see it!)

Reviewer #3: The authors have addressed all my comments in a satisfactory manner. Thank you.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No: Please make the entire code available on Github

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Dmitry Kobak

Reviewer #2: Yes: Alex Diaz-Papkovich

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010764.r005

Decision Letter 2

Thomas Serre, Emma Claire Robinson

10 Nov 2022

Dear Prof. Dr. Vinck,

Thank you very much for submitting your manuscript "Improved visualization of high-dimensional data using the distance-of-distance transformation" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Emma Claire Robinson

Academic Editor

PLOS Computational Biology

Thomas Serre

Section Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Thanks again to the authors for addressing my comments, and congratulations on very nice work. I recommend acceptance. I only noticed one small issue now:

* in Formula 1, I don't understand why the fraction before the sum is 1/2K, as the size of the I \\cup J union may be smaller than 2K. Should 2K in that formula be replaced by |I \\cup J|?

Reviewer #2: My comments have been adequately addressed.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Dmitry Kobak

Reviewer #2: Yes: Alex Diaz-Papkovich

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010764.r007

Decision Letter 3

Thomas Serre, Emma Claire Robinson

28 Nov 2022

Dear Prof. Dr. Vinck,

We are pleased to inform you that your manuscript 'Improved visualization of high-dimensional data using the distance-of-distance transformation' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Emma Claire Robinson

Academic Editor

PLOS Computational Biology

Thomas Serre

Section Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Thanks! I recommend acceptance.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Dmitry Kobak

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010764.r008

Acceptance letter

Thomas Serre, Emma Claire Robinson

14 Dec 2022

PCOMPBIOL-D-21-01871R3

Improved visualization of high-dimensional data using the distance-of-distance transformation

Dear Dr Liu,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Anita Estes

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. Influence of neighborhood size.

    (PDF)

    S2 Text. Unsupervised noise detection.

    (PDF)

    S3 Text. PCA preprocessing.

    (PDF)

    S4 Text. Fewer noise points than cluster points.

    (PDF)

    S5 Text. Influence of perplexity.

    (PDF)

    S6 Text. Distortion in noise-free situations.

    (PDF)

    S7 Text. Allen Institute Brain Observatory electrophysiological recordings.

    (PDF)

    S8 Text. Improvement of classification.

    (PDF)

    S9 Text. Distortion of real neural data by DoD transformation.

    (PDF)

    S10 Text. Distortion of real convolutional neural network data by DoD transformation.

    (PDF)

    S11 Text. Application of DoD transformation to CNN representation of images from different data sets.

    (PDF)

    S12 Text. Runtime of DoD transformation.

    (PDF)

    S1 Fig. Effect of neighborhood size K.

    A: 5 clusters, each with 20 points; 200 scattering noise points; dimensionality of 50; Original embedding (left) and DoD transformation with a neighborhood size of 5, 20, 50 and 100. B: 50 clusters, each with 20 points; 1000 scattering noise points; dimensionality of 50; Original embedding (left) and DoD transformation with a neighborhood size of 10, 50, 1000 and 2000.

    (PNG)

    S2 Fig. Inference of noise points based on neighborhood overlap.

    A: Distribution of overlap rate of neighborhood identity before and after the DoD transformation. B: Points with smaller overlap rate (< 65%) were identified as scattering noise points (black).

    (PNG)

    S3 Fig. PCA is limited in terms of solving scattering noise problem.

    50 clusters, each with 20 points; 1000 scattering noise points; dimensionality of 50; PCA preprocessing uses the first 10 principal components; DoD transformation uses neighborhood size of 10.

    (PNG)

    S4 Fig. DoD transformation keeps its performance when there are fewer noise points.

    50 clusters, each with 20 points; 500 noise points; dimensionality of 50; DoD transformation with a neighborhood size of 5. Left, original t-SNE visualization. Right, t-SNE visualization with DoD transformation.

    (PNG)

    S5 Fig. Larger perplexity of t-SNE algorithm does not solve scattering noise problem.

    50 clusters, each with 20 points; 1000 noise points; dimensionality of 50; DoD transformation with a neighborhood size of 5; perplexity values are 5, 50, 100, 500, 1000 from left to right. A: t-SNE on original distance matrix. B: t-SNE on distance matrix after DoD transformation.

    (PNG)

    S6 Fig. DoD transformation does not distort the clustering.

    A: 5 clusters, each with 20 points; dimensionality of 50; DoD transformation with neighborhood size ranging from 5 to 30. B: 5 clusters, each with 20 points; dimensionality of 50; DoD transformation with neighborhood size ranging from 5 to 30. All clusters were generated from multivariate Gaussian distribution and one of them is with a larger standard deviation (0.5) than the rest (0.2).

    (PNG)

    S7 Fig. Neural spiking data from Allen Institute.

    A: Illustration of recording session of drifting grating visual stimulus. B: Direction tuning curves of recorded units in session 754829445.

    (PNG)

    S8 Fig. DoD transformation improves KNN classification of neural spiking sequences.

    For both the distances in the high-dimensional space and the distances in the low-dimensional embedding, the cross-validated KNN classification scores are higher after DoD transformation.

    (PNG)

    S9 Fig. DoD transformation keeps the ring structure of neural representations of grating stimulus.

    Each data point represents the populational firing pattern in a given trial (n = 3200). Trials with different stimulus orientations are labelled in different colors.

    (PNG)

    S10 Fig. Neural manifolds were better separated by DoD transformation.

    Each data point represents the populational spiking pattern in a given trial (n = 600). Trials with different stimulus orientations are labelled in different colors.

    (PNG)

    S11 Fig. Object manifolds were not distorted by the DoD transformation.

    2D t-SNE embeddings for 8 random ImageNet classes maintain clustering patterns after the DoD transformation.

    (PNG)

    S12 Fig. DoD transformation on sketch images represented by convolutional neural network units.

    A: Low-dimensional manifold for 634 sketch patches from 5 different classes. Distances were computed based on the high-dimensional vectors in the fully connected layer of the pretrained Alexnet network. B: Low-dimensional manifold for data including both sketch patches and 50 random ImageNet patches. C: DoD transformation separates ImageNet patches from sketch patches. Left, Euclidean distance matrix. Right, 2D t-SNE embeddings. Sketch images of different classes are labelled in different colors. ImageNet patches are labelled in grey.

    (PNG)

    S13 Fig. Runtime of the DoD transformation.

    (PNG)

    Attachment

    Submitted filename: Review.docx

    Attachment

    Submitted filename: Response to reviewers.pdf

    Attachment

    Submitted filename: response_to_reviewers.pdf

    Attachment

    Submitted filename: letter_to_reviewer.pdf

    Data Availability Statement

    The paper uses publicly electrophysiological data from Allen Institute Brain Observatory 1.1 (https://allensdk.readthedocs.io/en/latest/visual_coding_neuropixels.html). The images are taken from public available ImagNet dataset (https://www.image-net.org/) and Sketch dataset (https://sketchy.eye.gatech.edu/). All code is shared at a github repository at https://github.com/Jinke-Liu/Distance-of-Distance-tSNE.


    Articles from PLOS Computational Biology are provided here courtesy of PLOS

    RESOURCES