Skip to main content
. 2019 Nov 20;9:17133. doi: 10.1038/s41598-019-53549-9

Figure 2.

Figure 2

Intrinsic Dimension (ID) estimation is possible in the extreme undersampled regime for arbitrarly large ID with the Full Correlation Integral (FCI) estimator, in the case of linearly embedded and slightly curved manifolds, possibily non uniformly sampled and with noise. (Top left) We show the density of neighbours ρ of preprocessed (centered and normalized) data (number of samples N = 500) extracted from {0, 1}d linearly embedded in D = 60 dimensions (Dd,60), for d = (5, 15, 30). We are able to efficiently extract the correct ID even though this is a highly non-uniformly sampled dataset, whose ρ displays manifold-dependent features (in this case, step-like patterns). Moreover, we observe that, as we increase d, the density of neighbours of this dataset quickly converges to our functional form. It is worth noticing that the whole functional form (Eq. 2) is needed for the fit; in fact, a local fit of the slope of ρ at half-height would result in an incorrect ID estimation. (Bottom left) We show the density of neighbours ρ of preprocessed data (N = 500) extracted uniformly from {0, 1}d, [0, 1]d and from d with multivariate gaussian distribution, for d = 15 and linearly embedded in D = 60 dimensions. All plot lines are compatible with the same functional form (Eq. 2), pointing to an intriguing manifestation of “universality” for high-dimensional data. (Center) To highlight the predictive power of the FCI method for a broad spectrum of dimensionalities (ranging from d = 4 to d = 200), we exhibit the estimated ID versus the number of sample points N for the linearly embedded hypercube d,500. Error bars are computed by averaging over 10 samples for each pair (N,d). (Top right) We asses quantitatively the predictive power of the FCI method by computing the average relative error |(dest − d)/d| (over 20 random instances) of the estimated ID in the range 5 ≤ d ≤ 1000, 5 ≤ N ≤ 1000. We observe that at N~100 we have an error of the order of 1% almost independently on the ID, and that ID estimation is possible also in the extreme undersampled N < d regime. (Bottom right) The FCI method estimates the correct ID even when the data are corrupted by noise. Here we consider a linearly embedded hypercube dataset 40,60 and add on the top of that a 60-dimensional gaussian noise of standard deviation σ. We observe a sharp transition in the estimated ID between the regime in which the noise is a perturbation (σ0.1 and dest = 40) and the regime in which the noise covers the signal (σ0.2 and dest = 60).