Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2019 Nov 20;9:17133. doi: 10.1038/s41598-019-53549-9

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2019

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

PMC Copyright notice

Intrinsic Dimension (ID) estimation is possible in the extreme undersampled regime for arbitrarly large ID with the Full Correlation Integral (FCI) estimator, in the case of linearly embedded and slightly curved manifolds, possibily non uniformly sampled and with noise. (Top left) We show the density of neighbours ρ of preprocessed (centered and normalized) data (number of samples N = 500) extracted from {0, 1}^d linearly embedded in D = 60 dimensions ( $D_{d, 60}$ ), for d = (5, 15, 30). We are able to efficiently extract the correct ID even though this is a highly non-uniformly sampled dataset, whose ρ displays manifold-dependent features (in this case, step-like patterns). Moreover, we observe that, as we increase d, the density of neighbours of this dataset quickly converges to our functional form. It is worth noticing that the whole functional form (Eq. 2) is needed for the fit; in fact, a local fit of the slope of ρ at half-height would result in an incorrect ID estimation. (Bottom left) We show the density of neighbours ρ of preprocessed data (N = 500) extracted uniformly from {0, 1}^d, [0, 1]^d and from $ℝ^{d}$ with multivariate gaussian distribution, for d = 15 and linearly embedded in D = 60 dimensions. All plot lines are compatible with the same functional form (Eq. 2), pointing to an intriguing manifestation of “universality” for high-dimensional data. (Center) To highlight the predictive power of the FCI method for a broad spectrum of dimensionalities (ranging from d = 4 to d = 200), we exhibit the estimated ID versus the number of sample points N for the linearly embedded hypercube $ℋ_{d,500}$ . Error bars are computed by averaging over 10 samples for each pair (N,d). (Top right) We asses quantitatively the predictive power of the FCI method by computing the average relative error |(d_est − d)/d| (over 20 random instances) of the estimated ID in the range 5 ≤ d ≤ 1000, 5 ≤ N ≤ 1000. We observe that at $N ~ 100$ we have an error of the order of 1% almost independently on the ID, and that ID estimation is possible also in the extreme undersampled N < d regime. (Bottom right) The FCI method estimates the correct ID even when the data are corrupted by noise. Here we consider a linearly embedded hypercube dataset $ℋ_{40,60}$ and add on the top of that a 60-dimensional gaussian noise of standard deviation σ. We observe a sharp transition in the estimated ID between the regime in which the noise is a perturbation ( $σ ≲ 0.1$ and d_est = 40) and the regime in which the noise covers the signal ( $σ ≳ 0.2$ and d_est = 60).