Skip to main content
. 2024 Feb 21;7:217. doi: 10.1038/s42003-024-05869-4

Fig. 4. Large number of observations required to obtain good weight estimates.

Fig. 4

a, b Realistic example where the true between-set correlation was set to rtrue = 0.3. Estimated weights are close to the assumed true (population) weights, as long as the sample size is large enough. b For PLS even more observations were necessary. c, d Weight stability, i. e. the average cosine-similarity between weights across all pairs formed from 100 repetitions, increases towards 1 (identical weights) with more observations. For PLS, weight stability can be high, even with few observations. The true between-set correlation was set to rtrue = 0.3. Each of the 100 dashed lines represents a different covariance matrix with different assumed weight vectors. The solid line shows the average across the dashed lines. e, f PC1 similarity was stronger for PLS (f) than for CCA (e) also for datasets with varying number of features and true between-set correlations rtrue. Shown is relative PC1 similarity across synthetic datasets with varying number of features, relative to the expected PC1 similarity of a randomly chosen vector with dimension matched to each synthetic dataset. Shaded areas denote 95% confidence intervals across 6 feature space dimensionalities, 10 covariance matrices and 100 draws of collections of observations with the indicated sample size (x-axis) from the multivariate normal distribution associated with these covariance matrices.