Skip to main content
. 2022 Aug 29;12:14683. doi: 10.1038/s41598-022-14395-4

Table 1.

A summary of the main findings of the twelve test cases studied here.

Test case Main findings
The near-perfect case of dimensionality reduction

The observed distances in a PC plot do not reflect the distances between the samples

Even sample size changes do not affect the topography of the outcomes for same-size populations

Different sample sizes

Changing sample sizes creates alternative results

A priori knowledge is vital to interpreting PCA results. Without it, interpreting PCA plots leads to nonsensical conclusions

One admixed population

The proportion of explained variance by the PCs is not biologically meaningful and is not a measure of PCA accuracy

Clines like the “Ancestral North Indians” (ANI) and “Ancestral South Indians” (ASI) are artifacts of the PCA scheme

PCA results do not reflect genetic or biological distances

Admixture levels and direction cannot be inferred from PCA

PCA schemes can be manipulated to support ethnocentric claims, as with the case of Ashkenazic Jews (AJs)

Experimenters can use PCA to produce near-endless conflicting and absurd historical scenarios, all mathematically correct but biologically incorrect

Two or three-way admixed population (Supplementary Text 2.1) PCA outcomes may appear, in part, meaningful based on a priori knowledge but are biologically meaningless and contradictory otherwise
Multiple admixed population

Alternating reference populations creates alternative results

Including multiple admixed populations does not improve PCA accuracy

PCA schemes can be manipulated to support origin or genetic distinctiveness claims

The case of multiple admixed populations without “unmixed” populations

Including multiple admixed populations without “unmixed” ones does not improve PCA accuracy

Although a deterministic process, PCA behaves unexpectedly as minor variations can lead to an ensemble of different outputs that appear stochastic. Consequently, PCA results are irreproducible

Pairwise comparisons (Supplementary Text 2.2)

PCA can lead to erroneous conclusions concerning clustering, identity, and distance cross-dimensionally

PCA clustering and distances are unpredictable and unreliable for studying the relationships between populations

Case–control matching and GWAS

Analyzing reference populations with mismatched ancestries respective to the unknown samples biases the ancestry inference of the latter

PCA exhibits a high error rate when used to create genetically homogeneous clusters

Analyzing higher PCs decreases the size of the homogeneous clusters and increases the size of the non-homogeneous ones

Studying genetic association in a case–control setting, PCA adjusted results had more false positives, fewer true positives, and weaker p-values than unadjusted results

“Exploring” PC plots yields no insight. The sole purpose of “exploration” is to allow experimenters to select their favorable solution based on their a priori knowledge

Projections PCA projections are unreliable and misleading, with correct outcomes indistinguishable from incorrect ones
Ancient DNA

Projecting ancient populations onto modern ones allows the experimenter to choose favorable results

Authors typically omit the amount of variance explained by the primary PCs because it is minuscule

Marker choice

Analyzing different markers creates alternative results

Ancestry informative markers (AIMs) are more robust to noise and errors when studying the population structure

Inferring a personal ancestry

Using PCA to infer individual ancestry is unreliable and misleading

Using PCA, experimenters can easily generate desired patterns to support personal ancestral claims