Table 1.
Test case | Main findings |
---|---|
The near-perfect case of dimensionality reduction |
The observed distances in a PC plot do not reflect the distances between the samples Even sample size changes do not affect the topography of the outcomes for same-size populations |
Different sample sizes |
Changing sample sizes creates alternative results A priori knowledge is vital to interpreting PCA results. Without it, interpreting PCA plots leads to nonsensical conclusions |
One admixed population |
The proportion of explained variance by the PCs is not biologically meaningful and is not a measure of PCA accuracy Clines like the “Ancestral North Indians” (ANI) and “Ancestral South Indians” (ASI) are artifacts of the PCA scheme PCA results do not reflect genetic or biological distances Admixture levels and direction cannot be inferred from PCA PCA schemes can be manipulated to support ethnocentric claims, as with the case of Ashkenazic Jews (AJs) Experimenters can use PCA to produce near-endless conflicting and absurd historical scenarios, all mathematically correct but biologically incorrect |
Two or three-way admixed population (Supplementary Text 2.1) | PCA outcomes may appear, in part, meaningful based on a priori knowledge but are biologically meaningless and contradictory otherwise |
Multiple admixed population |
Alternating reference populations creates alternative results Including multiple admixed populations does not improve PCA accuracy PCA schemes can be manipulated to support origin or genetic distinctiveness claims |
The case of multiple admixed populations without “unmixed” populations |
Including multiple admixed populations without “unmixed” ones does not improve PCA accuracy Although a deterministic process, PCA behaves unexpectedly as minor variations can lead to an ensemble of different outputs that appear stochastic. Consequently, PCA results are irreproducible |
Pairwise comparisons (Supplementary Text 2.2) |
PCA can lead to erroneous conclusions concerning clustering, identity, and distance cross-dimensionally PCA clustering and distances are unpredictable and unreliable for studying the relationships between populations |
Case–control matching and GWAS |
Analyzing reference populations with mismatched ancestries respective to the unknown samples biases the ancestry inference of the latter PCA exhibits a high error rate when used to create genetically homogeneous clusters Analyzing higher PCs decreases the size of the homogeneous clusters and increases the size of the non-homogeneous ones Studying genetic association in a case–control setting, PCA adjusted results had more false positives, fewer true positives, and weaker p-values than unadjusted results “Exploring” PC plots yields no insight. The sole purpose of “exploration” is to allow experimenters to select their favorable solution based on their a priori knowledge |
Projections | PCA projections are unreliable and misleading, with correct outcomes indistinguishable from incorrect ones |
Ancient DNA |
Projecting ancient populations onto modern ones allows the experimenter to choose favorable results Authors typically omit the amount of variance explained by the primary PCs because it is minuscule |
Marker choice |
Analyzing different markers creates alternative results Ancestry informative markers (AIMs) are more robust to noise and errors when studying the population structure |
Inferring a personal ancestry |
Using PCA to infer individual ancestry is unreliable and misleading Using PCA, experimenters can easily generate desired patterns to support personal ancestral claims |