Skip to main content
. 2022 Apr 18;377(1852):20200410. doi: 10.1098/rstb.2020.0410

Figure 1.

Figure 1.

Ancestry in admixed populations varies at multiple genetic scales, with variance among individuals and within individual genomes. We show examples of global and local ancestry inferred from phased 1000 Genomes Project data for populations of the Americas and Caribbean. Global ancestry was estimated using unsupervised ADMIXTURE analysis, including additional populations of European (Iberians (IBS) and Tuscans in Italy (TSI)) and West African (Esan (ESN), Mandinka (GWD), Mende (MSL) and Yoruba (YRI)) ancestry for reference. We show (a) population-level and (b) individual-level estimates of global ancestry across Mexican ancestry (MXL), Peruvian (PEL), Colombian (CLM), Puerto Rican (PUR), African ancestry (ASW) and Barbadian (ACB) populations; barplots illustrating these estimates for K = 3 were made using pong [6]. (c) Local ancestry as inferred by RFMix [7] for two example individuals (HG01149 and NA19776) who have similar global ancestry proportions, and belong to the CLM and MXL populations, respectively. For these analyses, we retained only SNPs marked ‘PASS’ and removed all individuals who were noted to have an up to third degree relative in the 1000 Genomes Project phase 3 pedigree file, leaving 998 individuals for analysis. We then filtered SNPs for missingness (greater than 5%) and low minor allele frequency (less than 1%) across all populations, and Hardy–Weinberg disequilibrium (p-value < 0.000001) within populations. For our ADMIXTURE analyses, we also removed SNPs in linkage disequilibrium (using the PLINK command – indep-pairwise 50 10 0.1), which left 698 408 SNPs for analysis. We ran the ADMIXTURE algorithm for K = 3 unsupervised using the default settings and a random seed. Pong identified a single mode across 30 replicates. To estimate local ancestry, we used the missingness, minor allele frequency and Hardy–Weinberg filtered phased genotype dataset. We designated individuals with high levels (over 99%) of global West African (AFR), Amerindigenous (AMR) and European (EUR) ancestry, as determined by our ADMIXTURE analysis, as reference groups for those respective ancestries. We ran RFMix v. 2.03 for the target Colombian and Mexican ancestry individuals using the HapMap GRCh37 genetic map lifted over to GRCh38, a maximum of two expectation-maximization iterations, and otherwise default parameters. (Online version in colour.)