Summary
The fate of hunting and gathering populations following the rise of agriculture and pastoralism remains a topic of debate in the study of human prehistory. Studies of ancient and modern genomes have found that autochthonous groups were largely replaced by expanding farmer populations with varying levels of gene flow, a characterization that is influenced by the almost universal focus on the European Neolithic1–5. We sought to understand the demographic impact of an ongoing shift to farming in Southwest Ethiopia, one of the last regions in Africa to experience such shifts6. Importantly, Southwest Ethiopia is home to several of the world’s remaining hunter-gatherer groups, including the Chabu people, who are currently transitioning away from their traditional mode of subsistence7. We generated genome-wide data from the Chabu and four neighboring populations, the Majang, Shekkacho, Bench, and Sheko, to characterize their genetic ancestry and estimate their effective population sizes throughout the last 60 generations. We show that the Chabu are a distinct population closely related to ancient peoples who occupied Southwest Ethiopia >4,500 years ago. Furthermore, the Chabu are undergoing a severe population bottleneck which began approximately 1,400 years ago. In studying eleven Eastern African populations, we find evidence for divergent demographic trajectories among hunter-gatherer-descendant groups. Our results illustrate that although foragers respond to encroaching agriculture and pastoralism with multiple strategies, including cultural adoption of agropastoralism, gene flow and economic specialization, they often face population decline.
Keywords: Neolithic transition, hunter-gatherers, agriculture, Southwest Ethiopia, Eastern Africa
Graphical Abstract
eTOC
Using new genotype data, Gopalan et al. show that the Chabu people of Southwest Ethiopia are closely related to ancient people who lived in the region prior to the rise of farming. The Chabu population has declined sharply over the past 1,400 years. However, this trend is not universal among Ethiopian hunter-gatherer descendants.
Results
In order to test hypotheses pertaining to the impact of foraging-to-farming transitions, we estimated genetic ancestry, relative genetic isolation, and the timing and magnitude of demographic fluctuations in Eastern African populations. Our investigation focuses on the Chabu (their preferred ethnonym7,8, but also referred to in the literature as the ‘Sabue’, ‘Sabu’ and ‘Shabo’9–11), a group of transitioning hunter-gatherers who inhabit the Southwestern Ethiopian highland forests that straddle the borders between the Oromia Regional State, Gambella Regional State, and the Southern Nations, Nationalities, and Peoples’ Region (SNNPR)7. We generated genome-wide data from the Chabu (n=83) and neighboring Majang, Shekkacho, Bench, and Sheko groups (n=49, 45, 48, 50) at 1.7 million single nucleotide polymorphisms (SNPs). We combined this dataset with published genotypes from additional groups from across Eastern, Central and Western Africa, as well as the Near East (Table S1). Importantly, we also include genomic data from Bayira, a 4,500 year old individual found in Mota Cave in the nearby Gamo Highlands who lived well before any evidence of agriculture or pastoralism in the region12,13, as well as additional ancient genomes from Eastern Africa, the Levant, and Anatolia.
Inferring Chabu origins through patterns of genome-wide relatedness
To characterize the genetic relationships between the Chabu and other African and Near Eastern populations, we first estimated their global ancestry. We performed unsupervised clustering of autosomal SNPs (i.e. ADMIXTURE; STAR Methods), varying ‘K’, the hypothesized number of ancestral source populations, from 2 to 12 (Figure S1)14. We focus on the pattern that arises at K=7, where global ancestry patterns in Southwest Ethiopia are represented by genetic components that we identify here by the population or linguistic/geographic group that carries its highest frequencies (Figure 1). The Chabu and their near neighbors are primarily characterized by differences in their frequencies of five components: ‘Bayira-majority’, ‘Chabu-majority’, ‘Nilo-Saharan’ (NS), ‘East African Afro-Asiatic’ (EAAA), and ‘Near Eastern’. Importantly, Bayira-majority and Chabu-majority components are genetically similar (Fst = 0.05) and jointly represent the ancestry of Southwest Ethiopian hunter-gatherers.
Figure 1. Global Ancestry Proportions of Individuals Inferred from Unsupervised Clustering of Genotype Data.
A) Each color corresponds to one of the K=7 hypothesized genetic components, and each vertical bar represents one individual genome (Bayira bar is widened for visualization). Population labels include linguistic codes in brackets; Afro-Asiatic (AA), Niger-Congo (NC), Nilo-Saharan (NS), linguistic isolates (I). Within Afro-Asiatic speakers, we further differentiate between Chadic (Ch), Cushitic (Cu), Egyptian (E), Omotic (O), and Semitic (S) speakers. For panels B-F) The geographic distributions for 5 of these ancestry components are depicted, with the intensity of the color corresponding to the mean population proportion of the respective ancestry. Each component is labeled below the map. G) The effective migration surface, inferred from the rate of decay of genetic similarity across geographic space, is depicted. Cool colors correspond to effective migration corridors, while warm colors correspond to effective migration barriers. See also Figures S1 and S2 and Table S1.
At low levels of K=3:5, the ‘Bayira-majority’ and ‘Chabu-majority’ ancestries are the same, distributed widely across Eastern Africa. At K=7, Chabu fall out and are modelled as carrying over 90% of their own ‘Chabu-majority’ ancestry. This component is also at significant frequencies in Bayira (9%), the neighboring Majang (38%), and nearby Nilo-Saharan populations (5–10%) (Figure 1A). The Bayira-majority component is found at highest frequency in the extant Aari Blacksmiths and Cultivators, who were previously found to be Bayira’s closest relatives12, as well as the Bench and Sheko (Figure 1B). The Majang and Gumuz, a western Ethiopian population, also carry this component at substantial frequencies. More generally, other Ethiopian populations are distinguished by their relative proportions of EAAA and Near Eastern (NE) components (Figure 1E, F). The Chabu do not carry EAAA nor NE ancestry.
Taken together, these results suggest that the Chabu are primarily descended from ancient Southwest Ethiopian hunter-gatherer groups; a hypothesis of secondary adoption of hunting-and-gathering is not supported. On the basis of these findings, we also consider the Majang, Gumuz, Aari Blacksmiths and Cultivators, Bench, and Sheko to be probable ‘hunter-gatherer descendants’, even though they do not currently practice this subsistence strategy. The Majang and Gumuz are similar to each other and have genetic affinities with both Nilo-Saharan and hunter-gatherer groups (Figures 1, 2). While both groups are primarily small-scale farmers today, ethnographic studies show that they exhibit characteristic features of hunter-gatherer societies, such as high degrees of egalitarianism and reciprocity, and likely hunted and gathered regularly in the recent past15,16. The Aari Blacksmiths and Cultivators, Bench, and Sheko form another group of populations that show the strongest genetic affinities to Bayira (Figures 1A, 2B), suggesting that they are also direct descendants of the ancestral forager populations to which Bayira belonged12.
Figure 2. Population Structure and F3 Outgroup Estimates for Modern and Ancient Individuals.
A) A principal component analysis (PCA) of genotype data from modern populations from Eastern Africa and the Near East, with ancient DNA samples superimposed, for PC1 and PC2. Bayira falls close to the Chabu cluster, which anchors the second PC. B) As in panel A, but for PC1 and PC3. In the first and third PCs, the Chabu lie near other modern and ancient hunter-gatherers. F3 outgroup tests for X, listed in each row, for shared drift with C) ancient Bayira or D) the Chabu relative to the Yoruba. E) The Anuak, a Nilo-Saharan-speaking Ethiopian group have higher F3 outgroup statistics with the Majang and Gumuz than with the Dinka or Shilluk, despite having a high proportion of NS ancestry (Figure 1A). The F3 outgroup statistic indicates that, among extant populations, Bayira is most closely related to the Aari Blacksmiths, Aari Cultivators, Bench, and Sheko, followed by the Chabu; conversely, the Chabu carry the most shared drift with their Majang neighbors, followed by Bayira.
Investigating and dating episodes of gene flow
Based on our unsupervised clustering analysis, nearly all Ethiopian populations carry multiple distinct ancestries (Figure 1). Furthermore, some of these groups (e.g., the Wolayta) show substantial intra-population variance in their ancestry components and/or a relatively broad distribution in PC space, suggesting recent gene flow (Figures 1A, 2A). Using F3 admixture statistics and a linkage disequilbrium (LD) decay-based method, we found additional evidence of gene flow in many Southwest Ethiopian populations, and were able date some of these instances to within the last 125 generations (Table S2)17,18.
Genetic signatures of isolation in the Chabu
We inferred spatial population structure using genetic data to estimate the effective migration surface (EEMS) of humans across Eastern Africa19. This analysis reveals corridors of and barriers to gene flow that closely correspond to the geographic distribution of ancestral components (Figure 1G). Some of these barriers also correspond to major geographic features such as deserts, high elevation areas, and bodies of water. However, other features that might be expected to have been migration barriers, such as the Nubian Desert and northeastern Ethiopian Highlands, seem not to have impeded historical gene flow to the same extent (Figure S2). We also find that areas of low migration tend to lie along the boundaries between major African language families, while high migration corridors lie within them (Figure S2C). Together, these results emphasize the close association between geography and language in determining gene flow between groups20,21.
The Chabu lie directly in the center of a language contact area with negative effective migration rates, indicating their relative isolation from neighboring groups. We further quantified the effects of recent and historical genetic isolation by analyzing runs of homozygosity (RoH). Compared to their neighbors, the Chabu carry a much larger proportion of their genome in RoH (Figure 3A). As the Chabu practice clan exogamy and have no cultural tradition of close relative marriage, their elevated levels of homozygosity relative to their neighbors are likely driven by demographic pressure7. We compared the Chabu to the Batwa, Biaka, Mbuti, Hadza, and Sandawe. Only the Hadza of Tanzania, who were previously shown to carry the highest levels of RoH among Africans, exceeded the Chabu in cumulative RoH (Figures S3)22.
Figure 3. Distributions of the Total Amount of the Genome in Runs of Homozygosity (RoH) in Southwest Ethiopian Populations.
RoH are represented by colored violins for A) all RoH segments and B) separate RoH size classes. The white point represents the median value of the distribution, and the black rectangle represents values between the lower and upper quartiles. The thin black ‘whiskers’ extend to data points that lie within 1.5 times the interquantile range below or above the lower and upper quantiles, respectively. The Chabu and Aari Blacksmiths showed significantly elevated total RoH in only the longest class suggesting that these populations’ genomic signatures of isolation and demographic decline are a result of relatively recent events. See also Figure S3.
Closely related populations experienced divergent demographic trajectories
Given these indications of recent demographic pressure on some hunter-gatherer groups, we sought to more precisely estimate their historical effective population sizes (Ne). We used a non-parametric method that leverages the distribution of segments that are shared identical-by-descent (IBD) across pairs of individuals23. This allowed us to estimate Ne at each generation from 4 to 60 generations ago (ga). We evaluated the robustness of IBDNe to small sample sizes, gene flow, and SNP ascertainment by performing coalescent simulations with msprime (STAR Methods)24. Briefly, we performed 10 simulation replicates for each demographic history, testing the effects of sample size, SNP density, and gene flow on Ne estimation. We found that, in the absence of gene flow, a true decline in Ne could be robustly inferred with as few as 20 samples (Figure S4). However, for the same sample size, constant and growing populations were often incorrectly estimated; constant Ne was estimated to be substantially increasing in 40% of replicates, while increasing Ne was estimated to be holding steady or fluctuating in 10% and 20% of replicates, respectively. This discrepancy resolves when the sample size was increased to 50 (Figure S4).
Among Southwest Ethiopian populations, we found that the Ne of the Chabu, Majang, Bench, Sheko, and Aari Blacksmiths have all declined in the recent past, while the Ne of Aari Cultivators, Gumuz, Wolayta and Shekkacho have increased (Figure 4). The decline in the Chabu and Aari Blacksmiths starting approximately 50 ga is consistent with the RoH results, but similar patterns of decline in the Majang, Bench, and Sheko were not suggested by our RoH analyses (Figure 3B). Two other hunter-gatherer-descendant groups in Tanzania, the Hadza and Sandawe both have experienced net declines in Ne over the past 60 generations (Figure 4); the Hadza decline is consistent with ROH patterns (Figure S3). We caution that small sample sizes <50 are sensitive to gene flow (STAR Methods); while the Aari Blacksmiths lacked evidence of recent gene flow (Table S2), additional analysis of the Sandawe is warranted. Further details are contextualized below (see Discussion).
Figure 4. Divergent Demographic Trajectories for Eastern and Central African Populations over the Past 2,000 Years.
Historical effective population sizes (Ne), from 4 to 60 generations ago, were inferred from distributions of identical-by-descent segments >4 cM among pairs of individuals. Shown from the upper left are the Chabu, the Majang and Gumuz, the Aari populations, the Bench and Sheko, the Shekkacho and Wolyata, and the Hadza and Sandawe hunter-gatherers of Tanzania. Filled circles represent the estimated Ne at a given generation. Colored ribbons indicate bootstrapped confidence intervals around these estimates. Note that the y-axis scale changes across panels and is on a log scale for the Shekkacho/Wolyata panel.
Discussion
Previous genetic research on the Neolithic transition has largely focused on Europe, especially with the advent of high-throughput ancient autosomal DNA5. However, ancient DNA studies that attempt to characterize the transition often represent hunter-gatherers by single individuals or aggregate samples over millennia2,25,26. These studies are therefore limited in what they can infer regarding the processes of transition. Furthermore, patterns observed in Europe may not pertain to innovations and diffusions of agriculture and/or pastoralism in Africa27–30. Here, by studying extant agriculturalist and hunter-gatherer East African populations, including 276 new samples from 5 populations, we evaluate the mechanisms underlying the spread of farming in this part of the world.
As farmers expand into a geographic region, hunter-gatherer groups already living there ultimately become either replaced by farmers (i.e. local extirpation) or persist alongside them. Much work has highlighted the prevalence of the former outcome; however, we are interested in how hunter-gatherer populations, and their genetic descendants, persist in the midst of major cultural shifts. We interpret our results in the context of ethnographic and archeological evidence to consider the following mechanisms by which present-day populations with hunter-gatherer genetic ancestry might have adjusted to encroachment by farmers: A) reduce their geographic range; B) move to an ecological region that is marginal for farming; C) adopt different cultural (including subsistence) practices; D) enter into an economic-symbolic exchange relationship10,16,31–36. This list of responses is not exhaustive, nor are they mutually exclusive; the history of any particular group may have involved multiple different responses at various times37. Importantly, genetic data can give insights into the extent to which population size change and gene flow were associated with these different responses.
Our analyses demonstrate that the Chabu descend from a population with genetic affinities to Bayira, an individual who lived well before any evidence of intensive farming in the region. The Chabu say they are the original inhabitants of the forests they currently occupy, a claim their nearest neighbors generally support7. However, the lack of recognition of Chabu land claims and the migration of farmers from other parts of Ethiopia facing land shortages have resulted in the loss of traditional Chabu forests to development projects7. We hypothesize that this documented loss of Chabu land over the past two decades is a continuation of centuries-long trend (Response A, above). Specifically, by analyzing population-level genomic patterns, we estimate that the Chabu have experienced a precipitous decline from approximately Ne =6,000 beginning ~40 ga to Ne=200 four generations ago (Figure 4). Current estimates of the Chabu census size range between 1,700 and 2,5007. These findings are not at odds with limited gene flow between the Chabu and Nilotic groups 30–43 ga (Figures 1, 2; Table S2) or previous findings of deep shared ancestry with the geographically distant Hadza and Sandawe11. Rather, they suggest that increased hunter-gatherer isolation and population decline is a relatively recent trend coinciding with the expansion of agriculturalists and pastoralists across Africa, which disrupted once-widespread hunter-gatherer networks11,29,38,39. Within just the last decade, ethnographic data show the Chabu are experiencing greater assimilation, with an increasing proportion of Chabu men preferring to take a Majang, Shekkacho, or Amhara spouse7.
A severe population bottleneck had been previously reported in the Hadza hunter-gatherers of Tanzania, who speak a linguistic isolate22. Our analyses support this observation, and find a decline in Hadza Ne from 3,500 to 160, accelerating between 15 and 25 ga (Figure 4). Today, the Hadza live around Lake Eyasi, an area unsuitable for cultivation or pastoralism, which may explain their continued persistence as hunter-gatherers (Responses A, B)40. The Sandawe, close neighbors of the Hadza who speak a distinct language isolate, also show an overall decline in Ne over the past 60 generations (Figure 4). This group of former hunter-gatherers is known to have transitioned to agro-pastoralism in the last 500 years (Response C)41.
Interestingly, the Majang and Gumuz exhibit divergent Ne trajectories despite being highly genetically similar and both current practitioners of small-scale cultivation. Majang have steadily declined starting 50 ga by about 85% from Ne=5,000, while the Gumuz Ne has apparently nearly doubled from 3,500 over the same period (Figure 4). Like the Sandawe, who transitioned to cultivation well after their Ne had already declined significantly, we hypothesize that the Majang were ‘late adopters’ of horticulture (Response C). Historical and ethnographic accounts indicate that, beginning a century ago, the Gumuz were forced to migrate to increasingly inhospitable lands due to pressure from Afro-Asiatic-speaking farming neighbors (Response B)42. Prior to this, however, we find that the Gumuz Ne was robust, perhaps because of ecological differences or earlier adoption of cultivation relative to the Majang.
We also observe opposite demographic trends in the closely related Aari Blacksmiths and Aari Cultivators. Previous studies have shown that these two groups diverged within the last 4,500 years, and are both probable descendants of a Bayira-like hunter-gatherer population12,43. Our results support earlier findings of a recent bottleneck in the Aari Blacksmiths, and more precisely estimate the timing and magnitude of this decline (Figure 4)43. At the same time, we find that the Ne of the Aari Cultivators follows a ‘U-shape’ decline and recovery (Figure 4). Today, the Aari Blacksmiths are a marginalized group of craftspeople who neighbor the Aari Cultivators and the Wolayta, with whom they engage in mutual economic exchange (Response D)43. Archeological evidence for blacksmithing, today considered a marginal occupational activity in southern Ethiopia (as is foraging and eating wild foods), appeared in nearby regions between 1,000 and 3,000 years ago44. There is also recent evidence from Southwest Ethiopia that the Manja hunter-gatherers shifted to charcoal production and marginal occupational groups were lower castes associated with marriage prohibitions45,46. We hypothesize that the divergence of the two Aari populations within the last 4,500 years followed the differential adoption of cultural practices (i.e. blacksmithing versus farming; Response C) and subsequent social marginalization of the Blacksmiths influenced their divergent patterns of Ne over the past 60 generations.
The Bench and Sheko are genetically indistinguishable from the Aari (Figures 1A, 2A), suggesting that the majority of their genetic ancestry also derives from Bayira-like hunter-gatherers43. Both groups are currently farmers (Response C), but, unlike the Aari Cultivators, they appear to have experienced net declines in population size over the last few millennia. Our simulations suggest that the apparent extreme jumps in Ne to highs of 93,000 and 35,000 that precede the declines in the Bench and Sheko, respectively, may actually be an artifact of high gene flow (Figures 4, S4). We find strong evidence for recent admixture with EAAA cultivator groups in both the Bench and Sheko (Table S2). By contrast, the F3 and LD-based tests indicate that if there was gene flow into the Aari from Afro-Asiatic cultivators, it occurred ~100 ga, which is unlikely to affect our IBD-based estimates of Ne (Table S2, Figure S4). Overall, the Aari Blacksmiths, Aari Cultivators, Bench, and Sheko exhibit evidence for Response C with qualitatively similar levels of gene flow from incoming EAAA groups (Figure 1A). Despite this, their demographic trajectories are heterogenous.
Conclusion
In this work, we characterize nuanced and varied hunter-gatherer responses to recent cultural and demographic changes associated with the spread of agriculture and pastoralism in Eastern Africa. While a shift to agricultural subsistence has been linked to increases in Ne47–49, we show that this is not a universal outcome. Furthermore, we observe declining Ne in populations that appear to resist cultural change. Continued ethnographic and genetic work in collaboration with the Chabu and other marginalized groups is likely to provide valuable insights into the interactions between farmers and hunter-gatherers, the drivers of major cultural transitions over long periods of coexistence, and the reasons behind the divergence of demographic histories in genetically and culturally similar groups.
STAR Methods
Resource Availability
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Brenna Henn (bmhenn@ucdavis.edu).
Materials availability
This study did not generate new unique reagents.
Data and code availability
Genotype data generated for this study are deposited at dbGaP at phs001123.v2.p2. This paper also analyzes existing, publicly available data. The accession numbers or DOIs for these datasets are listed in the key resources table. Additional plots and original code have been deposited at Zenodo and are publicly available as of the date of publication. The DOI is listed in the key resources table. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
KEY RESOURCES TABLE.
REAGENT or RESOURCE | SOURCE | IDENTIFIER |
---|---|---|
Critical Commercial Assays | ||
Infinium Multi-Ethnic Global-8 Kit | Illumina | WG-316 |
Deposited Data | ||
Genetic data from Bayira (Mota) | 12 | DOI: 10.1126/science.aad2879 |
Genetic data from Sandawe and Hadza people | 22 | DOI: 10.1073/pnas.1017511108 |
Genetic data from Bari, Bataheen, Beni-Amer, Beri, Copts, Danagla, Dinka, Gemar, Hadendowa, Halfawieen, Hausa, Ja'alin, Mahas, Misseriya, Nuba, Nuer, Shaigiya, and Shilluk people | 52 | DOI: 10.1371/journal.pgen.1006976 |
Genetic data from Mbuti people | 57 | DOI: 10.1038/nature18964 |
Genetic data from Aari Blacksmith, Aari Cultivator, Afar, Anuak, Tigray, Amhara, Ethiopian Somali, Gumuz, Oromo, and Wolayta people | 52 | DOI: 10.1016/j.ajhg.2012.05.015 |
Genetic data from Amhara, Egyptian, Ethiopian Somali, Gumuz, Oromo, and Wolayta people | 53 | DOI: 10.1016/j.ajhg.2015.04.019 |
Genetic data from Bakiga and Batwa people | 55 | DOI: 10.1073/pnas.1402875111 |
Genetic data from ancient African individuals | Allen Ancient DNA Resource (v. 44.3) | https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data |
Genetic data from Bench, Chabu, Majang, Shekkacho, and Sheko people | This study, deposited on dbGaP | phs001123.v2.p2 |
Software and Algorithms | ||
Custom scripts | This study, deposited on Zenodo | DOI: 10.5281/zenodo.5911732 |
ADMIXTOOLS | 18 | https://github.com/DReichLab/AdmixTools |
ADMIXTURE | 14 | https://dalexander.github.io/admixture/download.html |
ALDER | 17 | https://github.com/joepickrell/malder/tree/master/MALDER |
a-LoCoH, implemented in adehabitatHR (R package) | 74 | https://cran.r-project.org/web/packages/adehabitatHR/index.html |
EEMS | 19 | https://github.com/dipetkov/eems |
GARLIC | 76 | https://github.com/szpiech/garlic |
GenomeStudio v2.0.3 | NA | https://support.illumina.com/array/array_software/genomestudio/downloads.html |
hap-ibd v1.0 | 79 | https://github.com/browning-lab/hap-ibd |
IBDNe (ibdne.04Sep15.e78) | 23 | https://faculty.washington.edu/browning/ibdne.html |
maptools (R package) | The Comprehensive R Archive Network | https://cran.r-project.org/web/packages/maptools/index.html |
msprime | 24 | https://github.com/tskit-dev/msprime |
PLINK 1.9 | 60 | https://www.cog-genomics.org/plink/ |
PONDEROSA | 58 | https://github.com/williamscole/PONDEROSA |
R | The R Project for Statistical Computing | https://www.r-project.org/ |
RColorBrewer (R package) | The Comprehensive R Archive Network | https://cran.r-project.org/web/packages/RColorBrewer/index.html |
SHAPEIT2 | 78 | https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html |
smartpca | 64 | https://github.com/DReichLab/EIG |
Spatial ancestry plotting functions | Ryan Raaum 53 | DOI: 10.1534/genetics.116.187369 |
vegan (R package) | The Comprehensive R Archive Network | https://cran.r-project.org/web/packages/vegan/index.html |
zCall | 50 | https://github.com/jigold/zCall |
Experimental Model and Subject Details
Sample collection
Samples from the Chabu and the Majang and Shekkacho were collected by REWB in May 2013, using Oragene•DISCOVER (OGR-500) kits for the Chabu and generic 5 ml tubes with Norgen preservation solution for the other two groups. Additional Chabu individuals, as well as Bench and Sheko individuals, were collected in October 2019 with OGR-500 kits. Ethiopian samples were collected after months of ethnographic research by Samuel Dira and BSH, as part of a larger formal collaborative research and capacity-building relationship between the Departments of Anthropology at Hawassa University, Ethiopia (HU) and Washington State University (WSU). The collaboration involves training several HU faculty in the WSU PhD program and cooperative participation in research projects in Southwestern Ethiopia. Prior approvals for the project were obtained from the leadership of each group being sampled, from the School of Behavioral Sciences at Hawassa University (#BS/502/05), and from the Majang Zone Council of the Gambella Regional State (#901/Majang Zone 1). Ethical approval for human subjects research was obtained from the Institutional Review Board of Washington State University under proposals #12972 and #13134. IRB approval was obtained from UC Davis #1445036–1 (July 2019) for additional sampling.
We aimed to sample 50 individuals per population per field season. Samples were obtained opportunistically within each group from the general population present in public or semi-public spaces such as village centers and municipal buildings. No participants were excluded from sampling a priori except those under 18 years of age. In recruiting participants, we relied on local informants and community leaders and experts for their aid. Informed consent was obtained from each participant after reading or hearing the approved text translated into their local language and providing their signature, or a fingerprint in lieu of a signature for non-literate participants.
Ethnographic interviews and return of results
Among the Chabu, interviews were largely conducted in the Chabu language. One author (ZHG) has moderate Chabu language skills and conducted many field interviews in Chabu (for simple demographics). A translator who spoke Chabu, Majang, and English assisted. In some cases, interviews were conducted in Majang. For all other ethnicities, interviews were conducted through a translator in the local language. For the purpose of analyses, individuals were classified as Chabu if they self-reported as Chabu and said they had at least one Chabu parent. 6 Chabu that we interviewed reported a non-Chabu (Majang) parent (7% of parents) and 10 reported a non-Chabu (Majang) grandparent (2% of grandparents). The majority of Chabu individuals spoke the Chabu language. Payment was provided to all participants whether or not they identified as Chabu. Chabu data collection was performed in two locations where, as far as the authors are aware, most Chabu reside7. One is a predominantly Chabu village deep in the forest and the other is a multi-ethnic frontier town with government presence.
Results presented here, including general population genetics results for the Chabu in the context of neighboring groups, Ethiopia, and globally, were returned and discussed with members of the Chabu community in October 2019. In an effort to increase community attendance, we arranged in advance with community members in the surrounding area to travel for the presentation of results. Research results were presented in a community gathering followed by a discussion; those who could not attend the presentation and whom we encountered opportunistically during our visit were presented the results individually or by household. Many Chabu members in the audience were pleased to hear our research results matched cultural models of their history. They expressed concern about issues regarding development and education related to the need to teach the Chabu language to their children in school.
Method Details
Data generation and processing
50 individuals each of the Bench, Sheko, Majang, and Shekkacho, and 88 Chabu, were genotyped using the Illumina Infinium MultiEthnic Global Array, which assays over 1.7 million genetic markers. Genotypes were initially called using Illumina GenomeStudio v2.0.3 software and exported them using the human genome build GRCh37. We removed samples that had a call rate below 90%. Calls for rare variants, defined as those having a minor allele frequency (MAF) < 5%, were then replaced by using zCall following their published procedure50. Variants with more than 15% missing data, an observed heterozygosity greater than or equal to 80%, or with cluster separation less than or equal to 2% were removed from the dataset. In preparation for merging with additional datasets, all variants were converted to the Illumina top strand and oriented to match the 1000 Genomes reference. We renamed SNPs to match dbSNP version 144 and removed all indels and A/T or C/G transversion variants, leaving over 1.3 million SNPs in the final dataset.
Quantification and Statistical Analysis
Unsupervised clustering analysis
We merged our Ethiopian SNP data with previously published or publicly available genotype data from other Eastern Africans, as well as the Yoruba and Palestinians (Table S1)12,22,51–57. We identified relatives in our dataset using PONDEROSA, which is able to accurately identify kinship categories across populations with differential levaels of genetic diversity (i.e. due to inbreeding, bottlenecks etc)58. We set PONDEROSA’s parameters to join any segments separated by less than 1 cM and with fewer than 1 discordant homozygous site, and excluded any pairs of individuals exhibiting a 2nd degree relationship or closer.
We removed SNPs that had a missingness rate of over 5% or a minor allele frequency (MAF) less than 1% in the merged dataset, or were out of Hardy-Weinberg equilibrium (HWE) (p < 0.001) in any population59–61. Of the remaining individuals, we randomly discarded a set such that no population, as defined by their population labels, had more than 50 individuals. We then filtered for linkage disequilibrium in the merged dataset (using the PLINK command ‘—indep-pairwise 50 5 0.3) to the HWE, MAF and missingness filtered data, and removed individuals missing genotype data at more than 15% of sites, leaving 112,322 SNPs from 1,124 individuals across 45 populations plus Bayira59–61.
We ran the ADMIXTURE algorithm for Ks between 2 and 12 with 50 replicates each, and used pong to visualize concordance the between different runs and to identify the most frequent mode per K among all replicates14,62.The lowest cross validation error was achieved for a mode that occurred when K=9. However, we focus on that pattern that arises at K=7 as Ks beyond this tended to identify population-specific components that were less informative about inter-population relationships (Figure S1). For extant Eastern African populations with known sampling or ethnographic coordinates, we also plotted the population averages of each ancestry component geographically, interpolating between data points across the landscape as in Uren et al.63.
Principal component analysis
We performed principal component analysis (PCA) using smartpca (v. 16000) to visualize relationships between modern and ancient groups64. We took advantage of Procrustes transformation to include additional ancient samples from Africa and the Near East which had poor SNP overlap with the rest of the dataset27,29,65–68. We accessed the genotype data for these individuals through the harmonized genotype from the Allen Ancient DNA Resource (v. 44.3, accessed at https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data, February 19 2019). We retained ancient samples from sub-Saharan Africa, Israel, and Jordan that had a ‘PASS’ in the assessment column of the Allen Ancient DNA Resource dataset. We then merged our missingness, MAF, and HWE filtered dataset of the 1,124 individuals used for ADMIXTURE (the ‘main’ dataset) with the ancient individuals (the ‘drop in’ dataset).
We made all individuals in this combined dataset pseudohaploid by randomly retaining only one of an individual’s alleles at any heterozygous sites. We then performed a series of PCAs that included all ‘main’ individuals plus one ‘drop in’ individual. We retained SNPs that overlapped between the main dataset and the dropped in individual, filtered these SNPs for MAF and LD as described above, and computed the top 6 PCs using smartpca64. We inspected each of these ‘drop in’ PCAs individually, plotting two PCs against each other at a time and checking that the relative positions of the main dataset individuals were qualitatively similar across runs. In particular, biplots involving the top 3 PCs produced very consistent patterns. At this point, we chose to exclude samples for which fewer than 10,000 SNPs were used to calculate the PCs. Ultimately, we 74 ancient individuals. Using the R package ‘vegan’69, we performed Procrustes transformation by designating the ancient sample with the highest number of SNPs as the ‘baseline’ PCA. By comparing the coordinates of only the main dataset individuals across the baseline PCA and each remaining PCA in turn, Procrustes transformation calculates the optimal translation, rotation, and scaling factors needed minimize the overall sum of squared differences between datasets. We then applied these factors to the dropped in individual to calculate its new coordinates in the Procrustes transformed PCA.
Tests for genetic similarity (shared drift) and gene flow
We calculated F3 outgroup statistics of the form F3(Bayira; PX, Yoruba) and F3(Chabu; PX, Yoruba) in order to estimate the degree of shared drift between various Eastern African populations (PX) and Bayira and the Chabu, respectively (Figure S2)70. We also calculated F3 admixture statistics of the form F3(PX; P1, P2) to test for the possibility that a target population (PX) is the result of admixture between two diverged source groups (P1, P2)71. For all tests, we used a merged dataset of individuals genotyped on Omni1M, Omni2.5M, and MEGA platforms, plus Bayira, containing 3,563,795 SNPs and 1,515 individuals, which had been filtered for close relatives within populations and for MAF and HWE, but not for LD or missingness. We calculated the F3 statistics using the qp3pop program from ADMIXTOOLS, which automatically retained only SNPs that overlapped across all three groups (PX, P1, and P2)18.
Estimating the date of gene flow
We tested for possible gene flow between populations, and additionally estimated the dates (in generations) of these gene flow events, using ALDER, a LD-based method, with the ‘checkmap: NO’ and ‘mindis: 0.005’ options17. We used the same dataset as for calculating F3 statistics, described above.
Estimated effective migration surfaces
We used a method of estimating effective migration surfaces (EEMS) to visualize variation in migration rates across East Africa (Figure S2)19. The algorithm takes geo-referenced genetic SNP data as input and simulates migration across a grid under a stepping-stone model, returning a spatial depiction of estimated historical rates of gene flow. We prepared the SNP dataset by following the same procedure as for ADMIXTURE, but excluded some samples prior to relatedness and LD filtering. The samples we removed were Bayira, an ancient sample, the Somali and Sudanese populations from Pagani et al. 2012, due to lack of specific geographical information, and the Batwa, Bakiga, Biaka, Mbuti, Hadza, Sandawe, Palestinians, and Yoruba, due to the fact that these groups live outside the bounds of our geographic region of interest. We also applied a stricter individual missingness filter of 5%, which excluded all 5 Bari individuals. This left a dataset of 116,447 SNPs and 658 individuals across 35 populations. We primarily used the coordinates in the original publications with some adjustments; as EEMS is a spatial analysis based on historical population locations, the coordinates for Tigray individuals were changed from their sampling location near Addis Ababa to their traditional homeland in Eritrea, using the Glottolog coordinates for Tigrinya72. For similar reasons, we also excluded the 3 Hausa individuals given the population’s recent migration from outside the region of interest within the last 100 years73. However, we did not find that either of these changes led to major qualitative differences in the results. We performed 12 total runs under a range of starting parameters (number of demes specified as 200, 300, 400, and 500, each under three different starting seed values) and averaged the results to mitigate the possible bias of any single run. Each run was allowed to proceed for 30 million MCMC iterations to ensure convergence, with the first 15 million discarded as burn-in and the remaining 15 million thinned to retain 1 out of every 15,000 data points. Proposal variances were tuned so that proposals were accepted between 20% and 30% of the time for all runs.
Estimating the distributions of major language families
In order to determine the correspondence between EEMS-inferred migration barriers and corridors and linguistic boundaries, we calculated kernel estimates of language family distributions using the adaptive radius local convex hull (a-LoCoH) method74. Language centroid point data and (Greenberg-based) family classifications for every known living African language were obtained from Ethnologue (www.ethnologue.com). We then applied the a-LoCoH algorithm to construct ‘utilization distributions’ for each of the five major African language families (Niger-Congo, Afro-Asiatic, Nilo-Saharan, Khoisan, and Austronesian), using values equal to the longest geodesic distance between any two languages in a family, to accommodate variable point densities. This produced a set of layered isopleths for each language family representing decile occurrence probabilities. These isopleths were then plotted with overlaid language point data to visualize the extent and density of language distributions by family in relation to historical migration rate estimates as determined by EEMS (Figure S2). Putative linguistic isolates were determined according to Kibebe and Blench8,75.
Runs of homozygosity
We determined runs of homozygosity (RoH) in the autosomes of the Ethiopian populations and other African hunter-gatherers, the Hadza, Sandawe, Batwa, Biaka, and Mbuti. The Hadza and Sandawe were assayed on the Illumina 550k array25. We removed SNPs with more than 5% missingness or a less than 1% MAF from all datasets and removed SNPs that were not in Hardy-Weinberg equilibrium within each population (p < 0.001)59–61. We also thinned each dataset to approximately match the number of SNPs in the smallest dataset, leaving approximately 470,000 SNPs per population for analysis. We then identified RoH in each individual using PLINK, defining a run as having at least 30 SNPs and being at least 500 kb in length, allowing for no more than two missing and one heterozygous SNP per run59–61.
Despite varying these parameters, we found many instances of two RoH within a single individual closely flanking a low SNP density region. We chose to join such segments post hoc with a custom script (see ‘Data and Code Availability’) by defining low density regions as 1Mb windows that fell in the lower 5% of SNP count when compared to the entire genome. We also observed genome regions where unusually high numbers of individuals in a population carried a RoH segment. We defined such outlier regions as being more than three standard deviations above the mean depth of RoH in the population. We added these regions to a list of previously identified low density regions and known low complexity regions (i.e. heterochromatin, telomere, centromere, and short arm regions). We then removed all RoH segments that overlapped by 85% or more with one of these regions using a custom script (see ‘Data and Code Availability’). However, we found that none of these post hoc adjustments made a qualitative difference to RoH distributions at the population level.
Genomic autozygosity regions likelihood-based inference (GARLIC)
In order to analyze RoH in separate classes corresponding to the relative age of the events that produced them, we also identified RoH using GARLIC76. This algorithm implements a population model-based method of inferring RoH in ‘short’, ‘intermediate’ and ‘long’ size classes77. We ran GARLIC on the datasets used for PLINK RoH analysis using the following parameters: ‘error’ of 0.001, ‘winsize’ of 30, ‘auto-winsize’, and ‘auto-winsize-step’ of 5. As with PLINK RoH, we then joined segments that flanked regions of low SNP density, and updated their size class accordingly.
Historical effective population size inference
In order to infer segments of the genome shared IBD across individuals in a population, we first performing phasing using SHAPEIT2. We started with the RoH datasets described above and phased all individuals assayed on a given platform together using a reference panel of phased individuals from the 1000 Genomes project Phase 3 dataset the --duohmm option, and a window size of 5 Mb51,78. We converted the output of SHAPEIT2 to vcf file and then used hap-ibd v1.0 to identify tracts shared identical-by-descent (IBD) across all individuals in a given dataset79. We then ‘repaired’ these IBD segments using the merge-ibd-segments script80. We extracted all IBD segments shared between individuals of the same population, filtered out any that were shorter than 4 centiMorgans, and used these to estimate historical Ne with the 2015 version of IBDNe23.
Demographic simulations
In order to evaluate the effects of sample size and demographic history on IBDNe inference, we performed a series of simulations using msprime with an African-American recombination map and a mutation rate of 1 × 10−8 24,81,82. We used a standard coalescent model until 100 generations ago (ga), at which point our simulations switch to using a discrete time Wright-Fisher model. The latter has been shown to produce more realistic and unbiased patterns of recent IBD, which is especially important for our application83. We simulated 3 basic demographic scenarios: a population with Ne declining from 10,000 to 1,000 starting 50 ga, a population holding steady at 5,000, and a population increasing from 5,000 to 10,000 starting 10 ga. We also simulated scenarios where a single pulse of gene flow (either 10% or 25%) occurs at 10, 30, and 50 ga from a population that diverged 600 ga. In order to emulate the effect of ascertainment bias, we used a custom script (see ‘Data and Code Availability’) to filter our simulated genotype data to approximately match the allele frequency distribution of the 1000 Genomes project Phase 3 dataset and the SNP density of the MEGA array dataset in non-overlapping 1 Mb windows51. We inferred IBD and ran IBDNe on the resulting dataset as described above.
Gene flow can obscure the true patterns of historical Ne; sample size, admixture proportions, and timing all effect correct estimation. We modelled a single pulse of gene flow of either 10% or 25%, occurring at 10, 30 or 50 ga in a population with a constant or declining Ne. Gene flow caused an inflation of the estimate of Ne, leading to an overall trajectory that looks like decline or fluctuation (Figure S4). This is expected given that gene flow with a divergent group will introduce new, unrelated haplotypes and reduce the extent of IBD sharing in the population. We found that, in general, the severity of these errors declined with increasing age of the admixture event, decreasing gene flow proportion, and increasing sample size. Finally, we found that SNP ascertainment had essentially no effect on the accuracy of our Ne estimates (see ‘Data and Code Availability’).
Political boundaries in maps
The boundaries depicted in the maps do not imply the expression of an opinion by any of the authors of this paper regarding the legal status or political boundaries of any country or territory.
Supplementary Material
Highlights:
The Chabu people are related to ancient Southwest Ethiopian hunter-gatherers (HGs)
Like other African HGs, Chabu population size has declined over the past 1,400 years
However, other populations with Ethiopian HG ancestry have not experienced declines
This heterogeneity may stem from variable HG responses to encroaching farmers
Acknowledgements
We thank all Chabu, Majang, Bench, Sheko and Shekkacho participants in this research for their generous contributions and involvement in this research. We also thank additional communities that have contributed data to prior studies used here, and thank Dr. Luca Pagani, Dr. Nina Hollfelder et al., and Dr. George Perry for making these datasets available to us. We acknowledge the sovereignty and rights of all of these groups to the governance, protection, and use of their own genetic data. We thank Yoell Eno and other local research assistants for their contributions. We thank the Stanford Center for Computational, Evolutionary, and Human Genomics and Dr. Carlos Bustamante for providing funding for data generation, and Alexandra Sockell for her assistance with sample preparation. We thank Dr. William Palmer for assisting with the curation of Y-chromosomes. We thank Dr. Sharon Browning and Dr. Brian Browning for assistance with implementing the IBDNe algorithm. We thank Dr. Chris Gignoux for consultation on IBD inference and his support throughout the project. REWB was supported by funding from the IGERT Program for Evolutionary Modeling at Washington State University and an Exploration Fund grant from The Explorers Club. ZHG acknowledges IAST funding from the French National Research Agency (ANR) under the Investments for the Future (Investissements d’Avenir) program, grant ANR-17-EURE-0010. This research was supported by NIH grants R35GM133531 (to BMH) and 2R01HL104608 (to MD). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Inclusion and Diversity
We worked to ensure gender balance in the recruitment of human subjects. We worked to ensure ethnic or other types of diversity in the recruitment of human subjects. We worked to ensure that the study questionnaires were prepared in an inclusive way. One or more of the authors of this paper self-identifies as an underrepresented ethnic minority in science. One or more of the authors of this paper self-identifies as living with a disability. One or more of the authors of this paper received support from a program designed to increase minority representation in science. The author list of this paper includes contributors from the location where the research was conducted who participated in the data collection, design, analysis, and/or interpretation of the work.
Footnotes
Declaration of Interests
The authors declare no competing interests.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References:
- 1.Hofmanová Z, Kreutzer S, Hellenthal G, Sell C, Diekmann Y, Díez-del-Molino D, van Dorp L, López S, Kousathanas A, Link V, et al. (2016). Early farmers from across Europe directly descended from Neolithic Aegeans. Proc. Natl. Acad. Sci. 113, 6886–6891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Brace S, Diekmann Y, Booth TJ, van Dorp L, Faltyskova Z, Rohland N, Mallick S, Olalde I, Ferry M, Michel M, et al. (2019). Ancient genomes indicate population replacement in Early Neolithic Britain. Nat. Ecol. Evol. 3, 765–771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bramanti B, Thomas MG, Haak W, Unterlaender M, Jores P, Tambets K, Antanaitis-Jacobs I, Haidle MN, Jankauskas R, Kind C-J, et al. (2009). Genetic discontinuity between local hunter-gatherers and central Europe’s first farmers. Science. 326, 137–140. [DOI] [PubMed] [Google Scholar]
- 4.Skoglund P, Malmstrom H, Raghavan M, Stora J, Hall P, Willerslev E, Gilbert MTP, Gotherstrom A, and Jakobsson M (2012). Origins and genetic legacy of Neolithic farmers and hunter-gatherers in Europe. Science. 336, 466–469. [DOI] [PubMed] [Google Scholar]
- 5.Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K, Sudmant PH, Schraiber JG, Castellano S, Lipson M, et al. (2014). Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513, 409–413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hildebrand EA, Brandt SA, and Lesur-Gebremariam J (2010). The Holocene archaeology of Southwest Ethiopia: new insights from the Kafa Archaeological Project. Afr. Archaeol. Rev. 27, 255–289. [Google Scholar]
- 7.Dira SJ, and Hewlett BS (2017). The Chabu hunter-gatherers of the highland forests of Southwestern Ethiopia. Hunt. Gatherer Res. 3, 323–352. [Google Scholar]
- 8.Kibebe TT (2015). Documentation and grammatical description of Chabu. Dissertation. Addis Ababa University, Addis Ababa, Ethiopia. [Google Scholar]
- 9.Schnoebelen T (2009). (Un)classifying Shabo: phylogenetic methods and results. In Proceedings of Conference on Language Documentation & Linguistic Theory 2, Austin PK, Bond O, Charette M, Nathan D, and Sells P, eds. [Google Scholar]
- 10.González-Ruibal A, Marín Suárez C, Sánchez-Elipe M, Lesur J, and Martínez Barrio C (2014). Late hunters of Western Ethiopia: the sites of Ajilak (Gambela), ca. 1000–1200 AD. Azania Archaeol. Res. Afr. 49, 64–101. [Google Scholar]
- 11.Scheinfeldt LB, Soi S, Lambert C, Ko W-Y, Coulibaly A, Ranciaro A, Thompson S, Hirbo J, Beggs W, Ibrahim M, et al. (2019). Genomic evidence for shared common ancestry of East African hunting-gathering populations and insights into local adaptation. Proc. Natl. Acad. Sci. 116, 4166–4175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gallego Llorente M, Jones ER, Eriksson A, Siska V, Arthur KW, Arthur JW, Curtis MC, Stock JT, Coltorti M, Pieruccini P, et al. (2015). Ancient Ethiopian genome reveals extensive Eurasian admixture in Eastern Africa. Science. 350, 820–822. [DOI] [PubMed] [Google Scholar]
- 13.Arthur JW, Curtis MC, Arthur KJW, Coltorti M, Pieruccini P, Lesur J, Fuller D, Lucas L, Conyers L, Stock J, et al. (2019). The transition from hunting–gathering to food production in the Gamo Highlands of southern Ethiopia. African Archeological Review. 36, 5–65. [Google Scholar]
- 14.Alexander DH, Novembre J, and Lange K (2009). Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Alemayehu EE (2015). Mapping the socio-cultural landscape of the Gumuz Community of Metekel, Northwestern Ethiopia. African J. Hist. Cult. 7, 209–218. [Google Scholar]
- 16.Stauder J (1971). The Majangir: Ecology and Society of a Southwest Ethiopian People (Cambridge University Press; ). [Google Scholar]
- 17.Loh P-R, Lipson M, Patterson N, Moorjani P, Pickrell JK, Reich D, and Berger B (2013). Inferring admixture histories of human populations using linkage disequilibrium. Genetics 193, 1233–1254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, and Reich D (2012). Ancient admixture in human history. Genetics 192, 1065–1093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Petkova D, Novembre J, and Stephens M (2016). Visualizing spatial population structure with estimated effective migration surfaces. Nat. Genet. 48, 94–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Creanza N, Ruhlen M, Pemberton TJ, Rosenberg NA, Feldman MW, and Ramachandran S (2015). A comparison of worldwide phonemic and genetic variation in human populations. Proc. Natl. Acad. Sci. 112, 1265–1272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.López S, Tarekegn A, Band G, van Dorp L, Oljira T, Mekonnen E, Bekele E, Blench R, Thomas MG, Bradman N, et al. (2021). Evidence of the interplay of genetics and culture in Ethiopia. Nat. Commun. 12, 3581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Henn BM, Gignoux CR, Jobin M, Granka JM, Macpherson JM, Kidd JM, Rodríguez-Botigué L, Ramachandran S, Hon L, Brisbin A, et al. (2011). Hunter-gatherer genomic diversity suggests a southern African origin for modern humans. Proc. Natl. Acad. Sci. 108, 5154–5162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Browning SR, and Browning BL (2015). Accurate non-parametric estimation of recent effective population size from segments of identity by descent. Am. J. Hum. Genet. 97, 404–418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kelleher J, Etheridge AM, and McVean G (2016). Efficient coalescent simulation and genealogical analysis for large sample sizes. PLoS Comput. Biol. 12, e1004842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Mathieson I, Alpaslan-Roodenberg S, Posth C, Szécsényi-Nagy A, Rohland N, Mallick S, Olalde I, Broomandkhoshbacht N, Candilio F, Cheronet O, et al. (2018). The genomic history of southeastern Europe. Nature 555, 197–210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.González-Fortes G, Jones ER, Lightfoot E, Bonsall C, Lazar C, Grandal-d’Anglade A, Garralda MD, Drak L, Siska V, Simalcsik A, et al. (2017). Paleogenomic evidence for multi-generational mixing between Neolithic farmers and Mesolithic hunter-gatherers in the Lower Danube basin. Curr. Biol. 27, 1801–1810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Skoglund P, Thompson JC, Prendergast ME, Mittnik A, Sirak K, Hajdinjak M, Salie T, Rohland N, Mallick S, Peltzer A, et al. (2017). Reconstructing prehistoric African population structure. Cell 171, 59–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Prendergast ME, Lipson M, Sawchuk EA, Olalde I, Ogola CA, Rohland N, Sirak KA, Adamski N, Bernardos R, Broomandkhoshbacht N, et al. (2019). Ancient DNA reveals a multistep spread of the first herders into sub-Saharan Africa. Science. 365, eaaw6275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Wang K, Goldstein S, Bleasdale M, Clist B, Bostoen K, Bakwa-Lufu P, Buck LT, Crowther A, Dème A, McIntosh RJ, et al. (2020). Ancient genomes reveal complex patterns of population movement, interaction, and replacement in sub-Saharan Africa. Sci. Adv. 6, eaaz0183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Pickrell JK, Patterson N, Loh P-R, Lipson M, Berger B, Stoneking M, Pakendorf B, and Reich D (2014). Ancient west Eurasian ancestry in southern and eastern Africa. Proc. Natl. Acad. Sci. 111, 2632–2637. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Beauclerk J (1993). Hunters and gatherers in Central Africa: on the margins of development (Oxfam; ). [Google Scholar]
- 32.Köhler A, and Lewis J (2002). Putting Hunter-Gatherer and Farmer Relations in Perspective. A Commentary from Central Africa. In Ethnicity, Hunter-Gatherers, and the “Other”: Association or Assimilation in Southern Africa? (Smithsonian Institute; ), pp. 276–305. [Google Scholar]
- 33.Fentaw A (2007). A history of the Shekacho (1898–1974). Dissertation. Addis Ababa University, Addis Ababa, Ethiopia. [Google Scholar]
- 34.Malmström H, Gilbert MTP, Thomas MG, Brandström M, Storå J, Molnar P, Andersen PK, Bendixen C, Holmlund G, Götherström A, et al. (2009). Ancient DNA reveals lack of continuity between Neolithic hunter-gatherers and contemporary Scandinavians. Curr. Biol. 19, 1758–1762. [DOI] [PubMed] [Google Scholar]
- 35.de Filippo C, Heyn P, Barham L, Stoneking M, and Pakendorf B (2010). Genetic perspectives on forager-farmer interaction in the Luangwa valley of Zambia. Am. J. Phys. Anthropol. 141, 382–394. [DOI] [PubMed] [Google Scholar]
- 36.Patin E, Siddle KJ, Laval G, Quach H, Harmant C, Becker N, Froment A, Régnault B, Lemée L, Gravel S, et al. (2014). The impact of agricultural emergence on the genetic history of African rainforest hunter-gatherers and agriculturalists. Nat. Commun. 5, 3163. [DOI] [PubMed] [Google Scholar]
- 37.Page AE, and French JC (2020). Reconstructing prehistoric demography: What role for extant hunter-gatherers? Evol. Anthropol. 29, 332–345. [DOI] [PubMed] [Google Scholar]
- 38.Kim HL, Ratan A, Perry GH, Montenegro A, Miller W, and Schuster SC (2014). Khoisan hunter-gatherers have been the largest population throughout most of modern-human demographic history. Nat. Commun. 5, 5692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Verdu P, Austerlitz F, Estoup A, Vitalis R, Georges M, Théry S, Froment A, Le Bomin S, Gessain A, Hombert J-M, et al. (2009). Origins and genetic diversity of pygmy hunter-gatherers from Western Central Africa. Curr. Biol. 19, 312–318. [DOI] [PubMed] [Google Scholar]
- 40.Blurton Jones N (2016). Demography and Evolutionary Ecology of Hadza Hunter-Gatherers (Cambridge University Press; ). [Google Scholar]
- 41.Newman JL (1970). The Ecological Basis for Subsistence Change among the Sandawe of Tanzania (National Academy of Sciences; ). [Google Scholar]
- 42.Ahmad AH (1995). The Gumuz of the lowlands of Western Gojjam: the frontier in history 1900–1935. Africa 50, 53–67. [Google Scholar]
- 43.van Dorp L, Balding D, Myers S, Pagani L, Tyler-Smith C, Bekele E, Tarekegn A, Thomas MG, Bradman N, and Hellenthal G (2015). Evidence for a common origin of Blacksmiths and Cultivators in the Ethiopian Ari within the last 4500 years: lessons for clustering-based inference. PLoS Genet. 11, e1005397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Phillipson DW (2005). African Archaeology (Cambridge University Press; ). [Google Scholar]
- 45.Hailu GK (2016). Social stratification and marginalization in the Southern Nations Nationalities and People Region of Ethiopia: the case of Manja minority groups. Glob. J. Human-Social Sci. Sociol. Cult 16. [Google Scholar]
- 46.Yimer NA (2020). The social challenges of potters and tanners among the Yem people, Southwest Ethiopia. Soc. Ment. Res. Thinkers J. 6, 919–926. [Google Scholar]
- 47.Gignoux CR, Henn BM, and Mountain JL (2011). Rapid, global demographic expansions after the origins of agriculture. Proc. Natl. Acad. Sci. 108, 6044–6049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Aimé C, Laval G, Patin E, Verdu P, Ségurel L, Chaix R, Hegay T, Quintana-Murci L, Heyer E, and Austerlitz F (2013). Human genetic data reveal contrasting demographic patterns between sedentary and nomadic populations that predate the emergence of farming. Mol. Biol. Evol. 30, 2629–2644. [DOI] [PubMed] [Google Scholar]
- 49.Lopez M, Kousathanas A, Quach H, Harmant C, Mouguiama-Daouda P, Hombert JM, Froment A, Perry GH, Barreiro LB, Verdu P, et al. (2018). The demographic history and mutational load of African hunter-gatherers and farmers. Nat. Ecol. Evol. 2, 721–730. [DOI] [PubMed] [Google Scholar]
- 50.Goldstein JI, Crenshaw A, Carey J, Grant GB, Maguire J, Fromer M, O’Dushlaine C, Moran JL, Chambert K, Stevens C, et al. (2012). zCall: a rare variant caller for array-based genotyping. Bioinformatics 28, 2543–2545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, and Abecasis GR; 1000 Genomes Project Consortium (2015). A global reference for human genetic variation. Nature 526, 68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Hollfelder N, Schlebusch CM, Günther T, Babiker H, Hassan HY, and Jakobsson M (2017). Northeast African genomic variation shaped by the continuity of indigenous groups and Eurasian migrations. PLoS Genet. 13, e1006976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Pagani L, Kivisild T, Tarekegn A, Ekong R, Plaster C, Gallego Romero I, Ayub Q, Mehdi Q, Thomas MG, Luiselli D, et al. (2012). Ethiopian genetic diversity reveals linguistic stratification and complex influences on the Ethiopian gene pool. Am. J. Hum. Genet. 91, 83–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Pagani L, Schiffels S, Gurdasani D, Danecek P, Scally A, Chen Y, Xue Y, Haber M, Ekong R, Oljira T, et al. (2015). Tracing the route of modern humans out of Africa by using 225 human genome sequences from Ethiopians and Egyptians. Am. J. Hum. Genet. 96, 986–991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Perry GH, Foll M, Grenier J-C, Patin E, Nédélec Y, Pacis A, Barakatt M, Gravel S, Zhou X, Nsobya SL, et al. (2014). Adaptive, convergent origins of the pygmy phenotype in African rainforest hunter-gatherers. Proc. Natl. Acad. Sci. 111, E3596–E3603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Bergström A, McCarthy SA, Hui R, Almarri MA, Ayub Q, Danecek P, Chen Y, Felkel S, Hallast P, Kamm J, et al. (2020). Insights into human genetic variation and population history from 929 diverse genomes. Science. 367, eaay5012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, Zhao M, Chennagiri N, Nordenfelt S, Tandon A, et al. (2016). The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature. 538, 201–206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Williams CM, Scelza B, Gignoux CR, and Henn BM (2020). A rapid, accurate approach to inferring pedigrees in endogamous populations. bioRxiv. [Google Scholar]
- 59.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, et al. (2007). PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, and Lee JJ (2015). Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Purcell S, and Chang C PLINK1.9.
- 62.Behr AA, Liu KZ, Liu-Fang G, Nakka P, and Ramachandran S (2016). pong: fast analysis and visualization of latent clusters in population genetic data. Bioinformatics 32, 2817–2823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Uren C, Kim M, Martin AR, Bobo D, Gignoux CR, van Helden PD, Möller M, Hoal EG, and Henn BM (2016). Fine-scale human population structure in Southern Africa reflects ecogeographic boundaries. Genetics 204, 303–314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Patterson N, Price AL, and Reich D (2006). Population structure and eigenanalysis. PLoS Genet. 2, e190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Lazaridis I, Nadel D, Rollefson G, Merrett DC, Rohland N, Mallick S, Fernandes D, Novak M, Gamarra B, Sirak K, et al. (2016). Genomic insights into the origin of farming in the ancient Near East. Nature 536, 419–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Harney É, May H, Shalem D, Rohland N, Mallick S, Lazaridis I, Sarig R, Stewardson K, Nordenfelt S, Patterson N, et al. (2018). Ancient DNA from Chalcolithic Israel reveals the role of population mixture in cultural transformation. Nat. Commun. 9, 3336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Lipson M, Ribot I, Mallick S, Rohland N, Olalde I, Adamski N, Broomandkhoshbacht N, Lawson AM, López S, Oppenheimer J, et al. (2020). Ancient West African foragers in the context of African population history. Nature 577, 665–670. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Schlebusch CM, Malmström H, Günther T, Sjödin P, Coutinho A, Edlund H, Munters AR, Vicente M, Steyn M, Soodyall H, et al. (2017). Southern African ancient genomes estimate modern human divergence to 350,000 to 260,000 years ago. Science. 358, 652–655. [DOI] [PubMed] [Google Scholar]
- 69.Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, Minchin PR, O’Hara RB, Simpson GL, Solymos P, et al. (2020). vegan: Community Ecology Package.
- 70.Raghavan M, Skoglund P, Graf KE, Metspalu M, Albrechtsen A, Moltke I, Rasmussen S, Stafford TW, Orlando L, Metspalu E, et al. (2014). Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505, 87–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Reich D, Thangaraj K, Patterson N, Price AL, and Singh L (2009). Reconstructing Indian population history. Nature 461, 489–494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Hammarström H, Forkel R, and Haspelmath M (2018). Glottolog 3.3.
- 73.Benson S, and Duffield M (1979). Women’s work and economic change: the Hausa in Sudan and in Nigeria. IDS Bull. 10, 13–19. [Google Scholar]
- 74.Getz WM, Fortmann-Roe S, Cross PC, Lyons AJ, Ryan SJ, and Wilmers CC (2007). LoCoH: Nonparameteric kernel methods for constructing home ranges and utilization distributions. PLoS One 2, e207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Blench R (2017). African Language Isolates. In Language Isolates, Campbell L, ed., pp. 176–206. [Google Scholar]
- 76.Szpiech ZA, Blant A, and Pemberton TJ (2017). GARLIC: Genomic Autozygosity Regions Likelihood-based Inference and Classification. Bioinformatics 33, 2059–2062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Pemberton TJ, Absher DM, Feldman MW, Myers RM, Rosenberg NA, and Li JZ (2012). Genomic patterns of homozygosity in worldwide human populations. Am. J. Hum. Genet. 91, 275–292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Delaneau O, Zagury J-F, and Marchini J (2013). Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6. [DOI] [PubMed] [Google Scholar]
- 79.Zhou Y, Browning SR, and Browning BL (2020). A fast and simple method for detecting identity-by-descent segments in large-scale data. Am. J. Hum. Genet. 106, 426–437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Browning BL, and Browning SR (2013). Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194, 459–471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Ségurel L, Wyman MJ, and Przeworski M (2014). Determinants of mutation rate variation in the human germline. Annu. Rev. Genomics Hum. Genet. 15, 47–70. [DOI] [PubMed] [Google Scholar]
- 82.Hinch AG, Tandon A, Patterson N, Song Y, Rohland N, Palmer CD, Chen GK, Wang K, Buxbaum SG, Akylbekova EL, et al. (2011). The landscape of recombination in African Americans. Nature 476, 170–175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Nelson D, Kelleher J, Ragsdale AP, Moreau C, McVean G, and Gravel S (2020). Accounting for long-range correlations in genome-wide simulations of large cohorts. PLoS Genet. 16, e1008619. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Genotype data generated for this study are deposited at dbGaP at phs001123.v2.p2. This paper also analyzes existing, publicly available data. The accession numbers or DOIs for these datasets are listed in the key resources table. Additional plots and original code have been deposited at Zenodo and are publicly available as of the date of publication. The DOI is listed in the key resources table. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
KEY RESOURCES TABLE.
REAGENT or RESOURCE | SOURCE | IDENTIFIER |
---|---|---|
Critical Commercial Assays | ||
Infinium Multi-Ethnic Global-8 Kit | Illumina | WG-316 |
Deposited Data | ||
Genetic data from Bayira (Mota) | 12 | DOI: 10.1126/science.aad2879 |
Genetic data from Sandawe and Hadza people | 22 | DOI: 10.1073/pnas.1017511108 |
Genetic data from Bari, Bataheen, Beni-Amer, Beri, Copts, Danagla, Dinka, Gemar, Hadendowa, Halfawieen, Hausa, Ja'alin, Mahas, Misseriya, Nuba, Nuer, Shaigiya, and Shilluk people | 52 | DOI: 10.1371/journal.pgen.1006976 |
Genetic data from Mbuti people | 57 | DOI: 10.1038/nature18964 |
Genetic data from Aari Blacksmith, Aari Cultivator, Afar, Anuak, Tigray, Amhara, Ethiopian Somali, Gumuz, Oromo, and Wolayta people | 52 | DOI: 10.1016/j.ajhg.2012.05.015 |
Genetic data from Amhara, Egyptian, Ethiopian Somali, Gumuz, Oromo, and Wolayta people | 53 | DOI: 10.1016/j.ajhg.2015.04.019 |
Genetic data from Bakiga and Batwa people | 55 | DOI: 10.1073/pnas.1402875111 |
Genetic data from ancient African individuals | Allen Ancient DNA Resource (v. 44.3) | https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data |
Genetic data from Bench, Chabu, Majang, Shekkacho, and Sheko people | This study, deposited on dbGaP | phs001123.v2.p2 |
Software and Algorithms | ||
Custom scripts | This study, deposited on Zenodo | DOI: 10.5281/zenodo.5911732 |
ADMIXTOOLS | 18 | https://github.com/DReichLab/AdmixTools |
ADMIXTURE | 14 | https://dalexander.github.io/admixture/download.html |
ALDER | 17 | https://github.com/joepickrell/malder/tree/master/MALDER |
a-LoCoH, implemented in adehabitatHR (R package) | 74 | https://cran.r-project.org/web/packages/adehabitatHR/index.html |
EEMS | 19 | https://github.com/dipetkov/eems |
GARLIC | 76 | https://github.com/szpiech/garlic |
GenomeStudio v2.0.3 | NA | https://support.illumina.com/array/array_software/genomestudio/downloads.html |
hap-ibd v1.0 | 79 | https://github.com/browning-lab/hap-ibd |
IBDNe (ibdne.04Sep15.e78) | 23 | https://faculty.washington.edu/browning/ibdne.html |
maptools (R package) | The Comprehensive R Archive Network | https://cran.r-project.org/web/packages/maptools/index.html |
msprime | 24 | https://github.com/tskit-dev/msprime |
PLINK 1.9 | 60 | https://www.cog-genomics.org/plink/ |
PONDEROSA | 58 | https://github.com/williamscole/PONDEROSA |
R | The R Project for Statistical Computing | https://www.r-project.org/ |
RColorBrewer (R package) | The Comprehensive R Archive Network | https://cran.r-project.org/web/packages/RColorBrewer/index.html |
SHAPEIT2 | 78 | https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html |
smartpca | 64 | https://github.com/DReichLab/EIG |
Spatial ancestry plotting functions | Ryan Raaum 53 | DOI: 10.1534/genetics.116.187369 |
vegan (R package) | The Comprehensive R Archive Network | https://cran.r-project.org/web/packages/vegan/index.html |
zCall | 50 | https://github.com/jigold/zCall |