Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Feb 23.
Published in final edited form as: Am J Biol Anthropol. 2022 Apr 14;178(3):488–503. doi: 10.1002/ajpa.24521

Genomic analysis reveals geography rather than culture as the predominant factor shaping genetic variation in northern Kenyan human populations

Angela M Taravella Oill 1,2, Carla Handley 3, Emma K Howell 1,2, Anne C Stone 2,3,4, Sarah Mathew 3,4,*, Melissa A Wilson 1,2,*
PMCID: PMC9949739  NIHMSID: NIHMS1870827  PMID: 36790743

Abstract

Objectives:

The aim of this study was to characterize the genetic relationships within and among four neighboring ethnolinguistic groups in northern Kenya in light of cultural relationships to understand the extent to which geography and culture shape patterns of genetic variation.

Materials and Methods:

We collected DNA and demographic information pertaining to aspects of social identity and heritage from 572 individuals across the Turkana, Samburu, Waso Borana, and Rendille of northern Kenya. We sampled individuals across a total of nine clans from these four groups and, additionally, three territorial sections within the Turkana and successfully genotyped 376 individuals.

Results:

Here we report that geography predominately shapes genetic variation within and among human groups in northern Kenya. We observed a clinal pattern of genetic variation that mirrors the overall geographic distribution of the individuals we sampled. We also found relatively higher rates of intermarriage between the Rendille and Samburu and evidence of gene flow between them that reflect these higher rates of intermarriage. Among the Turkana, we observed strong recent genetic substructuring based on territorial section affiliation. Within ethnolinguistic groups, we found that Y chromosome haplotypes do not consistently cluster by natal clan affiliation. Finally, we found that sampled populations that are geographically closer have lower genetic differentiation, and that cultural similarity does not predict genetic similarity as a whole across these northern Kenyan populations.

Discussion:

Overall, the results from this study highlight the importance of geography, even on a local geographic scale, in shaping observed patterns of genetic variation in human populations.

Keywords: Africa, Kenya, geography, culture, social organization, genetic structure, genetic FST, cultural FST

Introduction

Among human populations, both geography and culture contribute to modifying patterns of genetic variation. Gene flow can be constrained by geographic distance (Wright 1943). In humans, it is commonly observed that as geographic distance between populations increases, genetic similarity decreases (e.g.,(Manica, Prugnolle, & Balloux, 2005; Novembre et al., 2008; Ramachandran et al., 2005). In addition to geography, genetic variation and population structuring are also influenced by cultural factors, like language (e.g., (Hunley et al., 2008; Nettle & Harriss, 2003; Pagani et al., 2012; Sun et al., 2013; Xu et al., 2010)) or social organization (Bose, Platt, Parida, Drineas, & Paschou, 2021; Chaix et al., 2007; Heyer et al., 2009; Marchi et al., 2017). Like geographic distance, linguistic distance has been shown to correlate with genetic distance (Cavalli-Sforza, Piazza, Menozzi, & Mountain, 1988; Nettle & Harriss, 2003).

Large scale research efforts have aimed to curate genetic variation across some African populations to understand population history and human health and disease (e.g., (Choudhury et al. 2020; 1000 Genomes Project Consortium et al. 2015; Gurdasani et al. 2015; Mulindwa et al. 2020; Tishkoff et al. 2009)). However, with more than 2,000 ethnolinguistic groups across the continent, there is still much to learn about the determinants of substructure both among and within individual ethnolinguistic groups at smaller geographic scales. For example, do cultural boundaries play an important role in shaping gene flow among neighboring groups, or does geography primarily shape patterns of genetic variation, even on a local scale?

Studies of small-scale groups throughout Africa have found that genetic substructure has been influenced to different extents by geography, ecological features, shared culture, and language, or a combination of these. For example, among groups in Southern Africa, genetic structure was found to correspond more with geographic and ecological barriers than with linguistic affiliations (Uren et al. 2016). Similarly, geography was shown to primarily explain genetic relationships among Khoe-San groups after accounting for admixture from migrant populations (Vicente et al. 2019). Additionally, an investigation of genomic diversity of northeast African populations found a bimodal distribution of genetic variation that was correlated with geography (Hollfelder et al. 2017). Geography has also been shown to largely correspond to genetic substructure among South African Bantu speakers, however, genetic substructure was also consistent with linguistic relationships (Sengupta et al. 2021). Among Bantu-speaking groups in Mozambique, genetic differentiation was found to be primarily determined by language rather than geography (Semo et al. 2020). Among ethnic groups in Ethiopia, genetic variation has been found to be associated with linguistic affiliation and shared culture (López et al. 2021). In many cases geography acts as the main determinant of genetic structuring, however, this is not always the case.

The Turkana, Samburu, Waso Borana, and Rendille are neighboring pastoral ethnolinguistic groups inhabiting the semi-arid northern region of Kenya. Turkana and Samburu languages are both Nilotic languages that are part of the Nilo-Saharan language family, while Borana and Rendille languages are Cushitic languages that are a part of the Afro-Asiatic language family (“WALS Online - Home” n.d.). As these groups are pastoralist, they herd cattle, camel, sheep and goat, and migrate over varied distances to access pasture and water. There is intense competition for scarce dry season grazing and water resources in this region, and armed livestock raiding occurs especially between communities belonging to different ethnolinguistic groups (Handley & Mathew, 2020). Marriage across ethnic boundaries particularly between the Rendille and Samburu has been noted (Spencer, 2012). Nomadic pastoralism in all four populations involves lots of movement of people over large distances, including into one another's territory. While there are no cultural prescriptions against intermarriage between these groups, historical and contemporary livestock raiding and resource competition often lead to hostile relationships between these groups. It is currently unclear whether these tensions contribute to isolation among these groups, or whether gene flow occurs despite the socio-political barriers.

All four groups have lineage-based divisions where individuals are organized into clans and, for some groups, clans are further grouped into either moieties or phratries (Figure S1). The Turkana, Samburu and Rendille are exogamous at the clan level while the Borana are exogamous at the moiety level, so for these groups, individuals generally marry outside of their birth clan. In addition to lineage-based divisions, the Turkana are unique from the other three groups in that they also have territory-based divisions that cross-cut clan-level organization. There are no marriage restrictions at the territorial level. This territorial division provides an opportunity to investigate whether territory-based division impacts substructuring in the Turkana. Detailed descriptions of the social organization of these populations can be found in (Handley & Mathew, 2020).

Each of these groups has a patrilineal descent system where an individual’s natal clan affiliation typically follows that of their father. There are, however, situations in which children do not take on the clan identity of their biological father. When the biological father is not officially married, i.e., has yet to pay a bride price to the family of the child’s mother at the time of birth, the child remains affiliated with the natal clan, which is the child’s mother’s and maternal grandfather’s clan. Households with few children or that have relatively higher material security may adopt a child, in which case the child takes the clan identity of the adoptive father. Therefore it is unclear how cultural systems of patrilineal descent shape patterns of male-specific genetic variation within these groups.

In a previous study, two members of this research team sampled 750 individuals from nine clans across these four ethnic groups, and in the Turkana additionally included three territorial sections, to obtain data on cultural beliefs and norms, and to quantify levels of cultural differentiation (cultural FST) among these groups (Handley & Mathew, 2020). Cultural FST is the proportion of the total variation in cultural traits that lie between populations (Bell, Richerson, & McElreath, 2009; Handley & Mathew, 2020) and can provide a quantitative measure of cultural similarity between groups. For the current study, we sampled individuals from these same nine clans and three Turkana territorial sections, which allows us to examine the relationship between genetic and cultural differentiation.

To form a better understanding of the population structure among the Turkana, Samburu, Rendille, and Waso Borana ethnolinguistic groups and how geography and culture contribute to shaping genetic variation in northern Kenya, we worked with these local groups to obtain genetic samples from 572 individuals across all four populations (Table 1). We were able to successfully genotype 376 of the 572 individuals on Illumina’s Multi-Ethnic Global Array (Table S1). For all samples, we additionally collected culturally relevant demographic information that included natal and post-marital affiliations and spoken languages for themselves, parents, and grandparents. For married men, we additionally collected demographic information for their spouse(s) (e.g. spouse's ethnic group and natal clan affiliation, etc.). We report here that geography predominately shapes genetic variation within and among human groups in northern Kenya. Specifically, we found a clinal pattern of genetic variation that mirrors the overall geographic distribution of the individuals we sampled. We found evidence of gene flow and relatively higher rates of intermarriage between the Samburu and Rendille than between any other pair of groups in our sample. We further observed strong recent genetic substructuring among the Turkana, based on territorial section affiliation, that did not affect the between-ethnolinguistic group comparisons. Within ethnolinguistic groups, we found that male Y chromosome haplotypes do not consistently cluster by natal clan affiliation. Finally, we found that ethnolinguistic groups that are geographically closer have lower genetic differentiation, and that cultural similarity (estimated via cultural FST) does not predict genetic similarity as a whole across these four northern Kenyan populations. Overall, despite cultural and linguistic differences, our analysis suggests that geography is the main driving force of genetic variation, even on a very local geographic scale.

Table 1. Background information on study populations and collected samples.

We sampled a total of 572 individuals in northern Kenya. Here, we describe general background information on the four study populations, including information on language, social organization, and population sizes in Kenya. This table was adapted from (Handley and Mathew 2020) but altered to reflect sample sizes collected and genotyped in the current study. aLanguage information reported here is based on assignments from the World Atlas of Languages (WALS) Online. bPopulation sizes reported here were obtained from the 2019 census report of the Kenya National Bureau of Statistics. cBorana extend into Ethiopia so their total population exceeds the numbers living in Kenya. dSample numbers are based on natal affiliations. Since we opportunistically sampled in these regions, we also sequenced individuals beyond the targeted clans and territories and these samples are marked as “Other” here. One individual was from the Gabbra ethnic group and not included in this table.

Ethnolinguistic
group
Language
genusa
Spoken
language
Social
organization
Approx.
population
in Kenyab
Clans sampled
(sample size,
number
genotyped)d
Borana Lowland East Cushitic Southern Oromo 2 exogamous moieties
17 clans
276,236c Noonituu (40, 32)
Warrajidaa (40, 35)
Other (30, 26)
Rendille Lowland East Cushitic Kirendille 2 phratries
9 exogamous clans
96,313 Ldupsai (43, 17)
Saale (45, 27)
Other (22, 14)
Samburu Nilotic Northern Maa 2 phratries
8 exogamous clans
333,471 Lpisikishu (40, 21)
Lukumai (42, 19)
Other (27, 13)
Turkana Nilotic Kiturkana 18 territorial sections (TS)
24 exogamous clans cross-cutting the TS
1,016,174 Kwatela TS: Ngisiger (24, 18), Ngipongaa (21, 14), Ngidoca (23, 21)
Ngiyapakuno TS: Ngisiger (24, 17), Ngipongaa (26, 18), Ngidoca (22, 17)
Ngibochoros TS: Ngisiger (21, 16), Ngipongaa (22, 14), Ngidoca (21, 15) Other (38, 23)

Materials and Methods

Community engagement and ethics

Both SM and CH have worked in Northern Kenya for over a decade and have established and maintained a strong relationship with the local communities. Research with the Turkana, Borana, Rendille, and Samburu has expanded to include genetic analyses and great care has been taken to ensure ethical informed consent, data collection, outreach communication, and data sharing.

Subsequent to obtaining the appropriate research permitting through Kenya’s National Commission for Science, Technology and Innovation (NACOSTI), yet prior to commencing any data collection, the field teams spent a considerable period of time with each local community and its leaders to sensitize individuals to the purpose and process of collecting genetic samples for this study. Other than SM and CH, field teams were composed solely of individuals from the participant communities, many of whom had been working with SM and CH for several years. Research assistants (RAs) and guides worked within their own ethnic groups, therefore having one team per group, and all information was presented to participants using local languages. As a key goal of any informed consent process, potential subjects must demonstrate sufficient comprehension of the methods and underlying scientific principles on which to base their decision to participate. With literacy rates for northeastern Kenya estimated below 10% of the adult population (Kebathi, 2008) explaining fundamental concepts regarding DNA, genetic data sample collection, and data sharing was of paramount importance. For each area surveyed, teams met with the responsible county/deputy commissioners, local/assistant chiefs, and/or community elders councils to explain the purpose of the study and to obtain permissions from the appropriate bodies. As data collection began, RAs discussed the study, its purpose, methodologies, and underlying scientific principles to each eligible participant and provided ample time for participants to ask questions and address any concerns. At this time, there was an estimated drop-out rate of 20% of eligible participants. For those remaining, we transitioned to the formal consent process, where the objectives, methods, and benefits of the study would be repeated before asking a subject to sign or mark the consent form. At this stage, there was an additional estimated 15% drop-out of eligible participants. Those who agreed to take part in the study were provided with the contact information for both local and foreign research team members in the event that s/he should choose to be removed from the study at any future point. Furthermore, despite the common practice in many locations of husbands granting permission for themselves and for their wives, permissions were obtained explicitly and directly from all female participants. However, the research teams made efforts to avoid households where directly soliciting female participation could transgress cultural norms and inadvertently introduce additional domestic concerns.

Once obtaining consent for participation, research assistants demonstrated the cheek cell swab collection procedure, using a clean swab on themselves to scrub the inside of their own cheeks. Our initial intention was for RAs to swab the participants’ mouths; however, we found that participants felt more comfortable being in control of the process to swab their own mouths. This required oversight from the RAs to ensure that the swab was oriented in the appropriate direction, did not come into contact with foreign bodies, and that enough pressure and effort were applied during the collection. Furthermore, we requested that participants rinse their mouths with water if they recently had been chewing tobacco or other organic products. Satisfactory swabs were handed to the RAs to seal within the collection tube and returned to CH for cool bag storage. After sample collection, participants were asked to respond to a 10-15 minute survey, developed in the ONA online platform for survey creation and implemented in the field through Online Data Kit (ODK) using handheld Samsung tablets. The survey requested permission to record the GPS locations of participants along with questions regarding the biological and cultural kinship lineages of the participants with a resolution to both maternal and paternal grandparents, along with languages spoken within each family household.

In 2018, AMTO traveled to Turkana to present outreach materials and discuss initial findings. We worked with VizLab graphic design studio at ASU to make a series of images to help explain what DNA is, how to get DNA from a cell, what can be learned about people and human history with DNA, and preliminary results from this genetic study (Figure S2). Along with the images, we created a script to explain, in layman's terms, what the images mean; we also had questions to ask at the end of the presentation to make sure participants were following and understanding what we were demonstrating to them. We worked with the local field assistants to translate the script into the local language and to present the script to the community. We presented to a total of 6 settlement areas across 3 territorial sections in Turkana County, Kenya. Overall, the presentations were well received by the communities, and people expressed interest in the results. Some people also expressed their excitement about wanting to know what else we would find from their DNA. Additional dissemination from our group will occur in the near future as it becomes safer to travel.

The Turkana, Rendille, Samburu, and Borana are small-scale pastoral populations in northern Kenya. We are therefore taking measures to ensure the protection of these groups by providing the genetic data generated here as controlled access while maintaining appropriate standards of data access. The genomic data generated here will be available through dbGap.

Sampling and sequencing

We collected DNA and demographic information from a total of 572 individuals from Turkana, Samburu, Rendille, and Waso Borana and successfully genotyped 376 of these individuals on Illumina’s Multi-Ethnic Global Array (Table 1; Table S1). For each ethnolinguistic group, we sampled individuals from at least two clans. Data collection occurred across northern Kenya from October 2016 – October 2017. For each participant, a DNA sample was taken in the form of saliva or cheek swab; the saliva was collected in an Oragene OG-500 DNA collection kit. In addition to collecting a DNA sample, a questionnaire was administered to each participant to acquire demographic information; this information included, for example, natal and post-marital clan affiliation, and spoken languages. DNA was extracted using a phenol-chloroform extraction method for the samples collected from cheek swabs. DNA for the samples collected with the Oragene OG-500 DNA collection kit was extracted at Yale Center for Genomic Analysis. The extracted DNA was then quantified on both a Qubit and Nanodrop. Each sample’s extracted DNA was then diluted to at least 35 ng/ul in a volume of 40 ul and sent to Langebio-Cinvestav sequencing facility in Mexico for SNP genotyping on Illumina’s Multi-Ethnic Global Array.

Quality control and filtering

We received the SNP genotype data in the form of a raw plink file. The coordinates were mapped to the human reference genome hg19. Initially, there were a total of 1,779,819 markers genotyped on the array. Sites with no valid mapping for the probe or with more than 1 best-scoring mapping for the probe were removed from our analyses. Additionally, we removed any sites marked as insertions or deletions. There were 27,089 duplicated variants in this file; duplicated variants have the same chromosome number and position and can have the same or different allele codes. Duplicated variants with the same chromosome, position, and allele codes were merged, while duplicated variants with the same chromosome and positions, but different allele codes were removed. We merged the duplicated sites using the ‘--merge-equal-pos’ flag and the default merge mode - which ignores missing calls and sets mismatching genotypes to missing - using PLINK v1.9 (Chang et al. 2015). There were a total of 1,715,718 sites after filtering.

As an additional quality control measure, we inferred the sex chromosome complement of each individual and compared this information with reported sex information. Two approaches were used to infer the sex chromosome complement of each individual, one approach based on the X chromosome inbreeding coefficient (F) and the other approach based on the number of Y chromosome genotype counts. Since genetic males are expected to have one X chromosome, they should not have any heterozygous sites on the X chromosome (minus the pseudoautosomal regions - PARs) and therefore an inbreeding coefficient equal to 1. We used the “--check-sex ycount” flag in PLINK v1.9 (Chang et al. 2015) to calculate the X chromosome inbreeding coefficient and the number of Y chromosome genotype counts using. The PARs were excluded from this calculation.

As per the PLINK documentation recommendations, individuals with an X chromosome inbreeding coefficient greater than 0.8 were considered male, while individuals with an X chromosome inbreeding coefficient less than 0.2 were considered female. The expectation for genetic females is that they have no genotype calls on the Y chromosome - since genetic females are expected to be XX - however, all the female individuals in our data set had genotype calls on the Y chromosome. We, therefore, visualized the X chromosome inbreeding coefficient and non-missing Y chromosome genotype counts together to see the distribution of these values in males and females and to identify any individuals that did not cluster with expected male and female values. We removed individuals that had discrepancies between these two metrics (Figure S3). A total of 10 individuals had discrepancies between the X chromosome inbreeding coefficient and non-missing Y chromosome genotype counts and were removed from subsequent analyses. Besides these 10 individuals that did not cluster with the expected male and female values, there were no mismatches between self-reported sex and genetic sex.

Identity by descent (IBD) was calculated across the autosomes to identify and remove related individuals. Prior to running the IBD analysis, we filtered sites with missing data across samples greater than 5% (--geno 0.05 flag in PLINK), sites with Hardy-Weinberg equilibrium p-value threshold less than 1x10−50, and pruned sites for linkage disequilibrium (50 kb window size, 10 kb variant step size, 0.2 r2 threshold). Filtering and IBD were calculated, using PLINK v1.9 (Chang et al. 2015), for all pairwise combinations of samples in our data set and output pairs of individuals with more than 18% IBD. We removed two samples with 100% IBD that were not replicate samples (the same sample sequenced twice), as these may reflect contamination. We had two replicates in this data set and removed one sample from each replicated pair. In the cases where there were clusters of individuals related, we removed the individuals who were related to many other individuals, to minimize the number of individuals to remove. In cases where just two individuals were related, we attempted to remove roughly equal numbers of males and females when possible. A total of 67 samples were removed, leaving 301 individuals.

We next performed an initial principal components analysis (PCA) on the 301 samples using smartpca, a program within the EIGENSOFT v6.0.1 software package (Price et al. 2006). We identified a total of 4 outlier samples; these samples were removed from subsequent analyses (Figure S4).

After individual filtering, we performed site filtering on the autosomes, Y chromosome, and mitochondrial DNA. For the autosomes, we removed sites with more than 5% missing data across individuals at a given site (“--geno 0.05” flag in PLINK/ 95% call rate filter), removed sites that deviate from Hardy-Weinberg equilibrium (p-value threshold of 1x10−50), and performed linkage disequilibrium pruning (50 kb window size, 10 kb variant step size, and 0.2 r2 threshold). For the Y chromosome and mitochondrial DNA, we removed sites with heterozygous calls, removed sites with more than 5% missing data across individuals at a given site, and removed sites that deviate from Hardy-Weinberg equilibrium with a p-value threshold of 1x10−50. After filtering, 516,821, 3,295, and 811 sites remained on the autosomes, Y chromosome, and mitochondrial DNA, respectively.

Merging with publicly available data

The cleaned autosomal genotype data from the northern Kenya were merged with variants from the 1000 Genome Resource (1000 Genomes Project Consortium et al. 2015). The 1000 Genomes variant call file was converted to PLINK using PLINK v1.9 (Chang et al. 2015). Prior to merging the genotype data and the 1000 genomes data, sites on the reverse strand in the Kenya genotype data were flipped. Sites on the reverse strand were identified using snpflip (https://github.com/biocore-ntnu/snpflip) and flipped using PLINK v1.9 (Chang et al. 2015). Snpflip uses information from a reference genome fasta sequence and your bim PLINK file to identify SNPs on the reverse strand. The GRCh37 GENCODE reference genome was used (ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_30/GRCh37_mapping/GRCh37.primary_assembly.genome.fa.gz) to flip SNPs from the Kenya genotype file. Additionally, we filtered sites with a minor allele frequency >10%, removed sites with more than 5% missing data across individuals at a given site, and performed linkage disequilibrium pruning (50 kb window size, 10 kb variant step size, and 0.2 r2 threshold) using PLINK v1.9 (Chang et al. 2015). After merging and filtering, 103,823 sites remained.

Population genetic analyses

To explore the genetic structure within and among northern Kenyan populations, we ran PCA using smartpca (Price et al. 2006) and ADMIXTURE (Alexander, Novembre, & Lange, 2009) on the autosomes for all unrelated samples. We ran ADMIXTURE for K = 2 - 5 with a total of 10 replicates for each K value. For streamlined post analysis and visualization of the different replication runs and K values from the ADMIXTURE analysis, we used pong (Behr, Liu, Liu-Fang, Nakka, & Ramachandran, 2016), an algorithm for processing and visualizing membership coefficient matrices. Pong finds the best alignment across all runs within and across the different K values and identifies modes among all runs for each K. We used the best alignments across all runs within and across the different K values for visualization in this manuscript.

To quantify genetic differentiation within and among northern Kenyan populations, we calculated Hudson’s FST (Hudson, Slatkin, & Maddison, 1992) using the estimator derived in (Bhatia, Patterson, Sankararaman, & Price, 2013). FST was calculated on the autosomes for ethnolinguistic groups and Turkana territorial sections. Results were visualized in R (R. C. Team & Others, 2013; R. Team & Others, 2015), using the visualization package, ggplot2 (Wickham, 2011).

To test whether individuals from Ngibochoros - the Turkana territorial section that is closer geographically to the Borana, Samburu, and Rendille than the other Turkana territorial sections we sampled - may have admixed with the other neighboring groups, we performed a series of permutations. The goal of the permutations was to test whether genetic FST was different between Ngibochoros and at least one of the ethnolinguistic groups, and Kwatela and/or Ngiyapakuno (the two other Turkana territorial sections we sampled) and the same ethnolinguistic group. We randomly shuffled samples from two of the territorial sections, calculated FST between the territorial section and ethnolinguistic group, and then calculated the test statistic which was the absolute difference between FST for each territorial section. We repeated this 1,000 times and calculated the p-value. This was done for all combinations. To complement the FST permutation analysis, we also calculated f3 using AdmixTools 3-population test (qp3Pop) (Patterson et al. 2012) to test for admixture between the Ngibochoros and the other neighboring ethnolinguistic groups we sampled. We used Turkana individuals not from Ngibochoros as one of the source populations, each of the other ethnolinguistic groups as the second source population, and Turkana individuals from Ngibochoros as the target population.

To investigate genetic similarity among ethnolinguistic groups, we used AdmixTools 3-population test (qp3Pop) (Patterson et al. 2012) to calculate outgroup f3 for each pair of ethnolinguistic groups. Yoruba (YRI) individuals from the 1000 Genomes Resource (1000 Genomes Project Consortium et al. 2015) were used as the outgroup population. The following f3 calculations were performed: f3(YRI; Turkana, Rendille), f3(YRI; Turkana, Borana), f3(YRI; Turkana, Samburu), f3(YRI; Borana, Samburu), f3(YRI; Borana, Rendille), f3(YRI; Samburu, Rendille).

To visualize the relationships among haplotypes within and among each ethnolinguistic group, we assigned haplogroups and generated haplotype networks for the Y chromosome and mitochondrial DNA. SNPs in our data set were first set to the forward orientation. Sites on the opposite strand were identified using snpflip (https://github.com/biocore-ntnu/snpflip) and flipped using PLINK v1.9 (Chang et al. 2015). Snpflip uses information from a reference genome fasta sequence and the bim PLINK file to identify SNPs on the reverse strand. The GRCh37 GENCODE reference genome was used (ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_30/GRCh37_mapping/GRCh37.primary_assembly.genome.fa.gz). Any sites unable to be flipped due to ambiguity in whether the site was on the reverse strand were removed. A total of 2,807 and 806 sites remained on the Y chromosome and mitochondrial DNA, respectively. Y chromosome haplogroups were assigned across 140 unrelated male samples using SNAPPY (Severson et al. 2018). SNAPPY was unable to assign a Y chromosome haplogroup to one sample so this individual is not included in the analysis. Mitochondrial DNA haplogroups were assigned across 297 unrelated male and female samples using Haplogrep2 (Weissensteiner et al. 2016). For the haplotype network construction, PLINK files were converted to VCF format using PLINK v.1.9 (Chang et al. 2015). VCF files were converted to FASTA file format using python and haplotype networks were constructed in R (R. C. Team & Others, 2013; R. Team & Others, 2015) using pegas (Paradis, 2010) and ape (Paradis & Schliep, 2019) packages.

Quantification of intermarriage

To quantify the amount of intermarriage in the Turkana, Samburu, Rendille, and Waso Borana, we used questionnaire information collected here and from (Handley & Mathew, 2020). The questionnaire information we collected in this study requested spouses’ ethnolinguistic group information only for the married men, while the questionnaire information from (Handley & Mathew, 2020) had spouse information for both married men and women that were sampled. In these groups, men may marry more than one wife, so rates of marriages were based on the total number of marriages rather than the number of individuals. For each ethnolinguistic group, we calculated the percentage of marriages both within the same ethnolinguistic group and from different ethnolinguistic groups. This was calculated separately for men and women.

Correlations between genetic and cultural differentiation

To investigate the correlation between 2D genetic space and geographic space, we performed a Procrustes analysis using the first 2 PCs for each individual in the autosomal PCA (Figure 1B) that also had corresponding longitude and latitude values. A total of 72 samples were included in this analysis. Procrustes analysis was performed using the R package vegan (“Vegan: Community Ecology Package” n.d.) with 1,000 permutations.

Figure 1. Intermarriage among ethnolinguistic groups contributes to the clinal pattern of genetic variation.

Figure 1.

Sampling regions, patterns of genetic variation, and rates of intermarriage across northern Kenya human populations. A) We sampled 376 individuals across four ethnolinguistic groups in northern Kenya and for the Turkana only, we additionally sampled across three territorial sections. B) Autosomal principal components analysis (PCA). C) Rate of intermarriage across each ethnolinguistic group. Points in A and B represent sampled ethnolinguistic groups and Turkana territorial sections. Colors represent ethnolinguistic group affiliation, and shapes represent Turkana territorial section affiliation. Each point in A represents the geographic location of each sampled group, while the points in B represent individuals.

To test whether measures of cultural similarity can predict genetic similarity, we performed a series of Pearson’s correlations between genetic FST, cultural FST, geographic distance, and linguistic distance. We used cultural FST values and linguistic distances that were previously calculated among these ethnolinguistic groups (Handley & Mathew, 2020). Briefly, cultural FST is a measure of cultural similarity between two groups; a low cultural FST indicates two groups are more culturally similar while a higher cultural FST indicates two groups are less culturally similar (Bell et al., 2009). Pairwise cultural FST was calculated among the four ethnolinguistic groups using a total of 49 norm statements relevant to cooperation, crime and punishment, raiding, family dynamics, and other cultural markers. Pairwise cultural FST was calculated for each norm separately and then averaged across all norms. For each group, language, genus, and family information were acquired from The World Atlas of Language Structures (WALS) database (“WALS Online - Home” n.d.). Using this information linguistic distances were categorized as follows: a score of 0 for groups that speak the same language (same language, same genus, same family), a score of 1 for groups that speak different languages from the same language genus (different language, same genus, same family), a score of 2 for groups that speak different languages from different genus but within the same family (different language, different genus, same family), and finally, a score of 3 for groups that speak different languages from different language families (different language, different genus, different family). Further details on both the cultural FST and linguistic distance calculations can be found in (Handley & Mathew, 2020). To calculate geographic distance between groups, we collected GPS coordinate information for the locations in which genetic sampling occurred. If a household fell within one precision of one or more households (within 20 meters), only one GPS measure was recorded. Using the latitude and longitude for each measured household in each population, we calculated the average distance between pairs of populations in kilometers (km). This involved calculating the distance between all households from one population to another population and then averaging these distances. This was computed for all pairs of populations using a custom python script. Pearson correlations were performed in R (R. C. Team & Others, 2013; R. Team & Others, 2015) using package ppcor (Kim, 2015) and visualized using ggplot2 (Wickham, 2011). We additionally performed Partial Mantel tests to test whether genetic variation can be explained by culture or language after controlling for the effects of geography, using the R package vegan (“Vegan: Community Ecology Package” n.d.).

Results

Intermarriage among ethnolinguistic groups contributes to the clinal pattern of genetic variation

We found a clinal pattern of genetic variation that mirrors the overall geographic distribution of the individuals we sampled (Figure 1A, 1B). In the principal components analysis (PCA), the Turkana samples separate from the other three groups along PC1, and along PC2 the Borana samples separate from the Rendille and Samburu (Figure 1B). Additionally, for individuals in the PCA with corresponding latitude and longitude values, we find a significant correlation between 2D genetic and geographic space (Procrustes analysis, t = 0.678, p = 0.001).

While the Borana samples form a discrete cluster from the other ethnolinguistic groups, there is overlap between Samburu and Rendille and some overlap between Samburu and Turkana (Figure 1B). Interestingly, many of the overlapping Samburu and Rendille samples have a family history - a parent and/or grandparent(s) in the Rendille and Samburu, respectively (Table S2). In contrast, the Samburu sample that falls near Turkana and the Turkana sample that falls near Samburu have no reported cross-group family history through the grandparent level in these individuals (Table S2). We additionally found high rates of intermarriage between some ethnolinguistic groups and nearly non-existent intermarriage between others (Figure 1C). For Rendille, 5% of female marriages and 16.4% of male marriages were with a Samburu individual (Figure 1C). For Samburu, we observe almost the exact opposite pattern, with 11.4% of female marriages and 4.8% of male marriages with a Rendille individual (Figure 1C). The Samburu also have low levels of intermarriage with the Turkana; 3.2% of male Samburu marriages were with a Turkana individual (Figure 1C). For the Borana, none of our sampled individuals report marriages with the Turkana, Samburu, or Rendille, but did report varying levels of intermarriage between the Borana and Sakuye, Gabra, Garri, and Somali (Figure 1C).

The Turkana have additional variation and geography-based substructuring

Just as these ethnolinguistic groups are geographically separated and similar to the PCA (Figure 1B), we observed clear genetic separation in ADMIXTURE analyses (Figure 2A). In the ADMIXTURE analyses, each of the ethnolinguistic groups have their own unique ancestry at K = 5 (Figure 2A). Interestingly, at K = 4 we observe possible admixture between Samburu and Rendille (Figure 2A).

Figure 2. The Turkana have additional variation and geography-based substructuring.

Figure 2.

A) ADMIXTURE analysis for 10 replicates of K = 2 - 5 for the autosomes. Each vertical bar represents an individual, and the colors represent the proportion of ancestry corresponding to K. Samples are organized by ethnolinguistic groups (separated by thick black vertical bars), then by Turkana territorial sections (separated by medium black vertical bars), and lastly by natal clan affiliation (separated by thin black vertical bars). We observe no substructure based on natal clan affiliation but do observe geographic substructuring in the Turkana based on territorial section (purple and blue clusters at K = 4 and 5). B) Autosomal genetic differentiation (FST) among Turkana territorial sections. Individuals from Ngibochoros territorial section are more genetically different than individuals from the other sampled territories. C) Autosomal FST among each Turkana territorial section and the other sampled ethnolinguistic groups. We performed a series of pairwise permutations and found that there is no statistical difference in genetic differentiation among Turkana territorial sections and ethnolinguistic groups. P-values from the permutation tests are annotated on the plot.

In the Turkana, we additionally found geography-based genetic substructuring based on territorial region (Figure 2A; Figure S5). In the ADMIXTURE analyses, we see substructure within the Turkana before we see all four ethnolinguistic groups being identified separately. For example, at K = 4, the Turkana are characterized by two different ancestries, with one of these ancestries unique to individuals from the Ngibochoros territorial section (Figure 2A). Consistent with this, in the PCA, we observe variation within the Turkana along PC2 (Figure 1B). The individuals from the Ngibochoros territorial section separate from the other Turkana territorial sections along PC2. We further calculated genetic differentiation, FST, among the three Turkana territorial sections. We found that FST between Ngibochoros and either Kwatela or Ngiyapakuno are much higher than FST between Kwatela and Ngiyapakuno, which are territorial sections that are both adjacent to each other and distant from the Ngibochoros (Figure 1A; Figure 2B).

The observed territorial section substructuring in the Turkana may be due to geographic separation among the territorial sections. Alternatively, it is possible that individuals from Ngibochoros - the territorial section that is closer geographically to the Borana, Samburu, and Rendille than the other Turkana territorial sections - may have admixed with the other neighboring groups, resulting in the higher genetic differentiation. To investigate these scenarios further, we calculated genetic FST between each of the Turkana territorial sections and the other three ethnolinguistic groups and performed permutation tests to investigate whether FST values were significantly different between each territorial section and ethnolinguistic group. If gene flow was occurring among Ngibochoros and the other three ethnolinguistic groups, we would expect FST to be lower between Ngibochoros and at least one of the ethnolinguistic groups than between either Kwatela or Ngiyapakuno and the same ethnolinguistic groups. What we find, however, is that FST is not significantly different for each of the Turkana territorial sections when compared with each of the other ethnolinguistic groups (Figure 2C). Additionally, we do not find evidence of admixture between Ngibochoros and the other ethnolinguistic groups using f3 statistics. We find that f3 is positive between each of the other ethnolinguistic groups and the Ngibochoros territorial section (Table S3).

Y chromosome haplotypes do not consistently cluster by natal clan affiliation

On the Y chromosome - where we expected Y haplotypes to be more similar for males from the same clan in groups with patrilineal descent than in different clans - we found that haplotypes do not consistently cluster by natal clan affiliation. For Turkana and Borana, there are no haplotypes unique to a clan (Figure 3A, 3D). For the Samburu, most of the haplotypes cluster by natal clan affiliation, with the exception of one haplotype that is shared among individuals from both clans we sampled (Figure 3B). For the Rendille we observed one haplotype unique to individuals from the Ldupsai clan, however, the rest of the haplotypes were shared among clans (Figure 3C). We observe similar characteristics in the mitochondrial DNA haplotype networks (Figure 4). Overall, within these patrilineal descent groups, Y chromosome haplotypes generally do not cluster by natal clan affiliation.

Figure 3. Y chromosome haplotypes do not consistently cluster by natal clan affiliation.

Figure 3.

Haplotype networks constructed from Y chromosome SNP data from A) Turkana, B) Samburu, C) Rendille and D) Borana male samples. The size of each node (circle) is proportional to the number of samples in the node (larger nodes have more samples and smallest nodes have 1 sample). Colors within each node represent natal clan affiliation corresponding to the key in each panel.

Figure 4. Mitochondrial DNA haplotypes do not consistently cluster by natal clan or ethnolinguistic group affiliation.

Figure 4.

Haplotype networks constructed from mitochondrial DNA SNP data from A) Turkana, B) Samburu, C) Rendille and D) Borana male and female samples. The size of each node (circle) is proportional to the number of samples in the node (larger nodes have more samples and smallest nodes have 1 sample). Colors within each node represent natal clan affiliation corresponding to the key in each panel. Major haplogroups are also annotated on each network in grey. E) Stacked bar plots of mitochondrial DNA haplogroups of male and female samples. Bars are colored by ethnolinguistic group affiliation.

Similar to the haplotype network analysis, we find that Y chromosome haplogroups do not cluster by natal clan affiliation and, additionally, do not cluster by ethnolinguistic group (Figure 3). A total of 11 Y chromosome haplogroups were assigned across 139 unrelated male samples. We find most samples were assigned haplogroups commonly found in Africa; a majority of samples were assigned to the haplogroup E, then followed by A, B, J, and T. We did not observe any haplogroups unique to a single ethnolinguistic group or clan, rather haplogroups were shared with 2 or more ethnolinguistic groups or clans. For the mtDNA haplogroups, there were a total of 74 haplogroups assigned across 297 individuals. As expected, L haplogroups are the most common in these populations, then followed by M and I. Other lower frequency mtDNA haplogroups include N1a1, K1a18, U9, U6, T1, HV1b1, J1d1a, R0, and R31. Most mtDNA haplogroups were shared with 2 or more ethnolinguistic groups or clans (Figure 4E). However, L0b* and L2a1'2'3'4 were found only in Turkana but did not have any clustering of sample based on natal clan affiliation (Figure 4E).

Genetic differentiation is driven by physical separation, not cultural processes

Ethnolinguistic groups that are geographically closer typically had lower genetic FST (Figure 4). The lowest genetic FST was found between the Samburu and Rendille yet they speak languages from different language families; individuals from these ethnolinguistic groups are closer geographically to each other than to any other ethnolinguistic group. The Turkana sampled here are, on average, furthest geographically from the other ethnolinguistic groups - ranging from 286 km to 439 km away. The Turkana also have much higher genetic FST values with the Rendille, Borana, and Samburu than FST measured between any two comparisons of the Rendille, Borana, and Samburu (Figure 5). Similar to the overall pattern observed with FST, in the outgroup f3 analysis we find that ethnolinguistic groups that are geographically further have less shared drift while groups that are geographically closer have more shared drift (Figure 5). Interestingly, unlike FST, f3 values are similar for the combinations including Turkana, and Samburu and Rendille do not have the most shared drift (Figure 5).

Figure 5. Geography primarily impacts patterns of genetic differentiation among ethnolinguistic groups.

Figure 5.

We calculated autosomal FST (top) and outgroup f3 (bottom) among ethnolinguistic groups. For the outgroup f3 calculations, Yoruba from the 1000 Genomes Resource were used as the outgroup population. Bars are ordered by FST. Dark orange (left) are groups furthest geographically, while the lighter orange bars are groups closest geographically. The Samburu and Rendille (pale orange) are two neighboring groups that speak languages from different language families, yet have the lowest genetic FST observed in our study. Line graph corresponds to the geographic distance between each pair of ethnolinguistic groups.

For some groups, the pattern of genetic differentiation secondarily paralleled linguistic relationships. Among the Turkana genetic FST comparisons, genetic FST between Turkana and Samburu - both Nilo-Saharan speakers - is about two times lower than genetic FST between the Turkana and Rendille, even though the Rendille individuals sampled here are closer to Turkana by about 63 km (Figure 5).

We find that cultural differentiation does not predict genetic differentiation among neighboring groups in northern Kenya. Genetic FST and cultural FST are not significantly correlated with each other (R = 0.63, p-value = 0.18; Figure 6; Figure S6). However, we observe a significant positive correlation between genetic FST and geographic distance both at the ethnolinguistic group level and also Turkana territorial section level (ethnolinguistic group level: R = 0.82, p-value = 0.048; Turkana territorial section level: R = 0.992, p-value = <0.001; Figure 6; Figure S6). We additionally performed Partial Mantel tests to test whether genetic variation can be explained by culture or language after controlling for the effects of geography. After geography is controlled, we find that genetic FST is not correlated with cultural FST (p-value = 0.58) or linguistic distance (p-value = 0.08).

Figure 6. Cultural differentiation does not predict genetic differentiation among human ethnolinguistic groups in northern Kenya.

Figure 6.

We performed a series of Pearson's correlations to explore whether cultural differentiation may impact genetic FST. Pearson correlations for A) genetic FST and geographic distance, B) cultural FST and geographic distance, C) genetic FST and cultural FST, D) genetic FST and linguistic distance, E) cultural FST and linguistic distance, and F) geographic distance and linguistic distance. R corresponds to the correlation coefficient; p corresponds to the p-value.

Discussion

In this study, we generated genome-wide SNP genotype data and investigated the extent to which geographic and cultural processes shape genetic variation within and among four pastoral populations in northern Kenya. We sampled across multiple layers of social organization - ethnolinguistic groups, clans, and territorial sections - finding that geography, rather than cultural processes, predominantly shape patterns of genetic variation in northern Kenya.

Among ethnolinguistic groups, we observed a clinal pattern of variation with a lack of discrete clustering, particularly between the Rendille and Samburu, and comparatively high levels of intermarriage between them. These results suggest ongoing gene flow between the Rendille and Samburu, the two most closely geographically located groups, but not the most culturally similar. Previous literature has noted intermarriage between the Rendille and Samburu (Spencer, 2012) and relatively higher levels of cooperation (Handley & Mathew, 2020). Genetic clustering of Cushitic and Nilo-Saharan speaking groups (of which the Rendille and Samburu are a part of, respectively) has previously been observed, supporting evidence of gene flow between these larger linguistic groups (Tishkoff et al., 2009). Our findings confirm previous genetic and cultural observations and provide an example in humans where genes are shared between different ethnolinguistic groups at a local geographic scale.

Though we found a clinal pattern of variation, the Waso Borana and Turkana formed fairly discrete clusters of unique genetic variation. Strikingly, in the individuals we sampled, no intermarriage was reported in the Turkana at all, and for Waso Borana, there was no intermarriage with the other three ethnolinguistic groups. These results suggest isolation in Waso Borana and Turkana from the ethnolinguistic groups sampled in this study. However, it is possible our sampling locations may be driving part of these observations. The regions that the Waso Borana inhabit border the other ethnolinguistic groups; however, we sampled individuals from the Merti region, which is an interior region of the Waso Borana territory. Likewise, the Turkana individuals we sampled were from the north and west regions of Turkana that do not directly border the other groups. We speculate that although these are nomadic groups that can traverse large distances, intermixing may occur in boundary regions rather than in interior regions. Future studies including individuals from both interior and border regions may shed light on this.

Perhaps one of the most intriguing results in this study was the observed genetic substructuring within Turkana based on territorial section affiliation. Individuals from the Ngibochoros territorial section are further geographically from the individuals sampled from the other territorial sections, and our results suggest that this geographic separation results in high genetic differentiation between individuals from Ngibochoros with individuals from Kwatela and Ngiyapakuno. However, we cannot rule out the possibility of this observed substructuring being driven by admixture with surrounding groups not sampled in this study. Because there is no cultural barrier to Turkana individuals marrying individuals from different territorial sections (i.e., clan level exogamy) and because there is extensive migration in dry season across territorial section boundaries in these nomadic groups, we did not expect to observe genetic substructuring within the Turkana. Rather, we were expecting the Turkana to be largely homogenous, similar to what we observed within the other three ethnolinguistic groups we sampled in this study. The Turkana are however the most populous of the four ethnic groups, numbering approximately 1 million individuals, and having the largest geographic span. It is possible that although there is a shared cultural identity over this larger area, interpersonal interactions and co-mingling between distant Turkana territorial sections are limited. Genetic substructuring has been observed across humans on larger geographic scales (i.e., (Alsmadi et al., 2013; Bryc et al., 2010; Jakkula et al., 2008; Salmela et al., 2011; Tian et al., 2008; Tishkoff et al., 2009; Xu et al., 2009)); however, due to a lack of dense sampling within individual populations across Africa, genetic substructuring within a single ethnolinguistic group has not been widely observed within the continent, nor indeed within as small of a region as we investigate here. Because underlying genetic structure can have implications for case-control studies (Price, Zaitlen, Reich, & Patterson, 2010), our results highlight the importance of accounting for fine-scale substructuring in future genomic studies, particularly with the Turkana, and emphasize the continued importance of characterizing genetic structure across globally diverse human populations.

Within ethnolinguistic groups, we found that Y chromosome haplotypes do not consistently cluster by natal clan affiliation, suggesting that patrilineality may not have a strong impact on patterns of male-specific genetic variation in northern Kenya pastoral populations. Previous research has found that, in groups with patrilineal descent, like pastoralists in Central Asia (Chaix et al., 2007) and the Bimoba in Ghana (Sanchez-Faddeev et al. 2013), males from the same clan have identical or similar Y haplotypes. However, this is not always the case, as seen in tribal Yemen, where Y haplotypes do not clearly cluster by clan (Raaum, Al-Meeri, and Mulligan 2013). A possible explanation for our finding is that cultural conception of fatherhood, and therefore clan affiliation, does not always correspond with who one’s biological father is. For example, in these groups, offspring from unofficial marriages - unions in which the bride price has not been paid - take on their mother’s clan. This would result in a mismatch in clan assignment for these offspring. Adoption can also result in a mismatch in clan affiliation. Adoption is known to occur in Turkana and an adopted child takes on the clan of their adopted father. Overall, these results highlight that patrilineal descent groups do not always correspond with genetic patriline.

We found that genetic differentiation was highest between ethnolinguistic groups separated by the largest geographic distances, suggesting that geography primarily impacts patterns of genetic differentiation among northern Kenyan populations. Previous studies of genetic structure among human populations in Africa have found correspondence between genetic structure and linguistic affiliation and/or geography, with some studies reporting correspondence of genetic structure predominantly with linguistic affiliations (Bryc et al., 2010; Tishkoff et al., 2009), while others found that patterns of genetic structure predominantly mirror geography and ecological barriers (Babiker, Schlebusch, Hassan, & Jakobsson, 2011; Uren et al., 2016). For northern Kenyan human populations, our results suggest that geography primarily shapes the observed patterns of genetic differentiation.

Though genetic differentiation primarily paralleled geographic distances, for some groups in our study, the pattern of genetic differentiation secondarily paralleled linguistic relationships. Specifically, we found that Turkana and Samburu had lower FST than the Turkana and Rendille, despite the former being geographically more distant from one another than Turkana and Rendille. The close genetic relationship between Turkana and Samburu compared to Turkana and Rendille could be due to shared Nilo-Saharan ancestry between the Turkana and Samburu but could also be the result of sampling from areas not directly bordering each group. Although not commonplace, the Turkana are known to intermarry with both the Rendille and Samburu, and likely occurs in regions bordering each group. Given our sampling strategy, we were unable to assess the extent of gene flow in regions directly bordering each group and if this differs from interior regions. Interestingly, we did not observe this pattern in the outgroup f3 analysis. Outgroup f3 measures shared drift between two groups relative to an outgroup and is less sensitive to population-specific drift (Patterson et al. 2012). It is possible that these FST values are representing sample specific drift in these groups. However, both FST and outgroup f3 show a pattern where ethnolinguistic groups that are geographically closer are more genetically similar and ethnolinguistic groups that are geographically further have less genetic similarity, so a more conservative interpretation of these results is that geography is driving patterns of genetic variation among ethnolinguistic groups.

Lastly, we found that cultural differentiation does not predict genetic differentiation among neighboring populations in northern Kenya. Previous studies have investigated the relation between genes, language, and cultural traits to understand the extent to which genes and culture/language travel among human populations, with examples in human history where genes and cultural traits and/or language have been shown to correspond (i.e., (Karafet et al. 2016; Filippo et al. 2012; Hunley et al. 2008; Lansing et al. 2007; Hewlett, De Silvestri, and Guglielmino 2002; Matsumae et al. 2021; Brown et al. 2014)) and others where spoken language has been shown to have no effect on genetic structure (Veeramah et al., 2010). Here, we used cultural FST based on norm statements relevant to cooperation, crime and punishment, raiding, family dynamics, and other cultural markers to test whether genes and culture travel together on a local geographic scale and find that cultural FST and genetic FST are not correlated with each other among northern Kenya pastoralists. We caution against the overinterpretation of this result, however, due to the limited number of groups sampled here. We anticipate this metric of cultural similarity will be of interest for future studies aimed at assessing questions of the movement of genes and culture in humans on both larger and local geographic scales. Taken together with the other results in this study, our findings suggest that geographic proximity, not cultural similarity, may provide a better explanation for the observed patterns of genetic variation among these groups.

Supplementary Material

Figure S3
Figure S1
Figure S4
Figure S5
Figure S2
Figure S6
Table S1
Supporting Information

Acknowledgments

This work was funded by the John Templeton Foundation (grant no. 48952) to SM, the National Institute of General Medical Sciences of the National Institutes of Health R35GM124827 to MAW, and The Graduate College at Arizona State University, Achievement Rewards for College Scientists (ARCS) Foundation Phoenix Chapter as a Pierson Scholar, and Arizona State University Chapter Sigma Xi to AMTO. The authors acknowledge Research Computing at Arizona State University for providing high-performance computing resources that have contributed to the research results. The National Museums of Kenya provided institutional support to conduct the research in Kenya. We thank our field research assistants for translating the questionnaires and aiding with data collection: Ekiru Carlystus, Amuria Lotiira, Chegem Muya, Gilbert Topos, Dismas Lomelu, Mohamed Noor Guyo, Abdi Wario, Paul Leramato, Damaris Lekilaui, Julius Longonyek, Sinyati Lesowapir, Rafael Letele, Simon Harugura, Benson Morsa, Ejere Ballo, and Lebo Parkeri. We also thank our participants and host communities for their hospitality and for their continued support in this project.

Footnotes

Declaration of Interests

The authors declare no competing interests.

Data Availability

The genotype data generated in this manuscript has been deposited on dbGap (dbGap accession number phs002654.v1.p1) and will be made available upon publication. All original code used in this manuscript can be found on GitHub: https://github.com/SexChrLab/Kenya_Fst.

References

  1. 1000 Genomes Project Consortium, Auton Adam, Brooks Lisa D., Durbin Richard M., Garrison Erik P., Kang Hyun Min, Korbel Jan O., et al. 2015. “A Global Reference for Human Genetic Variation.” Nature 526 (7571): 68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alexander David H., Novembre John, and Lange Kenneth. 2009. “Fast Model-Based Estimation of Ancestry in Unrelated Individuals.” Genome Research 19 (9): 1655–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Alsmadi Osama, Thareja Gaurav, Alkayal Fadi, Rajagopalan Ramakrishnan, John Sumi Elsa, Hebbar Prashantha, Behbehani Kazem, and Thanaraj Thangavel Alphonse. 2013. “Genetic Substructure of Kuwaiti Population Reveals Migration History.” PloS One 8 (9): e74913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Babiker Hiba Ma, Schlebusch Carina M., Hassan Hisham Y., and Jakobsson Mattias. 2011. “Genetic Variation and Population Structure of Sudanese Populations as Indicated by 15 Identifiler Sequence-Tagged Repeat (STR) Loci.” Investigative Genetics 2 (1): 12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Behr Aaron A., Liu Katherine Z., Liu-Fang Gracie, Nakka Priyanka, and Ramachandran Sohini. 2016. “Pong: Fast Analysis and Visualization of Latent Clusters in Population Genetic Data.” Bioinformatics 32 (18): 2817–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bell Adrian V., Richerson Peter J., and Richard McElreath. 2009. “Culture Rather than Genes Provides Greater Scope for the Evolution of Large-Scale Human Prosociality.” Proceedings of the National Academy of Sciences of the United States of America 106 (42): 17671–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bhatia Gaurav, Patterson Nick, Sankararaman Sriram, and Price Alkes L.. 2013. “Estimating and Interpreting FST: The Impact of Rare Variants.” Genome Research 23 (9): 1514–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bose Aritra, Platt Daniel E., Parida Laxmi, Drineas Petros, and Paschou Peristera. 2021. “Integrating Linguistics, Social Structure, and Geography to Model Genetic Diversity within India.” Molecular Biology and Evolution, January. 10.1093/molbev/msaa321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Brown Steven, Savage Patrick E., Min-Shan Ko Albert, Stoneking Mark, Ko Ying-Chin, Loo Jun-Hun, and Trejaut Jean A.. 2014. “Correlations in the Population Structure of Music, Genes and Language.” Proceedings. Biological Sciences 281 (1774): 20132072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bryc Katarzyna, Auton Adam, Nelson Matthew R., Oksenberg Jorge R., Hauser Stephen L., Williams Scott, Froment Alain, et al. 2010. “Genome-Wide Patterns of Population Structure and Admixture in West Africans and African Americans.” Proceedings of the National Academy of Sciences of the United States of America 107 (2): 786–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cavalli-Sforza LL, Piazza A, Menozzi P, and Mountain J. 1988. “Reconstruction of Human Evolution: Bringing Together Genetic, Archaeological, and Linguistic Data.” Proceedings of the National Academy of Sciences. 10.1073/pnas.85.16.6002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Chaix Raphaëlle, Quintana-Murci Lluís, Hegay Tatyana, Hammer Michael F., Mobasher Zahra, Austerlitz Frédéric, and Heyer Evelyne. 2007. “From Social to Genetic Structures in Central Asia.” Current Biology: CB 17 (1): 43–48. [DOI] [PubMed] [Google Scholar]
  13. Chang Christopher C., Chow Carson C., Tellier Laurent Cam, Vattikuti Shashaank, Purcell Shaun M., and Lee James J.. 2015. “Second-Generation PLINK: Rising to the Challenge of Larger and Richer Datasets.” GigaScience 4 (February): 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Choudhury Ananyo, TrypanoGEN Research Group, Aron Shaun, Botigué Laura R., Sengupta Dhriti, Gerrit Botha, Bensellak Taoufik, et al. 2020. “High-Depth African Genomes Inform Human Migration and Health.” Nature. 10.1038/s41586-020-2859-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Cesare de Filippo, Cesare de Filippo, Koen Bostoen, Stoneking Mark, and Brigitte Pakendorf. 2012. “Bringing Together Linguistic and Genetic Evidence to Test the Bantu Expansion.” Proceedings of the Royal Society B: Biological Sciences. 10.1098/rspb.2012.0318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Gurdasani Deepti, Carstensen Tommy, Tekola-Ayele Fasil, Pagani Luca, Tachmazidou Ioanna, Hatzikotoulas Konstantinos, Karthikeyan Savita, et al. 2015. “The African Genome Variation Project Shapes Medical Genetics in Africa.” Nature 517 (7534): 327–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Handley Carla, and Mathew Sarah. 2020. “Human Large-Scale Cooperation as a Product of Competition between Cultural Groups.” Nature Communications. 10.1038/s41467-020-14416-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Hewlett Barry S., Annalisa De Silvestri, and Guglielmino C. Rosalba. 2002. “Semes and Genes in Africa.” Current Anthropology 43 (2): 313–21. [Google Scholar]
  19. Heyer Evelyne, Balaresque Patricia, Jobling Mark A., Quintana-Murci Lluis, Chaix Raphaelle, Segurel Laure, Aldashev Almaz, and Hegay Tanya. 2009. “Genetic Diversity and the Emergence of Ethnic Groups in Central Asia.” BMC Genetics 10 (September): 49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hollfelder Nina, Schlebusch Carina M., Günther Torsten, Babiker Hiba, Hassan Hisham Y., and Jakobsson Mattias. 2017. “Northeast African Genomic Variation Shaped by the Continuity of Indigenous Groups and Eurasian Migrations.” PLoS Genetics 13 (8): e1006976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hudson RR, Slatkin M, and Maddison WP. 1992. “Estimation of Levels of Gene Flow from DNA Sequence Data.” Genetics 132 (2): 583–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Hunley Keith, Dunn Michael, Eva Lindström Ger Reesink, Terrill Angela, Healy Meghan E., Koki George, Friedlaender Françoise R., and Friedlaender Jonathan S.. 2008. “Genetic and Linguistic Coevolution in Northern Island Melanesia.” PLoS Genetics 4 (10): e1000239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Jakkula Eveliina, Karola Rehnström, Teppo Varilo, Pietiläinen Olli P. H., Paunio Tiina, Pedersen Nancy L., deFaire Ulf, et al. 2008. “The Genome-Wide Patterns of Variation Expose Significant Substructure in a Founder Population.” American Journal of Human Genetics 83 (6): 787–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Karafet Tatiana M., Bulayeva Kazima B., Nichols Johanna, Bulayev Oleg A., Gurgenova Farida, Omarova Jamilia, Yepiskoposyan Levon, Savina Olga V., Rodrigue Barry H., and Hammer Michael F.. 2016. “Coevolution of Genes and Languages and High Levels of Population Structure among the Highland Populations of Daghestan.” Journal of Human Genetics 61 (3): 181–91. [DOI] [PubMed] [Google Scholar]
  25. Kebathi Joyce N. 2008. “Measuring Literacy: The Kenya National Adult Literacy Survey.” Adult Education and Development 71. [Google Scholar]
  26. Kim Seongho. 2015. “Ppcor: An R Package for a Fast Calculation to Semi-Partial Correlation Coefficients.” Communications for Statistical Applications and Methods 22 (6): 665–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Lansing J. Stephen, Cox Murray P., Downey Sean S., Gabler Brandon M., Hallmark Brian, Karafet Tatiana M., Norquest Peter, et al. 2007. “Coevolution of Languages and Genes on the Island of Sumba, Eastern Indonesia.” Proceedings of the National Academy of Sciences of the United States of America 104 (41): 16022–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. López Saioa, Tarekegn Ayele, Band Gavin, Lucy van Dorp Nancy Bird, Morris Sam, Oljira Tamiru, et al. 2021. “Evidence of the Interplay of Genetics and Culture in Ethiopia.” Nature Communications 12 (1): 3581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Manica Andrea, Prugnolle Franck, and Balloux François. 2005. “Geography Is a Better Determinant of Human Genetic Differentiation than Ethnicity.” Human Genetics 118 (3–4): 366–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Marchi Nina, Hegay Tatyana, Mennecier Philippe, Georges Myriam, Laurent Romain, Whitten Mark, Endicott Philipp, et al. 2017. “Sex-Specific Genetic Diversity Is Shaped by Cultural Factors in Inner Asian Human Populations.” American Journal of Physical Anthropology 162 (4): 627–40. [DOI] [PubMed] [Google Scholar]
  31. Matsumae Hiromi, Ranacher Peter, Savage Patrick E., Blasi Damián E., Currie Thomas E., Koganebuchi Kae, Nishida Nao, et al. 2021. “Exploring Correlations in Genetic and Cultural Variation across Language Families in Northeast Asia.” Science Advances 7 (34). 10.1126/sciadv.abd9223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Mulindwa Julius, Noyes Harry, Ilboudo Hamidou, Pagani Luca, Nyangiri Oscar, Kimuda Magambo Phillip, Ahouty Bernardin, et al. 2020. “High Levels of Genetic Diversity within Nilo-Saharan Populations: Implications for Human Adaptation.” American Journal of Human Genetics 107 (3): 473–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Nettle Daniel, and Harriss Louise. 2003. “Genetic and Linguistic Affinities between Human Populations in Eurasia and West Africa.” Human Biology 75 (3): 331–44. [DOI] [PubMed] [Google Scholar]
  34. Novembre John, Johnson Toby, Bryc Katarzyna, Kutalik Zoltán, Boyko Adam R., Auton Adam, Indap Amit, et al. 2008. “Genes Mirror Geography within Europe.” Nature 456 (7218): 98–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Pagani Luca, Kivisild Toomas, Tarekegn Ayele, Ekong Rosemary, Plaster Chris, Romero Irene Gallego, Ayub Qasim, et al. 2012. “Ethiopian Genetic Diversity Reveals Linguistic Stratification and Complex Influences on the Ethiopian Gene Pool.” American Journal of Human Genetics 91 (1): 83–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Paradis Emmanuel. 2010. “Pegas: An R Package for Population Genetics with an Integrated–Modular Approach.” Bioinformatics 26 (3): 419–20. [DOI] [PubMed] [Google Scholar]
  37. Paradis Emmanuel, and Schliep Klaus. 2019. “Ape 5.0: An Environment for Modern Phylogenetics and Evolutionary Analyses in R.” Bioinformatics 35 (3): 526–28. [DOI] [PubMed] [Google Scholar]
  38. Patterson Nick, Moorjani Priya, Luo Yontao, Mallick Swapan, Rohland Nadin, Zhan Yiping, Genschoreck Teri, Webster Teresa, and Reich David. 2012. “Ancient Admixture in Human History.” Genetics 192 (3): 1065–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Price Alkes L., Patterson Nick J., Plenge Robert M., Weinblatt Michael E., Shadick Nancy A., and Reich David. 2006. “Principal Components Analysis Corrects for Stratification in Genome-Wide Association Studies.” Nature Genetics 38 (8): 904–9. [DOI] [PubMed] [Google Scholar]
  40. Price Alkes L., Zaitlen Noah A., Reich David, and Patterson Nick. 2010. “New Approaches to Population Stratification in Genome-Wide Association Studies.” Nature Reviews. Genetics 11 (7): 459–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Raaum Ryan L., Al-Meeri Ali, and Mulligan Connie J.. 2013. “Culture Modifies Expectations of Kinship and Sex-Biased Dispersal Patterns: A Case Study of Patrilineality and Patrilocality in Tribal Yemen.” American Journal of Physical Anthropology 150 (4): 526–38. [DOI] [PubMed] [Google Scholar]
  42. Ramachandran Sohini, Deshpande Omkar, Roseman Charles C., Rosenberg Noah A., Feldman Marcus W., and Cavalli-Sforza L. Luca. 2005. “Support from the Relationship of Genetic and Geographic Distance in Human Populations for a Serial Founder Effect Originating in Africa.” Proceedings of the National Academy of Sciences of the United States of America 102 (44): 15942–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Salmela Elina, Lappalainen Tuuli, Liu Jianjun, Sistonen Pertti, Andersen Peter M., Schreiber Stefan, Savontaus Marja-Liisa, et al. 2011. “Swedish Population Substructure Revealed by Genome-Wide Single Nucleotide Polymorphism Data.” PloS One 6 (2): e16747. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Sanchez-Faddeev Hernando, Pijpe Jeroen, van der Hulle Tom, Meij Hans J., van der Gaag Kristiaan J., Slagboom P. Eline, Westendorp Rudi G. J., and de Knijff Peter. 2013. “The Influence of Clan Structure on the Genetic Variation in a Single Ghanaian Village.” European Journal of Human Genetics: EJHG 21 (10): 1134–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Semo Armando, Gayà-Vidal Magdalena, Cesar Fortes-Lima, Alard Bérénice, Oliveira Sandra, Almeida João, Prista António, et al. 2020. “Along the Indian Ocean Coast: Genomic Variation in Mozambique Provides New Insights into the Bantu Expansion.” Molecular Biology and Evolution 37 (2): 406–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Sengupta Dhriti, Choudhury Ananyo, Fortes-Lima Cesar, Aron Shaun, Whitelaw Gavin, Bostoen Koen, Gunnink Hilde, et al. 2021. “Genetic Substructure and Complex Demographic History of South African Bantu Speakers.” Nature Communications 12 (1): 2080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Severson Alissa L., Shortt Jonathan A., Mendez Fernando L., Wojcik Genevieve L., Bustamante Carlos D., and Gignoux Christopher R.. 2018. “SNAPPY: Single Nucleotide Assignment of Phylogenetic Parameters on the Y Chromosome.” BioRxiv. bioRxiv. 10.1101/454736. [DOI] [Google Scholar]
  48. Spencer P 2012. “Nomads in Alliance: Symbiosis and Growth among the Rendille and Samburu of Kenya.” http://eprints.soas.ac.uk/20803/1/NOMADS%20IN%20ALLIANCE%202012.pdf. [Google Scholar]
  49. Sun Hao, Zhou Chi, Huang Xiaoqin, Liu Shuyuan, Lin Keqin, Yu Liang, Huang Kai, Chu Jiayou, and Yang Zhaoqing. 2013. “Correlation between the Linguistic Affinity and Genetic Diversity of Chinese Ethnic Groups.” Journal of Human Genetics 58 (10): 686–93. [DOI] [PubMed] [Google Scholar]
  50. Team, R. Core, and Others. 2013. “R: A Language and Environment for Statistical Computing.” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.470.5851&rep=rep1&type=pdf.
  51. Team, Rstudio, and Others. 2015. “RStudio: Integrated Development for R.” RStudio, Inc. , Boston, MA: URL Http://Www.Rstudio.Com 42: 14. [Google Scholar]
  52. Tian Chao, Kosoy Roman, Lee Annette, Ransom Michael, Belmont John W., Gregersen Peter K., and Seldin Michael F.. 2008. “Analysis of East Asia Genetic Substructure Using Genome-Wide SNP Arrays.” PloS One 3 (12): e3862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Tishkoff Sarah A., Reed Floyd A., Friedlaender Françoise R., Ehret Christopher, Ranciaro Alessia, Froment Alain, Hirbo Jibril B., et al. 2009. “The Genetic Structure and History of Africans and African Americans.” Science 324 (5930): 1035–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Uren Caitlin, Kim Minju, Martin Alicia R., Bobo Dean, Gignoux Christopher R., van Helden Paul D., Möller Marlo, Hoal Eileen G., and Henn Brenna M.. 2016. “Fine-Scale Human Population Structure in Southern Africa Reflects Ecogeographic Boundaries.” Genetics 204 (1): 303–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Veeramah Krishna R., Connell Bruce A., Naser Ansari Pour Adam Powell, Plaster Christopher A., Zeitlyn David, Mendell Nancy R., Weale Michael E., Bradman Neil, and Thomas Mark G.. 2010. “Little Genetic Differentiation as Assessed by Uniparental Markers in the Presence of Substantial Language Variation in Peoples of the Cross River Region of Nigeria.” BMC Evolutionary Biology 10 (March): 92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. “Vegan: Community Ecology Package.” n.d. Comprehensive R Archive Network (CRAN). Accessed January 19, 2022. https://CRAN.R-project.org/package=vegan.
  57. Vicente Mário, Jakobsson Mattias, Ebbesen Peter, and Schlebusch Carina M.. 2019. “Genetic Affinities among Southern Africa Hunter-Gatherers and the Impact of Admixing Farmer and Herder Populations.” Molecular Biology and Evolution 36 (9): 1849–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. “WALS Online - Home.” n.d. Accessed September 23, 2021. https://wals.info/. [Google Scholar]
  59. Weissensteiner Hansi, Pacher Dominic, Anita Kloss-Brandstätter Lukas Forer, Specht Günther, Bandelt Hans-Jürgen, Kronenberg Florian, Salas Antonio, and Schönherr Sebastian. 2016. “HaploGrep 2: Mitochondrial Haplogroup Classification in the Era of High-Throughput Sequencing.” Nucleic Acids Research 44 (W1): W58–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Wickham Hadley. 2011. “Ggplot2.” Wiley Interdisciplinary Reviews: Computational Statistics 3 (2): 180–85. [Google Scholar]
  61. Wright S 1943. “Isolation by Distance.” Genetics 28 (2): 114–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Xu Shuhua, Kangwanpong Daoroong, Seielstad Mark, Srikummool Metawee, Kampuansai Jatupol, Jin Li, and HUGO Pan-Asian SNP Consortium. 2010. “Genetic Evidence Supports Linguistic Affinity of Mlabri--a Hunter-Gatherer Group in Thailand.” BMC Genetics 11 (March): 18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Xu Shuhua, Yin Xianyong, Li Shilin, Jin Wenfei, Lou Haiyi, Yang Ling, Gong Xiaohong, et al. 2009. “Genomic Dissection of Population Substructure of Han Chinese and Its Implication in Association Studies.” American Journal of Human Genetics 85 (6): 762–74. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S3
Figure S1
Figure S4
Figure S5
Figure S2
Figure S6
Table S1
Supporting Information

Data Availability Statement

The genotype data generated in this manuscript has been deposited on dbGap (dbGap accession number phs002654.v1.p1) and will be made available upon publication. All original code used in this manuscript can be found on GitHub: https://github.com/SexChrLab/Kenya_Fst.

RESOURCES