Skip to main content
PLOS One logoLink to PLOS One
. 2023 Nov 8;18(11):e0290423. doi: 10.1371/journal.pone.0290423

Eurasian back-migration into Northeast Africa was a complex and multifaceted process

Rickard Hammarén 1,2, Steven T Goldstein 2, Carina M Schlebusch 1,3,4,*
Editor: Gyaneshwer Chaubey5
PMCID: PMC10631636  PMID: 37939042

Abstract

Recent studies have identified Northeast Africa as an important area for human movements during the Holocene. Eurasian populations have moved back into Northeastern Africa and contributed to the genetic composition of its people. By gathering the largest reference dataset to date of Northeast, North, and East African as well as Middle Eastern populations, we give new depth to our knowledge of Northeast African demographic history. By employing local ancestry methods, we isolated the Non-African parts of modern-day Northeast African genomes and identified the best putative source populations. Egyptians and Sudanese Copts bore most similarities to Levantine populations whilst other populations in the region generally had predominantly genetic contributions from the Arabian peninsula rather than Levantine populations for their Non-African genetic component. We also date admixture events and investigated which factors influenced the date of admixture and find that major linguistic families were associated with the date of Eurasian admixture. Taken as a whole we detect complex patterns of admixture and diverse origins of Eurasian admixture in Northeast African populations of today.

Introduction

Northeast Africa has undeniably been a key region in human evolutionary history. The out-of-Africa migrations need to have passed through, if not originated in the region. East Africa is also home to some of the most important fossil evidence for human evolution from early bipedal species such as Australopithecus afarensis [1], to the emergence of early anatomically modern humans such as the Omo fossils [2, 3].

Numerous overlapping migrations of farmers and herders over the last several thousand years have also played a critical role in reshaping the current socioeconomic and linguistic diversity of the region. It is clear that back migrations into Northeast Africa have had a major impact on the genetic ancestries of the peoples in the region today [46]. Ethiopian populations, for instance, harbor a large proportion of “non-African” ancestry, as high as, ∼ 40% in some groups—see for instance the Amhara in Fig 1c in [7]. What is clear is that some current-day Northeast Africans can trace much of their ancestry from other sources than the original hunter-gatherers in the region, such as the Mota individual, an Ethiopian male who lived around 4500 years ago [8, 9]. It is also clear that these back migrations into Africa have been ongoing for a long time period. For North Africa, seven individuals from Morocco that had a high affinity to Middle Eastern populations, dated to 15 000 years ago, suggesting the possibility that similar deep-in-time admixtures might have occurred in other parts of Africa [10].

In 2017, several ancient genomes were sequenced in an attempt to uncover the demographic patterns in African prehistory [11]. The study contained data from 16 ancient African individuals from 8 100—400 BP. They found that ancient East African hunter-gatherers form a cline of ancestry with modern-day southern African hunter-gatherer (San) groups. This indicates that in the past, hunter-gatherers with a gradient of shared ancestry ranged from eastern to southern Africa. The fact that these hunter-gatherer groups existed until the relatively recent past allows for the possibility of interactions between them and later pastoralist and agricultural groups in East Africa. This data was later re-analyzed with several new individuals, particularly from East Africa [5]. They proposed a four-stage model where initially Sudanese Nilotic speakers admixed with groups with Eurasian ancestry (either from Northern Africa or the Levant) within Northeast Africa. In step two, the descendants of these groups migrated to East Africa reaching Lake Turkana by around 5 000—4 000 BP and central Tanzania by around 3000 BP and mixed with local hunter-gatherer groups throughout this process [5]. The first signs of pastoralism in East Africa coincide with these events. Thirdly the second wave of Sudanese-related groups migrated into the area and contributed to the pastoral Iron Age populations. Lastly, West African ancestry (genetically similar to Bantu speakers) appeared alongside the advent of crop farming in the region. These findings were then yet again revised in 2020 [12]. By analyzing 20 additional ancient individuals, additional resolution was given to the picture and several new patterns emerged. Mainly, [12] propose that the pastoralists probably arrived in East Africa in multiple waves from several different locations, or that severe population structure was present (distinguishing between the two was not possible). Both [12] and [5] conclude that there was no single event of hunter-gatherer and herder introgression, neither in space nor in time. Instead a complex “moving frontier” is proposed with diverse patterns of interactions along the contact zones between hunter-gatherer and herder groups.

In the last decade, several genetic studies on modern-day populations have focused on the genetic demographic history of Ethiopia and found patterns of linguistic stratification within Ethiopian populations, i.e. populations within the same language family are more similar to each other than populations belonging to other language families [4, 13, 14]. It is less clear if this pattern holds true in Northeast Africa as a whole, as [15] found a stronger association between geography and genetics than between genetics and linguistic family. By studying modern-day genetic variation, [6] investigated the non-African part of Ethiopian populations and were able to conclude that there has been Eurasian admixture, likely coming from Levantine, rather than Arabian populations. This event was estimated to have occurred around 3 000 years ago.

By leveraging one of the largest datasets of Northeast African populations to date, we aim to add resolution to Eurasian admixture in Northeast African populations. Specifically, we aim to improve the estimation of the best proxies for the origin of Eurasian admixture in modern-day Northeast African populations by using more Northeast African and Middle Eastern, and Eurasian reference populations. In this study, we follow the approach of [4, 6, 16] in that we employ local ancestry methods to identify the Eurasian fragments of East African genomes and extract those segments from the surrounding genomes, a process referred to here as ancestry-deconvolution. We then identify the current-day populations that best match those segments. We also date the events to get a better understanding of historic and prehistoric movements in the region. Using the information of possible sources for admixture and dating of these, we construct a model representative of the population history in the region. Overall we find a complex history of Eurasian admixture in Northeastern Africa, related to the spread of languages, the Muslim conquest, and trade routes along the Red Sea.

Materials and methods

Genotyping QC

Genotyping data was gathered from previously published studies [4, 7, 1726]. See S6 Table for a list of populations included in this study, their original population, language classification, and geographic coordinates. The geographic sampling information is displayed in S1 Fig. Only autosomal chromosomes were investigated in this study. PLINK v1.90b4.9 [27] was used for data handling and processing. Before data merging, each dataset was quality controlled which entailed 1) removing related individuals using KING [28], the first individual within each pair of second-degree relatives or closer was removed 2) SNPs with less than 1% genotyping rate was excluded (plink --geno 0.01) 3) C/G and A/T SNPs were removed 4) individuals with at least 15% missingness was removed (plink –mind 0.15) 5) potential genotyping errors were removed (plink --hwe 0.0000001) 6) lastly only overlapping SNPs between the datasets were kept.

Before analysis that could be adversely affected by linkage disequilibrium (ADMIXTURE and PCA) SNPs in LD were filtered out using plink --indep-pairwise 50 10 0.1.

The data from [4] was converted from hg18 to hg19 using the liftOver tool from UCSC (https://genome.ucsc.edu/cgi-bin/hgLiftOver).

As the number of individuals in each population varied quite substantially, from only a few individuals to around a hundred for other populations, we randomly sub-sampled all populations down to 30 individuals. This was done to reduce the effect that sample size can have on demographic inference.

Metadata

Geographic information about the populations was gathered from the original publications in the following fashion, 1) directly from the text or 2) if no coordinates were provided then they were interfered from the map of sampling locations, 3) if no map or coordinates were supplied, then a point in the middle of the respective country was chosen this was the case for three publications [17, 19, 25]. Regarding language classification, we followed a similar approach as for geographic data, namely that information/classification was used if available in the original publication. The Egyptians from [25] and the Qatari from [19] were both assumed to be Arabic speakers and thus classified as Semitic. The Niger-Kordofanian classification used in [7] was changed to Niger-Congo, to better align with the other datasets. For visualization purposes, the Semitic speakers on the African continent were given their own label (African Semitic) and their own colour. This distinction was only made to better distinguish between the investigated populations (targets) and Middle Eastern populations used as references. This distinction is thus based solely on geography and is not supported by any linguistic deductions. For a detailed classification of all linguistic groupings used, see S6 Table.

Population structure inferences

Unsupervised population structure inference analysis for K = 2 to K = 15 was performed with ADMIXTURE [29] version 1.3.0 for K = 2 to K = 15 using a random seed each time and repeated 50 times. PONG version 1.5 [30] was used to visualize the results and find the major mode and pairwise similarity within the major modes. Principal component analysis (PCA) was performed using FlashPCA version 2.0 [31]. For the PCA plots, PC refers to Principal Component, with each value in the PCA plots representing the projection of the data on the eigenvectors, scaled by the eigenvalues. Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP) was performed on the genotypes directly using the umap-learn python library version 0.5.2.

Patterns of migration rates

The migration rate over the sampling area was investigated using FEEMS [32]. A grid of coordinates covering North-Eastern Africa and most of the Middle East was generated. Cross-validation was performed and the lambda with the lowest cross-validation value was used to generate the final plot.

Phasing

Phasing was carried out out using SHAPEIT version 2.r837 [33] with the 1000 genomes phase 3 reference genomes [18] and options --states 500 --main 20 --burn 10 --prune 10. Misaligned sites between the reference dataset and panel were excluded.

Local ancestry estimation

MOSAIC version 1.3.7 was compiled and ran under R version 4.0.0 [34] to perform local admixture inference, admixture dating as well as ancestry deconvolution. To minimize the potential bias of different sample sizes between investigated target populations, and sources the number of individuals investigated for each population was downsampled to ten individuals. The ancestry deconvolution was performed by running MOSAIC, using the specified resources (see Results for specific scenarios), and then looking at the constructed ancestries that MOSAIC infers from the provided sources. The constructed ancestry in MOSAIC was then compared to the source populations and Fst was used to evaluate which one of the source ancestries it most closely resembled. If one of the ancestries shows the most genetic similarity to a Eurasian source then the analysis continued for that ancestry. Thus only samples/targets that mosaic found could be explained by at least one Eurasian ancestry source was ancestry de-convoluted. Segments of each individual’s genome that were assigned to the Eurasian ancestry with a probability of 80% or more by MOSAIC were saved and the remainder of the genome was set as missing. Admixture dating was extracted from MOSAIC’s co-ancestry curves for the Eurasian-like ancestry.

Outgrup f3

Outgroup f3 were performed using qp3Pop from ADMIXTOOLS 2 version 2.0 [35, 36]. The San population Ju|’hoansi was used as the outgroup and the extracted ancestry fragments of each target population were tested against populations from the Eurasian reference dataset. The aim of this procedure is to identify which reference population is most like the extracted Eurasian ancestry.

Visualization

PCA and outgroup f3 results were visualized in R version 3.6.1 using the libraries ggplot2 [37]. Maps were created in R version 3.6.1 using ggplot2 and the sf, rnaturalearth, and rnaturalearthdata libraries, the latter being based on the public domain maps and rasters from Natural Earth @ naturalearthdata.com. The kriging projection maps were generated in Surfer version 12.0.626 from Golden Software.

Results

After quality control, removal of related individuals, and down-sampling to a maximum of 30 individuals per population the dataset consisted of 2066 individuals from 101 population groups and 199 422 SNPs. Note that some populations are represented multiple times from different original publications, resulting in a total of 97 unique populations, see S6 Table.

Population structure in Northeast Africa

General population structure inferences were performed using PCA and ADMIXTURE on a dataset where SNPs in LD were pruned (85 529 SNPs remaining). The output from ADMIXTURE shown in Fig 1 (for full analysis see S2 Fig) captures similar patterns to the PCA analysis (S5S19 Figs), with the first separations being between major geographic regions. The K with the lowest cross-validation error was K = 13, see S3 Fig.

Fig 1. Spatial distribution of ancestries.

Fig 1

A) ADMIXTURE results for K = 7, 13 & 14 visualized using PONG (a truncated version of S2 Fig, and the K = 13 and 14 panels do not include all clusters, e.g. East Asia is not represented). B) Kriging plot of the distribution of the pink component at K = 14 (maximized in Yemen) in A on the East Africa populations. C) Kriging plot of the distribution of the dark blue component (Lebanese) from K = 14. D) FEEMS plot of inferred patterns of migration rates for the lowest cross-validation lambda. High w (blue) indicates an area with higher than average effective migration whilst low w (brown) indicates lower than average effective migration areas.

The first division in the data is between Africans and non-Africans, and it is clear that North- and East- Africans have a much larger proportion of shared ancestry with Eurasian groups than other African groups (K = 3 in S2 Fig). East African groups break away from other African groups at K = 5 via a component (black) maximized in the Nuba at 80.1%. Of particular interest for the present investigation is also the component that emerges is K = 8 (light orange) maximized in the Ari, Sabue and Gumuz populations. The Sabue is one of the few remaining hunter-gatherer groups in East Africa today and they share genetic continuity with earlier hunter-gatherer ancestry from the region [38] represented by the Mota individual [8]. The Ari, Gumuz and Sabue have been suggested to retain a high degree of ancient East African hunter-gatherer ancestry, [4, 26, 38] and our demographic analyses indicate a high degree of similarity between these populations. This component is shared with many other East African groups, displaying fractions of ancestry that show deep continuity in the region.

At K = 11 another East African component appears, maximized in the Somali populations and might represent Cushitic-related ancestry. Levantine populations separate from the Arabic populations at K = 14 and we visualized these two components using a Kriging interpolation across the study area, Fig 1. These two ancestries were the component maximized in the Lebanese Druze (dark blue) and the component highest in the Yemeni (blush pink).

To investigate the differences in affinities of our target populations to either Levantine and Yemenite ancestry we performed a f4 test. The test took the following form Yemen_YEMEN |Lebanese_Christian|Target S22 Fig. It showed significant association with Levantine for the populations north of the Sudanese BeniAmer as well as for the Oromo and Tygray from Ethiopia, the Kenyan Luhya and the Maasai from Kenya. No populations had a significantly higher association to Yemenite ancestry when compared to Levantine ancestry in this more stringent test. The test however highlight the importance of Levantine admixture for more northern populations in particular.

In our Principal Component (PC) analyses, the first PC differentiates between African and non-African groups S5 Fig. Several African populations fall on the cline between African and non-African variation, in particular North Africans, such as the Egyptians and populations from Sudan who are known to have Eurasian admixture [15]. We also observe a grouping according to linguistics where Omotic, Afro-Asiatic, and Nilo-Saharan speakers separate from each other. East African groups are positioned on the diagonal between PC 1 and PC 3, with the Ari, Sandawe, and Sabue populations forming their own cluster in the direction of the southern African Khoe-San, indicating shared ancestries between these hunter-gatherer groups S5 Fig. This cline is similar to what was found in studies using aDNA [5, 11] and is a reflection of the cline between southern and East African hunter-gather ancestry.

UMAP was also performed in the genotype information in our dataset, see S20 Fig. This analysis produces two larger clusters of populations, one consisting of West African groups, Eastern Bantu speakers, the Saharan speakers, the Nuer, Dinka, and Shiluk from Sudan. The other cluster contains mainly Middle Eastern populations and Ethiopians, as well as the remaining Sudanese populations.

To further investigate the historic patterns of gene flow, migrations and which barriers to migrations are evident across the region of interest, we used the FEEMS software package [32], Fig 1D.

Determination of Eurasian sources through local ancestry estimation

To identify distinct ancestries in East African populations, we employed MOSAIC [34]. We wanted to identify patterns of local ancestries and determine which of our reference populations were the best proxies for the different genetic components. In particular, we were interested in the “non-African” or rather Eurasian segments of the genomes. Following the approach from previous studies, we aimed to isolate these Eurasian genetic segments and analyze them in isolation [6, 39]. Thirty-five East African and Northeast African populations were chosen as target populations to analyze. For location and linguistic groups of these target populations see S4 Fig.

As has been shown in previous studies, and indicated by our demographic inference Fig 1 and S6 Fig, there are generally four main components of modern-day East African genetic ancestry [5, 12]. Namely, basal East African hunter-gatherer ancestry, Sudanese/Nilotic ancestry, Eurasian ancestry, and West African ancestry associated with Bantu speakers. Since the aim of this study is to identify the best proxy for the source of the Eurasian ancestry of the Northeast African populations, we constructed a scenario where we could use these four ancestral sources to paint the haplotypes of our chosen target populations using MOSAIC [34]. We set up an initial scenario to try and identify the best Eurasian source to use for further analyses. In this scenario we used the following populations: To represent the basal East African hunter-gatherer ancestry we chose the Sabue [26, 38]. The Sabue has been referred to by many different names in the literature, for instance, Shabo and Chabu, here we use the name used in the original publication of the data [26]. The CEU (Utah residents with Northern and Western European ancestry) population from the 1000 genomes consortium was chosen as a proxy for general Eurasian ancestry. The Sudanese Dinka was chosen to represent Sudanese ancestry (the group that defines the black component in the ADMIXTURE analysis associated with Sudanese ancestry). The YRI (Yoruba in Ibadan, Nigeria) was used as a proxy for West African Niger-Congo and Bantu-speaker-associated ancestry.

We then used these four populations (CEU, YRI, Dinka, and Sabue) as sources in a 3-way admixture scenario in MOSAIC and extracted the called genotypes that were assigned to the CEU-like ancestry with a probability of 80% or more. The closest affinity of each constructed ancestry was determined by the Fst test in MOSAIC against the four source populations. This resulted in regions of each East African individual’s genome that is more closely associated with a Eurasian ancestry than with the other ancestries. Only these regions were kept whilst the rest was set as missing for each individual see S5 Table for missingness information of each population.

This non-African part of the genomes was then compared using outgroup f3 to Eurasian references populations with the Ju|’hoansi as outgroup (target |REF |Ju|’hoansi). A higher value of the outgroup f3 test indicates a smaller genetic distance between the in-groups compared to the outgroup. The San group Ju|’hoansi was chosen as the outgroup since previous studies had shown them to be the least admixed modern-day Khoe-San group [23]. The f3 outgroup test thus identified the population that is the most similar to the Eurasian fraction of the Northeast African target populations, see the top three in Fig 2 and top five in S2 Table.

Fig 2. f3 outgroup results of the target populations.

Fig 2

Only the top three hits are shown. The f3 outgroup was calculated for the Eurasian-like ancestry for each target population in the following manner: Target | Source | Ju|’hoansi.

The outgroup f3 analysis provided us with the best Eurasian source population to use for each of our Northeast African target populations. We then re-ran the MOSAIC analysis above but instead used this best source instead of the CEU. We refer to this dataset and approach as the “best by f3”. This approach can also be thought of as using our prior knowledge to construct the best scenario.

As an alternative to our own ascertained approach we also tested other less constrained scenarios. In these scenarios, we kept YRI, Sabue, and Dinka as the three African populations and then varied the Eurasian sources. We tried both providing one Eurasian population at a time as well as providing two Eurasian populations at the same time. These scenarios were repeated under a 2-way, 3-way, and 4-way admixture scenario, that is using two, three, and four ancestral sources with the four or five reference populations respectively. All 35 target populations were investigated under these differing permutations. Since evaluating the best model can be nontrivial and require lots of manual curation we opted to use MOSAIC’s R2 metric (genomic fit) to evaluate the best model. In general, the simpler models performed better, all of the 2-way scenarios outperformed their equivalent (using the same populations) 3-way and 4-way admixture scenarios. Though using two Eurasian populations as sources outperformed a single source. This could be because our dataset does not contain a good match to the original source. These R2 values can be found in S1 Table. This dataset is referred to as the “Best by R2” or simply “R2” dataset in the rest of the study. These runs thus produce two inferred ancestral sources, one Non-African and one African. Five of the target populations did not generate a Non-African source as the closest fit determined by Fst, these were LWK-Luhya_Kenya, Sudan_Baria, Sudan_Hausa, Sudan_Nuer, and Uganda_Baganda thus none of these populations are shown in the best by R2 analysis. This second dataset is thus the best dataset that we achieved using a parametric approach. Ancestry tract length distribution plots for both of these datasets were generated and are available for download from DOI:10.17044/scilifelab.23957703.

Dating Eurasian admixture in Northeast Africa

For both the f3 outgroup-based approach and the R2 approach above, we determined the admixture date (in generations) from MOSAIC’s co-ancestry curves for most Eurasian-like constructed ancestry. The results of this dating can be seen in Fig 3 with the best source based on f3 in A, and D and the dates based on the runs with the highest R2 value in B and E.

Fig 3. Admixture dating in generation for the Eurasian-like ancestry from MOSAIC.

Fig 3

A and D contain data for the best source as determined by f3 whilst B and E illustrate the dataset determined by the best on R2 value. A and B are the admixture date in generations, C is the target population locations, and D and E are the same data but plotted over the study area surface using Kriging interpolation. The numbers here represent the major breaks (black lines). Note that some populations did not find a Eurasian source in the best by R2 runs and thus do not have a date.

Given a generation time of 29 years, this gives a time span ranging from 72.5 years ago for the Nilotic-speaking Anuak to 1027 years ago for the Cushitic-speaking Afar, both from Ethiopia for the best by R2 dataset [40]. In the best by f3 dataset the range is smaller ranging from 84 years for the Eastern Sudanic-speaking Nuer (South Sudan) to 940 for the Semitic-speaking Bataheen (Sudan).

To visualize the correlation between linguistic classification and the inferred admixture date we generated dot plots of the dates per linguistic group as well as the larger linguistic families, see Fig 4.

Fig 4. Dot plot representations of the admixture dating in generation for the most Eurasian-like ancestry from MOSAIC.

Fig 4

A and C contain data for the best source as determined by f3 whilst B and D illustrate the dataset determined by the best on R2 value. A and B are per smaller linguistic classifications whilst C and D show the same data but are divided into linguistic families. The red triangle represents the mean value.

We compared these dates to the categorical information we had about the populations, that is Country, Linguistic group (e.g. Semitic), or larger Linguistic family (e.g. Afro-Asiatic) using a two-way ANOVA, S4 Table. We find that only larger linguistic families correlated significantly with the detected admixture dates for the best by f3 dataset, S4A Table. The same pattern where the lowest p-value is observed for the larger linguistic family is true also for the R2 dataset but without reaching significance, S4B Table. We also test whether there was a spatial correlation to the admixture dates. This was done by comparing the great circle distance between Tel Aviv (a coastal location in the Levant) as well as Sanaa (the capital of Yemen) and each population’s sampling location, S21 Fig. For the R2 values by each linguistic family see S3 Table. For the distance from Tel Aviv, we find a low but significant correlation for both datasets, R2 of 0.088 for the best by f3 dataset and 0.067 for the best by R2 data. We find weaker, but significant, support for the distance from Sanaa in both datasets R2 0.048 (p = 0.001) for the best by f3 dataset and R2 0.031 (p = 0.002), S21 Fig.

As there were some discrepancies between the two dating approaches we compared the dates to each other by plotting the dates from the f3 dataset against the R2 dataset, see Fig 5. Then we performed linear regression on the data. This resulted in a correlation (R2) of 0.3093 with a p-value of 0.001417. The majority of target populations fall within the 95% confidence interval, the gray area in Fig 5. Notable exceptions are the Sudanese Nuba and the Ethiopian Afar populations that have much older dates in the R2 scenario, and the Sudanese Zagawa and Hadendowa, who display the opposite pattern with much older dates in the best by f3 scenario. Most other populations that deviate from either line do so with only a few generations. From our analyses, there is nothing in particular that makes these four populations stand out from nearby populations such as in the ADMIXTURE or PCA. The MOSAIC metric such as R2, Rst etc is also in line with comparable populations. The Sudanese Nuba are known to be a relatively heterogeneous group [41], but that is not reflected in our analyses of population structure, see Fig 1 and S6 Fig.

Fig 5. Correlation between dates from the two approaches.

Fig 5

Linear regression (blue line) comparing the admixture dates of the Eurasian-like ancestry from the best by f3 to the best by R2 dataset. The grey area represents the 95% confidence interval. The black line is X = Y, i.e. same date in both approaches.

We also compared our inferred dates to LD-based Malder [42] for both datasets S23 and S24 Figs. Malder generally only inferred a single admixture event, thus making the interpretation of a comparison between the two methods somewhat difficult.

Discussion

In this study, we investigate the patterns of different genetic ancestries in Northeast African populations, focusing on Eurasian back migrations. We inferred population structure using both global and local ancestry methods. Using the local ancestry method MOSAIC we identify regions of Northeast African populations’ genomes with Eurasian ancestry. We also attempt to date this admixture. In our approach, we start from the modern-day groups and try to infer patterns of past interactions by analyzing their genomes. We however acknowledge that Northeast Africa is a region with a complex history spanning thousands of years. The expansions and contractions, rise and fall of states, kingdoms, and empires across the region have had a major impact on the formation, dissolution, and current distributions of the sampled communities in this study. We, therefore, recognize that the groups included in this study are modern-day populations that were created by introgression/interaction/assimilation events in the past and should not be seen as unchanged entities that represent exact past distributions of groups. For example, the interpretation of the dating for Nilotic speakers from East Africa, the Maasai, Turkana, and Samburu is complex since they only relatively recently reached their current-day distributions through expansions from Sudan and Uganda within the last few centuries [43].

Dating the admixture of different groups with each other is of great interest to population geneticists. Having a date for when the mixing of two groups occurred allows us to incorporate other types of independent evidence into our analyses, such as written or oral history or linguistic information, thus a big part of the effort in this study and discussion is spent on our inferred dates of admixture. The “best by f3” analysis is our attempt to propose a scenario that best fits the previously known genetic history of the region, whilst the “best by R2” analysis, based on the genomic fit (R2), is intended as a less constrained scenario for picking out Eurasian ancestry in Northeast African groups. As shown in Fig 5 the two approaches result in similar dates for most of the populations.

Population structure inferences illustrate the complex genetic history of Northeast African populations. Larger patterns of genetic associations between many of the world’s distinct human lineages are reflected in Northeast African genomes. The hunter-gatherer’s ancestry highlights the deep history of the region and its people and that this ancestry remains within the East African populations. The southern part of the region has a closer genetic affinity to West African groups, a result of the Bantu expansion and several of these populations also speak Bantu languages today. That the Bantu expansion did not continue further into the region could be a result of geographical barriers such as the Ethiopian Highlands and the dry regions of the Horn of Africa, indicated by our FEEMS analysis in Fig 1D or as suggested by [15] that the Northeast African Nilotic speaking herders (such as the Dinka and Nuer), who have remained relatively isolated from other groups, could have formed a buffer against the Bantu expansion continuing further into Northeast Africa.

Eurasian admixture has had a large influence on the genomes of Northeast African groups. The Egyptian and Sudanese Copt populations for instance are genetically very similar to Middle Eastern groups rather than to other African populations. The pattern is true also for the rest of North Africa and present as early as at least 15 000 years ago [10] though not investigated here. The Copts look genetically similar to the Egyptians from Cairo, see Fig 1A and 1C, this is not unsurprising given that the Copts arrived in Sudan around 200 years ago from Egypt and seem to have lived relatively isolated since then [15]. Our admixture date for the Copts (with Eurasians) was inferred to be 27.5 for the f3 analysis and 25.7 for the R2 and around 22 generations for the Egyptians. Thus this admixture took place around the 14th century.

Further south in the region, we continue to see the impact of past Eurasian admixture. Northeast African populations from Sudan and Ethiopia positions’ in the PCA plots are being drawn towards Eurasian populations, S6 Fig. ADMIXTURE analyses recapitulate this pattern where Northeast African groups share the component maximized in Middle Eastern groups (pink component at K = 6, Fig S2 Fig). The Sudanese data in our study is mainly from [15] who also investigated the time and sources of admixture in Sudanese populations. [15] investigated a simpler admixture scenario with only two putative sources, namely the Sudanese Nuer and the TSI (Tuscan) to represent the admixture of a Sudanese basal population with a Eurasian source. This is most similar to our R2 approach in which we picked the scenario with the best genomic fit (R2) and for two Eurasian sources in each run and then picked the two Eurasian sources that produced the best genomic fit (R2 value). Our findings are generally in agreement, particularly for the Eurasian admixture dates that is the primary focus of our study.

In the area of current-day Sudan and South Sudan, there is a clear divide between the Eastern Sudanic- and Semitic-speaking groups from Sudan, and the South Sudanese groups, as well as the Saharan-speaking Sudanese groups. This divide can be seen both with regards to global ancestry as well as their inferred admixture dates for their Eurasian ancestries. Dongola had been the capital of the Nubian Kingdom and the fall of Dongola in 1317 to Mameluke forces meant the start of Arab and Islamic dominance south of the borders of Egypt. Many of the Semitic speakers in our dataset have their Eurasian admixture dated to this time—around 20 generations ago. The exception is mainly the Southern Semitic speakers such as the Beni-Amer and Tygray whose dates are slightly older at around 30 generations ago. Around 30 generations ago is also the inferred date for the Ethiopian Cuschitic-speaking Afar and Oromo (though Oromo had a generation time of 21 for the best by f3). South Sudanese groups however stayed largely isolated, this pattern is evident in the ADMIXTURE analysis, as the populations around South Sudan are represented by a specific component (the black component at K = 5 and onward) with very little of the non-African (pink) component that we find in most other North-East African groups, indicating their isolation and genetic homogeneity compared to other populations.

Previous studies that investigated Ethiopian population structure, observed clustering based on linguistic families [4, 14, 15]. This pattern is recapitulated in our analysis, both from the population inference methods as well as the admixture dating. The Omotic-speaking Ari populations form their own small cluster (PC 1 vs PC 3 in S6 Fig), a reflection of their segregated status within Ethiopia [4]. The Gumuz (Language isolate) and Anuak (Nilotic) display very little Eurasian admixture, and given that the date that we infer is only a few generations ago, it could be that they received this Eurasian admixture through secondary admixture with another neighboring group a few generations ago, or that it’s an effect of recent or ongoing admixture.

Within the northeast African geographic space, the analysis using FEEMS recapitulates expected natural barriers to migration such as the Red Sea, the Gulf of Aden, the Persian Gulf, and the Sahara Desert. In addition to clear geographical barriers, we also see evidence of linguistic and cultural barriers. One obvious example is the low migration rate between the Ethiopian Somali and the other Ethiopian populations, and as expected high migration rate is inferred between the different Somali groups. The Great Rift Valley forms a natural barrier across Ethiopia with highlands on both sides of the rift. A previous study looking at Ethiopian genetics found a significant association of genetic similarity to elevation, ethnicity, and first language, and interestingly not a second language nor religion [13].

Along the Red Sea coast of Eritrea and Sudan, we find a region of high gene flow extending into northern Ethiopia and into the Great Rift Valley, Fig 1D. This region corresponds well to the pink component in Fig 1A and 1B which seems to represent Yemeni ancestry. f3 visualizations also indicates higher geneflow from Arabian groups in this area relative to more northern and southern latitudes, Fig 2. It is also a region in which we infer some of the oldest inferred admixture dates. These observations, as well as the shared linguistics of South Semitic (as South Semitic languages that are found in Yemen, Oman, Eritrea, and Ethiopia [44]), indicate a close connection between Eritrea, and Ethiopia to the south of the Arabian peninsula and present-day Yemen. The Kingdom of Aksum (or the Aksumite Empire) encompassing Eastern Sudan, Northern Ethiopia, Eritrea, Djibouti and across the Red Sea into Yemen, thrived between the 1:th and 7:th century AD, as trade along the Red Sea increased and the trade along the Nile decreased. Both Rome and Byzantium traded with the Indian Subcontinent and artifacts from these Kingdoms can be found at Aksumite sites, [45, 46]. The Semitic-speaking Ethiopian populations also group together with the Middle Eastern populations in the UMAP analysis, S20 Fig. These admixture events could come as the result of the Red Sea trade. Aksum collapsed in the 8th century as Islam started to expand and control over the Red Sea trade shifted to the Near East [47].

Previous ancestry deconvolution studies pointed at Levantine sources for the Eurasian admixture in Northeast Africans rather than Arabic groups [4, 6]. We find that the pattern is more complex with different source populations in different regions, see Fig 1B and 1C as well as Fig 2. Levantine contributions are seen more towards the north and contributions from Arabian peninsula groups are seen more at lower latitudes, Figs 1B, 1C and 2.

The fact that both approaches for admixture dating produced populations from the same country that had the most extreme difference in Eurasian admixture dating, highlights the heterogeneous nature of North-East African genetics and how little explanatory power country borders have on population structure. It is, not unexpectedly so, rather geographic, linguistic, and cultural borders that explain the degree of genetic interconnections between groups.

The major linguistic family was the only factor that was significant (and only for the best by f3) in our ANOVA test of the available categories, S4 Table. The linear regression analysis of distance from the Levant, S21A and S21B Fig, also produced a significant fit with a negative coefficient indicating more recent admixture dates further from the Levant—this is likely driven by the younger dates for the populations in and around South Sudan. The same pattern was observed when comparing the distance to Sanaa, albeit with a smaller slope of the line and larger p-values (S21C and S21D Fig).

One possible explanation for this phenomenon could be that populations with little or no previous Eurasian admixture would have their inferred admixture date affected more by recent Eurasian admixture than populations that experienced larger admixture in the past. In other words, most, if not all, of the populations in this study have or have had admixture with populations from the Middle East during the Arab expansion, and this newer admixture is obscuring older admixture patterns. The groups with younger inferred dates in our analysis thus likely have less older admixture contributions.

Our study thus points to that the distribution of Eurasian-like ancestry in Eastern and North-Eastern African populations is mostly an effect of more recent migrations (many of them recorded in historical texts) rather than ancient events related to the advent of pastoralism in the region at large, as indicated by ancient DNA studies [5]). Identifying the impact of ancient events on populations was not feasible when the original pattern has been distorted or masked by subsequent admixture events. To fully explore the question of Eurasian admixture into Africa over larger timescales likely requires population-level aDNA, especially of the early East African hunter-gatherers such as Mota, and the various in-moving groups, including those containing Eurasian admixture.

Conclusions

North-Eastern Africa is a vast region with complex histories of migrations and admixture. It was not possible to identify one source or origin of Eurasian admixture in the region, rather different populations have experienced admixture at different times, at varying degrees, and from different external sources. Although slight trends were observed linked to language grouping and geography, the overall pattern proved to be complex and specific to certain population groups. Previous studies have highlighted these events in distinct regions or countries in Northern and Eastern Africa, whilst we in this study have tried to combine them with a specific emphasis on the Eurasian admixture in modern-day populations.

Supporting information

S1 Table. Top two Eurasian source populations identified by their haplotype fit to the genomes (R2 value from MOSAIC) for each target population.

(PDF)

S2 Table. f3 outgroup result grouped by language family of the target populations, top 5 hits show.

The f3 outgroup was calculated for the most Eurasian-like ancestry for each target population in the following manner: Target | Source | Ju|’hoansi.

(PDF)

S3 Table. R2 values of linear regression between admixture date and distance from Tel Aviv in the top row by each linguistic group.

The Best by f3 dataset is in the leftmost columns while the Best by R2 dataset is on the right. Some linguistic groups have only one target population so no value whilst Saharan two populations which yields a R2 of 1.

(PDF)

S4 Table. Two way ANOVA statistics for the two different models.

Comparisons are between the Admixture date with the factors Country, Linguistic group and Larger linguistic family (Meta.Lang.Group). The Asterisk indicates significant values. In A it’s the admixture date obtained from the best source determined by f3 outgroup and in B it’s the dates of admixture for the source with the highest.

(PDF)

S5 Table. Missingness by population for the Ancestry deconvolution.

Min and max is the individual lowest and highest missingness for that population. Since non-Eurasian regions of the target’s genomes were set to missing, this measure is the inverse of the amount of Eurasian ancestry inferred for each individual and population in the best by f3 dataset.

(PDF)

S6 Table. Overview of populations used in the study and their linguistic information.

(XLSX)

S1 Fig. Location for populations included in this study.

Colours indicate linguistic groups. Made with Natural Earth.

(PDF)

S2 Fig. PONG visualization of 15 K’s of unsupervised ADMIXTURE analysis.

50 iterations for the full dataset. The best identified K through cross-validation was K = 13.

(PDF)

S3 Fig. Average cross-validation (CV) error for the 50 repetitions.

The K with the lowest CV error was K = 13, indicated by the horizontal line.

(PDF)

S4 Fig. North-East African target populations used in the study.

Labels are by country and colouring by linguistic family. African_Semitic was used just more easy to distinguish between the investigated populations (target) and the Middle Eastern Semitic populations. Made with Natural Earth.

(PDF)

S5 Fig. Principal component analysis with each value in the PCA plots, is the projection of the data on the eigenvectors, scaled by the eigenvalues.

Values within parenthesis are the PC loading. Populations are coloured by linguistic group.

(PDF)

S6 Fig. Principal component analysis.

(PDF)

S7 Fig. Principal component analysis.

(PDF)

S8 Fig. Principal component analysis.

(PDF)

S9 Fig. Principal component analysis.

(PDF)

S10 Fig. Principal component analysis.

(PDF)

S11 Fig. Principal component analysis.

(PDF)

S12 Fig. Principal component analysis.

(PDF)

S13 Fig. Principal component analysis.

(PDF)

S14 Fig. Principal component analysis.

(PDF)

S15 Fig. Principal component analysis.

(PDF)

S16 Fig. Principal component analysis.

(PDF)

S17 Fig. Principal component analysis.

(PDF)

S18 Fig. Principal component analysis.

(PDF)

S19 Fig. Principal component analysis.

(PDF)

S20 Fig. Uniform Manifold Approximation and Projection for Dimension Reduction on the full dataset.

Colours are the same as in S1 Fig.

(PDF)

S21 Fig. Linear regression comparing the great circle distance in kilometers from Tel Aviv and Sanaa compared to admixture date of the Eurasian ancestry estimations.

The blue line is the fitted linear regression line and the grey area represents the 95% confidence interval of the standard error. A) Distance from Tel Aviv for the best by f3 dataset. B) Distance from Tel Aviv for the best by R2 dataset. C) Distance from Sanaa for the best by f3 dataset. D) Distance from Sanaa for the best by R2.

(PDF)

S22 Fig. f4 test comparing Lebanese to Yemenite ancestry for each of the target populations.

(PDF)

S23 Fig. MALDER vs MOSAIC dates.

For the best by f3 dataset, using the same source populations as in the corresponding MOSAIC analysis. Only populations that Malder estimated had one event are shown. The populations for which Malder inferred two admixture events were: Egypt_Egyptia 40 and 6 generations ago, Sudan_Halfawieen 87 and 7 generations ago, and Sudan_Mahas 94 and 12 generations ago. The blue line is the fitted linear regression line and the grey area represents the 95% confidence interval of the standard error.

(PDF)

S24 Fig. MALDER vs MOSAIC dates.

For the best by R2 dataset, using the same source populations as in the corresponding MOSAIC analysis. Only populations that Malder estimated had one event are shown. The populations for which Malder inferred two admixture events were: Egypt_Egyptian 39 and 8 generations ago, Kenya_Turkana 8 and 164 generations ago, Sudan_Halfawieen 77 and 6 generations ago, and Sudan_Mahas 81 and 12 generations ago. The blue line is the fitted linear regression line and the grey area represents the 95% confidence interval of the standard error.

(PDF)

Acknowledgments

The computational analyses were done through the Swedish National Infrastructure for Computing (SNIC) at Uppmax. Authorized NIH Data Access Committee (DAC) granted data access to Carina Schlebusch for the controlled-access genetic data analyzed in this study that were previously deposited by Scheinfeldt et al. 2019 in the NIH dbGAP repository (accession code phs001780.v1.p1; date of approval: 2019-05-17). For the genome-wide genotype data from the Patin et al. 2017 study (EGA accessory number EGAD00010001209), data access was granted via European GenomePhenome Archive (EGA) by the GEH Data Access Committee EGAC00001000139. A special thanks to the author of MOSAIC Michael Salter-Townshend for the discussion on how the best perform the ancestry deconvolution, to Carolina Bernhardsson for help with plotting, and Cesar Fortes-Lima for help with the study design.

Data Availability

All data used in the study is from published studies. For some datasets permission from the original authors were obtained through data access agreements.

Funding Statement

The project was funded by the European Research Council (ERC StG AfricanNeo, grant no. 759933) and the Knut and Alice Wallenberg Fellowship grant. Computations were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at UPPMAX, partially funded by the Swedish Research Council (through grant agreement no. 2018-05973). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The salaries of C.S. and R.H. was funded by the European Research Council (ERC StG AfricanNeo, grant no. 759933) and the Knut and Alice Wallenberg Fellowship grant.

References

  • 1. Johanson D, Edey M. How our olders human acestor was discovered—and who she was. Simon & Schuster; 1990. [Google Scholar]
  • 2. McDougall I, Brown FH, Fleagle JG. Stratigraphic placement and age of modern humans from Kibish, Ethiopia. Nature. 2005;433(7027):733–736. doi: 10.1038/nature03258 [DOI] [PubMed] [Google Scholar]
  • 3. Vidal CM, Lane CS, Asrat A, Barfod DN, Mark DF, Tomlinson EL, et al. Age of the oldest known Homo sapiens from eastern Africa. Nature. 2022;601(7894):579–583. doi: 10.1038/s41586-021-04275-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Pagani L, Kivisild T, Tarekegn A, Ekong R, Plaster C, Gallego Romero I, et al. Ethiopian genetic diversity reveals linguistic stratification and complex influences on the Ethiopian gene pool. American journal of human genetics. 2012;91(1):83–96. doi: 10.1016/j.ajhg.2012.05.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Prendergast ME, Lipson M, Sawchuk EA, Olalde I, Ogola CA, Rohland N, et al. Ancient DNA reveals a multistep spread of the first herders into sub-Saharan Africa. Science (New York, NY). 2019;365(6448):eaaw6275. doi: 10.1126/science.aaw6275 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Molinaro L, Montinaro F, Yelmen B, Marnetto D, Behar DM, Kivisild T, et al. West Asian sources of the Eurasian component in Ethiopians: a reassessment. Scientific reports. 2019;9(1):18811–18818. doi: 10.1038/s41598-019-55344-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Gurdasani D, Carstensen T, Tekola-Ayele F, Pagani L, Tachmazidou I, Hatzikotoulas K, et al. The African Genome Variation Project shapes medical genetics in Africa. Nature. 2015;517(7534):327–332. doi: 10.1038/nature13997 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Gallego Llorente M, Jones ER, Eriksson A, Siska V, Arthur KW, Arthur JW, et al. Ancient Ethiopian genome reveals extensive Eurasian admixture throughout the African continent. Science (New York, NY). 2015;350(6262):820–822. doi: 10.1126/science.aad2879 [DOI] [PubMed] [Google Scholar]
  • 9. Gallego Llorente M, Jones ER, Eriksson A, Siska V, Arthur JW, Arthur KW, et al. Erratum for the Report “Ancient Ethiopian genome reveals extensive Eurasian admixture in Eastern Africa” (previously titled “Ancient Ethiopian genome reveals extensive Eurasian admixture throughout the African continent”) by M. Gallego Llorente, E. R. Jones, A. Eriksson, V. Siska, K. W. Arthur, J. W. Arthur, M. C. Curtis, J. T. Stock, M. Coltorti, P. Pieruccini, S. Stretton, F. Brock, T. Higham, Y. Park, M. Hofreiter, D. G. Bradley, J. Bhak, R. Pinhasi, A. Manica. Science (New York, NY). 2016;351(6275):aaf3945–aaf3945. [Google Scholar]
  • 10.van de Loosdrecht M, Bouzouggar A, Humphrey L, Posth C, Barton N, Aximu-Petri A, et al. Pleistocene North African genomes link Near Eastern and sub-Saharan African human populations. Science (New York, NY). 2018;. [DOI] [PubMed]
  • 11. Skoglund P, Thompson JC, Prendergast ME, Mittnik A, Sirak K, Hajdinjak M, et al. Reconstructing Prehistoric African Population Structure. Cell. 2017;171(1):59–71.e21. doi: 10.1016/j.cell.2017.08.049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Wang K, Goldstein S, Bleasdale M, Clist B, Bostoen K, Bakwa-Lufu P, et al. Ancient genomes reveal complex patterns of population movement, interaction, and replacement in sub-Saharan Africa. Science advances. 2020;6(24):eaaz0183. doi: 10.1126/sciadv.aaz0183 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. López S, Tarekegn A, Band G, van Dorp L, Bird N, Morris S, et al. Evidence of the interplay of genetics and culture in Ethiopia. Nature communications. 2021;12(1):3581–15. doi: 10.1038/s41467-021-23712-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Hellenthal G, Bird N, Morris S. Structure and ancestry patterns of Ethiopians in genome-wide autosomal DNA. Human molecular genetics. 2021;30(R1):R42–R48. doi: 10.1093/hmg/ddab019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Hollfelder N, Schlebusch CM, Günther T, Babiker H, Hassan HY, Jakobsson M. Northeast African genomic variation shaped by the continuity of indigenous groups and Eurasian migrations. PLoS Genetics. 2017;13(8):e1006976–17. doi: 10.1371/journal.pgen.1006976 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Vicente M, Schlebusch CM. African population history: an ancient DNA perspective. Current Opinion in Genetics & Development. 2020;62:8–15. doi: 10.1016/j.gde.2020.05.008 [DOI] [PubMed] [Google Scholar]
  • 17. Fernandes V, Brucato N, biology JFM, 2019. Genome-wide characterization of Arabian Peninsula populations: shedding light on the history of a fundamental bridge between continents. Molecular Biology and Evolution. 2021;36(3):575–586. doi: 10.1093/molbev/msz005 [DOI] [PubMed] [Google Scholar]
  • 18. The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015;526:68 EP–. doi: 10.1038/nature15393 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Rodriguez-Flores JL, Fakhro K, Agosto-Perez F, Ramstetter MD, Arbiza L, Vincent TL, et al. Indigenous Arabs are descendants of the earliest split from ancient Eurasian populations. Genome research. 2016;26(2):151–162. doi: 10.1101/gr.191478.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Montinaro F, Busby GBJ, Gonzalez-Santos M, Oosthuitzen O, Oosthuitzen E, Anagnostou P, et al. Complex Ancient Genetic Structure and Cultural Transitions in Southern African Populations. Genetics. 2017;205(1):303–316. doi: 10.1534/genetics.116.189209 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Triska P, Soares P, Patin E, Fernandes V, Černý V, Pereira L. Extensive Admixture and Selective Pressure Across the Sahel Belt. Genome biology and evolution. 2015;7(12):3484–3495. doi: 10.1093/gbe/evv236 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Patin E, Lopez M, Grollemund R, Verdu P, Harmant C, Quach H, et al. Dispersals and genetic adaptation of Bantu-speaking populations in Africa and North America. Science (New York, NY). 2017;356(6337):543–546. doi: 10.1126/science.aal1988 [DOI] [PubMed] [Google Scholar]
  • 23. Schlebusch CM, Skoglund P, Sjödin P, Gattepaille LM, Hernandez D, Jay F, et al. Genomic variation in seven Khoe-San groups reveals adaptation and complex African history. Science (New York, NY). 2012;338(6105):374–379. doi: 10.1126/science.1227721 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Fortes-Lima C, Gessain A, Ruiz-Linares A, Bortolini MC, Migot-Nabias F, Bellis G, et al. Genome-wide Ancestry and Demographic History of African-Descendant Maroon Communities from French Guiana and Suriname. The American Journal of Human Genetics. 2017;101(5):725–736. doi: 10.1016/j.ajhg.2017.09.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Pagani L, Schiffels S, Gurdasani D, Danecek P, Scally A, Chen Y, et al. Tracing the Route of Modern Humans out of Africa by Using 225 Human Genome Sequences from Ethiopians and Egyptians. The American Journal of Human Genetics. 2015;96(6):986–991. doi: 10.1016/j.ajhg.2015.04.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Scheinfeldt LB, Soi S, Lambert C, Ko WY, Coulibaly A, Ranciaro A, et al. Genomic evidence for shared common ancestry of East African hunting-gathering populations and insights into local adaptation. Proceedings of the National Academy of Sciences of the United States of America. 2019;116(10):4166–4175. doi: 10.1073/pnas.1817678116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. The American Journal of Human Genetics. 2007;81(3):559–575. doi: 10.1086/519795 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies. Bioinformatics (Oxford, England). 2010;26(22):2867–2873. doi: 10.1093/bioinformatics/btq559 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Alexander DH, Lange K. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC bioinformatics. 2011;12(1):246–246. doi: 10.1186/1471-2105-12-246 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Behr AA, Liu KZ, Liu-Fang G, Nakka P, Ramachandran S. pong: fast analysis and visualization of latent clusters in population genetic data. Bioinformatics (Oxford, England). 2016;32(18):2817–2823. doi: 10.1093/bioinformatics/btw327 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Abraham G, Qiu Y, Inouye M. FlashPCA2: principal component analysis of Biobank-scale genotype datasets. Bioinformatics (Oxford, England). 2017;33(17):2776–2778. [DOI] [PubMed] [Google Scholar]
  • 32. Marcus J, Ha W, Barber RF, Novembre J. Fast and flexible estimation of effective migration surfaces. eLife. 2021;10. doi: 10.7554/eLife.61927 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. O’Connell J, Gurdasani D, Delaneau O, Pirastu N, Ulivi S, Cocca M, et al. A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genetics. 2014;10(4):e1004234. doi: 10.1371/journal.pgen.1004234 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Salter-Townshend M, Myers S. Fine-Scale Inference of Ancestry Segments Without Prior Knowledge of Admixing Groups. Genetics. 2019;212(3):869–889. doi: 10.1534/genetics.119.302139 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Maier R, Patterson N. ADMIXTOOLS 2; p. GPL–3.
  • 36. Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, et al. Ancient Admixture in Human History. Genetics. 2012;192(3):1065–1093. doi: 10.1534/genetics.112.145037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. H W. ggplot2: Elegant graphics for data analysis. Springer-Verlag, New York, USA, pp. 260.; 2016. [Google Scholar]
  • 38. Gopalan S, Berl REW, Myrick JW, Garfield ZH, Reynolds AW, Bafens BK, et al. Hunter-gatherer genomes reveal diverse demographic trajectories during the rise of farming in Eastern Africa. Current biology: CB. 2022;32(8):1852–1860.e5. doi: 10.1016/j.cub.2022.02.050 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Vicente M, Priehodová E, Diallo I, Podgorná E, Poloni ES, Černý V, et al. Population history and genetic adaptation of the Fulani nomads: inferences from genome-wide data and the lactase persistence trait. BMC genomics. 2019;20(1):915–912. doi: 10.1186/s12864-019-6296-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Fenner JN. Cross-cultural estimation of the human generation interval for use in genetics-based population divergence studies. American Journal of Physical Anthropology. 2005;128(2):415–423. doi: 10.1002/ajpa.20188 [DOI] [PubMed] [Google Scholar]
  • 41. Spaulding J. A Premise for Precolonial Nuba History. History in Africa. 2014;14:369–374. doi: 10.2307/3171848 [DOI] [Google Scholar]
  • 42. Loh PR, Lipson M, Patterson N, Moorjani P, Pickrell JK, Reich D, et al. Inferring admixture histories of human populations using linkage disequilibrium. Genetics. 2013;193(4):1233–1254. doi: 10.1534/genetics.112.147330 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Spear T, Waller R, editors. Being Maasai. Ethnicity and Identity In East Africa. Ohio University Press; 1993. [Google Scholar]
  • 44.Faber A. Genetic subgroupings of the Semitic languages. The University of Texas at Austin; 1980.
  • 45.Finneran N. Ancient Ethiopia—Aksum: Its Antecedents and Successors. By David W Phillipson. 250mm. Pp 176, ills. London: British Museum Press, 1998. ISBN 0-714125-39-3. £20.00.—The Monuments of Aksum. Edited by David W Phillipson. 300mm. Pp 201, ills. London and Addis Ababa: British Institute in Eastern Africa and Addis Ababa University Press, 1997. ISBN 1-872566-11-1. £45.00. The Antiquaries Journal. 2000;80(1):348–349.
  • 46.Sharp M. D. W. Phillipson: Ancient Ethiopia. Aksum: Its Antecedents and Successors. Pp. 176, 12 pls, 60 figs. London: British Museum Press, 1998. Cased, £20. ISBN: 0-7141-2539-3. The Classical Review. 1999;49(1):288–289.
  • 47. Mitchell P, Lane P. The Oxford Handbook of African Archaeology. Oxford University Press; 2013. [Google Scholar]

Decision Letter 0

Francesc Calafell

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

15 Nov 2022

PONE-D-22-25736Eurasian back-migrations into Northeast Africa was a complex and multifaceted processPLOS ONE

Dear Dr. Schlebusch,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Dec 30 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Francesc Calafell

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. 

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

3. Thank you for stating the following in the Acknowledgments Section of your manuscript: 

   "The computation and data handling were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at Uppmax partially funded by the Swedish Research Council through grantagreement no. 2018-05973. Authorized NIH Data Access Committee (DAC) granted data access to Carina

Schlebusch for the controlled-access genetic data analysed in this study that were previously deposited byScheinfeldt et al. 2019 in the NIH dbGAP repository (accession code phs001780.v1.p1; date of approval:2019-05-17). For the genome-wide genotype data from the Patin et al. 2017 study (EGA accessory numberEGAD00010001209), data access was granted via European GenomePhenome Archive (EGA) by the GEH

Data Access Committee EGAC00001000139. This project was supported by funding to CS from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme(grant agreement No. 759933). A special thanks to the author of MOSAIC Michael Salter-Townshend for

discussion on how the best perform the ancestry deconvolution, to Carolina Bernhardsson for help with plotting and Cesar Fortes-Lima for help with the study design."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. 

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: 

 "No: The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

4. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

"Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

5. Please ensure that you include a title page within your main document. You should list all authors and all affiliations as per our author instructions and clearly indicate the corresponding author.

6. We note that Supplementary Figure 1 in your submission contain [map/satellite] images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (a) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (b) remove the figures from your submission:

a. You may seek permission from the original copyright holder of Supplementary Figure 1 to publish the content specifically under the CC BY 4.0 license.  

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

7. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

Please take into account the clarifications and suggestions for improvement that the reviewers contributed, and in particular try to address the methodological concerns expressed by reviewer #3.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript by Hammaren and colleagues addresses an interesting question about North African populations: the origin of the Eurasian back-migrations. I highly appreciate that the authors reanalyzed data from previous papers that could be further explored. However, the answer to the main question is not clearly discussed. The paper would improve with a clearer discussion.

In general, there is too much emphasis and sometimes overinterpretation of the ADMIXTURE analyses. The discussion in mainly driven by those results, but the most novel and interesting part are the f3 analyses after local ancestry:

- The text is too descriptive and difficult to follow. Some of the descriptions about ADMIXTURE components and PC could be shortened.

- You might need to review the colors linked to the ADMIXTURE plots, sometimes they do not match with the description in the text:

-“Light green component that emerges is K=7 it is maximized in the Sabue”. This color is maximed in Namibia_Nama SouthAfrica_Juhoansi.

-“blush pink component highest in the Qatari”. It is higher in Yemen.

-“Gumuz and Sabue have a high proportion of the dark green component.” Is this the black component?

-“blue component maximized in Middle Eastern groups at K=7, Figure 2.” I think this is figure 1.

- Also, you might want to check the labels of the ADMIXTURE plots, the names are sometimes duplicated and not clear.

- I would suggest providing a visualization of the f3 results in Table 1. And discussing these results and their interpretations in the discussion.

- Regarding ADMIXTURE interpretations, one example is the hunter-gatherer ancestry, which is mentioned through the text at different points. Are you claiming that it corresponds to one of their ancestral components in ADMIXTURE? It is too speculative without ancient data. Also, talking about “ancestries” from ADMIXTURE analyses is in most cases speculative, I would suggest checking these interpretations and the language used.

- In the discussion, the paragraph “Previous recent ancestry deconvolution studies pointed at Levantine”. You are repeating the description results that were already mentioned, but it is not clear what is the interpretation.

About the methodology:

- It is not explained how you treat the data after the local ancestry. Do you mask the rest of the genomes setting it as missing? If that is the case, which proportion of the genome is masked and therefore what are the missingness values?

- Could the use of CEU for the local ancestry bias the following analyses about the specific source for this ancestry? It is good that the analyses are repeated with the best source from f3 afterwards, but I wonder if this could produce a circular result (i.e. if the first analyses using CEU are biasing the f3 estimates, so the next analyses would be also biased). Could you comment it further?

- You mention: “Though using two Eurasian populations as sources outperformed a single source.” Is it possible that this happens because you do not have the best source and then it is more likely that the source is coming from two different ones?

- Is it possible to estimate confidence intervals for the admixture dating? Considering the differences observed between the “f3” and “R2” estimations it would be important. The correlation between both admixture dates is not strongly significant and there are some outliers. It is briefly disused in the discussion, but why specifically do you think these differences are found? For instance, is there a worst correlation among those with more Eurasian ancestry?

- To see if the masking is necessary, I suggest using an f4 test comparing between possible sources of the Eurasian ancestry in the different Northeast African populations, i.e. f4(Levant, Arab; Northeast Africa, Ju’hoansi).

- It would be interesting to see the ancestry proportions estimated with MOSAIC. For instance, comparing them with the admixture dates could bring more information about the different demographic processes.

- What are the sources of the admixture dating, is it always the Eurasian population and an African population? For instance, in the discussion, when talking about the Copts and Egyptians, it is not clear what are the sources of the admixture dates described.

- The discussion about how more recent admixture events could be masking the older ones is not supported with specific analyses. Maybe the authors could cite at least some other works.

- There are some other places where a reference would help, for instance, “The pattern is true also for the rest of North Africa, though not investigated here.”

On a minor note, English is readable, but needs to be checked. There are some repetitions of words and gramma mistakes. I am not listing them because there are many, so please check it carefully. Some that I found:

-“are evident across the region of interest interest we used the FEEMS”

-The title in the results: “Dating Eurasian admixture dating in Northeast Africa.” You do not need the second “dating”.

Reviewer #2: In the manuscript by Hammaren et al the authors attempt to disentangle some of the complexity in the genetic history of present day populations from North East Africa. They begin by collecting a large dataset of publicly available present-day data spanning all of Africa, the Near East and Europe, and then apply a variety of methods in attempts to identify present-day populations that appear distinctively similar to the introgressed Eurasian fragments in North East African groups and could thus be good proxies, or at least shed light onto the past dynamics of the region.

Although their final conclusion is that their attempt is not as successful as they would like, it does take time to get there, and I think this is something that could be made clearer earlier on, although I also grant it might be a different in writing style. But personally, I found the concluding remarks somewhat undermined the rest of the discussion, for all that I very much agree with the key points made in that final section, and I wonder if it might be more effective if brought in earlier. Beyond that, I have mostly minor comments, and no concerns to raise about the scientific content of the manuscript as much as about its presentation.

1. Title: should be either "Eurasian back-migrations into Northeast Africa WERE a complex and multifaceted process" or "Eurasian back-migration into Northeast Africa was a complex and multifaceted process"; the current version doesn't have subject/tense agreement.

2. Methods, "4) individuals with at least 10% missingness was removed (plink --mind 0.01)" That is actually a filter for 1% missingness - please confirm what was actually done!

3. Results, "Note that some populations are represented multiple times from different original publications, resulting in a total of 97 unique populations." As a sanity check for the dataset merging, do results for the same population across different publications generally agree?

4. Results, "The output from Admixture shown in Figure 1 (for full analysis see Supplementary Figure 2)" It would be good if the figure legend (or somewhere in the main text) made clear that this is a truncated version of Supplementary Figure 2, and the K=13 and 14 panels don't include all clusters (East Asia is missing), as the 'full analysis' is a bit ambiguous here, since not all values of K are shown in the main text fig. Likewise, an explanation of what 'w' is in panel 1D would be helpful for readers not very familiar with FEEMS.

5. Results, "Ancestry tract length distribution plots for both of these datasets was generated and are available on request." If this is a matter of just plots, please add them as supplements to begin with. It'll be easier for everyone than having to dig them up when a reader asks for them in three years!

6. Discussion, "Admixture analyses recapitulate this pattern where Northeast African groups share the blue component maximized in Middle Eastern groups at K=7, Figure 2." That should surely be figure 1?

7. Discussion, "Dongola had been the capital of the Nubian Kingdom and the fall of Dongola in 317 to Mameluke forces meant the start of Arab and Islamic dominance south of the borders of Egypt. Many of the Semitic speakers in our dataset have their Eurasian admixture dated to this time - around 20+ generations ago.", Is there a typo here, or a number missing? Using the value given in the text of 29 years per generation, 20 generations is 580 years ago, and looking at the corresponding figure no numbers are anywhere near close to the 58 generations needed to get to get to 320.

8. Discussion, "One possible explanation for this phenomenon could be that populations with little or no previous Eurasian admixture would have their inferred admixture date effected more by recent Eurasian admixture than population that experienced larger admixture in the past." Effected should be affected.

Reviewer #3: Review of “Eurasian back-migrations into Northeast Africa was a complex and multifaceted process”

In “Eurasian back-migrations into Northeast Africa was a complex and multifaceted process” Rickard Hammaren and colleagues assembled and analysed a genome-wide dataset composed of XXX individuals from several populations focusing on the recent demographic dynamics of the North-Eastern side of the African continent.

In doing so, they apply different population structure and local ancestry analyses. The main claim the authors make is that “the distribution of the Eurasian-like ancestry in the Eastern an North-Eastern African populations is mostly an effect of more recent migrations rather than ancient events related to the advent of pastoralism in the region at large”.

In my opinion this is an interesting but at the same time controversial statement, which requires a carfeul validation analysis. A series of different local ancestry analysis and admixture dating have repeatedly reported a signal of Eurasian-like ancestry entering into the continent at least 3,000 years ago. Although I appreciate the fact that the complexity of the many layers intersecting since at least 15,00 years ago may introduce many bias in previous researches, I can’t see how the analysis conducted in this paper could not affected by similar issues. Moreover, I have previously attempted to use MOSAIC on simulated data, and in our specific setting, MOSAIC was not able to infer the true ancestral fragments, causing a bias in the admixture dating. I did not conduct any further analysis and therefore I do not have any evidence MOSAIC does something wrong, but I would definitely use some other local ancestry inference methods and admixture dating approaches checking if the results hold.

Minor considerations:

Pag: I would clearly state the number of individuals and populations in the final dataset

Pag 3: it is not clear if you removed individuals with missing exceeding 1% or 10% as the description and the command line are discordant.

Pag 3: I think that the UMAP analysis should be described in a more detailed way, and possibly evaluating the outcome of multiple iterations.

Pag 4: Maybe F3 could be written as f3 ?

Pag 5: Given that this article might be of interest to many scholars, I would specify what TSI and RHGs mean.

Pag 6: The last paragraph explaining the approach for the best R2 and f3 was, in my opinion hard to read, I would consider reformulating it.

Pag 7: I think that the Discussions are too long, containing many paragraphs that would probably be a better fit into the results section.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Decision Letter 1

Gyaneshwer Chaubey

9 Aug 2023

Eurasian back-migration into Northeast Africa was a complex and multifaceted process

PONE-D-22-25736R1

Dear Dr. Carina M Schlebusch,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Please note that one reviewer rightly mentioned the availability of data, which I trust the authors would follow. 

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Gyaneshwer Chaubey

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I appreciate the efforts of the authors to reply to the questions raised in the revision. The paper has improved its readability and I do not have any major questions.

Just a small possible error:

- In line 97: “publications in the following fashion, 1) directly from the text or s 2) if no coordinates”. Is there something missing in the “s” after “or”?

And a small question:

- From the f3 analysis, there are two patterns 1) some populations do not show affinity towards Europeans, 2) other populations do show similar affinity to Lebanese/Qatar/Yemen or European.

Is this correlated with the observation in the ADMIXTURE analyses?

I wonder if in case 2) admixture might come from two different Eurasian sources. Do you think it is the case, or it is not supported by your MOSAIC results?

Reviewer #2: I thank the authors for their response to my comments. I am mostly satisfied, especially as my comments were minor, but their response to point 5 does not meet the PLOS Data policy, which states that all data should be made available, either as supplements or through deposition in a public repository (such as Figshare etc). I agree with their decision not to add 70 supplementary figures to the manuscript file, but then the figures, which are deemed worth mentioning in the text, should be publicly deposited elsewhere, or mention to them and the relevant ancestry tracts omitted from the manuscript.

Reviewer #3: The authors replied to my main concerns, and I do not have any further consideration.

I would recommend the revsed version for pubblication.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

Acceptance letter

Gyaneshwer Chaubey

18 Aug 2023

PONE-D-22-25736R1

Eurasian back-migration into Northeast Africa was a complex and multifaceted process

Dear Dr. Schlebusch:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Gyaneshwer Chaubey

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Top two Eurasian source populations identified by their haplotype fit to the genomes (R2 value from MOSAIC) for each target population.

    (PDF)

    S2 Table. f3 outgroup result grouped by language family of the target populations, top 5 hits show.

    The f3 outgroup was calculated for the most Eurasian-like ancestry for each target population in the following manner: Target | Source | Ju|’hoansi.

    (PDF)

    S3 Table. R2 values of linear regression between admixture date and distance from Tel Aviv in the top row by each linguistic group.

    The Best by f3 dataset is in the leftmost columns while the Best by R2 dataset is on the right. Some linguistic groups have only one target population so no value whilst Saharan two populations which yields a R2 of 1.

    (PDF)

    S4 Table. Two way ANOVA statistics for the two different models.

    Comparisons are between the Admixture date with the factors Country, Linguistic group and Larger linguistic family (Meta.Lang.Group). The Asterisk indicates significant values. In A it’s the admixture date obtained from the best source determined by f3 outgroup and in B it’s the dates of admixture for the source with the highest.

    (PDF)

    S5 Table. Missingness by population for the Ancestry deconvolution.

    Min and max is the individual lowest and highest missingness for that population. Since non-Eurasian regions of the target’s genomes were set to missing, this measure is the inverse of the amount of Eurasian ancestry inferred for each individual and population in the best by f3 dataset.

    (PDF)

    S6 Table. Overview of populations used in the study and their linguistic information.

    (XLSX)

    S1 Fig. Location for populations included in this study.

    Colours indicate linguistic groups. Made with Natural Earth.

    (PDF)

    S2 Fig. PONG visualization of 15 K’s of unsupervised ADMIXTURE analysis.

    50 iterations for the full dataset. The best identified K through cross-validation was K = 13.

    (PDF)

    S3 Fig. Average cross-validation (CV) error for the 50 repetitions.

    The K with the lowest CV error was K = 13, indicated by the horizontal line.

    (PDF)

    S4 Fig. North-East African target populations used in the study.

    Labels are by country and colouring by linguistic family. African_Semitic was used just more easy to distinguish between the investigated populations (target) and the Middle Eastern Semitic populations. Made with Natural Earth.

    (PDF)

    S5 Fig. Principal component analysis with each value in the PCA plots, is the projection of the data on the eigenvectors, scaled by the eigenvalues.

    Values within parenthesis are the PC loading. Populations are coloured by linguistic group.

    (PDF)

    S6 Fig. Principal component analysis.

    (PDF)

    S7 Fig. Principal component analysis.

    (PDF)

    S8 Fig. Principal component analysis.

    (PDF)

    S9 Fig. Principal component analysis.

    (PDF)

    S10 Fig. Principal component analysis.

    (PDF)

    S11 Fig. Principal component analysis.

    (PDF)

    S12 Fig. Principal component analysis.

    (PDF)

    S13 Fig. Principal component analysis.

    (PDF)

    S14 Fig. Principal component analysis.

    (PDF)

    S15 Fig. Principal component analysis.

    (PDF)

    S16 Fig. Principal component analysis.

    (PDF)

    S17 Fig. Principal component analysis.

    (PDF)

    S18 Fig. Principal component analysis.

    (PDF)

    S19 Fig. Principal component analysis.

    (PDF)

    S20 Fig. Uniform Manifold Approximation and Projection for Dimension Reduction on the full dataset.

    Colours are the same as in S1 Fig.

    (PDF)

    S21 Fig. Linear regression comparing the great circle distance in kilometers from Tel Aviv and Sanaa compared to admixture date of the Eurasian ancestry estimations.

    The blue line is the fitted linear regression line and the grey area represents the 95% confidence interval of the standard error. A) Distance from Tel Aviv for the best by f3 dataset. B) Distance from Tel Aviv for the best by R2 dataset. C) Distance from Sanaa for the best by f3 dataset. D) Distance from Sanaa for the best by R2.

    (PDF)

    S22 Fig. f4 test comparing Lebanese to Yemenite ancestry for each of the target populations.

    (PDF)

    S23 Fig. MALDER vs MOSAIC dates.

    For the best by f3 dataset, using the same source populations as in the corresponding MOSAIC analysis. Only populations that Malder estimated had one event are shown. The populations for which Malder inferred two admixture events were: Egypt_Egyptia 40 and 6 generations ago, Sudan_Halfawieen 87 and 7 generations ago, and Sudan_Mahas 94 and 12 generations ago. The blue line is the fitted linear regression line and the grey area represents the 95% confidence interval of the standard error.

    (PDF)

    S24 Fig. MALDER vs MOSAIC dates.

    For the best by R2 dataset, using the same source populations as in the corresponding MOSAIC analysis. Only populations that Malder estimated had one event are shown. The populations for which Malder inferred two admixture events were: Egypt_Egyptian 39 and 8 generations ago, Kenya_Turkana 8 and 164 generations ago, Sudan_Halfawieen 77 and 6 generations ago, and Sudan_Mahas 81 and 12 generations ago. The blue line is the fitted linear regression line and the grey area represents the 95% confidence interval of the standard error.

    (PDF)

    Attachment

    Submitted filename: NEA revisions 28 Febr.pdf

    Data Availability Statement

    All data used in the study is from published studies. For some datasets permission from the original authors were obtained through data access agreements.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES