Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Jul 16;15(7):e0233808. doi: 10.1371/journal.pone.0233808

Fine-scale genomic analyses of admixed individuals reveal unrecognized genetic ancestry components in Argentina

Pierre Luisi 1,*, Angelina García 1,2,3, Juan Manuel Berros 4, Josefina M B Motti 5, Darío A Demarchi 1,2,3, Emma Alfaro 6,7, Eliana Aquilano 8, Carina Argüelles 9,10, Sergio Avena 11,12, Graciela Bailliet 8, Julieta Beltramo 8,13, Claudio M Bravi 8, Mariela Cuello 8, Cristina Dejean 11,12, José Edgardo Dipierri 7, Laura S Jurado Medina 8, José Luis Lanata 14, Marina Muzzio 8, María Laura Parolin 15, Maia Pauro 1,2,3, Paula B Paz Sepúlveda 8, Daniela Rodríguez Golpe 8, María Rita Santos 8, Marisol Schwab 8, Natalia Silvero 8, Jeremias Zubrzycki 16, Virginia Ramallo 17, Hernán Dopazo 18,19,*
Editor: Francesc Calafell20
PMCID: PMC7365470  PMID: 32673320

Abstract

Similarly to other populations across the Americas, Argentinean populations trace back their genetic ancestry into African, European and Native American ancestors, reflecting a complex demographic history with multiple migration and admixture events in pre- and post-colonial times. However, little is known about the sub-continental origins of these three main ancestries. We present new high-throughput genotyping data for 87 admixed individuals across Argentina. This data was combined to previously published data for admixed individuals in the region and then compared to different reference panels specifically built to perform population structure analyses at a sub-continental level. Concerning the Native American ancestry, we could identify four Native American components segregating in modern Argentinean populations. Three of them are also found in modern South American populations and are specifically represented in Central Andes, Central Chile/Patagonia, and Subtropical and Tropical Forests geographic areas. The fourth component might be specific to the Central Western region of Argentina, and it is not well represented in any genomic data from the literature. As for the European and African ancestries, we confirmed previous results about origins from Southern Europe, Western and Central Western Africa, and we provide evidences for the presence of Northern European and Eastern African ancestries.

Introduction

The first systematic investigation of human genetic variation in Argentina focused on a limited number of markers either uniparental (mtDNA, Y-STRs, Y-SNP; [110] or autosomal (Short Tandem Repeats, Ancestry Informative Markers, Alu sequences, indels, and blood groups [1116]). Studies based on autosomal markers identified an important inter-individual heterogeneity for the African, Native American and European genetic ancestry proportions [1116]. Accordingly, most of the studies based on uniparental markers showed large differences in the genetic composition of Argentinean populations, accounting for the different demographic histories within the country [1,2,1720,39,16]. Although the idea of a 'white' country with most of inhabitant's descendants from European immigrants has now been rejected by these studies, the Argentine founding myth of a white and European nation remains perceptible today [21].

There is a wealth of information about the European ancestors, both from familial stories and from historical records. On the contrary, little is known about the African and Native American populations that contributed to the admixture events between the continental components. The great wave of European immigration occurred between the mid-19th to mid-20th centuries. Although immigrants came from all over the continent, historical records attest that immigration waves from Southern Europe (mainly Italy and Spain) were predominant [22].

As for the African genetic origins in Argentina, historical records about the arrival of African slaves to to Río de la Plata show that Luanda, in current Angola territory in Central Western region of Africa, was the main departure harbor, followed by ports located in the Gulf of Guinean and on the coast of the present-day Senegalese, Gambian and Sierra Leone territories. The Indian Ocean coasts, in the current territory of Mozambique, were also an important departure location [2325]. Although the departure harbor is a poor proxy to infer the actual slaves’ origins [25,26], the African uniparental lineages observed in Argentina are consistent with the historical records [27,28].

As for the Native American component, it is difficult to study its origin focusing on present-day communities since their organization has changed drastically after the arrival of the first conquerors in the 16th century [21]. During the period of conquest and colonization, wars, diseases and forced labor decimated the Native populations. The system of colonial exploitation also often meant the relocation of individuals, families, and communities [29]. Then, the expansion of the nation-state by the late 19th Century can be described as a territorial annexation process and subjugation of the indigenous peoples perpetrated by the Argentinean national armed forces between 1876 and 1917 [30].

Due to the specificity of the Argentinean demographic history, a remaining challenge is to unravel which populations from each continent contributed to the genetic pool in nowadays Argentinean populations leveraging genotype data for hundreds of thousands of autosomal markers from the whole genome. Recently, two articles presented high-throughput genotyping data for modern Argentine individuals, and provided the first insights to decipher which populations from each continent contributed to the genetic pool in nowadays Argentinean populations [31,32]. In both studies, it was found that the European ancestry in Argentina is mainly explained by Italian and Iberian ancestry components. Homburger et al. observed a strong gradient in Native American ancestry of South American Latinos between Andean and other South American Native American populations [31]. In addition, Muzzio et al. found that African ancestry is explained by a Central Western (Bantu-influenced) component and a Western component.

Despite these progresses, the efforts to understand the demographic history that shaped genetic diversity in Argentina have inevitably been scarce, particularly for African and Native American components. Studies of a wider region, namely South America can provide important insights. A previous study showed that Western and Central Western African ancestries are common across the Americas, particularly in Northern latitudes, while the influence of South/Eastern African ancestry is greater in South America [33]. In another study in Brazil, two African ancestries have been observed: a Western African one and another associated with Central East African and Bantu populations, the latter being more present in the Southeastern and Southern regions [34]. Ancient DNA analyses of the peopling of the Americas suggest that Native American populations of South America descend from two streams from Northern America: one mainly present in the Andes and another one present elsewhere [35]. These streams have replaced the first people settled in the region, whose ancestry was related to the Clovis culture [35]. Genetic continuity for the Native American component appears to have prevailed in the region ever since these replacements [3537]. However, little is known about the legacy in modern populations of the different ancestries associated to the several waves of population arrivals in South America. Modern Native American populations in the Southern Cone of South America seems to be divided between a component that includes populations in Tropical and Subtropical Forests and another component that includes Andean, Central Chile and Patagonian populations [38]. A recent genomic study of ancient and modern populations from Central Chile and Western Patagonia further identified that they are differentiated from the Andean and Subtropical and Tropical Forests populations [37]. In another genomic study of modern samples (in which the Southern Cone is only represented by the Gran Chaco region), it has been found that all non-Andean South American populations are likely to share a common lineage, while they are unlikely to share with the Andeans any common ancestor from Central America [39], supporting the hypothesis of many back migrations to Central America from non-Andean South American populations [39]. It has also been described that the genetic interactions between the peopling routes on both sides of the Andes were limited [39,40].

Here, we carry out a fine-scale population genomic study to get insight into the genetic structure and the complex origins of the African, European and Native American ancestries in Argentina populations. Using Affymetrix, Axiom-Lat1 array technology, we genotyped 87 admixed individuals nationwide for more than 800,000 SNPs covering coding and non-coding regions of the genome. Additionally, a dataset with ~500 individuals from modern and ancient populations throughout South America was constructed. We also pulled together genotype data for Sub-Saharan African and European populations from the literature. The new data generated in the present study, compared to those data sets, allows to broaden the knowledge of the sub-continental origins of the three main genetic ancestries of Argentinean populations.

Results and discussion

Studied populations

High-throughput genotyping data was generated for 87 admixed individuals throughout Argentina (Fig 1 and S1 Table). For clarity in the visualization of the results, we used provinces and regions to classify the admixed individuals analyzed as shown in Fig 1. This classification has not been used for any statistical analyses. The generated data was compared to different data sets to understand the origins of the genetic diversity in the country. These data sets are called DS<n> and are described in S2 Table.

Fig 1. Sample locations from the present study.

Fig 1

The samples are divided according to regions for visualization only. N: The number of samples that passed genotyping Quality Controls.

Ancestry in a worldwide context

In order to characterize the genetic diversity observed in Argentina within a worldwide genetic context, Principal Component Analysis (PCA [41]; S1 Fig) was applied to the dataset DS1, which contains genotype data for the samples from Argentina generated in the present study, as well as other admixed and Native American individuals from South America [31,37,38,42] and individuals from Europe and Africa from the 1000 Genomes Project (1KGP; [42]). The PCA shows that Argentinean individuals have different proportions of European, Native American and, much less represented, African ancestry (S1A Fig). This pattern, which has already been documented in other admixed populations from South America [31,42], including in Argentina [1316,32] supports the heterogeneous genetic origins throughout the country. In addition, PC4 discriminates between Southern and Northern European individuals (S1B Fig) while PC6 separates Luhya population, a Bantu-speaking population in Kenya, from Western African populations (S1C Fig). The Argentinean samples do not exhibit any gradient along these two PCs. But, PC3 and PC5 discriminate three different main Native American ancestries that are also represented in the Argentinean samples.

We ran unsupervised clustering models with Admixture software [43] on DS1. We used from 3 to 12 putative ancestral populations. Comparing the cross-validation scores obtained for each run, we estimated that the genotype data analyzed was best explained with a model with K = 8 ancestral populations (S2A Fig). At K = 3 (S2B Fig and S3 Table), the algorithm allows estimating the proportions of European, Native American and African ancestry. For K = 4 to K = 7, (S2C–S2F Fig), the model detects sub-continental ancestries. At K = 8 (Fig 2; S3 Table), the European ancestry is divided into Northern and Southern components (dark and light blue, respectively), while African ancestry is composed of Westernmost African (green), Gulf of Guinea (light green) and Bantu-influenced (dark green) components. Moreover, three components are observed for Native American ancestry: one in Central Chile/Patagonia populations (in orange; hereafter referred to as CCP), another in Central Andes populations (in yellow; hereafter referred to as CAN) and a third one in populations from Tropical and Subtropical Forests (in pink; here after referred to as STF).

Fig 2. Admixture analyses in a worldwide context.

Fig 2

Admixture for K = 8. 1st row: 1000 Genomes Samples from Europe, Africa and South America [42]. 2nd row: Modern Native American samples grouped following [38]. 3rd row: Chilean, Peruvian and Argentinean admixed samples from [31]. 4th row: Argentinean admixed samples from the present study. Samples are grouped according to regions (Fig 1); CYA: Cuyo Region in Argentina; NEA: Northeastern Region, NWA: Northwestern Region; PPA: Pampean Region; PTA: Patagonia Region. CLM: Colombians from Medellin; PEL: Peruvians from Lima.

We confirmed that the estimates of the continental ancestry proportion obtained from a model with K = 3 and K = 8 are highly consistent (S3 Fig). Moreover, the eigenvectors from PCA and the ancestry proportion estimates with Admixture are well correlated (S4 Fig).

From the Admixture results for K = 8 (Fig 2), we observed that the European ancestry for Argentinean samples, is divided in Southern and Northern components, the former being the most abundant. The low proportion of African ancestry in Argentinean samples makes difficult to interpret its sub-continental origins from analyses within a global context. Surprisingly, all three components of Native American ancestry are present in most Argentinean samples (Fig 2). They exhibit mid proportions of different Native ancestries suggesting either the result of a mixture between these three ancestry components or an underrepresentation of Native American in the reference panels currently available. Such mixture pattern is not observed in other South American countries. Indeed, the Native American ancestry for Peruvian, Chilean and Colombian admixed samples is mainly represented by CAN, CCP and STF, respectively. This is consistent with the geographical area where the admixed individuals have been sampled, and the genetic ancestry of the indigenous communities from each country.

Sub-continental ancestry components in Argentina

To decipher the sub-continental origins of the European, African and Native American ancestries, we first estimated local ancestry patterns in phased DS2 and DS3 (DS2p and DS3p) separately. Across phased chromosomes, we assigned whether a genomic region was of Native American, European or African ancestry using RFMix v2 [44]. We applied principal component and unsupervised clustering analyses on the masked data (S5 Fig) compiled with reference data sets describing the genetic diversity within each continent (S2 Table).

The genetic legacy of European migration in Argentina

We used DS4, a combination of the masked genotype data for admixed individuals. with a set of European individuals carefully selected to be representative of the genetic diversity in their sampling area [45,46].

From the PCA, we observed that most Argentinean individuals cluster with Iberians and Italians (Fig 3), as previously described [31,32]. However, some individuals cluster with Central and Northern Europeans.

Fig 3. European ancestry specific principal component analysis.

Fig 3

(A) Multidimensional scaling scatterplot (MDS) from Euclidian distances calculated from weighted 5-first European-specific Principal Components performed using the European reference samples, and admixed samples masked for African ancestry. Individuals from the European reference panel are colored according to main European geographic regions as shown in (B) while South American individuals are represented as shown in the legend. Elbow method to choose the best number of PCs to compute MDS is shown in S6 Fig.

We estimated that the genotype data analyzed was best explained with an admixture model with K = 2 (S7A Fig). Although it is rather difficult to assign a clear label to each of these two ancestral populations, it seems that the algorithm discriminates between Central/Northern (dark blue) and Southern Europe (light blue) components (S7B Fig). For K = 3, we still observe a Central/Northern European component (dark blue), while the South can be divided into a component particularly represented in Iberian individuals (turquoise) and another component more frequent in Southeastern Europeans and Italians (light blue). The Iberian component is highly represented in Southern American samples (S7C Fig), most likely reflecting the legacy of first Iberian migrations in South America during colonial period [11]. Samples from Argentina and Chile exhibit higher proportions of Southeastern/Italian and Northern European ancestries than Colombians, as well as lower Iberian ancestry proportions (S8 Fig). We observed no significant difference in the proportions of any European ancestry between Argentinean and Chilean samples (S8 Fig). However, both PCA and Admixture shows that the individuals with most Southeastern/Italian ancestry are from Argentina, This is consistent with a previous study [31], and it can be explained by the many arrivals from Italy during the great wave of European immigration in the 19th and 20th centuries [22].

Moreover, Argentinean individuals with higher proportion of Central/Northern European ancestry are from Misiones province (NEA), consistent with the historical record of settlement of Polish, German, Danish and Swedish colonies promoted by governmental or private enterprises in the province [47].

Different African ancestry components in Argentina

To investigate the sub-continental components that explains African ancestry in Argentina, we used DS5 which combines masked genotype data from admixed individuals with a published data set of African individuals [42,4850] (S9A Fig). The African reference populations used here can be divided into five main groups: Bantu-influenced, Hunter-Gatherers, Western African, San and Eastern African populations [48].

From the PCA applied to this data, we observed that African ancestry in Argentina has small genetic affinity with San and Hunter-Gatherers populations. Indeed, according to PC1 and PC2, Argentinean individuals are located within a group of African individuals belonging to Bantu-influenced, Western African and Eastern African populations, and clearly distinct from San and Hunter-Gatherer populations (S9B Fig). Moreover, on PC3, most Argentinean are closer to Western African and Bantu-influenced populations, while several Argentinean individuals, particularly from Northern Argentina, are located within a gradient towards Eastern African populations (S9D Fig). According to PC4, most of the Argentinean individuals are closer to the Bantu-influenced populations while others individuals are closer to Western Africa populations, and several other Argentinean individuals are found in a cline between these two groups (S9C Fig). These patterns were confirmed by applying Admixture algorithm. The Cross-Validation procedure points to a best fit of the data with K = 5 (S10A Fig), a model for which we observe that he Bantu-influenced and Western Africa components are the most represented in Argentinean individuals. Moreover, this analysis also showed that some individuals exhibit smaller, yet important, proportions of Eastern African ancestry, particularly in Northern Argentina (Fig 4). Although, the important missing genotype rate in masked data for admixed individuals could bias PCA and Admixture results, the results obtained by both methods are highly consistent for admixed individuals (S11 Fig).

Fig 4. African ancestry-specific admixture analysis.

Fig 4

Admixture for K = 5. 1st row: 1,685 reference samples with >99% of African ancestry. Populations are grouped following [48]. 2nd and 3rd rows: Peruvian and Argentinean admixed samples from [31] and from the present study. Samples are grouped according to region. CYA: Cuyo Region; NEA: Northeastern Region, NWA: Northwestern Region; PPA: Pampean Region; PTA: Patagonia Region. Genotype data for admixed samples were masked for African ancestry.

The legacy of Western Africa on the African genetic diversity in the Americas has been preeminent [5153], along with an impact of Bantu-influenced populations from Central Western Africa, particularly in Brazil and the Caribbean [51,52]. These two African ancestries have also been previously documented in Argentina from studies of autosomal [32] and maternal markers [27,28]. Maternal lineages specific to populations from Central Western Africa–particularly from Angola- are the most common African lineages in Argentina according to studies in the Central region [27] and in four urban centers (Puerto Madryn, Rosario, Resistencia and Salta) [28]. These results are concordant with the predominance of the Bantu-influenced origin identified in the present study. In addition, the presence of Southeastern Africa maternal lineages in Argentina [27,28] is consistent with African ancestry of this origin identified in previous studies in other South America countries [33,34], and with the Eastern African ancestry identified here.

An unrecognized genetic Native American component in Argentina

We compared masked genotype data for admixed individuals with masked genotype data of modern Native American reference populations from South America [37,38] and a set of ancient DNA data from the region [3537,54,55] (DS6; Fig 5A). The PCA (Fig 5B) and Admixture analyses (best model with K = 3) confirmed the three main South Native American ancestry components described by de la Fuente et al. (2018), and identified in our analyses at the global level.

Fig 5. Native American ancestry-specific principal component analysis.

Fig 5

(A) Localization map of the studied samples. Color and point coding matches sample groups: ancient populations are grouped following [35], modern Native American populations are grouped following [38] while Argentinean admixed sample locations are grouped according to Regions (Fig 1). For clarity, some geographic coordinates have been slightly changed. (B) Multidimensional scaling scatterplot from Euclidian distance calculated from weighted Native American ancestry specific Principal Components. Elbow method to choose the best number of PCs to compute MDS is shown in S12 Fig.

Similar results were also observed when applying Admixture algorithm (best model with K = 3; S13A Fig). The proportion estimates for the three main Native American ancestries observed for the South America Native American individuals and for the ancient samples (Fig 6) highly correlate with estimates from unmasked data (S14 Fig), and are consistent with CAN, CCP and STF labels that we attributed. In Argentina, the three Native American ancestries are observed in almost all the admixed individuals studied. In Northeastern Argentina (NEA), STF ancestry is the most frequent. In Northwestern Argentina (NWA), CAN ancestry is not clearly dominating since STF is also observed in important proportions. In the South, CCP is observed in greater proportions.

Fig 6. Native American ancestry-specific admixture analysis.

Fig 6

Admixture for K = 3. 1st row: Ancient samples grouped following [35]. 2nd row: Modern Native American samples grouped following [38]. 3rd row: Chilean, Peruvian and Argentinean admixed samples from [31]. 4th row: Argentinean admixed samples from the present study; sample locations are grouped according to Regions (Fig 1); CYA: Cuyo Region; NEA: Northeastern Region, NWA: Northwestern Region; PPA: Pampean Region; PTA: Patagonia Region. Genotype data for all modern samples were masked for Native American ancestry.

Although migration events tend to reduce the genetic distance among these components, we still identified correlations between geographic coordinates and the proportions of each Native American ancestry component. Indeed, CAN is found in higher frequencies further North (S15A Fig) and West (S15A Fig), STF is higher further North (S15B Fig) and East (S15E Fig), while the proportion of CCP is higher further South (S15C Fig).

Many individuals from the Cuyo and Pampean regions of Argentina (San Juan and Córdoba provinces as well as South of Buenos Aires province) exhibit intermediate position in PCA (Fig 5) and mid proportion estimates with Admixture (Fig 6). This pattern can be interpreted as the result of a mixture between different ancestries (scenario 1) or relative limited shared history with any of them (scenario 2). In the intent to discriminate between both scenarios, we increased the number of ancestral populations to 4 and 5. Model with K = 4 (S13C Fig) does not help to address this question, and model with K = 5 informs that scenario 2 is the most likely for Middle Holocene samples from the Southern Cone (S13D Fig). Models including more ancestral populations (K>5) did not allow robust ancestry proportion estimates most likely due to the masked data leveraged here. Many regions of Argentina are underrepresented in the reference panel because of the scarcity of Native American communities in most of the Argentinean territory. Altogether, PCA and admixture analyses are not sufficient to unequivocally contrast whether some Native American genetic diversity specific to Argentina is not represented in the reference panel.

In order to circumvent these limitations, we performed an objective quantitative approach based on K-means clustering (S16 Fig). We assigned the 452 modern South American individuals (the 53 ancient samples included in DS6 were not analyzed here) to a Native American ancestry cluster. We assigned 163, 161 and 70 individuals to the clusters representing CAN, STF and CCP, respectively, and 32 individuals were assigned to a fourth cluster (S4 Table). The remaining 26 individuals were removed for further analyses because their group assignation was not consistent across the three clustering approaches. We acknowledge that these groups are culturally, ethnically and linguistically heterogeneous. However, we argue that analyzing such groups built from genetic similarities may provide interesting insights into evolutionary mechanisms that shaped the Native American ancestry in South America.

Most individuals from Calingasta, (located in the Northwest Monte and Thistle of the Prepuna ecoregion; San Juan Province) and from Santiago de Chile were assigned to the fourth group. The genealogical record for the Calingasta individuals attests to a local origin of their direct ancestors up to two generations ago, and they have mtDNA sub-haplogroups predominant in the Cuyo region (S1 Table; [56,57]). On the other hand, all the Huilliche and Pehuenche individuals from Central Chile [37] have been consistently assigned to CCP. Altogether, we decided to refer to this fourth group hereafter as Central Western Argentina hereafter (CWA).

Relationship among the four identified Native American components

Significant positive f3 scores to test for Treeness were obtained for the six possible pairwise comparisons among the four Native American clusters identified (Fig 7A and S5A Table). In addition to confirming the differentiation among the three components described in our reference panel (CCP, CAN and STF), this analysis showed that the fourth group (CWA) most likely represents a Native American component never described in any previous study based on autosomal genetic markers. In fact, the genetic differentiation (measured with FST index) between CWA and any other cluster is not lower than for other comparisons (Fig 7B). The lowest FST was observed between STF and CAN, probably due to the fact that STF encompasses the Northern Andes region (Fig 7B). Moreover, the distribution of 1 –f3(YRI; Ind1, Ind2) between pairs of individuals from different groups (S17 Fig) is an additional argument to discard a scenario of mixture (mentioned before as scenario 1).

Fig 7. Relationship among the four Native American groups identified.

Fig 7

(A) f3(Target; S1, S2) to test for treeness. (B) FST between pairs of Native American groups. (C) f4(YRI, Target; S1, S2) to test whether Target shares more ancestry with S1 or S2. Since f4 is symmetrical when switching S1 and S2, only positive comparisons are shown. (D) Neighbor-joining tree estimated from distance matrix of the form 1/ f3(YRI; X, Y). The tree was estimated using ancient sample from Upward Sun River site in Beringia (USR1) as outgroup. CAN: Central Andes; STF: Subtropical and Tropical Forests; CCP: Central Chile / Patagonia; CWA: Central Western Argentina; YRI: Yoruba from 1KGP. Vertical segments are the +/- 3 standard errors intervals in A and B.

Furthermore, f4 analyses showed that (i) CAN has no particular genetic affinity with any component relative to the others; (ii) STF is closer to CAN as compared to CCP and CWA; and (iii) CWA and CCP exhibit higher genetic affinity between them than with CAN or STF (Fig 7C; S5B Table). However, a neighbor-joining analysis [58] from distances of the form 1/f3(YRI; X, Y) suggests that CAN is more closely related to CCP and CWA than to STF (Fig 7D; S5C Table).

Based on these evidences, we argue that CWA may represent a Native American ancestry that diverged from CCP and established on the Eastern side of the Andes in the Cuyo region.

The existence of a specific differentiated component in the Central Western and Central regions of Argentina has been previously suggested from maternal lineages analyses accounting for the genetic relationship between these two regions [10] and the presence of specific clades underrepresented elsewhere [10,27,59]. These studies support the hypothesis of a common origin and/or important gene flow [57], and the authors referred to a meta-population with great temporal depth and differentiated from other regions in Argentina. Moreover, the ethnographic description of the populations that were settled at the moment of contact with the Spanish colonies accounts to a potential relationship between Huarpes in the Cuyo region and Comechingones in present-day Córdoba province ([60], cited by [61]). Altogether, these facts support the Central Western Argentina (CWA) label that we attributed to this fourth component.

Complementary historical facts may explain the representation of this component in Santiago de Chile and its absence in Huilliche and Pehuenche populations. First, in pre-Columbian times, it seems to have existed an ethnic boundary around the 36° S Latitude, since Central Andean civilizations could not expand further South [62,63]. Later, the Spanish could neither settle those territories. Second, the Spanish colonists that established in Santiago de Chile organized massive deportation of Huarpes individuals from Cuyo region to palliate the lack of indigenous workforce [64]. For example, in Santiago de Chile, in 1614, 37% of the indigenous people that lived in the suburbs were Huarpes according to the chronicler Vázquez de Espinosa [65].

The present study, in which we analyzed individuals that do not belong to any indigenous community, made possible the identification of a Native American component not previously reported from autosomal markers.

Genetic affinity of the four Native American ancestry components with ancient populations

We evaluated the genetic affinity of the four Native American ancestry components to ancient samples from the literature [3537,54,55,66,67]. Graphical summaries of pairwise f3-based distances are presented in S18 Fig.

When comparing the genetic affinity of a given component with the different ancient groups using either the f3-outgroup or the f4 statistics (S19F Fig and S5D and S5E Table), we identified that CAN tends to exhibit greater genetic affinity with ancient Andean populations than with other ancient groups (S19A and S19E Fig). Strikingly, the genetic affinity of this component with both Late Andes and Early Andean ancient groups would point to a genetic continuity across the whole temporal transect that the archaeological samples provide. This would suppose that the replacement of an early population arrival by a later stream of gene flow in Central Andes identified previously [35] does not fully explain the current-day genetic diversity for CAN. We observed higher genetic affinity with ancient Southern Cone groups for both CCP and CWA (S19C, S19D, S19G and S19H Fig). As for STF, we observed intermediate genetic affinity with ancient samples from both the Andes and the Southern Cone (S19B and S19F Fig). The fact that there is no ancient sample group exhibiting outstanding genetic affinity with STF points to the underrepresentation of this component in ancient samples. First, the geographical range covered by ancient samples that could represent this component is restricted to Brazil, while STF is a heterogeneous group that includes relatively isolated populations [39] from the Gran Chaco, the Amazonas and Northern Andes. Moreover, the most recent samples that could represent STF are aged ~6700BP, and gene flow with other components since then may have contributed to dissolve the genetic affinity of STF with ancient samples in Brazil analyzed here.

Then, we evaluated the relationship of the time depth of the ancient samples from either the Andes or the Southern Cone, with their genetic affinity to the modern components of Native American ancestry (Fig 8). We observed a statistically significant relationship between the age of the ancient Southern Cone samples and their genetic affinity with CCP and CWA. This means that the older the ancient sample from the Southern Cone, the lower the shared drift with CCP and CWA. On the other hand, no statistically significant relationship was identified for STF and CAN (P = 0.523 and P = 0.596, respectively; Fig 8A). These patterns could be due to a relationship between geography and the age of the ancient samples because the most recent samples are concentrated in the Southern tip of the subcontinent (Fig 5A). Moreover, the number of SNPs with genotype data tends to decrease with the age of the ancient samples due to DNA damage, and thus inducing a potential bias towards significant positive correlations. To simultaneity correct both these two putative confounding effects, we repeated the analyses but using a correction for the ancient sample age (the residuals of the linear regression between the age of the ancient samples and their geographic coordinates) and a correction for genetic affinity estimates (the residuals of the linear regression between f3 and the number of SNPs to estimate it). This correction intensified the relationship described for CCP and CWA (Fig 8C). It also allowed to actually identifying significant relationships for STF and CAN. On the other hand, CAN is the only modern Native American component that exhibits a significant relationship between its genetic affinity with ancient Andean samples and their age (Fig 8B). This pattern holds after correction for geography (Fig 8D). Repeating the same analyses using f4 statistics, we reached the same conclusions (S20 Fig).

Fig 8. Changes across time of genetic affinity of the four Native American with ancient samples.

Fig 8

Each point represents a f3-Outgroup score of the form f3(YRI; X, Ancient) vs the age of ancient samples, where X is one of the four identified modern Native American components, and Ancient is an ancient group. X is represented by the color of the square while the symbol inside the square represents Ancient. The legend for plotted symbols is shown in Fig 5. (A) f3 vs age of ancient samples from Southern Cone. (B) f3 vs age of ancient samples from the Andes. (C) f3 vs age of ancient samples from Southern Cone considering correction for both f3 and age. (D) f3 vs age of ancient samples from the Andes considering correction for both f3 and age correction. Linear regression slopes and the associated P-values are shown. CAN: Central Andes; STF: Subtropical and Tropical Forests; CCP: Central Chile / Patagonia; CWA: Central Western Argentina; YRI: Yoruba from 1KGP.

Early divergence among the four Native American components in Argentina

Using another setting of the f4 statistics (S5F Table; S21 Fig), we observed that all Southern Cone ancient samples − except Los Rieles ~10900 BP − are more similar to CCP than to CWA (S21F Fig), pointing to divergence time between those two components older than ~7700 BP. CWA is not closer to any Brazilian and Andean ancient sample as compared to CCP, reinforcing the idea that CWA is not a mix of CCP, CAN and STF related ancestries. Moreover, we observed that all the ancient groups of the Southern Cone have greater genetic affinity with CCP than with CAN and STF (S21B and S21D Fig). Ancient groups of the Southern Cone (with the exception of Los Rieles ~10900 BP) tend to exhibit higher genetic affinity with CWA than with STF (S21C Fig). In addition, the comparisons including STF and CWA also point to a closer genetic relation between ancient groups from the Southern Cone with CWA, although statistical significance is reached only for Late Holocene Patagonia samples.

All the ancient samples from the Andes, independently of their previous assignment to the Early or Late population stream [35], are genetically more similar to CAN than to any of the three other components (S21A, S21D and S21E Fig). These results support further the hypothesis that the ancestry associated with an early Andean population arrival remains present to some extent in modern Central Andean populations and has not been totally replaced by a later stream of gene flow. Moreover, the archaeological samples in Brazil have higher genetic affinity with CAN than with CWA and CCP (not statistically significant for the latter comparisons). This suggests that divergence of STF with CCP and CWA could have occurred before the divergence of STF with CAN (also observed in S18A Fig).

Altogether, the trends depicted by these analyzes are expected under a model of early divergence among the four components. Our results also support the hypothesis of genetic continuity over long period of time for the different components after their split.

In order to get insights into the past genetic influence among the four components since their divergence, we applied a last f4-statistics analysis (S5G Table; S22 Fig). We observed that the influence of CAN appears to have been more pronounced into STF component than into CWA and CCP components (S22B, S22G and S22I Fig). The genetic affinity of CAN with STF, CCP and CWA is reduced as compared to both ancient Early and Late Andes samples (S22A, S22H and S22J Fig), implying that that the genetic legacy of Andean ancient populations into CAN remains important nowadays. The mutual influence of CCP and CWA components seems to have been important in remote times (S22K and S22L Fig). However, the available ancient samples provide much more precise information on the influence of CCP into CWA component than the other way around. Indeed, CWA does not exhibit higher genetic affinity to any ancient group relative to CCP (S22K Fig). The archaeological record for which genetic data has been generated misrepresents CWA since its early divergence with CCP, as well as the common ancestors specific to these two components.

Conclusions

European ancestry in modern Argentinean populations is mostly explained by Southern European origins, although we identified several individuals with Northern European ancestry. African ancestry in Argentina traces its origins from three main components. Although the Bantu-influenced and Western components are clearly the most represented, we also found that an Eastern origin explains some of the African ancestry in Argentina and represents up to ~30% of the African ancestry for some Argentinean individuals. Our work shows that studying more admixed individuals, with a particular focus on extending the geographical coverage of the Argentinean territory, would help to identify the genetic legacy from secondary migration streams, and thus to get a better representation of the complex origins of African and European ancestries in the country.

Concerning Native American ancestry, we concluded that Argentinian populations share, in varying proportions, from three distinct components previously described: Central Andes, Central Chile/Patagonia and Subtropical and Tropical Forests. Moreover, we present evidence supporting the hypothesis that the Native American ancestry in the Central Western region of Argentina may derive from a fourth component that diverged early from the other Native American components, and maintained a tight link with the Central Chile/Patagonia component. This relation is not explained by a putative contribution of admixed individuals from Santiago de Chile (S23 Fig). Having identified this component from admixed individuals demonstrates that focusing only on indigenous communities is insufficient, at least in Argentina, to fully characterize the Native American genetic diversity and decipher the pre-Columbian history of Native Americans. Indeed, most indigenous communities have been culturally annihilated and invisibilized [68,69] to the point that several Argentinean regions were considered “Indian free” in the mid-20st century [70]. However, the cultural incorporation did not necessarily imply a biological extinction. Although studies based on samples from indigenous communities [3739] provide decisive information to understand the evolutionary history of Native American ancestry, alternative strategies must be considered to fill this gap in the effort to more fully describe the Native American ancestry (e.g. see [71]). Studying admixed individuals can be complex, and leveraging a pure statistical approach, we grouped individuals from rather culturally, ethnically, linguistically, and genetically heterogeneous groups to represent the four Native American components discussed here. Yet, the present study provides useful insights into the routes followed by the main population arrivals in the Southern Cone (S24 Fig).

Further efforts are needed to better characterize the Native American ancestry component identified in the Central Western region of Argentina. Particularly, we encourage future studies to confirm the tentative geographical label that we suggest here, and to estimate its influence in the region. Besides these specific questions, many other general questions remain to be answered to better understand the pre-Columbian population dynamics in the Southern Cone such as the time and place of the splits among the components described here, and the extent of genetic exchanges among them. More genotype data for ancient samples, modern indigenous communities and admixed individuals, particularly in Central, Northeastern and Patagonia regions of Argentina, would help to decipher these issues.

The genomic characterization of populations is an unavoidable practice for many issues ranging from the understanding of our biological heritage, the rational use of biobanks, the definition of an adequate reference genome, the estimation of polygenic risk scores, the study and treatment of simple and complex diseases, and the design of a national program of genomic medicine in our country. This study is a joint effort of Argentinean institutions funded by the national scientific system, and represents the first milestone of the Consorcio Poblar, a national consortium for creating a public reference biobank to support biomedical genomic research in Argentina [72]. Genomic knowledge of local populations should be a priority for developing countries to achieve an unbiased representation of diversity in public databases and the scientific development in periphery countries.

Material and methods

Studied samples

We genotyped 94 individuals with the Axiom LAT1 array (Affymetrix) from 24 localities and 17 provinces across Argentina (Fig 1). These samples were selected among 240 collected by different population genetics groups (Consorcio PoblAR) during past sampling campaigns with a biological anthropology focus. According to the available information (e.g. interviews, genealogical information, etc.), each PoblAR research group selected for this study some samples, maximizing the odds that they come from individuals with greater Native American ancestry. For example, surnames were used as a proxy to achieve this objective and the permanence of ancestors in national territory has been another variable that was taken into account. The analyzed 94 samples were also selected to ensure an extended geographical range and were included when they presented sufficient DNA concentration and Native American maternal lineage. Moreover, among the males, we prioritized those with Native American paternal lineage.

All saliva and blood samples were collected under written informed consent to participate to this study. The informed consents for research use were approved by several Ethics Committees from (i) Hospital Zonal de Puerto Madryn (Resolución 009/2015), (ii) Hospital Zonal de San Carlos de Bariloche (Resolución 1510/2015), (iii) Hospital Provincial de Pediatría Dr. Fernando Barreyro, (iv) Investigaciones Biomédicas IMBICE (RENIS CE000023), (v) Provincia de Jujuy, (vi) Hospital Italiano of Buenos Aires (protócolo 1356/09); and (vii). Centro de Educación Médica e Investigaciones Clínicas “Norberto Quirno” (Resolución 612/2018). The biological samples were coded and anonymized, as per the Argentina National Law 25.326 of Protection of Personal Data.

87 samples and 791,543 autosomal variants passed the standard Affymetrix genotyping Quality Controls (S1 Table).

Most of the genotype data processing was performed using in-house scripts in R [73] and perl [74], leveraging plink2 [75], vcftools v1.13 [76], and bedTools v2 [77].

We compiled the genotype data for the 87 Argentinean samples with different genotype data available in the literature. We focused our study on biallelic SNPs (removing indels and SNPs with more than 2 alleles). Any putative inconsistent strand had been fixed processing to the relevant flip, filtering out any SNP with ambiguous genotype (A/T, G/C).

Cryptic relatedness among samples were assessed using King software [78]. To avoid any 1st degree relationship, we filtered out individuals, minimizing the total number of removals. No admixed individuals had been removed at this step.

We thus build different dataset arrangements (named DS<n>) that we analyzed through this work (S2 Table).

Argentinean genetic diversity within a worldwide context

To analyze genetic diversity in Argentina within a worldwide context, we built the Dataset1 (DS1). This dataset contains 87 Argentinean samples, 654 African, 503 European and 179 South American samples from 1KGP [42], 54 modern unrelated Chilean samples [37], and 161 Native-American individuals from South America [38]. Moreover, DS1 included genotype data from [31] which consists in 82 individuals from Lima (Peru), 27 from Santiago de Chile, and 161 from Argentinean urban centers.

We filtered out any variant and individual exhibiting in the compiled data set more than 2% and 5% of missing genotypes (—geno 0.02 and—mind 0.05 flags in plink 1.9), respectively, as well as Minor Allele Frequency below 1% (—maf 0.01 flag in plink 1.9). The filtered data was then pruned for Linkage-Disequilibrium (—indep-pairwise 50 5 0.5 flag in plink 1.9). The combined data set has a total intersection of 59,237 SNPs and 1,908 individuals. With this curated data we performed Principal Component Analyses [41] (S1 Fig) and Admixture [43] (Fig 2, S2 Fig and S3 Table).

Local ancestry

Local ancestry analyses rely on haplotype reconstruction (phasing) and require high SNP density. Since, admixed individuals of interest were genotyped on different microarray platforms, we decided to perform two separate local ancestry analyses on two different data sets (DS2 and DS3). DS2 consists of 87 Argentinean individuals from the present study, and 54 Chilean individuals [37], all of them genotyped with the Axiom LAT1 microarray. DS3 includes 175 Argentinean, 27 Chilean, and 119 Peruvian genotyped with the Illumina OMNI1 microarray [31]. Both DS2, DS3 were merged with 1KGP data consisting in 503 phased reference samples for each of the African and European genetic ancestry, and 347 Latin American individuals.

The 87 Argentinean, and the 54 Chilean samples were phased with shapeIT2 [79,80]. The genetic map, and 5,008 haplotypes panel provided by 1KGP were downloaded from http://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3/. Algorithm, and model parameters were used by default, filtering out monomorphic SNPs, and those with more than 2% of missing genotype. We obtained phased genotype data for 608,501 autosomal SNPs. This data was then merged with phased 1KGP genotype data for African, American and European samples described before. Since this data set derived from DS2, we call it DS2p (phased DS2).

The 175 Argentinean, 27 Chilean and 119 Peruvian samples from [31] were phased separately using the same procedure with shapeIT2. After filtering for missing genotypes and merging with phased 1KGP genotype data for African, American and European samples we obtained phased data for 694,626 autosomal SNPs. Since this data set derived from DS3, we call it DS3p (phased DS3).

In DS2 and DS3 we ran Admixture using K parameter minimizing the Cross-Validation score. We used individuals with more than 99% of Native American ancestry as references for local ancestry estimation. For DS2 we used K = 7, and Native American ancestry was defined as the sum of the two American specific components observed (S25 Fig). According to this criterion, 48 individuals were assigned as Native American reference. All the other American samples were defined as Admixed.

For DS3, we used K = 5, and Native American ancestry was defined as the single American specific component observed (S26 Fig). According to this criterion, 19 individuals were assigned as Native American reference. All the other American samples were defined as Admixed.

RFMix v2 ([44] downloaded at https://github.com/slowkoni/rfmix on 15th August 2018) was run on DS2P and DS3P separately using parameter settings similar to [81]. The reference panels consist of the African and European samples from 1KGP, as well as Native American individuals identified through Admixture procedure described before. We used 1 Expectation-Maximization iteration (-e 1), actualizing the reference panel in this process (—reanalyze-reference). We used CRF spacing size and random forest window size of 0.2 cM (-c 0.2 and -r 0.2). We use a node size of 5 (-n 5). We set the number of generations since admixture to 11 (-G 11) considering the estimates from [31]. The forward-backward output was then interpreted to assign allele ancestry to the one exhibiting major posterior probability, conditioning that it was greater than 0.9. Otherwise, the allele ancestry was assigned to Unknown (UKN). As a sanity check, the global ancestry proportions estimated through this RFMix analysis were compared with those obtained with Admixture software. The global ancestry proportion estimates obtained by both procedures matched very well: spearman’s correlation greater than 0.9 in American samples for any of the 3 continental ancestries (S27 Fig). Moreover, the ancestry ditypes assigned in admixed individuals from 1KGP (included in both DS2p and DS3p) were highly consistent between both independent masking procedures (S28 Fig).

Ancestry specific population structure

In order to analyze the ancestry-specific population structure we masked the data, i.e. for each individual, we assigned missing genotype for any position for which at least one of the two alleles was not assigned to the relevant ancestry. In other words, to study ancestry A, we kept for each individual, regions exhibiting ancestry A on both haplotypes (ditypes) as illustrated in S5 Fig

European ancestry specific population structure

To study European ancestry specific population structure, we analyzed together masked data for this ancestry for Colombian individuals from 1KG and individuals from DS2P and DS3P excluding individuals from Chilean Native American communities [37]. This data was merged with a set of reference individuals with European ancestry [46], which is a subset of the POPRES dataset [45]. We call this data set as DS4. We removed individuals with less than 30% SNPs with the ancestry ditypes (—mind 0.7 with plink 1.9). We also removed SNPs with more than 50% of missing genotypes (—geno 0.5 with plink 1.9). Thus, DS4 contains 132 modern Argentinean individuals (29 from the present study and 103 from [31]), 17 individuals from Santiago de Chile, 4 from Lima and 74 from Colombia [42]. DS4 encompasses 29,347 SNPs of which 27,634 remained after LD-pruning (—indep-pairwise 50 5 0.5 flag in plink2).

Smartpca from Eigensoft v7.2.0 was run on DS4 [41] with the lsqproject option ON. We report the PCA results summarized into a 2-dimensional space by applying Multidimensional Scaling on weighted Euclidian distance matrix for the first N PCs. We weighted each PC by the proportion of variance it explains. We selected the N most informative PCs according to the Elbow method on the proportion of explained variance. Admixture [43] was run with K ranging from 2 to 10 with cross-validation procedure.

African ancestry specific population structure

To study African ancestry specific population structure, we analyzed together masked data for this ancestry for individuals from DS2P and DS3P. This data was merged with a compilation of reference individuals with African ancestry from [42,4850]. We removed African individuals with less than 99% of African ancestry when comparing them to the 2504 individuals from 1KGP (Admixture with K = 7 minimizing cross-validation score). We thus reduced the African reference to 1685 individuals. We call as DS5 the data set containing both the masked data for admixed South American individuals and African reference individuals.

We removed SNPs with more than 10% of missing genotypes (—geno 0.1 with plink 1.9), and individuals with less than 5% of the ancestry ditypes (—mind 0.95 with plink 1.9). Thus, DS5 contains, 26 modern Argentinean individuals (all from the present study), and 12 individuals from Lima (9). DS5 consisted in 137,136 SNPs, of which 128,086 remained after LD-pruning (—indep-pairwise 50 5 0.5 flag in plink2).

PCA and Admixture were performed as for European ancestry specific population structure analyses (described before).

Native American ancestry specific population structure

To study Native American ancestry specific population structure, we analyzed together masked data for this ancestry for individuals from DS2P and DS3P. This data was merged with pseudo-haploid data for ancient samples within South and Central America [3537,54,55], as well as with masked data for Native American individuals from [38]. The pseudo-haploid data for ancient samples was downloaded from the Reich Laboratory webpage (https://reich.hms.harvard.edu/downloadable-genotypes-present-day-and-ancient-dna-data-compiled-published-papers) the 15th of May 2019. For each sample, we used annotations (geographic coordinates and approximate date) from the metafile provided at the same url.

We call this data set DS6. We removed individuals with less than 30% SNPs with the ancestry ditypes (—mind 0.7 with plink 1.9). We also removed SNPs with less than 50% individuals with the ancestry ditypes (—geno 0.5 with plink 1.9). DS6 contains 146 modern Argentinean individuals (74 from the present study and 72 from [31]), 22 individuals from Santiago and 77 from Lima, along with 207 Native South American individuals from [37,38] and 53 ancient samples. DS6 encompasses 47,003 SNPs, of which 39,423 remained after LD-pruning (—indep-pairwise 50 5 0.5 flag in plink2).

PCA and Admixture were performed as for European ancestry specific population structure analyses (described before), with the difference that the poplistname option was set for smartpca in order to estimate the PC using only modern samples and project the ancient samples.

Statistical assignation of modern South American individuals to Native American components

Given a distance matrix among modern individuals from DS6, we performed K-means clustering with the number of clusters ranging from 2 to 20 and selected the K-means output minimizing the Bayesian Information Criterion (BIC). This procedure was applied to three different distance matrix: (i) Weighted Euclidean distance based on the two first PCs from Principal Component Analysis (ii) Euclidean distance based on the ancestry proportions estimated from Admixture (K = 3) and (iii) 1f3(Ind1,Ind2,Yoruba), where Ind1 and Ind2 are two individuals. In the three cases, the K-means procedure minimized the BIC when considering 4 clusters. These clusters correspond to the three Native American Component discussed in this paper (CAN, CCP and STF), along with one laying in-between, which we finally attributed to Central Western Argentina. However, individual assignment to one of these four clusters was not totally consistent according to the distance matrix we used. To obtain a robust assignation, an individual was assigned to a given cluster when it consistently belonged to it across the three K-means procedures, otherwise it had been removed for following analyses (S16 Fig). The cluster assignation for each individual is given in S4 Table.

We computed pairwise FST among the groups using smartpca [41] with fstonly and lsqproject set to YES, and all the other settings left at default. Standard errors were estimated with the block jackknife procedure [41].

F-statistics to infer relationship between the four Native American components in Argentina and their genetic affinity with ancient populations

Starting from genotype data for the individuals in DS6, we also included genotype data from Yoruba (YRI) population from [42], Mixe population from [38] and, pseudo-haploid data for Anzick individual from the Clovis culture [67] and USR1 individual from Upward Sun River in Beringia [66]. The resulting data set is called DS7. Within DS7, we grouped modern individuals according to their assigned Native American group and we removed those with inconsistent assignation (S16 Fig and S4 Table). Thus, DS7 contains 426 modern individuals, 55 ancient samples, 108 Yoruba and 17 Mixe individuals. DS7 encompasses 88,564 SNPs. Note that we did not apply any SNP filtering overall DS7 in order to maximize the number of SNPs included in each group comparison considered.

Using modern individuals from DS7, and considering any possible combination of the four groups identified (STF, CCP, CAN and CWA), we computed the f3 statistics [82] in the form of f3(Target; S1, S2). This allowed to contrast whether Target, S1 and S2 could be organized in the form of a phylogenetic tree (positive f3) or whether the Target group is the result of mixture between S1 and S2 groups (negative f3). We also computed f4(YRI, Target; S1, S2) to test whether group Target shares more evolutionary history with group S1 (negative f4) or group S2 (positive f4). Moreover, we reconstructed the Neighbour-joining tree from the matrix of distances with Phylip v3.2 [58] and USR1 as outgroup. The distances were estimated as 1/f3-outgroup(YRI; X, Y).

We computed f3-Outgroup and f4 statistics in order to estimate the genetic affinity of the four Native American components with ancient populations from South and Central America. We computed the f3-outgroup and f4 statistics in the form of f3(YRI; X, Ancient) and f4(YRI, X; USR1, Ancient), where X the represent the cluster containing all individuals assigned to a given Native American ancestry component.

We also computed f4 of the form f4(YRI, Ancient; X, Y). This f4 setting allowed testing which of X or Y, each referring to one of the four Native American components. shares more ancestry with a given Ancient group. X and Y.

We finally computed f4 of the form f4(Ancient, X; Y, YRI) to test whether a given modern Native American component Y shares exhibit closer genetic affinity with a given Ancient group (negative f4) or with another modern Native American component X (positive f4).

All the results based on F-statistics are listed in S5 Table. We assessed significance of a comparison considering 3 standard errors (|Z| > 3) from block jackknife procedure [82].

Geographical maps

All maps have been generated in R using, maps package, (https://cran.r-project.org/web/packages/maps/index.html). S24 Fig includes a raster downloaded at https://www.naturalearthdata.com/ and integrated in R with raster (https://CRAN.R-project.org/package=raster) and RStoolbox (http://bleutner.github.io/RStoolbox) packages.

Supporting information

S1 Fig. Principal component analysis in a worldwide Context.

A: PC2 vs PC1; B: PC4 vs PC3; C: PC6 vs PC5. The percentage of variance explained by each principal component (PC) is given. Each point represents an individual following the color and point codes given in legend.

(TIF)

S2 Fig. Admixture analysis in a worldwide context.

(A) Cross-Validation score for Admixture runs on the worldwide meta dataset (DS1) with K from 3 to 12. (B-G) Admixture results with K = 3 to K = 8. 1KGP: 1000 Genomes Project; CYA: Cuyo Region; NEA: Northeastern Region, NWA: Northwestern Region; PPA: Pampean Region; PTA: Patagonia Region.

(PDF)

S3 Fig. Comparison of different admixture continental ancestry proportion estimates in a worldwide context.

Comparison of the African, European and Native American ancestry proportion estimates obtained with Admixture models with K = 3 and K = 8 applied to DS1. (A) African ancestry proportions for K = 3 are as observed in green in S2B Fig, while for K = 8 they are estimated as the sum of the three greenish colors observed in Main Fig 2. (B) European ancestry proportions for K = 3 are as observed in blue in S2B Fig, while for K = 8 they are estimated as the sum of the two bluish colors observed in Main Fig 2. (C) Native American ancestry proportions for K = 3 are as observed in orange in S2B Fig, while for K = 8 they are estimated as the sum of the three reddish colors observed in Main Fig 2.

(PDF)

S4 Fig. Correlation between eigenvectors and ancestry proportion estimates from analyses in a worldwide context.

Comparison of ancestry proportion estimates from Admixture model with K = 8 and the 6 first Principal Components (PCs) in DS1. (A) PC1 vs African ancestry proportions (estimated as the sum of the three greenish colors observed in Main Main Fig 2). (B) PC2 vs European ancestry proportions (estimated as the sum of the two bluish colors observed in Main Fig 2). (C) PC2 vs Native American ancestry proportions (estimated as the sum of the three redish colors observed in Main Fig 2). (D) PC6 vs Bantu-influenced ancestry proportions (dark olive green in Main Fig 2). (E) PC6 vs Western African ancestry proportions (dark green in Main Fig 2). (F) PC4 vs Southern European ancestry proportions (light blue in Main Fig 2). (G) PC4 vs Northern European ancestry proportions (dark blue in Main Fig 2). (H) PC3 vs Cenral Chile / Patagonia ancestry proportions (orange in Main Fig 2). (I) PC3 vs Central Andes ancestry proportions (yellow in Main Fig 2). (J) PC5 vs Subtropical and Tropical Forests ancestry proportions (pink in Main Fig 2).

(PDF)

S5 Fig. Example of a local ancestry output.

(A) RFMIX output for a given admixed individual. (B) Masked genotype showing ditypes of Native American (red), European (blue) and African (green) ancestry. Gaps are represented in grey and regions with unassigned ancestry (Unknown) are in black.

(TIFF)

S6 Fig. Choice of the number of principal components from European ancestry-specific principal component analysis.

Elbow method to determine which PC minimizes the angle of the curve from the chart “Percentage of variance explained versus Number of PCs”

(PDF)

S7 Fig. European ancestry specific admixture analysis.

(A) Cross-Validation scores for K from 2 to 10. (B) Admixture for K = 2. (C) Admixture for K = 3. CYA: Cuyo Region; NEA: Northeastern Region, NWA: Northwestern Region; PPA: Pampean Region; PTA: Patagonia Region.

(PDF)

S8 Fig. Comparision of different European ancestry proportions in South America.

Comparison of ancestry proportion estimates from European Ancestry Specific Admixture Analyses (K = 3) among samples from the Argentina, Chile and Colombia (A) Southeaster/Italian ancestry (light blue in S7C Fig). (B) Iberian ancestry (turquoise in S7C Fig). (C) Northern Ancestry (dark blue in S7C Fig). P-value of the Wilcoxon test for each pairwise comparison is shown.

(PDF)

S9 Fig. African ancestry-specific principal component analysis.

(A) Localization map of the 1685 reference samples with >99% of African ancestry. (B-C) Principal Components performed using the African reference samples (represented as in panel A), and South American samples masked for African ancestry.

(TIF)

S10 Fig. African ancestry-specific admixture analysis.

(A) Cross-validation scores K from 2 to 10. (B) Admixture for K = 2. (C) Admixture for K = 3. (D) Admixture plots for K = 4. CYA: Cuyo Region; NEA: Northeastern Region, NWA: Northwestern Region; PPA: Pampean Region; PTA: Patagonia Region.

(PDF)

S11 Fig. Correlation between eigenvectors and ancestry proportion estimates from African ancestry specific analyses.

Comparison of ancestry proportion estimates from Admixture model with K = 5 and some Principal Components (PCs) in admixed samples from DS5. (A) PC3 vs Western African ancestry proportions (yellow in S10 Fig). (B) PC3 vs Bantu-influenced ancestry proportions (blue in S10 Fig). (C) PC4 vs Eastern African ancestry proportions (orange in S10 Fig).

(PDF)

S12 Fig. Choice of the number of principal components from Native American ancestry-specific principal component analysis.

Elbow method to determine which PC minimizes the angle of the curve from the chart “Percentage of variance explained versus Number of PCs”

(PDF)

S13 Fig. Native American ancestry-specific admixture analysis.

(A) Cross-validation scores K from 2 to 10. (B) Admixture for K = 2. (C) Admixture for K = 4. (D) Admixture plots for K = 5. CYA: Cuyo Region; NEA: Northeastern Region, NWA: Northwestern Region; PPA: Pampean Region; PTA: Patagonia Region.

(PDF)

S14 Fig. Comparison of Native American ancestry proportion estimates obtained with admixture on unmasked data (K = 8) and masked data (K = 3).

Unmasked and masked data refers to DS1 and DS6, respectively. Spearman correlation coefficients and associated P-values are shown. CAN: Central Andes; STF: Subtropical and Tropical Forests; CCP: Central Chile / Patagonia.

(PDF)

S15 Fig. Correlation of Native American ancestry proportions and geographic coordinates in Argentina.

(A) Central Andes ancestry proportions vs Latitude. (B) Central Andes ancestry proportions vs Longitude. (C) Subtropical and Tropical Forests ancestry proportions vs Latitude. (D) Subtropical and Tropical Forests ancestry proportions vs Longitude. (E) Central Chile/Patagonia ancestry proportions vs Latitude. (F) Central Chile/Patagonia ancestry proportions vs Longitude. Linear regression slopes and the associated P-values are shown.

(PDF)

S16 Fig. Individual assignation to a Native American ancestry cluster.

Consensus cluster assignation of South American individuals based on three K-means procedures run with different pairwise distances among individuals. (Top) K-means results using Ancestry-Specific PCA and Admixture (ASPCA and AS-Admixture), and f3 results to compute pairwise distances. Individuals are represented as in Main Fig 5. Insets: BIC score for number of clusters set to K-means ranging from 2 to 20. In all the three cases, K-means BIC was minimized when considering 4 clusters. (Bottom) Same as top with point colors corresponding to the assigned cluster.

(TIF)

S17 Fig. Pairwise genetic affinity among individuals assigned to different groups.

Boxplots for 1- f3(YRI; Ind1, Ind2), where Ind1 and Ind2 are two individuals belonging to Group 1 and Group 2, respectively. The groups are either the fourth Native American components identified or ancient Middle Holocene Southern Cone groups. For clarity, boxplot outliers are not shown. YRI: Yoruba from 1KGP.

(PDF)

S18 Fig. Graphical visualization of pairwise genetic distances among modern and ancient groups in South America.

(A) Neighbor-joining tree from distances of the form 1/f3(YRI; X, Y). USR1 from Ancient Beringia was used as outgroup (B) Multidimensional-scaling from distances of the form 1-f3(YRI; X, Y). Each group is represented as appearing in the leaf of (A). USR1 and Anzick-1 were not considered in (B). YRI: Yoruba from 1KGP.

(TIF)

S19 Fig. Genetic affinity of the four Native American components with ancient groups.

(A-D) f3(YRI; X; Ancient). (E-H) f4(YRI, X; Ancient Beringia, Ancient). (A) and (E): with Central Andes (CAN) as X. (B) and (F): With Subtropical and Tropical Forests (STF) as X. (C) and (G): With Central Chile / Patagonia (CCP) as X. (D) and (H): With Central Western Argentina (CWA) as X. YRI: Yoruba from 1KGP; Ancient Beringia: USR1 individual from [66]; X: Native American component in Argentina (one plot per X). Ancient: ancient group labeled on the x-axis and represented with a point/color scheme as in Main Fig 5. Vertical segments are the +/- 3 standard errors intervals.

(PDF)

S20 Fig. Changes across time of genetic affinity of the four Native American components with ancient groups.

Each point represents a f4 score of the form f4(YRI, X; Ancient Beringia, Ancient) vs the age of ancient sample, where X is one of the four identified Native American components, and Ancient is an ancient group. X is represented by the color of the square while Ancient is represented by the point within the square. The point code of the ancient samples is shown in Main Fig 5. Ancient Beringia: USR1 individual from [66]. (A) f4 vs age of ancient samples from Southern Cone. (B) f4 vs age of ancient samples from Andes. (C) f4 vs age of ancient samples from Southern Cone considering correction for both f4 and age. (D) f4 vs age of ancient samples from Andes considering correction for both f4 and age. Linear regression slopes and the associated P-values are shown. CAN: Central Andes; STF: Subtropical and Tropical Forests; CCP: Central Chile / Patagonia; CWA: Central Western Argentina; YRI: Yoruba from 1KGP.

(TIF)

S21 Fig. Comparison of genetic affinity of an ancient group to a Native American component relative to another.

f4 (YRI, Ancient, X, Y) where X and Y are two of the four identified Native American components (one plot per X-Y combination), and Ancient is ancient group labeled on the x-axis and represented with a point/color scheme as in Main Fig 5. Vertical segments are the +/- 3 standard errors intervals. Note this setting for f4 statistics is symmetrical when switching X and Y.

(PDF)

S22 Fig. Comparison of genetic affinity of a Native American component to another relative to an ancient group.

f4(Ancient, X; Y, YRI) where X and Y are two of the four identified Native American components (one plot per X-Y combination), and Ancient is ancient group labeled on the x-axis and represented with a point/color scheme as in Main Fig 5. Vertical segments are the +/- 3 standard errors intervals. CAN: Central Andes; STF: Subtropical and Tropical Forests; CCP: Central Chile / Patagonia; CWA: Central Western Argentina; YRI: Yoruba from 1KGP.

(PDF)

S23 Fig. Removing admixed Santiago de Chile individuals to compute F-statistics does not affect the results.

Admixed individuals from Santiago de Chile were removed to perform the analyses presented in this figure. (A) f3(Target; S1, S2) to test for treeness; (B) f4(YRI, Target; S1, S2) to test whether Target shares more ancestry with S1 or S2; (C) f3(YRI; CWA; Ancient); (D) f4(YRI, CWA; Ancient Beringia, Ancient); (E) f4(Ancient, CCP; CWA, YRI); (F) f4(Ancient, CWA; CCP YRI); (G) f4(YRI, Ancient; CWA, CCP). CCP: Central Chile / Patagonia.

(PDF)

S24 Fig. Schematic routes for the main population arrivals in the Southern Cone.

Each arrow represents one of the four components discussed throughout the article. Neither the time and place of the splits among these components nor gene flow among them have been addressed in this study.

(PDF)

S25 Fig. Admixture analyses in DS2 to define European, African and Native American reference individuals for local ancestry analyses.

(A) Cross-validation scores from K = 3 to K = 10. (B) Admixture for K = 7.

(PDF)

S26 Fig. Admixture analyses in DS3 to define European, African and Native American reference individuals for local ancestry analyses.

(A) Cross-validation scores from K = 3 to K = 10. (B) Admixture for K = 5.

(PDF)

S27 Fig. Comparison of admixture and RFMix ancestry proportion estimates in DS2p and DS3p.

(A-C) For DS2p: Argentinean samples from the present study with reference panel that consists in 1KGP individuals from Africa, Europe and America [42] and Chilean individuals from [37]. Native American, European and African ancestry proportions estimates with RFMix vs with Admixture with K = 7. (D-F) For DS3p: Argentinean samples from [31] with reference panel that consists in 1KGP individuals from Africa, Europe and America [42]. Native American, European and African ancestry proportions estimates with RFMIx vs with Admixture with K = 5.

(TIF)

S28 Fig. Consistency of the masking procedure applied to DS2p and DS3p.

We compared the percentage of variants with same ancestry ditypes in DS2p and DS3p for American admixed individuals from the 1000 Genomes. Project.

(PDF)

S1 Table. Sample information.

Sampling location, gender, uniparental lineages, Affymetrix QC metrics, color and point coding used for plots.

(XLS)

S2 Table. Data sets (DS) analyzed throughout the article.

(PDF)

S3 Table. Ancestry proportion estimates in a worldwide context.

Ancestry proportion estimates from Admixture analyses with K = 3 and K = 8 at the worldwide level. The column names describe the labels attributed to each ancestry detecting for both Admixture analyses, as well as the hexadecimal code for the color used to represent it in the corresponding admixture plot. The columns “Point”, “Color” and “cex” list the graphical parameters used to represent each individual in the different plots throughout the article.

(XLS)

S4 Table. Native American cluster assignation.

Individual Native American cluster assignation is given for each of the three K-means procedures and for the consensus call (columns “F3”, “PCA”, “Admixture” and “Consensus”). The ancestry proportion estimates from Admixture analyses with K = 3 on the masked data for Native American ancestry are also provided. The column names explicit the labels attributed to each ancestry detecting for both Admixture analyses as well as the hexadecimal code for the color. For admixed individuals (from the present study and [31]), the”Population” and”Region” columns list the locality and province, respectively, while for Native American population (from [37,38]) the”Population” and”Region” columns list the ethnic and main ethnic groups, respectively.

(XLS)

S5 Table. F-statistics.

(A) f3(Target; S1, S2) only for comparisons including Native American components (B) f4(YRI, Target; S1, S2) only for comparisons including Native American components (C) f3(YRI; X, Y) only for comparisons including Ancient Beringia, Mixe and Native American components (D) f3(YRI; X, Y) where X and Y can be either an ancient group or one of the four Native American components. (E) f4(YRI, X; Ancient Beringia, Ancient) only for comparisons including between a Native American components (X) and an ancient group (Ancient). (F) f4(YRI, Ancient, X, Y) only for comparisons including two Native American components (X, Y) and an ancient group (Ancient). (G) f4(Ancient, X; Y, YRI) only for comparisons including two Native American components (X, Y) and an ancient group (Ancient). YRI: Yoruba from 1KGP.

(XLSX)

Acknowledgments

In memoriam of Raúl Carnese.

We thank Rolando González-José, Andrea Llera and Mariana Berenstein for the management and promotion of the Consorcio Poblar.

We thank all the participants of the different sampling campaigns across Argentina from which the newly genotyped samples are derived from.

We thank Ricardo A. Verdugo, Etienne Patin, Luca Pagani, Marta E. Alarcón Riquelme and María Teruel Artacho who kindly shared genotype data included in this study.

We thank Laura Fejerman for her proofreading of the manuscript.

Abbreviations

1KGP

1000 Genomes Project

AMBA

Metropolitan Area of Buenos Aires

BIC

Bayesian Information Criterion

CAN

Central Andes

CCP

Central Chile / Patagonia

CLM

Colombians from Medellin

CWA

Central Western Region in Argentina

CYA

Cuyo Region in Argentina

DS

Data set

LD

Linkage Disequilibrium

NEA

Northeastern Region in Argentina

NWA

Northwestern Region in Argentina

PCA

Principal Component Analysis

PEL

Peruvians from Lima

PPA

Pampean Region in Argentina

PTA

Patagonia Region in Argentina

SNP

Single Nucleotide Polymorphism

STF

Subtropical and Tropical Forests

YRI

Yoruba individuals in Ibadan, Nigeria

Data Availability

The data analyzed here comprises both newly generated and previously reported data sets. Access to publicly available datasets should be requested through the distribution channels indicated in each published study. Newly generated samples have been registered under study  EGAS00001004492 in the European Genome-Phenome Archive which contains both raw and processed individual genotype datasets with accession number EGAD00010001913 and EGAD00001006227, respectively.

Funding Statement

This project has been founded by FONCyT) - ANPCyT (grants PICT2014-1597 attributed to HD and PICT3655-2016 attributed to PL) and CONICET (grant PIP 0208/14 attributed to HD and VR). FONCyT: Fondo para la Investigación Científica. https://convocatoriasfoncyt.mincyt.gob.ar/ ANPCyT: Agencia Nacional de Promoción Científica y Tecnológica. https://www.argentina.gob.ar/ciencia/agencia CONICET: Consejo Nacional de Investigaciones Científicas y Técnicas. https://www.conicet.gov.ar/ Biocódices provided support in the form of part-time employment salaries for authors HD, JMB and JZ. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

References

  • 1.Bravi CM, Sans M, Bailliet G, Martínez-Marignac V, Portas M, Barreto I, et al. Characterization of mitochondrial DNA and Y-chromosome haplotypes in a Uruguayan population of African ancestry. Hum Biol. 1997;69: 641–652. [PubMed] [Google Scholar]
  • 2.Sala A, Penacino G, Corach D. Comparison of allele frequencies of eight STR loci from Argentinean Amerindian and European populations. Hum Biol. 1998;70: 937–947. [PubMed] [Google Scholar]
  • 3.Goicoechea A., Carnese FR, Dejean C, Avena SA, Weimer TA, Franco MHLP, et al. Genetic Relationships Between Amerindian Populations of Argentina. Am J Phys Anthropol. 2001;115: 133–143. 10.1002/ajpa.1063 [DOI] [PubMed] [Google Scholar]
  • 4.Dejean CB, Crouau-Roy B, Goicoechea AS, Avena SA, Carnese FR. Genetic variability in Amerindian populations of northern Argentina. Genet Mol Biol. 2004;27: 489–495. [Google Scholar]
  • 5.Fejerman L, Carnese F, Goicoechea A, Avena S, Dejean C, Ward R. African ancestry of the population of Buenos Aires. Am J Phys Anthropol. 2005;128: 164–70. 10.1002/ajpa.20083 [DOI] [PubMed] [Google Scholar]
  • 6.Wang S, Ray N, Rojas W, Parra M V., Bedoya G, Gallo C, et al. Geographic patterns of genome admixture in latin American mestizos. PLoS Genet. 2008;4: e1000037 10.1371/journal.pgen.1000037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bobillo MC, Zimmermann B, Sala A, Huber G, Röck A, Bandelt H-J, et al. Amerindian mitochondrial DNA haplogroups predominate in the population of Argentina: towards a first nationwide forensic mitochondrial DNA sequence database. Int J Legal Med. 2010;124: 263–268. 10.1007/s00414-009-0366-3 [DOI] [PubMed] [Google Scholar]
  • 8.Pauro M, Garcia A, Bravi CM, Demarchi DA. Distribucion de haplogrupos mitocondriales aloctonos en poblaciones rurales de Cordoba y San Luis. Rev Argentina Antropol Biol. 2010;12: 47–55. [Google Scholar]
  • 9.Pauro M, García A, Nores R, Demarchi D a. Analysis of uniparental lineages in two villages of Santiago Del Estero, Argentina, seat of Pueblos de Indios in colonial times. Hum Biol. 2013;85: 699–720. 10.3378/027.085.0504 [DOI] [PubMed] [Google Scholar]
  • 10.García A, Pauro M, Bailliet G, Bravi CM, Demarchi DA. Genetic variation in populations from central Argentina based on mitochondrial and Y chromosome DNA evidence. J Hum Genet. 2018; 63: 493–507. 10.1038/s10038-017-0406-7 [DOI] [PubMed] [Google Scholar]
  • 11.Martínez Marignac VL, Bertoni B, Parra EJ, Bianchi NO. Characterization of admixture in an urban sample from Buenos Aires, Argentina, using uniparentally and biparentally inherited genetic markers. Hum Biol. 2004;76: 543–57. 10.1353/hub.2004.0058 [DOI] [PubMed] [Google Scholar]
  • 12.Seldin MF, Tian C, Shigeta R, Scherbarth HR, Silva G, Belmont JW, et al. Argentine Population Genetic Structure: Large Variance in Amerindian Contribution. Am J Phys Anthropol. 2007;132: 455–462. 10.1002/ajpa.20534 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Corach D, Lao O, Bobillo C, van Der Gaag K, Zuniga S, Vermeulen M, et al. Inferring continental ancestry of argentineans from autosomal, Y-chromosomal and mitochondrial DNA. Ann Hum Genet. 2010;74: 65–76. 10.1111/j.1469-1809.2009.00556.x [DOI] [PubMed] [Google Scholar]
  • 14.Parolin ML, Toscanini UF, Velázquez CL, Berardi GL, Holley A, Tamburrini C, et al. Genetic admixture patterns in Argentinian Patagonia. PLoS One. 2019;14: e0214830 10.1371/journal.pone.0214830 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Avena S, Via M, Ziv E, Pérez-Stable EJ, Gignoux CR, Dejean C, et al. Heterogeneity in genetic admixture across different regions of argentina. PLoS One. 2012;7: e34695 10.1371/journal.pone.0034695 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.García A, Dermarchi DA, Tovo-Rodrigues L, Pauro M, Callegari-Jacques SM, Salzano FM, et al. High interpopulation homogeneity in Central Argentina as assessed by ancestry informative markers (AIMs). Genet Mol Biol. 2015;38: 324–331. 10.1590/S1415-475738320140260 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Salas A, Richards M, Lareu M-V, Scozzari R, Coppa A, Torroni A, et al. The African diaspora: mitochondrial DNA and the Atlantic slave trade. Am J Hum Genet. 2004;74: 454–465. 10.1086/382194 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.García A, Pauro M, Nores R, Bravi CM, Demarchi DA. Phylogeography of mitochondrial haplogroup D1: An early spread of subhaplogroup D1j from Central Argentina. Am J Phys Anthropol. 2012;149: 583–590. 10.1002/ajpa.22174 [DOI] [PubMed] [Google Scholar]
  • 19.Corach D, Figueira Risso L, Marino M, Penacino G, Sala A. Routine Y-STR typing in forensic casework. Forensic Sci Int. 2001;118: 131–135. 10.1016/s0379-0738(00)00483-7 [DOI] [PubMed] [Google Scholar]
  • 20.Salas A, Jaime JC, Álvarez-Iglesias V, Carracedo Á. Gender bias in the multiethnic genetic composition of central Argentina. J Hum Genet. 2008;53: 662–674. 10.1007/s10038-008-0297-8 [DOI] [PubMed] [Google Scholar]
  • 21.Martínez Sarasola C. Nuestros paisanos los indios. 9th ed Buenos Aires: Del Nuevo Extremo; 2011. [Google Scholar]
  • 22.Devoto F. Historia de la inmigración en la Argentina. 1a ed Buenos Aires: Editorial Sudamericana; 2003. [Google Scholar]
  • 23.Studer E. La trata de negros en el Río de la Plata durante el siglo XVIII. Buenos Aires: Instituto de Historia de Argentina “Doctor Emilio Ravignani”; 1958. [Google Scholar]
  • 24.Rodríguez Molas R. Los sometidos de la conquista. Centro Edi. Buenos Aires; 1985.
  • 25.Borucki A. The Slave Trade to the Río de la Plata, 1777–1812: Trans-Imperial Networks and Atlantic Warfare. Colon latin Am Rev. 2011;20: 81–107. [Google Scholar]
  • 26.Guzmán F. Africanos en la Argentina. Una reflexión desprevenida. Andes. 2006;17. [Google Scholar]
  • 27.Pauro M. Análisis molecular de linajes uniparentales en poblaciones humanas del centro de Argentina. Universidad de Córdoba; 2015. [Google Scholar]
  • 28.Di Fabio Rocca F. La presencia subsahariana en el acervo génico de poblaciones cosmopolitas de la Argentina. Universidad de Buenos Aires; 2016. [Google Scholar]
  • 29.Mandrini R. La Argentina aborigen. De los primeros pobladores a 1910. Siglo XXi. Buenos Aires; 2008.
  • 30.Lenton DI, Delrio WM, Pérez PMV, Papazian AER, Nagy MA, Musante M. Argentina’s Constituent Genocide: Challenging the Hegemonic National Narrative and Laying the Foundation for Reparations to Indigenous Peoples. Armen Rev. 2012;53: 63–84. [Google Scholar]
  • 31.Homburger JR, Moreno-Estrada A, Gignoux CR, Nelson D, Sanchez E, Ortiz-Tello P, et al. Genomic Insights into the Ancestry and Demographic History of South America. PLOS Genet. 2015;11: e1005602 10.1371/journal.pgen.1005602 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Muzzio M, Motti JMB, Sepulveda PBP, Yee M, Santos R, Ramallo V, et al. Population structure in Argentina. PLoS One. 2018;13: e0196325 10.1371/journal.pone.0196325 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Gouveia MH, Borda V, Leal TP, Moreira RG, Bergen AW, Kehdy FSG, et al. Origins, admixture dynamics and homogenization of the African gene pool in the Americas. Mol Biol Evol. 2020;Epub ahead: pii: msaa033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kehdy FSG, Gouveia MH, Machado M, Magalhães WCS, Horimoto AR. Origin and dynamics of admixture in Brazilians and its effect on the pattern of deleterious mutations. Proc Natl Acad Sci U S A. 2015;112: 8696–8701. 10.1073/pnas.1504447112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Posth C, Nakatsuka N, Lazaridis I, Fehren-schmitz L, Krause J, Reich D, et al. Reconstructing the Deep Population History of Central and South America Article Reconstructing the Deep Population History of Central and South America. Cell. 2018;175: 1185–1197. 10.1016/j.cell.2018.10.027 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Moreno-Mayar JV, Vinner L, Damgaard PDB, Fuente C De, Margaryan A, Orbegozo MI, et al. Early human dispersals within the Americas. Science. 2018;362: pii: eaav2621 10.1126/science.aav2621 [DOI] [PubMed] [Google Scholar]
  • 37.de la Fuente C, Ávila-arcos MC, Galimany J, Carpenter ML, Homburger JR. Genomic insights into the origin and diversification of late maritime hunter-gatherers from the Chilean Patagonia. Proc Natl Acad Sci U S A. 2018;115: E4006–E4012. 10.1073/pnas.1715688115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Reich D, Patterson N, Campbell D, Tandon A, Mazieres S, Ray N, et al. Reconstructing Native American population history. Nature. 2012;488: 370–4. 10.1038/nature11258 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Gnecchi-ruscone GA, Sarno S, Fanti S De, Gianvincenzo L, Giuliani C, Boattini A, et al. Dissecting the Pre-Columbian Genomic Ancestry of Native Americans along the Andes–Amazonia Divide. Mol Biol Evol. 2019;36: 1254–1269. 10.1093/molbev/msz066 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Gómez-Carballa A, Pardo-Seco, Brandini S, Achilli A, Perego UA, Coble MD, et al. The peopling of South America and the trans-Andean gene flow of the first settlers. Genome Res. 2018;28:767–779. 10.1101/gr.234674.118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2: 2074–2093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015;526: 68–74. 10.1038/nature15393 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19: 1655–64. 10.1101/gr.094052.109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Maples BK, Gravel S, Kenny EE, Bustamante CD. RFMix: A discriminative modeling approach for rapid and robust local-ancestry inference. Am J Hum Genet. 2013;93: 278–288. 10.1016/j.ajhg.2013.06.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Nelson MR, Bryc K, King KS, Indap A, Boyko AR, Novembre J, et al. The Population Reference Sample, POPRES: A Resource for Population, Disease, and Pharmacological Genetics Research. Am J Hum Genet. 2008;83: 347–358. 10.1016/j.ajhg.2008.08.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, et al. Genes mirror geography within Europe. Nature. 2008;456: 98–101. 10.1038/nature07331 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Gallero MC, Krautstofl EM. Proceso de poblamiento y migraciones en la Provincia de Misiones, Argentina (1881–1970). Avá Rev Antropol. 2010;16: 1881–1970. [Google Scholar]
  • 48.Patin E, Lopez M, Grollemund R, Verdu P, Harmant C, Quach H, et al. Dispersals and genetic adaptation of Bantu-speaking populations in Africa and North America. Science. 2017;546: 543–546. [DOI] [PubMed] [Google Scholar]
  • 49.Pagani L, Kivisild T, Tarekegn A, Ekong R, Plaster C, Romero IG, et al. Ethiopian Genetic Diversity Reveals Linguistic Stratification and Complex Influences on the Ethiopian Gene Pool. Am J Hum Genet. 2012;91: 83–96. 10.1016/j.ajhg.2012.05.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Schlebusch CM, Skoglund P, Sjödin P, Gattepaille LM, Hernandez D, Jay F, et al. Genomic Variation in Seven Khoe-San Groups Reveals Adaptation and Complex African History. Science. 2012;338: 374–9. 10.1126/science.1227721 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Moreno-Estrada A, Gravel S, Zakharia F, McCauley JL, Byrnes JK, Gignoux CR, et al. Reconstructing the Population Genetic History of the Caribbean. PLoS Genet. 2013;9: e1004023 10.1371/journal.pgen.1004023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Fortes-lima C, Gessain A, Ruiz-linares A, Restrepo BN, Rojas W, Migot-nabias F, et al. Genome-wide Ancestry and Demographic History of African-Descendant Maroon Communities from French Guiana and Suriname. Am J Hum Genet. 2017;101: 725–736. 10.1016/j.ajhg.2017.09.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Bryc K, Auton A, Nelson MR, Oksenberg JR, Hauser SL, Williams S, et al. Genome-wide patterns of population structure and admixture in West Africans and African Americans. Proc Natl Acad Sci U S A. 2010;107: 786–791. 10.1073/pnas.0909559107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Raghavan M, Steinrücken M, Harris K, Schiffels S, Rasmussen S, DeGiorgio M, et al. Genomic evidence for the Pleistocene and recent population history of Native Americans. Science. 2015;349: aab3884 10.1126/science.aab3884 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Lindo J, Haas R, Hofman C, Apata M, Moraga M, Verdugo RA, et al. The genetic prehistory of the Andean highlands 7000 years BP though European contact. Sci Adv. 2018;4: eauu4921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Motti JMB, Muzzio M, Ramallo V, Kladniew BR, Alfaro EL, Dipierri JE, et al. Origen Y Distribución Espacial De Linajes Maternos Nativos En El Noroeste Y Centro Oeste Argentinos. Rev Argentina Antropol Biol. 2013;15: 3–14. [Google Scholar]
  • 57.Motti J. Caracterización de Linajes Maternos en la Población Actual del Noroeste y Centro-Oeste Argentinos. Universidad Nacional de La Plata; 2012. [Google Scholar]
  • 58.Felsenstein J. PHYLIP-Phylogeny Inference Package (Ver. 3.2). Cladistics. 1989;5: 164–166. [Google Scholar]
  • 59.Motti JMB, Schwab ME, Beltramo J, Jurado-medina LS, Muzzio M, Ramallo V, et al. Regional differentiation of native American populations from the analysis of maternal lineages. Intersecc en Antropol. 2017;18: 271–282. [Google Scholar]
  • 60.Canals Frau S. Etnología de los Huarpes. Una síntesis Anales del Instituto de Etnología Americana. Anales del Instituto de Etnología Americana, Tomo VII; Mendoza; 1946. pp. 9–147. [Google Scholar]
  • 61.Escolar D. Los dones étnicos de la Nación Identidades huarpe y modos de producción de soberanía en Argentina. Prometeo E; Buenos Aires; 2007. [Google Scholar]
  • 62.Bengoa J. Historia de los antiguos mapuches del sur Desde antes de la llegada de los españoles hasta las paces de Quilín. Editorial; Santiago; 2003. [Google Scholar]
  • 63.Galdames S. ¿Detuvo la batalla del Maule la expansión inca hacia el sur de Chile? Cuardernos Hist. 2017;3: 7–25. [Google Scholar]
  • 64.Michieli CT. Antigua historia de Cuyo. Ansilta Ed; San Juan; 1994. [Google Scholar]
  • 65.Gascón M. Cuyo en el espacio imperial. La fase de configuración: 1580–1680. Rev Tefros. 2011;9: 1–20. [Google Scholar]
  • 66.Moreno-Mayar JV, Potter BA, Vinner L, Steinrücken M, Rasmussen S, Terhorst J, et al. Terminal Pleistocene Alaskan genome reveals first founding population of Native Americans. Nature. 2018;553: 203–207. 10.1038/nature25173 [DOI] [PubMed] [Google Scholar]
  • 67.Rasmussen M, Anzick SL, Waters MR, Skoglund P, Degiorgio M, Jr TWS, et al. The genome of a Late Pleistocene human from a Clovis burial site in western Montana. Nature. 2014;506: 225–229. 10.1038/nature13025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Michieli CT. La disolución de la categoría Jurídico-Social de “indio” en el siglo XVIII: El caso de San Juan (Región de Cuyo). Publicacio. Universidad Nacional de San Juan; 2000. [Google Scholar]
  • 69.Katzer L. El mestizaje como dispositivo biopolítico Biblos. Pueblos indígenas Interculturalidad, colonialidad, política. Biblos; Buenos Aires; 2009. pp. 59–7. [Google Scholar]
  • 70.Briones C. Construcciones de aboriginalidad en Argentina. Société suisse des Américanistes / Schweizerische Amerikanisten-Gesellschaft. 2004;Bulletin 6: 73–90. [Google Scholar]
  • 71.Mas-Sandoval A, Arauna LR, Gouveia MH, Barreto ML, Horta BL, Lima-costa MF, et al. Reconstructed Lost Native American Populations from Eastern Brazil Are Shaped by Differential Jê/Tupi Ancestry. Genome Biol Evol. 2019;11: 2593–2604. 10.1093/gbe/evz161 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Dopazo H, Llera AS, Berenstein M, Gonzáles-josé R. Genomas, enfermedades y medicina de precisión: un Proyecto Nacional. Ciencia, Tecnol y Política. 2019;2: 1–10. [Google Scholar]
  • 73.Team RC. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria: 2019. [Google Scholar]
  • 74.Wall L, Christiansen T, Orwant J. Programming perl. O’Reilly M; 2000. [Google Scholar]
  • 75.Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience. 2015;4: 1–16. 10.1186/2047-217X-4-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, Depristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27: 2156–2158. 10.1093/bioinformatics/btr330 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26: 841–842. 10.1093/bioinformatics/btq033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen W-M. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26: 2867–73. 10.1093/bioinformatics/btq559 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Delaneau O, Marchini J, Zagury J-F. A linear complexity phasing method for thousands of genomes. Nat Methods. 2011;9: 179–181. 10.1038/nmeth.1785 [DOI] [PubMed] [Google Scholar]
  • 80.Delaneau O, Marchini J, Zagury J. A linear complexity phasing method for thousands of genomes. Nat Methods. 2012;9: 179–181. [DOI] [PubMed] [Google Scholar]
  • 81.Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, et al. Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations. Am J Hum Genet. 2017;100: 635–649. 10.1016/j.ajhg.2017.03.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, et al. Ancient admixture in human history. Genetics. 2012;192: 1065–93. 10.1534/genetics.112.145037 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Francesc Calafell

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

16 Mar 2020

PONE-D-20-02648

Fine-scale genomic analyses of admixed individuals reveal unrecognized genetic ancestry components in Argentina

PLOS ONE

Dear Dr Luisi,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

We would appreciate receiving your revised manuscript by Apr 30 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Francesc Calafell

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements:

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.plosone.org/attachments/PLOSOne_formatting_sample_main_body.pdf and http://www.plosone.org/attachments/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

3. Thank you for stating the following in the Competing Interests section:

I have read the journal's policy and the authors of this manuscript have the following competing interests:

PL provides consulting services to myDNAmap S.A.

JMB is employed by Biocódices S.A.

HD is the scientific director of Biocódices S.A.

We note that one or more of the authors are employed by a commercial company: Biocodices S.A. and myDNAmap S.A.

1.     Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form.

Please also include the following statement within your amended Funding Statement.

“The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”

If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement.

2. Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc.  

Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to  PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and  there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

4. We note that Figures 1, 2, 4 and S23 in your submission contain [map/satellite] images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

1.    You may seek permission from the original copyright holder of Figures 1, 2, 4 and S23 to publish the content specifically under the CC BY 4.0 license. 

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

2.    If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript by Luisi et al is an important addition to our growing knowledge about the global and subcontinental ancestry of South American populations, focusing on Argentina. They used several up-to date methods to perform Population Genetics analyses and, we need to recognize the great effort compiling several publicly available genomic data from Native Americans, Africans and Europeans to be used as parental reference to their new genome-wide data of 87 individuals from Argentina. However, I have one major and some specific concerns, which should be addressed before publishing the manuscript in PLOS ONE:

MAJOR: The manuscript needs to be synthesized and more focused on the important results and on what the results mean rather than so many descriptions. The authors have very interesting results, but the reader gets distracted across the reading. Several descriptions of results and methodology could go to the Supplementary Material or Methods section.

Specific concerns and suggestions:

1) In the abstract, “then compared to different reference panels specifically built to run population structure analyses at a sub-continental level”, the word “run” should be changed to “perform” or other word of similar meaning.

2) In the Introduction, the authors gave a very good overview about previous studies using uniparental markers, but they should recognize recent important papers that used genome-wide data to study population structure in Central America (Moreno-Estrada et al 2013) and in Brazil (Kehdy et al. 2015). Specifically, about the African genetic components, there are some previous major studies that should be recognized (i.e. Mathias et al. 2016 and Gouveia et al. 2020). Also, did you look up at the African Voyages database (https://www.slavevoyages.org/) to see if they have information about the slave trade to Argentina?

3) At the beginning of “Results and Discussion”. It would help to have a subsection title “Studied Populations”

4) I think the authors should consider having one main figure describing the global population structure of Argentina (probably Figure 1). I know the authors focused on subcontinental ancestry; However, its important to have a sense of global population structure in Argentina without the need of access the supplementary material. For these global population structure, we have a clear correlation between PCA and ADMIXTURE analyses that should be integrated.

5) In the results, the authors constantly described the Peruvians and Chilean admixed samples before the results for Argentina, which distracted about the focus (Argentina results), which should be first.

6) The authors refer to the first main figure with ancestry results (Fig. 2) only in the middle of the manuscript. Again, the authors should begin with the important findings to avoid distracting the reader with descriptions.

7) It is not clear why the authors focused only on the Native American subcontinental components to perform F3 analyses. Also, the authors should highlight the importance of African and European subcontinental Ancestry in Argentina. Despite it agrees previous studies using uniparental data, this is the first time it is showed using genome-wide data, which we know that produces better estimates of ancestry proportions and subcontinental substructure when compared with uniparental data.

Reviewer #2: GENERAL COMMENTS

Luisi et al. have genotyped 94 DNA samples from various sites across Argentina using Axiom SNP arrays and integrated publicly available data from other studies to describe genetic affinities with modern and ancient South American populations, as well as other continental sources within Europe and Africa.

It is an important piece of work given the underrepresentation of South American populations in global genomic studies and for Argentina represents a pioneering effort towards a genomic characterization of its population. It has its limitations, mostly derived from the small sample size per subpopulation, but this seems to be compensated by re-analyzing previously published data from overlapping and surrounding populations in the region. This allowed them to assembly larger groups of samples per geographic region and to identify four major ancestry components in Argentina, one of them previously undescribed and putatively restricted to the Centralwest of Argentina (CWA).

In general it is methodologically sound and follows standard practices for the analysis of admixed genomes. My major comments will be oriented towards giving feedback about the general flow of the text with the intention of improving readability. There are many typos and minor (but important) grammatical errors throughout the text, so I suggest having a copyeditor to revise and polish the manuscript.

GENERAL COMMENTS

1. I found the abstract to be vague, long, and not appealing. It evokes claims of potential novelty without being clear about what exactly is new. If the novel finding is the identification of the CWA component of Native American origin, perhaps it should be highlighted more and earlier in the text.

2. The introduction is too long and overreaching. It has an ambitious scope but at the same time seems disproportionate when then the methodology is limited by the inferences from 87 individuals. I understand the motivation of being comprehensive in the introduction of concepts but perhaps the message can be more effectively conveyed by addressing more directly those elements that are missing in the literature and that these particular analyses will be contributing to.

3. What is the sampling scheme to enroll the 94 individuals in the study and to which extent it can be assumed their lineages are representative of the geographic regions they belong to? The family background is discussed further for a couple of samples that the authors acknowledge are possible cases of recent migration events within regions in Argentina, but I didn't easily find an overall description of the sampling criteria other than taking mt/Y haplogroups into account. This can be helpful to describe in an analogous way the POPRES samples are described for Europe.

4. The authors point to a couple of historical facts that could explain the affinity of Santiago and Cuyo samples to a novel component from Central Western Argentina. Although the historical accounts seem plausible, it will be better if the authors could formally test the specificity of such novel component, or at least suggest further analyses to do so. This is important given that many of the samples driving the signal of the Western Argentinean component are Chilean individuals from Santiago. Could this be reflecting a Huilliche-related component or any other unsampled population from the central southern cone?

5. Table S02 is very helpful and well organized. It's appreciated.

6. Fig S2: By excluding the Central and Northern Amerind samples from the unsupervised Admixture it is unclear whether the pink-salmon component assigned to CLM and MXL is the same as the one defined by the Chibchan-Paezan and Equatorial-Tucanoan groups, or if this model is missing such component. It would be helpful if the authors can clarify.

7. Fig 1. If samples are already grouped and colored by administrative region, what is the purpose of showing some sampling locations in different color shades within a given administrative region?

8. Fig 4. Consider inverting MDS values along the 1st dimension to better match the sampling locations in the map with the population clusters in MDS space. I think this will help interpretation of the figure.

9. Fig 4. Is the Lagoa Santa sample (Sumidouro) missing from the map in Fig4A?

SPECIFIC COMMENTS

p4. line 107: I think demography is not necessarily a "mechanism" in the context of shaping genetic diversity. I would suggest considering demographic "processes" instead.

p4. line 115: These kind of statements are just not informative - "We found that African origins in Argentina trace back from different regions". I would suggest rephrasing or removing.

p5. line 128: what do they mean by "broad scales" in South America, and thus what is the new "fine-scale" contributed by this paper? even if succinct, it should be stated upfront and spelled out so that the reader knows what is this about.

p5. line 129: better to say "we present *genomic* analyses"...

p5. line 131: typo, should say: limited number *of* markers...

p5. line 139: typo, should say: to *give* a general overview... (instead of "gives")

p7. line 178: the "16th century" and "the arrival of conquerors" seem to be two disconnected elements in this phrase: "has changed drastically from the 16th century, and the arrival of the first conquerors"...

are they meant to be disconnected or did the authors mean to say: "from the arrival of the first conquerors in the 16th century, until the 20th century"... ?

p8. line 190 & 191: there is no transition between these two sections. The topic changes abruptly from historical facts to giving examples of some of the recent genomic studies in the region. It would be better to walk the reader through and convey the message as to why such historical facts may have left a genetic signature in the modern Argentine population

p8. line 192: markers are not "through" the genome. I recommend finding a better wording in "thousands of autosomal markers through the genome for modern Argentine individuals"...

p8. line 206: to which novel component are the authors referring to in "described a Native American component not represented in the 1000 Genomes project"?

p8. line 209: it is not clear what the reader should understand by "achieving a fine-scale knowledge". What exactly is missing? and thus which gap is this paper helping to fill? I think those are key concepts the reader should be given upfront and clearly.

p12. line 294: I don't think the expected pattern is to be "striking" so there is no need to say so.

p13. line 308: it will be clearer if the authors can refer to the color used to identify each of the three components of Native American ancestry while describing Figure S2G (besides the CCP, CAN, and LWL acronyms).

p13. line 311. The authors might be quite familiar with the term Lowlands and the populations included in such groups but the reader might not. It will be clearer to spell this out the first time the term is introduced as well as in Fig S2. Is it supposed to include the Amazonian reference populations? the Argentinean too? and again, is this the same component observed in CLM and MXL? if that's the case it can be confusing to think about it as a "lowland-specific" component.

p14. line 327: I think it is expected that most Argentinean samples are assigned to all three Native American components if all the reference samples are from those sources. Local Native American samples from Argentina will be needed to determine whether that's the case or additional unsampled source populations are missing.

p15. line 356: by how much are the African and Native American ancestries underestimated? For clarity, it would be helpful to put some numbers to such comparison in the main text.

p16. line 376: it will be clearer if the authors can identify by color the "new component" observed in Iberians. Also, wouldn't the authors expect the Argentinean samples to exhibit higher proportions of the southern European component predominant in PROPES Italian samples (from Fig. S7C) given the great wave of European immigration from Italy in the 19th and 20th centuries?

p17. line 393: the proportion of African ancestry is rather low in the Argentinean dataset. Can the authors elaborate more on how this can compromise the resolution and reliability of their MDS analysis? did they set a minimum threshold of African ancestry to be included in the analysis?

p18. line 419: the legend of Fig 3 is extremely poor and a minimum of methods description should be given, together with some major highlights for the reader to focus on. Figure legends in general have very few details.

p20. line 468: check the use of his/her in the description of the Puerto Madryn individual's pedigree.

p21. line 495: I think it should say "By increasing the number of..."

p22. line 506: 32 individuals from where? One has to dive into Fig S15 to figure it out and if I'm not mistaken this 4th cluster is mostly accounted for CYA samples from Calingasta and Chilean samples from Santiago. The reader will appreciate if this is stated from the out set in the main text to ease reading flow. Also, is there any connection between the aDNA component detected at K=5 and the fourth cluster of ancestry represented by the 32 modern south American individuals? probably not, but given the current text flow they seem to be related...

p23. line 532: why not refer to the fourth component as CWA in "(iii) the fourth component and CCP exhibit higher genetic affinity between them"... and so on? I know the therm CWA is later introduced as a nomenclature suggestion for the new component, but the reading is just not easy if it takes too long to reveal it. Specially when the text is already referring to figures (and supplementary plots) where CWA is already labeled.

p23. line 552: what is the evidence for the maternal lineages of the Calingasta individuals being region-specific? presumably the authors are referring to mtDNA haplogroups restricted to or predominant in the Cuyo region?

p23. line 556: consider saying "in the Cuyo region".

p24. line 562: should read "analyses accounting *for* the genetic relationship..."

p24. line 574: important typo, should not say "pre-Colombian" but "pre-Columbian".

p25. line 578: did they mean "out of reach the expansion of"... ?

p25. line 586: consider this alternative phrasing - "the identification of an undescribed Native American component not previously reported from autosomal markers". I think that "never described" is unnecessarily categorical.

p25. line 589: consider saying "underrepresentation of *diverse* regions in Argentina" instead of "many".

p25. line 590: consider removing "specific". To say "Native American genetic diversity" is specific enough.

p27. line 621: while the authors rightly point out the underrepresentation of the LWL component in ancient studies, isn't it counterintuitive that the few ancient samples from Brazil do not show much affinity with the modern samples while all previous analyses do show Argentinean NEA and PPA samples clustering with LWL reference populations, stemming mostly from Amazonian indigenous groups across Brazil. It is noted that the authors argue for the extended spread of the LWL component across the region and the rather restricted geographic coverage of the ancient Brazilian samples, but it would be nice to see the authors elaborating a bit more on possible demographic scenarios underlying the lack of contribution of these ancient samples into modern populations assigned to the LWL component.

p27. line 633: couple of typos, it should say "where X is one *of* the four Native American *components*" (plural)

p27. line 634: The f3 vs age correlations are an interesting addition to the analyses. However, I didn't see a high level statement summarizing what's the main finding other than just saying that there are some correlations even after correcting for geography, etc. I think it will be helpful to offer the interpretation even if the authors feel is too obvious from the plots. It can be a simple phrase along lines like "We observed a significant relationship between X and Y (P = X), where the older the ancient sample the lower the shared drift with the modern components" or "where more recent ancient samples tend to show higher affinities with the modern components" (if that's the case).

Also, can the authors rule out that such correlations are not a function of DNA preservation and/or amount of good quality data from each of the ancient samples? One could think that the older samples show lower f3 values simply due to lower proportions of available data across the genome. Can the authors elaborate on that possibility? What is the missing data per aDNA sample in each of these correlations?

p28. line 658: in the figure legend I think you want to say "the symbol inside the square represents..." (instead of "the point within the square"). Also, avoid "the point code" and consider instead: "the legend for plotted symbols of ancient samples...".

p30. line 707: I get what the authors want to say, but such subtitle is just not reflecting the idea and it has poor grammatical flow. Please consider rewording.

p31. line 724: another typo, instead of "three Andes ancient groups" it should say "three *Andean* ancient groups". Also there should be a conjunction (maybe "of") between events and geographical expansions.

p32. line 742: I would suggest expanding the first sentence to something like "We studied genetic ancestry at the sub-continental scale in Argentinean individuals in the context of other South American populations".

p32. line 753: consider removing the word "from" three times

p33. line 768: It is not clear what the authors want to say with "These groups exhibit within population structure, and gene flow are most likely to have occurred among them after divergence." Please copyedit and consider rephrasing.

p33. line 771: should be "limit" instead of "limits".

p33. line 780: should be "pre-Columbian" instead of "pre-Colombian".

p34. line 790: consider this alternative phrasing:

This study is a joint effort of Argentinean institutions funded by the national scientific system, and represents the first milestone of the Consorcio Poblar, a national consortium for creating a public reference biobank to support biomedical genomic research in the Republic of Argentina [75]. Genomic knowledge of local populations should be a priority for developing countries to achieve an unbiased representation of diversity in public databases and the scientific development at a global scale.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Andres Moreno-Estrada

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Jul 16;15(7):e0233808. doi: 10.1371/journal.pone.0233808.r002

Author response to Decision Letter 0


7 May 2020

5. Review Comments to the Author

Reviewer #1: The manuscript by Luisi et al is an important addition to our growing knowledge about the global and subcontinental ancestry of South American populations, focusing on Argentina. They used several up-to date methods to perform Population Genetics analyses and, we need to recognize the great effort compiling several publicly available genomic data from Native Americans, Africans and Europeans to be used as parental reference to their new genome-wide data of 87 individuals from Argentina. However, I have one major and some specific concerns, which should be addressed before publishing the manuscript in PLOS ONE:

MAJOR: The manuscript needs to be synthesized and more focused on the important results and on what the results mean rather than so many descriptions. The authors have very interesting results, but the reader gets distracted across the reading. Several descriptions of results and methodology could go to the Supplementary Material or Methods section.

We understand this concern and addressed it. We reduced the descriptions of the results and some sanity check procedures have been moved to the Material and Methods section. The Results and Discussion section has been reduced ~1200 words (~20%) without losing its main content. We think that the Results/Discussion section is now much clearer and easier to follow for the reader. Thanks for the suggestions.

Specific concerns and suggestions:

1) In the abstract, “then compared to different reference panels specifically built to run population structure analyses at a sub-continental level”, the word “run” should be changed to “perform” or other word of similar meaning.

We now use “perform”.

2) In the Introduction, the authors gave a very good overview about previous studies using uniparental markers, but they should recognize recent important papers that used genome-wide data to study population structure in Central America (Moreno-Estrada et al 2013) and in Brazil (Kehdy et al. 2015). Specifically, about the African genetic components, there are some previous major studies that should be recognized (i.e. Mathias et al. 2016 and Gouveia et al. 2020).

We now included citations to Khedy et al. 2015 and Gouveia et al. 2020.

First, in the introduction (Line 171): “A previous study showed that Western and Central Western African ancestries are common across the Americas, particularly in Northern latitudes, while the influence of South/Eastern African ancestry is greater in South America [33]. In another study in Brazil, two African ancestries have been observed: a Western African one and another associated with Central East African and Bantu populations, the latter being more present in the Southeastern and Southern regions [34]”.

And then in Results and Discussion (Line 370): “In addition, the presence of Southeastern Africa maternal lineages in Argentina [27,28] is consistent with African ancestry of this origin identified in previous studies in other South America countries [33,34], and with the Eastern African ancestry identified here.”

We decided not to cite Mathias et al. 2016 since we do not find that their results are of direct interest for the scope of our study.

In order to keep the introduction focused on the subject of this article, we preferred not to cite literature concerning Central American. Note that Moreno-Estrada et al 2013 is cited later during the discussion (Line 362).

Also, did you look up at the African Voyages database (https://www.slavevoyages.org/) to see if they have information about the slave trade to Argentina?

Thanks for recommending this very interesting database. We consulted it and we could not find particular information about the origins of the slaves in Argentina specifically.

3) At the beginning of “Results and Discussion”. It would help to have a subsection title “Studied Populations”

Thanks for the suggestion. The new version includes such sub-section.

4) I think the authors should consider having one main figure describing the global population structure of Argentina (probably Figure 1). I know the authors focused on subcontinental ancestry; However, its important to have a sense of global population structure in Argentina without the need of access the supplementary material.

Agreed. We now included Admixture analysis (with K=8) as Main Fig 2. This allows presenting earlier in the manuscript not only the three main continental ancestries but also of the main sub-continental ancestries discussed later.

For these global population structure, we have a clear correlation between PCA and ADMIXTURE analyses that should be integrated.

We included an additional Supplementary Figure for that and a mention in the main text (Line 263): “Moreover, the eigenvectors from PCA and the ancestry proportion estimates with Admixture are well correlated (S4 Fig).“

5) In the results, the authors constantly described the Peruvians and Chilean admixed samples before the results for Argentina, which distracted about the focus (Argentina results), which should be first.

Thanks. In the new version, we describe Argentinean samples first.

6) The authors refer to the first main figure with ancestry results (Fig. 2) only in the middle of the manuscript. Again, the authors should begin with the important findings to avoid distracting the reader with descriptions.

Agreed. This point has been addressed in Comment #4 from the same Referee.

7) It is not clear why the authors focused only on the Native American subcontinental components to perform F3 analyses.

If the referee refers to using F-statistics to compare groups of Admixed individuals to reference groups of African or European individuals (either modern or ancient):

F-statistics analyses at the group level we performed for the Native American component aimed at addressing the question of a putative presence of an unrecognized Native American component that would explain why many individuals exhibit (i) mid proportion estimates for the three Native American ancestries represented in the reference panel, and (ii) intermediate positions in the PCA graph. Such question is not relevant for European and African ancestries.

If the referee refers to f3(YRI; Individual1, Individual2) as shown in S15 Fig to estimate genetic affinity between two individuals: In Europe and Africa, we did not used f3 to build inter-individual distances, and summarize the distance matrix with MDS as we did for America.

First an MDS based on 1-f3(YRI; Individual1, Individual2) distances in Europa does not perform as well as. Indeed, the MDS from f3-based distances does not allow catching in detail the geographical cline in the European Reference Panel (see Figure below).

In Africa, we did not intent to run f3. The very low proportions of the genome with African ancestry in Admixed individuals imply a very reduced number of overlapping SNPs in masked data for each pair of Admixed individuals (few hundreds or thousands SNPs). Such reduced number of SNPs does not allow estimating informative f3 statistics.

Moreover, we want to stress here that we did use f3 for pairwise individual comparisons in America only for Cluster assignation and not to describe the Native American ancestry composition in Argentina in the Main Text. For that purpose, we only relied on the results from Admixture and PCA. We wished to have the most robust assignation to a Native American Group as possible, leveraging different inter-individual distances. Since the results obtained with f3-based distances matched very well Admixture and PCA results, we decided to keep it for that purpose only. We thus could draw more robust boundaries among the different Native American Groups by removing few individuals with inconsistent assignation across the clustering method applied to three kinds of inter-individual distance matrix.

Also, the authors should highlight the importance of African and European subcontinental Ancestry in Argentina. Despite it agrees previous studies using uniparental data, this is the first time it is showed using genome-wide data, which we know that produces better estimates of ancestry proportions and subcontinental substructure when compared with uniparental data.

We agree that the results for European and African ancestry were a bit despised. We now put the PCA results for Europe as Main Figure. We also give more importance to the findings about European and African ancestries.

The last sentence of the Abstract (Line 95): “As for the European and African ancestries, we confirmed previous results about origins from Southern Europe, Western and Central Western Africa, and we provide evidences for the presence of Northern European and Eastern African ancestries.”

In Author summary (Line 106): “We confirmed that most of the European genetic ancestry comes from the South, although several individuals are related to Northern Europeans. We confirmed that the African origins in Argentina mainly trace back from Western and Central/Western regions, and we document some proportion of Eastern African origins poorly described before.”

In Conclusion (Line 623): “Our work shows that studying more admixed individuals, with a particular focus on extending the geographical coverage of the Argentinean territory, would help to identify the genetic legacy from secondary migration streams, and thus to get a better representation of the complex origins of African and European ancestries in the country.”

Reviewer #2: GENERAL COMMENTS

Luisi et al. have genotyped 94 DNA samples from various sites across Argentina using Axiom SNP arrays and integrated publicly available data from other studies to describe genetic affinities with modern and ancient South American populations, as well as other continental sources within Europe and Africa.

It is an important piece of work given the underrepresentation of South American populations in global genomic studies and for Argentina represents a pioneering effort towards a genomic characterization of its population. It has its limitations, mostly derived from the small sample size per subpopulation, but this seems to be compensated by re-analyzing previously published data from overlapping and surrounding populations in the region. This allowed them to assembly larger groups of samples per geographic region and to identify four major ancestry components in Argentina, one of them previously undescribed and putatively restricted to the Centralwest of Argentina (CWA).

In general it is methodologically sound and follows standard practices for the analysis of admixed genomes. My major comments will be oriented towards giving feedback about the general flow of the text with the intention of improving readability. There are many typos and minor (but important) grammatical errors throughout the text, so I suggest having a copyeditor to revise and polish the manuscript.

First of all, we apologize for the typos. Second, we are very grateful to the referee who clearly made an important effort to help us to improve the writing.

We now asked a person speaking English fluently to perform a proofreading of the manuscript.

GENERAL COMMENTS

1. I found the abstract to be vague, long, and not appealing. It evokes claims of potential novelty without being clear about what exactly is new. If the novel finding is the identification of the CWA component of Native American origin, perhaps it should be highlighted more and earlier in the text.

We performed many changes in the abstract based on the Referee suggestions. We explicitly mention the novelty of the Northern European and Eastern African ancestries. We also mention the CWA component earlier in the abstract. Finally, we also reduced the number of words (by ~1/3) so it focuses more on the results and it appeals more potential readers.

2. The introduction is too long and overreaching. It has an ambitious scope but at the same time seems disproportionate when then the methodology is limited by the inferences from 87 individuals. I understand the motivation of being comprehensive in the introduction of concepts but perhaps the message can be more effectively conveyed by addressing more directly those elements that are missing in the literature and that these particular analyses will be contributing to.

Agreed. We drastically reduced the introduction (by a third) so it focuses more on the scope of the study.

3. What is the sampling scheme to enroll the 94 individuals in the study and to which extent it can be assumed their lineages are representative of the geographic regions they belong to? The family background is discussed further for a couple of samples that the authors acknowledge are possible cases of recent migration events within regions in Argentina, but I didn't easily find an overall description of the sampling criteria other than taking mt/Y haplogroups into account. This can be helpful to describe in an analogous way the POPRES samples are described for Europe.

We did not consider describing the collections as in POPRES. Indeed, the samples were collected in 12 different sampling campaigns and describing each would not be particularly interesting, particularly because the data we present here will not probably be used as a reference panel like POPRES. However, we extended the description of the samples in Material and Methods, so the scheme to select the samples is more explicit.

Line 677: “We genotyped 94 individuals with the Axiom LAT1 array (Affymetrix) from 24 localities and 17 provinces across Argentina (Fig 1). These samples were selected among 240 collected by different population genetics groups (Consorcio PoblAR) during past sampling campaigns with a biological anthropology focus. According to the available information (e.g. interviews, genealogical information, etc.), each PoblAR research group selected for this study some samples, maximizing the odds that they come from individuals with greater Native American ancestry. For example, surnames were used as a proxy to achieve this objective and the permanence of ancestors in national territory has been another variable that was taken into account. The analyzed 94 samples were also selected to ensure an extended geographical range and were included when they presented sufficient DNA concentration and Native American maternal lineage. Moreover, among the males, we prioritized those with Native American paternal lineage.”

4. The authors point to a couple of historical facts that could explain the affinity of Santiago and Cuyo samples to a novel component from Central Western Argentina. Although the historical accounts seem plausible, it will be better if the authors could formally test the specificity of such novel component, or at least suggest further analyses to do so. This is important given that many of the samples driving the signal of the Western Argentinean component are Chilean individuals from Santiago. Could this be reflecting a Huilliche-related component or any other unsampled population from the central southern cone?

We agree that it is important to have the most extensive possible body of evidence concerning this new component. We added in Main Fig 7 (Relationship among the four Native American groups identified), the results for pairwise FST among groups. The important genetic differentiation between CWA and CCP is an extra evidence for the specificity of this novel component. We also demonstrate that removing Santiago individuals from CWA does not alter at all our results (S23 Fig). Moreover, as it is now more clearly stated in the revised manuscript, “all the Huilliche and Pehuenche individuals from Central Chile [37] have been consistently assigned to CCP” (Line 448). Altogether, we argue that CWA is not Huilliche-related despite the tight relationship between CWA and CCP since their early divergence, as suggested by the f-statistics we present in the manuscript.

We agree that CWA is a tentative name for this novel component, and further analyses are needed to confirm it. We stress this point in Conclusions (Line 653): “Further efforts are needed to better characterize the Native American ancestry component identified in the Central Western region of Argentina. Particularly, we encourage future studies to confirm the tentative geographical label that we suggest here, and to estimate its influence in the region.”

However, since there are very few Native American communities in the central region, addressing this question is not straightforward. We think that more ancient DNA samples from this region would help. This is now stressed more clearly in Results and Discussion (Line 613): “The archaeological record for which genetic data has been generated misrepresents CWA since its early divergence with CCP, as well as the common ancestors specific to these two components.”

5. Table S02 is very helpful and well organized. It's appreciated.

6. Fig S2: By excluding the Central and Northern Amerind samples from the unsupervised Admixture it is unclear whether the pink-salmon component assigned to CLM and MXL is the same as the one defined by the Chibchan-Paezan and Equatorial-Tucanoan groups, or if this model is missing such component. It would be helpful if the authors can clarify.

To answer this question, first we find necessary to clarify that we previously called the pink component Lowlands but we decided to change this label to Subtropical and Tropical Forests (STF). We discuss this point below addressing the referee comment concerning p13. line 31.

We think that the use of this new label helps to dissipate the lack of clarity stressed by the referee. In our clustering approach for classifying individuals according to their main Native American Ancestry, we observe that the STF cluster encompasses populations not only from Gran Chaco and Amazonian populations but also from the North Andes (sampled almost exclusively in the Colombian territory). See the Native American groups plotted in pink in the South American map (Main Fig 5). Therefore, we think it is not surprising that CLM individuals exhibit a high proportion of the pink (STF) ancestry. However, we agree that it is a caveat to not explicitly discuss this point when presenting the Admixture results. We now discuss this (Line 274): “Such mixture pattern is not observed in other South American countries. Indeed, the Native American ancestry for Peruvian, Chilean and Colombian admixed samples is mainly represented by CAN, CCP and STF, respectively. This is consistent with the geographical area where the admixed individuals have been sampled, and the genetic ancestry of the indigenous communities from each country.”

Concerning MXL, we interpret that the high proportions of the pink ancestry could be due to back migrations into Central America previously described. We mention this possible scenario in the Introduction (Line 191): “In another genomic study of modern samples (in which the Southern Cone is only represented by the Gran Chaco region), it has been found that all non-Andean South American populations are likely to share a common lineage, while they are unlikely to share with the Andeans any common ancestor from Central America [39], supporting the hypothesis of many back migrations to Central America from non-Andean South American populations [39].” However, understanding past population movements between South America and Central America is still an open question. We would rather not distract the reader including such considerations in our manuscript for two reasons: (i) the relationship with Central America concerns mostly the Northern part of South America, and our work focuses on the Southern Cone, (ii) to explain the genetic affinity between non-Andean South American populations and Central American populations, the “back-migration into Central America” scenario is preferred with the current body of evidences. Altogether, we interpret that recent genetic influence of Central American populations on Argentina populations would be reduced.

Therefore, not having to address those questions that are beyond the scope of our study, we decided to now remove MXL (and PUR) individuals, and thus not to include Central and North indigenous communities, from our analyses.

Summarizing our answer to this interesting point stressed by the referee: (i) we removed MXL and PUR from the analyses, (ii) we added a short text to discuss the pink ancestry in CLM, and (iii) we changed the label of this pink ancestry to Subtropical and Tropical Forests.

7. Fig 1. If samples are already grouped and colored by administrative region, what is the purpose of showing some sampling locations in different color shades within a given administrative region?

We think it is clearer to use broad groups of sampling location in our graphical representations. However, within a region, there are different provinces (internal lines in the map), which are actually the real political divisions in Argentina. Therefore, we used a color shade for each province, with similar color shades within the same region. We think that this is the most informative way to refer to the samples. Moreover, there is a technical limitation behind this decision: we wanted to use filled points for Argentinean samples so they are clearly highlighted in the plots, but in R there are only up to 5 different filled point types, that is less than the number of described locations in several regions.

8. Fig 4. Consider inverting MDS values along the 1st dimension to better match the sampling locations in the map with the population clusters in MDS space. I think this will help interpretation of the figure.

Thanks. We changed the MDS graphic for America (now Main Fig. 5).

9. Fig 4. Is the Lagoa Santa sample (Sumidouro) missing from the map in Fig4A?

Sumidouro was included in our analyses (4 out of 5 samples passed missing genotypes filtering): all the points that appear in the map and in the MDS are given in the legend insets. We acknowledge that it was really difficult to differentiate between overlapping points on the map, and we generated a new map avoiding this issue (now Main Fig. 5).

SPECIFIC COMMENTS

p4. line 107: I think demography is not necessarily a "mechanism" in the context of shaping genetic diversity. I would suggest considering demographic "processes" instead.

This sentence has been removed

p4. line 115: These kind of statements are just not informative - "We found that African origins in Argentina trace back from different regions". I would suggest rephrasing or removing.

We agree with the referee and we reformulated this.

p5. line 128: what do they mean by "broad scales" in South America, and thus what is the new "fine-scale" contributed by this paper? even if succinct, it should be stated upfront and spelled out so that the reader knows what is this about.

This sentence had been removed and there is no reference to “broad-scale” in the article, what was indeed confusing. Now in the introduction, we think that it is clear that “fine-scale analyses” refer to the analysis of sub-continental level origins.

p5. line 129: better to say "we present *genomic* analyses"...

Thanks

p5. line 131: typo, should say: limited number *of* markers...

Typo has been fixed

p5. line 139: typo, should say: to *give* a general overview... (instead of "gives")

Typo has been fixed

p7. line 178: the "16th century" and "the arrival of conquerors" seem to be two disconnected elements in this phrase: "has changed drastically from the 16th century, and the arrival of the first conquerors"...

are they meant to be disconnected or did the authors mean to say: "from the arrival of the first conquerors in the 16th century, until the 20th century"... ?

We reformulated this sentence (Line 146): “As for the Native American component, it is difficult to study its origin focusing on present-day communities since their organization has changed drastically after the arrival of the first conquerors in the 16th century [21].”

p8. line 190 & 191: there is no transition between these two sections. The topic changes abruptly from historical facts to giving examples of some of the recent genomic studies in the region. It would be better to walk the reader through and convey the message as to why such historical facts may have left a genetic signature in the modern Argentine population

We now formulated a transition between the two sections (Line 155): “Due to the specificity of the Argentinean demographic history, a remaining challenge is to unravel which populations from each continent contributed to the genetic pool in nowadays Argentinean populations leveraging genotype data for hundreds of thousands of autosomal markers from the whole genome.”

p8. line 192: markers are not "through" the genome. I recommend finding a better wording in "thousands of autosomal markers through the genome for modern Argentine individuals"...

We reformulated the sentence.

p8. line 206: to which novel component are the authors referring to in "described a Native American component not represented in the 1000 Genomes project"?

We removed the reference to this result in the introduction since it was uninformative at this point.

p8. line 209: it is not clear what the reader should understand by "achieving a fine-scale knowledge". What exactly is missing? and thus which gap is this paper helping to fill? I think those are key concepts the reader should be given upfront and clearly.

We reformulated the sentence.

p12. line 294: I don't think the expected pattern is to be "striking" so there is no need to say so.

We removed “striking”.

p13. line 308: it will be clearer if the authors can refer to the color used to identify each of the three components of Native American ancestry while describing Figure S2G (besides the CCP, CAN, and LWL acronyms).

We added it, not only for this section of the text, but also for every description of Admixture plots.

p13. line 311. The authors might be quite familiar with the term Lowlands and the populations included in such groups but the reader might not. It will be clearer to spell this out the first time the term is introduced as well as in Fig S2. Is it supposed to include the Amazonian reference populations? the Argentinean too? and again, is this the same component observed in CLM and MXL? if that's the case it can be confusing to think about it as a "lowland-specific" component.

Before the first submission, we had extensive discussions about the label we could attribute to the pink component. It was not an easy decision due to the wide geographical range of the reference Native American individuals it encompasses: Gran Chaco, Amazonian and North Andes. The later is actually not Lowlands, but at the time of the first submission, we thought that Lowlands was the best option among the ones we could think about. After the referee comment, we discussed again this point and the alternative name “Subtropical and Tropical Forests” has been suggested. We think it is the best name we can assign to this pink component since it matches perfectly the geographical range of the indigenous communities in which this component prevails, what avoids potential confusion.

p14. line 327: I think it is expected that most Argentinean samples are assigned to all three Native American components if all the reference samples are from those sources. Local Native American samples from Argentina will be needed to determine whether that's the case or additional unsampled source populations are missing.

For clarity of the manuscript, we think it is better not discuss in details the ancestry composition at the sub-continental level when presenting the results in a worldwide context. This is why we do not mention the possibility of an unsampled source population.

We now give more emphasis on formulating the two competing scenarios to explain this pattern in the section relative to Native American ancestry (Line 415) “Many individuals from the Cuyo and Pampean regions of Argentina (San Juan and Córdoba provinces as well as South of Buenos Aires province) exhibit intermediate position in PCA (Fig 5) and mid proportion estimates with Admixture (Fig 6). This pattern can be interpreted as the result of a mixture between different ancestries (scenario 1) or relative limited shared history with any of them (scenario 2).“

We focused most of the study of Native American specific ancestry to comparisons with ancient samples to contrast whether the pattern in Argentina of mixed Native American components is really due to a mixture between those components or because Native American references are missing. We think that our work gives robust evidences for the latter.

Moreover, we wish to highlight that a main point of our article is that we argue that “getting Native American samples from Argentina” representing all the Native American genetic ancestry is a hard, if possible at all, task since very few communities remain nowadays. The communities are mostly distributed in the extreme sides of the national territory. In the article, we stressed this point several times, for example:

Introduction (Line 146): “As for the Native American component, it is difficult to study its origin focusing on present-day communities since their organization has changed drastically after the arrival of the first conquerors in the 16th century [21].“

Conclusion (Line 636): “Having identified this component from admixed individuals demonstrates that focusing only on indigenous communities is insufficient, at least in Argentina, to fully characterize the Native American genetic diversity and decipher the pre-Columbian history of Native Americans. Indeed, most indigenous communities have been culturally annihilated and invisibilized [68,69] to the point that several Argentinean regions were considered “Indian free” in the mid-20st century [70]. However, the cultural incorporation did not necessarily imply a biological extinction. Although studies based on samples from indigenous communities [37–39] provide decisive information to understand the evolutionary history of Native American ancestry, alternative strategies must be considered to fill this gap in the effort to more fully describe the Native American ancestry (e.g. see [71]).“

We hope that we have clarified our position about the sampling scheme only focusing on indigenous communities in the restructured version of the manuscript thanks to the Referees’ suggestions.

p15. line 356: by how much are the African and Native American ancestries underestimated? For clarity, it would be helpful to put some numbers to such comparison in the main text.

Following Referee #1 suggestion to keep the Results section focused on the main results, this part moved to Material and Methods (Line 777). S27 Fig (S5 Fig in previous version) aims at answering this point. However, redundant information in the previous version of this figure made its visualization rather discouraging. We simplified the figure and added boxplots as insets so that it is easier to quantity the differences of ancestry proportions estimates with RFMIX and Admixture.

Moreover, we performed an additional quality control for the masking scheme (this is mentioned in the text, Line 779 and S28 Fig): we checked how consistent are the masking in 1000 Genomes admixed individuals for the ~170K SNPs present both in DS2p and DS3p (the two datasets for which RFMIX has been run separately). We found that 95% Admixed 1000 Genomes individuals have more than 99% such SNPs with the same masking in DS2p and DS3p while the individual with the lower consistency rate had only 4% SNPs with inconsistent masking

p16. line 376: it will be clearer if the authors can identify by color the "new component" observed in Iberians.

Done.

Also, wouldn't the authors expect the Argentinean samples to exhibit higher proportions of the southern European component predominant in PROPES Italian samples (from Fig. S7C) given the great wave of European immigration from Italy in the 19th and 20th centuries?

We now address this question in the manuscript. We performed new European Ancestry Specific analyses including CLM from 1000 Genomes (no PEL individual with masked genotype data passed the filtering step for missing genotype rate). Line 314: “Samples from Argentina and Chile exhibit higher proportions of Southeastern/Italian and Northern European ancestries than Colombians, as well as lower Iberian ancestry proportions (S8 Fig). We observed no significant difference in the proportions of any European ancestry between Argentinean and Chilean samples (S8 Fig). However, both PCA and Admixture shows that the individuals with most Southeastern/Italian ancestry are from Argentina, This is consistent with a previous study [31], and it can be explained by the many arrivals from Italy during the great wave of European immigration in the 19th and 20th centuries [22]. “

p17. line 393: the proportion of African ancestry is rather low in the Argentinean ataset. Can the authors elaborate more on how this can compromise the resolution and reliability of their MDS analysis? did they set a minimum threshold of African ancestry to be included in the analysis?

We acknowledge that we applied a rather loose criteria since we kept individuals with more than 5% SNPs with African ancestry ditypes assigned (--mind 0.95 flag with plink). But we think that this does not affect our conclusions.

We now added a sentence to briefly discuss this point (Line 350): “Although, the important missing genotype rate in masked data for admixed individuals could bias PCA and Admixture results, the results obtained by both methods are highly consistent for admixed individuals (S11 Fig).”

In what follows, we provide more elements.

Because of missing genotypes we expect that samples with masked genotypes would tend to:

- Have values close to 0 for any PC with PCA (although the lsq transformation implemented in smartpca tends to reduce this bias)

- Have overestimated ancestry proportions with Admixture for the ancestries most represented in the reference panel (in our case: Bantu-influenced and Western African ancestries).

If we have a look at individual PCs (now shown as S9 Fig), we observe:

- Argentinean individuals have negative 0 for PC1, and are grouped with Bantu Eastern and Western African. That is they are not evenly distributed around 0 as expected if their value was only explained by the fact that they have many missing data. This demonstrates that Argentineans have actually most of their African Ancestry explained by these three ancestries.

- A clear gradient for Argentineans on PC4, between Western African (top) and Bantu-influenced populations (bottom).

- Some Argentinean individuals (from Chepes, Formosa and Tucuman) have positive values on PC3 that discriminates Eastern Africans to the other African populations.

- The same individuals have the greatest Eastern African ancestry proportions with Admixture (in orange). This ancestry is the most poorly represented in the African reference panel we built.

The very important consistency for PCA and Admixture analyses for Admixed individuals is an important argument against a potential bias due to missing genotype. Moreover, our conclusions match historical records and maternal lineage analyses (citation in the manuscript Line 369-373). Altogether, we are convinced that any artifact due to missing genotypes does not drive our conclusions. Most of the African ancestry is really explained by Western African and Bantu-influenced ancestries. The Eastern African ancestry we observe for some Argentinean individuals (the novel result) is really present in Argentina.

Note: after addressing the referee’s comment, we decide to present the African ancestry specific PCA in a different way (individuals PC graphs instead of the MDS based on distance from the 5 first PCs as previously showed). We implemented this change and adapted the manuscript section accordingly.

p18. line 419: the legend of Fig 3 is extremely poor and a minimum of methods description should be given, together with some major highlights for the reader to focus on. Figure legends in general have very few details.

We agree that the legends of figures ancestry-specific PCA and Admixture analyses were not including enough details. We improved them (Main Figs 2-6)

p20. line 468: check the use of his/her in the description of the Puerto Madryn individual's pedigree.

This sentence has been removed.

p21. line 495: I think it should say "By increasing the number of..."

Corrected.

p22. line 506: 32 individuals from where? One has to dive into Fig S15 to figure it out and if I'm not mistaken this 4th cluster is mostly accounted for CYA samples from Calingasta and Chilean samples from Santiago. The reader will appreciate if this is stated from the out set in the main text to ease reading flow. Also, is there any connection between the aDNA component detected at K=5 and the fourth cluster of ancestry represented by the 32 modern south American individuals? probably not, but given the current text flow they seem to be related…

We acknowledge that the reading was rather difficult. Based on the reviewer’s very useful suggestions, we changed the description of the results. We believe that in the new version of the manuscript, it is now clear from the out set in the main text that the 4th cluster is mostly accounted for Cuyo samples from Calingasta and Chilean samples from Santiago.

We also acknowledge that the description of admixture results with K=4 and K=5 was misleading for the reader. We reformulated the entire section so it articulates better what follows of the manuscript and the description of the results (and their limitations) from admixture and PCA (Lines 415-430).

p23. line 532: why not refer to the fourth component as CWA in "(iii) the fourth component and CCP exhibit higher genetic affinity between them"... and so on? I know the therm CWA is later introduced as a nomenclature suggestion for the new component, but the reading is just not easy if it takes too long to reveal it. Specially when the text is already referring to figures (and supplementary plots) where CWA is already labeled.

Before the first submission, we hesitated a lot to name directly the fourth component CWA. We now have restructured this section following his suggestion.

p23. line 552: what is the evidence for the maternal lineages of the Calingasta individuals being region-specific? presumably the authors are referring to mtDNA haplogroups restricted to or predominant in the Cuyo region?

We changed the sentence (Line 445): “The genealogical record for the Calingasta individuals attests to a local origin of their direct ancestors up to two generations ago, and they have mtDNA sub-haplogroups predominant in the Cuyo region (S1 Table; [56,57]).” The reader can find more precision in the cited works.

p23. line 556: consider saying "in the Cuyo region".

Done.

p24. line 562: should read "analyses accounting *for* the genetic relationship..."

Done.

p24. line 574: important typo, should not say "pre-Colombian" but "pre-Columbian".

We fixed the typo

p25. line 578: did they mean "out of reach the expansion of"... ?

Thanks

p25. line 586: consider this alternative phrasing - "the identification of an undescribed Native American component not previously reported from autosomal markers". I think that "never described" is unnecessarily categorical.

Thanks. We changed it.

p25. line 589: consider saying "underrepresentation of *diverse* regions in Argentina" instead of "many".

Thanks. We changed it.

p25. line 590: consider removing "specific". To say "Native American genetic diversity" is specific enough.

Thanks. We changed it.

p27. line 621: while the authors rightly point out the underrepresentation of the LWL component in ancient studies, isn't it counterintuitive that the few ancient samples from Brazil do not show much affinity with the modern samples while all previous analyses do show Argentinean NEA and PPA samples clustering with LWL reference populations, stemming mostly from Amazonian indigenous groups across Brazil. It is noted that the authors argue for the extended spread of the LWL component across the region and the rather restricted geographic coverage of the ancient Brazilian samples, but it would be nice to see the authors elaborating a bit more on possible demographic scenarios underlying the lack of contribution of these ancient samples into modern populations assigned to the LWL component.

We reformulated this part, so we discuss different potential (perhaps complementary) reasons to explain the lack of contribution of these ancient samples into modern STF (Line 527): “The fact that there is no ancient sample group exhibiting outstanding genetic affinity with STF points to the underrepresentation of this component in ancient samples. First, the geographical range covered by ancient samples that could represent this component is restricted to Brazil, while STF is a heterogeneous group that includes relatively isolated populations [39] from the Gran Chaco, the Amazonas and Northern Andes. Moreover, the most recent samples that could represent STF are aged ~6700BP, and gene flow with other components since then may have contributed to dissolve the genetic affinity of STF with ancient samples in Brazil analyzed here.”

We provide here more details about this discussion. Posth et al. 2018 described a population replacement in Brazil after 9600BP (the age of Lapa Do Santo sample). It would mean that the second ancestry stream that has potentially remained in the region since then is only represented by Laranjal ~ 6700BP. Since 6700BP, gene flow between the STF and other components must have dissipated the genetic affinity of STF component with this ancient group. We provide here some analyses F-statistics that demonstrate that (to the limit of the statistical resolution due to reduce number of SNPs when including two ancient DNA group in the comparison)

1. STF has higher genetic affinity with Laranjal 6700BP than with the older ancient DNA samples from Brazil and Belize. Even if the trend is not outstanding, it is consistent with the population replacement suggested by Posth et al.

2. STF has higher genetic affinity with Laranjal 6700BP than with the oldest ancient DNA samples (from both the Southern Cone and Central Andes), consistent with the fact that STF is related to ancient populations from Brazil.

Looking closer (we removed the historical Kaweskar ~100B in what follows): there is a clear correlation between the age of the Ancient Sample and the f4 in both the Andes and Southern Cone. That is the genetic affinity of STF to ancient samples from other regions (relative to Laranjal 6700BP) increases with time. Consistent with gene flow between STF and ancient populations represented by Andes aDNA samples and between STF and the ancient populations represented by Southern Cone aDNA samples.

p27. line 633: couple of typos, it should say "where X is one *of* the four Native American *components*" (plural)

We fixed the typo.

p27. line 634: The f3 vs age correlations are an interesting addition to the analyses. However, I didn't see a high level statement summarizing what's the main finding other than just saying that there are some correlations even after correcting for geography, etc. I think it will be helpful to offer the interpretation even if the authors feel is too obvious from the plots. It can be a simple phrase along lines like "We observed a significant relationship between X and Y (P = X), where the older the ancient sample the lower the shared drift with the modern components" or "where more recent ancient samples tend to show higher affinities with the modern components" (if that's the case).

Thanks for the suggestion. We changed the text (Line 538): “We observed a statistically significant relationship between the age of the ancient Southern Cone samples and their genetic affinity with CCP and CWA. This means that the older the ancient sample from the Southern Cone, the lower the shared drift with CCP and CWA.”

Also, can the authors rule out that such correlations are not a function of DNA preservation and/or amount of good quality data from each of the ancient samples? One could think that the older samples show lower f3 values simply due to lower proportions of available data across the genome. Can the authors elaborate on that possibility? What is the missing data per aDNA sample in each of these correlations?

The number of SNPs for all the f-statistics estimated in the manuscript is given in S5 Table.

We agree that due to DNA damage, the number of SNPs with genotype data tends to decrease with the age of the ancient samples, what could induce a bias towards significant positive correlations between the age of ancient samples and their genetic affinity with modern Native American components. We discarded this possibility because such bias would imply similar correlations for the fourth components, a pattern we did not observe. In addition, we did not observe significant correlations between the number of SNPs to estimate f3 or f4 and ancient sample age for either the Andes or the Southern Cone (P-value for Spearman correlation test > 0.25).

However, in order to dissipate any doubt on this potential confounding factor, we addressed this question including an extra correction in our analysis that did not change the trends identified (Line 543). “These patterns could be due to a relationship between geography and the age of the ancient samples because the most recent samples are concentrated in the Southern tip of the subcontinent (Fig 5A). Moreover, the number of SNPs with genotype data tends to decrease with the age of the ancient samples due to DNA damage, and thus inducing a potential bias towards significant positive correlations. To simultaneity correct both these two putative confounding effects, we repeated the analyses but using a correction for the ancient sample age (the residuals of the linear regression between the age of the ancient samples and their geographic coordinates) and a correction for genetic affinity estimates (the residuals of the linear regression between f3 and the number of SNPs to estimate it). This correction intensified the relationship described for CCP and CWA (Fig 8C). It also allowed to actually identifying significant relationships for STF and CAN. On the other hand, CAN is the only modern Native American component that exhibits a significant relationship between its genetic affinity with ancient Andean samples and their age (Fig 8B). This pattern holds after correction for geography (Fig 8D)”

p28. line 658: in the figure legend I think you want to say "the symbol inside the square represents..." (instead of "the point within the square"). Also, avoid "the point code" and consider instead: "the legend for plotted symbols of ancient samples...".

Thanks. We included these suggestions

p30. line 707: I get what the authors want to say, but such subtitle is just not reflecting the idea and it has poor grammatical flow. Please consider rewording.

We removed the subtitle that was indeed poorly informative. The results described after that subtitle are totally articulated now with the previous section into a unique one now called “Early divergence among the four Native American components in Argentina”

We also reformulated the sentence pointed by the referee (Line 602). “In order to get insights into the past genetic influence among the four components since their divergence, we applied a last f4-statistics analysis (S5G Table; S22 Fig).”

p31. line 724: another typo, instead of "three Andes ancient groups" it should say "three *Andean* ancient groups". Also there should be a conjunction (maybe "of") between events and geographical expansions.

This sentence has been removed.

p32. line 742: I would suggest expanding the first sentence to something like "We studied genetic ancestry at the sub-continental scale in Argentinean individuals in the context of other South American populations".

Thanks. Done.

p32. line 753: consider removing the word "from" three times

Done.

p33. line 768: It is not clear what the authors want to say with "These groups exhibit within population structure, and gene flow are most likely to have occurred among them after divergence." Please copyedit and consider rephrasing.

We agree that this sentence was not clear, and we removed it. We added the idea of within group genetic structure in the previous sentence. We removed the reference to gene flow since it was not relevant for the idea developed here.

Line 647: “Studying admixed individuals can be complex, and leveraging a pure statistical approach, we grouped individuals from rather culturally, ethnically, linguistically, and genetically heterogeneous groups to represent the four Native American components discussed here.”

p33. line 771: should be "limit" instead of "limits".

We fixed the typo.

p33. line 780: should be "pre-Columbian" instead of "pre-Colombian".

We fixed the typo.

p34. line 790: consider this alternative phrasing:

This study is a joint effort of Argentinean institutions funded by the national scientific system, and represents the first milestone of the Consorcio Poblar, a national consortium for creating a public reference biobank to support biomedical genomic research in the Republic of Argentina [75]. Genomic knowledge of local populations should be a priority for developing countries to achieve an unbiased representation of diversity in public databases and the scientific development at a global scale.

Thanks for the suggestion. We included this phrasing

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Francesc Calafell

13 May 2020

Fine-scale genomic analyses of admixed individuals reveal unrecognized genetic ancestry components in Argentina

PONE-D-20-02648R1

Dear Dr. Luisi,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Francesc Calafell

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Francesc Calafell

26 Jun 2020

PONE-D-20-02648R1

Fine-scale genomic analyses of admixed individuals reveal unrecognized genetic ancestry components in Argentina

Dear Dr. Luisi:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Francesc Calafell

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Principal component analysis in a worldwide Context.

    A: PC2 vs PC1; B: PC4 vs PC3; C: PC6 vs PC5. The percentage of variance explained by each principal component (PC) is given. Each point represents an individual following the color and point codes given in legend.

    (TIF)

    S2 Fig. Admixture analysis in a worldwide context.

    (A) Cross-Validation score for Admixture runs on the worldwide meta dataset (DS1) with K from 3 to 12. (B-G) Admixture results with K = 3 to K = 8. 1KGP: 1000 Genomes Project; CYA: Cuyo Region; NEA: Northeastern Region, NWA: Northwestern Region; PPA: Pampean Region; PTA: Patagonia Region.

    (PDF)

    S3 Fig. Comparison of different admixture continental ancestry proportion estimates in a worldwide context.

    Comparison of the African, European and Native American ancestry proportion estimates obtained with Admixture models with K = 3 and K = 8 applied to DS1. (A) African ancestry proportions for K = 3 are as observed in green in S2B Fig, while for K = 8 they are estimated as the sum of the three greenish colors observed in Main Fig 2. (B) European ancestry proportions for K = 3 are as observed in blue in S2B Fig, while for K = 8 they are estimated as the sum of the two bluish colors observed in Main Fig 2. (C) Native American ancestry proportions for K = 3 are as observed in orange in S2B Fig, while for K = 8 they are estimated as the sum of the three reddish colors observed in Main Fig 2.

    (PDF)

    S4 Fig. Correlation between eigenvectors and ancestry proportion estimates from analyses in a worldwide context.

    Comparison of ancestry proportion estimates from Admixture model with K = 8 and the 6 first Principal Components (PCs) in DS1. (A) PC1 vs African ancestry proportions (estimated as the sum of the three greenish colors observed in Main Main Fig 2). (B) PC2 vs European ancestry proportions (estimated as the sum of the two bluish colors observed in Main Fig 2). (C) PC2 vs Native American ancestry proportions (estimated as the sum of the three redish colors observed in Main Fig 2). (D) PC6 vs Bantu-influenced ancestry proportions (dark olive green in Main Fig 2). (E) PC6 vs Western African ancestry proportions (dark green in Main Fig 2). (F) PC4 vs Southern European ancestry proportions (light blue in Main Fig 2). (G) PC4 vs Northern European ancestry proportions (dark blue in Main Fig 2). (H) PC3 vs Cenral Chile / Patagonia ancestry proportions (orange in Main Fig 2). (I) PC3 vs Central Andes ancestry proportions (yellow in Main Fig 2). (J) PC5 vs Subtropical and Tropical Forests ancestry proportions (pink in Main Fig 2).

    (PDF)

    S5 Fig. Example of a local ancestry output.

    (A) RFMIX output for a given admixed individual. (B) Masked genotype showing ditypes of Native American (red), European (blue) and African (green) ancestry. Gaps are represented in grey and regions with unassigned ancestry (Unknown) are in black.

    (TIFF)

    S6 Fig. Choice of the number of principal components from European ancestry-specific principal component analysis.

    Elbow method to determine which PC minimizes the angle of the curve from the chart “Percentage of variance explained versus Number of PCs”

    (PDF)

    S7 Fig. European ancestry specific admixture analysis.

    (A) Cross-Validation scores for K from 2 to 10. (B) Admixture for K = 2. (C) Admixture for K = 3. CYA: Cuyo Region; NEA: Northeastern Region, NWA: Northwestern Region; PPA: Pampean Region; PTA: Patagonia Region.

    (PDF)

    S8 Fig. Comparision of different European ancestry proportions in South America.

    Comparison of ancestry proportion estimates from European Ancestry Specific Admixture Analyses (K = 3) among samples from the Argentina, Chile and Colombia (A) Southeaster/Italian ancestry (light blue in S7C Fig). (B) Iberian ancestry (turquoise in S7C Fig). (C) Northern Ancestry (dark blue in S7C Fig). P-value of the Wilcoxon test for each pairwise comparison is shown.

    (PDF)

    S9 Fig. African ancestry-specific principal component analysis.

    (A) Localization map of the 1685 reference samples with >99% of African ancestry. (B-C) Principal Components performed using the African reference samples (represented as in panel A), and South American samples masked for African ancestry.

    (TIF)

    S10 Fig. African ancestry-specific admixture analysis.

    (A) Cross-validation scores K from 2 to 10. (B) Admixture for K = 2. (C) Admixture for K = 3. (D) Admixture plots for K = 4. CYA: Cuyo Region; NEA: Northeastern Region, NWA: Northwestern Region; PPA: Pampean Region; PTA: Patagonia Region.

    (PDF)

    S11 Fig. Correlation between eigenvectors and ancestry proportion estimates from African ancestry specific analyses.

    Comparison of ancestry proportion estimates from Admixture model with K = 5 and some Principal Components (PCs) in admixed samples from DS5. (A) PC3 vs Western African ancestry proportions (yellow in S10 Fig). (B) PC3 vs Bantu-influenced ancestry proportions (blue in S10 Fig). (C) PC4 vs Eastern African ancestry proportions (orange in S10 Fig).

    (PDF)

    S12 Fig. Choice of the number of principal components from Native American ancestry-specific principal component analysis.

    Elbow method to determine which PC minimizes the angle of the curve from the chart “Percentage of variance explained versus Number of PCs”

    (PDF)

    S13 Fig. Native American ancestry-specific admixture analysis.

    (A) Cross-validation scores K from 2 to 10. (B) Admixture for K = 2. (C) Admixture for K = 4. (D) Admixture plots for K = 5. CYA: Cuyo Region; NEA: Northeastern Region, NWA: Northwestern Region; PPA: Pampean Region; PTA: Patagonia Region.

    (PDF)

    S14 Fig. Comparison of Native American ancestry proportion estimates obtained with admixture on unmasked data (K = 8) and masked data (K = 3).

    Unmasked and masked data refers to DS1 and DS6, respectively. Spearman correlation coefficients and associated P-values are shown. CAN: Central Andes; STF: Subtropical and Tropical Forests; CCP: Central Chile / Patagonia.

    (PDF)

    S15 Fig. Correlation of Native American ancestry proportions and geographic coordinates in Argentina.

    (A) Central Andes ancestry proportions vs Latitude. (B) Central Andes ancestry proportions vs Longitude. (C) Subtropical and Tropical Forests ancestry proportions vs Latitude. (D) Subtropical and Tropical Forests ancestry proportions vs Longitude. (E) Central Chile/Patagonia ancestry proportions vs Latitude. (F) Central Chile/Patagonia ancestry proportions vs Longitude. Linear regression slopes and the associated P-values are shown.

    (PDF)

    S16 Fig. Individual assignation to a Native American ancestry cluster.

    Consensus cluster assignation of South American individuals based on three K-means procedures run with different pairwise distances among individuals. (Top) K-means results using Ancestry-Specific PCA and Admixture (ASPCA and AS-Admixture), and f3 results to compute pairwise distances. Individuals are represented as in Main Fig 5. Insets: BIC score for number of clusters set to K-means ranging from 2 to 20. In all the three cases, K-means BIC was minimized when considering 4 clusters. (Bottom) Same as top with point colors corresponding to the assigned cluster.

    (TIF)

    S17 Fig. Pairwise genetic affinity among individuals assigned to different groups.

    Boxplots for 1- f3(YRI; Ind1, Ind2), where Ind1 and Ind2 are two individuals belonging to Group 1 and Group 2, respectively. The groups are either the fourth Native American components identified or ancient Middle Holocene Southern Cone groups. For clarity, boxplot outliers are not shown. YRI: Yoruba from 1KGP.

    (PDF)

    S18 Fig. Graphical visualization of pairwise genetic distances among modern and ancient groups in South America.

    (A) Neighbor-joining tree from distances of the form 1/f3(YRI; X, Y). USR1 from Ancient Beringia was used as outgroup (B) Multidimensional-scaling from distances of the form 1-f3(YRI; X, Y). Each group is represented as appearing in the leaf of (A). USR1 and Anzick-1 were not considered in (B). YRI: Yoruba from 1KGP.

    (TIF)

    S19 Fig. Genetic affinity of the four Native American components with ancient groups.

    (A-D) f3(YRI; X; Ancient). (E-H) f4(YRI, X; Ancient Beringia, Ancient). (A) and (E): with Central Andes (CAN) as X. (B) and (F): With Subtropical and Tropical Forests (STF) as X. (C) and (G): With Central Chile / Patagonia (CCP) as X. (D) and (H): With Central Western Argentina (CWA) as X. YRI: Yoruba from 1KGP; Ancient Beringia: USR1 individual from [66]; X: Native American component in Argentina (one plot per X). Ancient: ancient group labeled on the x-axis and represented with a point/color scheme as in Main Fig 5. Vertical segments are the +/- 3 standard errors intervals.

    (PDF)

    S20 Fig. Changes across time of genetic affinity of the four Native American components with ancient groups.

    Each point represents a f4 score of the form f4(YRI, X; Ancient Beringia, Ancient) vs the age of ancient sample, where X is one of the four identified Native American components, and Ancient is an ancient group. X is represented by the color of the square while Ancient is represented by the point within the square. The point code of the ancient samples is shown in Main Fig 5. Ancient Beringia: USR1 individual from [66]. (A) f4 vs age of ancient samples from Southern Cone. (B) f4 vs age of ancient samples from Andes. (C) f4 vs age of ancient samples from Southern Cone considering correction for both f4 and age. (D) f4 vs age of ancient samples from Andes considering correction for both f4 and age. Linear regression slopes and the associated P-values are shown. CAN: Central Andes; STF: Subtropical and Tropical Forests; CCP: Central Chile / Patagonia; CWA: Central Western Argentina; YRI: Yoruba from 1KGP.

    (TIF)

    S21 Fig. Comparison of genetic affinity of an ancient group to a Native American component relative to another.

    f4 (YRI, Ancient, X, Y) where X and Y are two of the four identified Native American components (one plot per X-Y combination), and Ancient is ancient group labeled on the x-axis and represented with a point/color scheme as in Main Fig 5. Vertical segments are the +/- 3 standard errors intervals. Note this setting for f4 statistics is symmetrical when switching X and Y.

    (PDF)

    S22 Fig. Comparison of genetic affinity of a Native American component to another relative to an ancient group.

    f4(Ancient, X; Y, YRI) where X and Y are two of the four identified Native American components (one plot per X-Y combination), and Ancient is ancient group labeled on the x-axis and represented with a point/color scheme as in Main Fig 5. Vertical segments are the +/- 3 standard errors intervals. CAN: Central Andes; STF: Subtropical and Tropical Forests; CCP: Central Chile / Patagonia; CWA: Central Western Argentina; YRI: Yoruba from 1KGP.

    (PDF)

    S23 Fig. Removing admixed Santiago de Chile individuals to compute F-statistics does not affect the results.

    Admixed individuals from Santiago de Chile were removed to perform the analyses presented in this figure. (A) f3(Target; S1, S2) to test for treeness; (B) f4(YRI, Target; S1, S2) to test whether Target shares more ancestry with S1 or S2; (C) f3(YRI; CWA; Ancient); (D) f4(YRI, CWA; Ancient Beringia, Ancient); (E) f4(Ancient, CCP; CWA, YRI); (F) f4(Ancient, CWA; CCP YRI); (G) f4(YRI, Ancient; CWA, CCP). CCP: Central Chile / Patagonia.

    (PDF)

    S24 Fig. Schematic routes for the main population arrivals in the Southern Cone.

    Each arrow represents one of the four components discussed throughout the article. Neither the time and place of the splits among these components nor gene flow among them have been addressed in this study.

    (PDF)

    S25 Fig. Admixture analyses in DS2 to define European, African and Native American reference individuals for local ancestry analyses.

    (A) Cross-validation scores from K = 3 to K = 10. (B) Admixture for K = 7.

    (PDF)

    S26 Fig. Admixture analyses in DS3 to define European, African and Native American reference individuals for local ancestry analyses.

    (A) Cross-validation scores from K = 3 to K = 10. (B) Admixture for K = 5.

    (PDF)

    S27 Fig. Comparison of admixture and RFMix ancestry proportion estimates in DS2p and DS3p.

    (A-C) For DS2p: Argentinean samples from the present study with reference panel that consists in 1KGP individuals from Africa, Europe and America [42] and Chilean individuals from [37]. Native American, European and African ancestry proportions estimates with RFMix vs with Admixture with K = 7. (D-F) For DS3p: Argentinean samples from [31] with reference panel that consists in 1KGP individuals from Africa, Europe and America [42]. Native American, European and African ancestry proportions estimates with RFMIx vs with Admixture with K = 5.

    (TIF)

    S28 Fig. Consistency of the masking procedure applied to DS2p and DS3p.

    We compared the percentage of variants with same ancestry ditypes in DS2p and DS3p for American admixed individuals from the 1000 Genomes. Project.

    (PDF)

    S1 Table. Sample information.

    Sampling location, gender, uniparental lineages, Affymetrix QC metrics, color and point coding used for plots.

    (XLS)

    S2 Table. Data sets (DS) analyzed throughout the article.

    (PDF)

    S3 Table. Ancestry proportion estimates in a worldwide context.

    Ancestry proportion estimates from Admixture analyses with K = 3 and K = 8 at the worldwide level. The column names describe the labels attributed to each ancestry detecting for both Admixture analyses, as well as the hexadecimal code for the color used to represent it in the corresponding admixture plot. The columns “Point”, “Color” and “cex” list the graphical parameters used to represent each individual in the different plots throughout the article.

    (XLS)

    S4 Table. Native American cluster assignation.

    Individual Native American cluster assignation is given for each of the three K-means procedures and for the consensus call (columns “F3”, “PCA”, “Admixture” and “Consensus”). The ancestry proportion estimates from Admixture analyses with K = 3 on the masked data for Native American ancestry are also provided. The column names explicit the labels attributed to each ancestry detecting for both Admixture analyses as well as the hexadecimal code for the color. For admixed individuals (from the present study and [31]), the”Population” and”Region” columns list the locality and province, respectively, while for Native American population (from [37,38]) the”Population” and”Region” columns list the ethnic and main ethnic groups, respectively.

    (XLS)

    S5 Table. F-statistics.

    (A) f3(Target; S1, S2) only for comparisons including Native American components (B) f4(YRI, Target; S1, S2) only for comparisons including Native American components (C) f3(YRI; X, Y) only for comparisons including Ancient Beringia, Mixe and Native American components (D) f3(YRI; X, Y) where X and Y can be either an ancient group or one of the four Native American components. (E) f4(YRI, X; Ancient Beringia, Ancient) only for comparisons including between a Native American components (X) and an ancient group (Ancient). (F) f4(YRI, Ancient, X, Y) only for comparisons including two Native American components (X, Y) and an ancient group (Ancient). (G) f4(Ancient, X; Y, YRI) only for comparisons including two Native American components (X, Y) and an ancient group (Ancient). YRI: Yoruba from 1KGP.

    (XLSX)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    The data analyzed here comprises both newly generated and previously reported data sets. Access to publicly available datasets should be requested through the distribution channels indicated in each published study. Newly generated samples have been registered under study  EGAS00001004492 in the European Genome-Phenome Archive which contains both raw and processed individual genotype datasets with accession number EGAD00010001913 and EGAD00001006227, respectively.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES