Skip to main content
Molecular Biology and Evolution logoLink to Molecular Biology and Evolution
. 2025 Jul 1;42(7):msaf156. doi: 10.1093/molbev/msaf156

Nested Admixture During and After the Trans-Atlantic Slave Trade on the Island of São Tomé

Marta Ciccarella 1,2,3,✉,c, Romain Laurent 4, Zachary A Szpiech 5,6, Etienne Patin 7, Françoise Dessarps-Freichey 8, José Utgé 9, Laure Lémée 10, Armando Semo 11,12, Jorge Rocha 13,14,15,a,✉,c, Paul Verdu 16,a,✉,c
Editor: Andrey Rzhetsky
PMCID: PMC12284396  PMID: 40590308

Abstract

Human genetic admixture, involving the contact between two or more previously isolated populations, can be a complex process influenced by social dynamics. In this study, we aim to reconstruct complex admixture histories in São Tomé, an island in the Gulf of Guinea where the Portuguese established one of the first plantation-based slave societies. Since the 15th century, migration waves from Africa and Europe, slavery, marooning, and indentured labour led to profound demographic shifts and social stratification on the island. Examining 2.5 million SNPs newly genotyped in 96 São Toméans, we observed patterns of genetic differentiation that were more complex than those of other populations descended from enslaved Africans on either side of the Atlantic. Using local ancestry inference and Identical-by-Descent methods, we identified five genetic clusters in São Tomé and reconstructed shared ancestries between each cluster and 70 African and European population samples, including an extensive sample from the Cabo Verde archipelago. Our findings align with historical records, retracing the major slave trade routes and labour-driven migrations after the abolition of slavery. We also identified gene flow between recently admixed groups that were previously isolated on the island. We call this process, creating multiple layers of genetic ancestry in admixed genomes, nested admixture. We suggest that changing social structures in São Tomé transformed the genetic structure of its population and influenced the admixture process. This study demonstrates how successive admixture and isolation events during and after the Trans-Atlantic Slave Trade shaped extant genetic diversity patterns at local scale in Africa.

Keywords: genetic admixture, genetic structure, demography, Trans-Atlantic Slave Trade

Introduction

The Trans-Atlantic Slave Trade (TAST) was one of the largest human migrations of historical times, involving the forced displacement of more than 12 million enslaved Africans according to historical sources (Eltis and Richardson 2010). Population genetic studies have extensively investigated the impact of these forced migrations on the genetic diversity of admixed populations on either sides of the Atlantic (Baharian et al. 2016; Ongaro et al. 2019; Micheletti et al. 2020; Almeida et al. 2021; Laurent et al. 2023). These studies provided important contributions for the understanding of genetic admixture dynamics in the context of the TAST and European colonization, as well as for the general understanding of human genetic admixture processes (Gravel 2012; Fortes-Lima and Verdu 2020; Mooney et al. 2023).

Populations descended from enslaved Africans during the TAST experienced several periods of recurrent introgression from populations originating from different regions of Africa (Ongaro et al. 2019; Fortes-Lima and Verdu 2020; Fortes-Lima et al. 2021). In the colonial context, the enslaved and non-enslaved communities were often reproductively isolated and subject to distinct demographic constraints. Changes in the economic viability of plantations, shifts in colonial policy, and the abolition of the TAST and slavery, all contributed to evolving socio-cultural contexts that led to changes in community integration and discrimination over time (Curtin 1990). The aim of this study is to investigate successive admixture and isolation histories on the island of São Tomé, in light of the documented history of settlement and social stratification during the five centuries of Portuguese colonization.

The archipelago of São Tomé e Príncipe, in the Gulf of Guinea, was uninhabited at the time the Portuguese arrived on the island of São Tomé around 1470 and rapidly settled the island from 1493 onwards (Caldeira 2011). Here, they established for the first time an economic system based on plantations and the exploitation of enslaved labour for the massive production of sugar cane, known as the Plantation Economic System (PES), which was later deployed throughout European colonies in the Caribbeans and the Americas (Curtin 1990).

The archipelago served as an important hub for slave trade between the 16th century and the abolition of the TAST and slavery during the 19th century (Caldeira 2021). Historical records show that enslaved Africans were initially traded from the Niger Delta in the Kingdom of Benin, in a region that is now Nigeria. However, as the local demand for enslaved labour increased during the 16th century PES in São Tomé, together with the growing slave trade with colonies from across the Atlantic, São Toméan merchants expanded their slave recruitment areas southwards, into the Kingdom of Kongo and other regions of northern Angola (Caldeira 2013).

In the 19th century, São Tomé e Príncipe became a major producer of cocoa and coffee. After the formal abolition of slavery in 1875, the plantations relied on indentured servitude. Contractual workers, known as “serviçais”, were mostly recruited from other Portuguese colonies, including Angola, Mozambique, and Cabo Verde, an archipelago in West Western Africa. Remarkably, over the course of the 20th century, approximately 80,000 Cabo Verdeans migrated to São Tomé (Carreira 1976). As São Tomé e Príncipe, Cabo Verde was also colonized by the Portuguese in the 1460s and underwent more than 400 years of history linked with the TAST (Carreira 1972). Nevertheless, the ports of embarkations of enslaved Africans and the colonial exploitation of the islands differed significantly between the two archipelagos (Seibert 2014).

The island of São Tomé offers unique opportunities to disentangle the impact of social stratification on genetic diversity in the colonial context of the TAST, during the first PES, and after the abolition of the TAST and that of slavery, during the second PES. Extant communities within São Tomé are associated with different histories related to marooning, slavery, and indentured labour. Since the beginning of Portuguese colonization, the forested interior of São Tomé served as a refuge for runaway slaves, who remained largely isolated in maroon communities until the 19th century, when some of these communities became known as the Angolares (Seibert 1998). Today, Angolares communities in the north-western and southern coasts of the island are known to speak Angolar, one of the two main autochthonous creole languages of the island (Lorenzino 1998; Hagemeijer 2011). In parallel, formerly enslaved-Africans, who gained their freedom, became the largest social group in São Tomé during the 17th and the 18th century, a period of profound economic decline and relative abandon of the colony by Portuguese settlers and by the Portuguese crown (Seibert 2015). Their descendants are often associated with the Forro creole language, “Forro” literally meaning “freed slave” (Hagemeijer 2011). Finally, descendants of Cabo Verdean serviçais may be considered to form a third social group associated with yet another creole language spoken on the island: the Cabo Verdean Kriolu (Instituto Nacional de Estatística 2012).

In this complex context of forced and deliberate migrations to São Tomé and social stratification within the island, previous genetic studies have identified substantial genetic structure with a limited number of loci (Tomas et al. 2002; Coelho et al. 2008) or samples (Almeida et al. 2021). Coelho et al. suggested that the identified genetic structure within São Tomé was mainly caused by the differentiation between Angolares and non-Angolares individuals, with increased genetic drift having occurred in the former group. More recently, Almeida et al. generated exome sequence data in 25 individuals from the islands of São Tomé and Príncipe and found that groups speaking the Angolar and Forro creoles in São Tomé had similar levels of genetic contribution from both the Gulf of Guinea and Angola.

The nested genetic structures and the mosaic of African genetic diversity found by these previous studies in São Tomé thus raised a series of fundamental questions about the histories of admixture in the island. What are the origins of the genetic ancestors of São Toméans? Are the diverse genetic ancestries evenly distributed among present-day São Toméans? Within São Tomé, and spanning existing social groups speaking different creole languages, when and how did genetic admixture and isolation processes occur?

In this study, we investigated 96 unrelated individuals sampled in 14 sampling sites from São Tomé, each genotyped at 2.5 million SNPs genome-wide. We identified five genetic clusters of individuals in the sample based on haplotype-sharing patterns. We inferred the shared-haplotypic ancestries for each five São Toméan genetic clusters using previously published genome-wide data from 70 African and European populations, including an extensive sample from the archipelago of Cabo Verde (Auton et al. 2015; Gurdasani et al. 2015; Patin et al. 2017; Semo et al. 2020; Laurent et al. 2023). We reconstructed more recent shared ancestries between admixed groups using sharing patterns of long tracts Identical-by-Descent. We finally propose a chronology of successive admixture and isolation events that may explain the genetic diversity observed in São Tomé based on different types of genetic evidence. This study highlights the influence of the complex history of European colonization, slavery, indentured labour, and social stratification on extant genetic patterns in São Tomé.

Results

São Tomé Genetic Diversity in the Global Context

To investigate the genetic diversity across the island of São Tomé in the worldwide context, we combined the newly genotyped data of 96 unrelated individuals from São Tomé (Fig. 1, supplementary table S1, Supplementary Material online) with a reference dataset of population samples from Africa, Europe, and the Americas, including other populations descended from enslaved Africans in Cabo Verde, African-Americans in southwest USA (ASW) and African-Caribbeans in Barbados (ACB) (supplementary table S3, Supplementary Material online) (Auton et al. 2015; Gurdasani et al. 2015; Patin et al. 2017; Semo et al. 2020; Laurent et al. 2023). Figure 2a shows the sampling locations of the 70 populations included in the reference dataset, which are grouped into 10 distinct geographical regions. We considered a subset of population samples from previous studies, focusing on regions in Africa that were key ports of embarkation during the Trans-Atlantic Slave Trade (Eltis and Richardson 2010).

Fig. 1.

Fig. 1.

Map of São Tomé showing the distribution of individual samples collected from 14 sites. Sampling sites are concentrated in the most populated areas where the road network (shown in grey) is denser. Each circle represents a sampling site, with its size proportional to the sample size. The map on the right shows the location of São Tomé in the Gulf of Guinea.

Fig. 2.

Fig. 2.

The genetic differentiation of the worldwide population samples. a) Geographic locations of the 70 population samples considered in the present study. b) MDS projection of pairwise Allele-Sharing Dissimilarities (ASD) among São Toméans, Cabo Verdeans and other African, American and European populations. The first two axes of variation in the MDS were calculated among 3,203 individuals using 411,121 autosomal SNPs. On the bottom, the color-coded legend sorts the populations into 10 distinct geographical regions in continental Africa, Europe and the Americas.

Figure 2b shows the first two dimensions of a Multi-Dimensional Scaling analysis calculated on Allele-Sharing Dissimilarities (ASD) (Bowcock et al. 1994), between each pair of individuals in the merged dataset including 411,121 autosomal SNPs. The first axis of variation was explained by major genetic differentiation between African and European populations, while the second axis captured genetic differentiation within Africa, revealing a north-south gradient ranging from West Western African to South African populations.

The 96 unrelated individuals born in São Tomé were highly dispersed across the two first dimensions of the ASD-MDS, compared to the 70 population samples from Africa, Europe, and the Americas (Fig. 2b). Other populations descended from enslaved Africans showed a relatively simpler pattern, in spite of being also highly admixed: Cabo Verdean individuals fell along a trajectory going from European to West Western African populations, reflecting the impact of slave recruitment from the neighbour areas of Senegambia and Guinea-Bissau in their current genetic profile (Beleza et al. 2012; Laurent et al. 2023); African-Americans (ASW) and African-Barbadians (ACB) clustered along a trajectory going from Western African to European populations, in agreement with a different history of recruitment of slaves mainly in the Gulf of Guinea (Martin et al. 2017; Laurent et al. 2023).

We further applied an unsupervised clustering algorithm, ADMIXTURE (Alexander et al. 2009), to visualize at once numerous axes of inter-individual genetic variation and population structure (Fig. 3). The results of the ADMIXTURE analysis reflected the patterns of genetic differentiation observed in the ASD-MDS, while providing further information on genetic differentiation within São Tomé and Cabo Verde at higher values of K.

Fig. 3.

Fig. 3.

ADMIXTURE analysis. Unsupervised ADMIXTURE analysis using 110,499 LD-pruned (<0.1) autosomal SNPs from 1,347 unrelated individuals, including 96 individuals that reported to be born in São Tomé, 2 individuals born in Príncipe, 225 individuals born on seven different islands in Cabo Verde, and a random sample of 20 individuals for each of the remaining population samples (supplementary table S4, Supplementary Material online). The proportion of independent ADMIXTURE runs closely resembling one-another is indicated on the left of each panel below the K value. Population samples are ordered by geographical regions according to the colour code indicated in Fig. 2.

As for the first dimension of ASD-MDS, ADMIXTURE clustering at K = 2 was explained by differences between African and European populations, maximizing the red and blue clusters, respectively. At higher values of K, new clusters captured differences among African populations. At K = 7, West Western African populations maximized the green cluster, and East Western African populations maximized the brown cluster. Also, at this value of K, West Bantu-speaking populations from West Central and West Southern Africa presented the highest proportions of the red cluster, while East Bantu-speaking groups from East Southern and South Africa maximized the yellow cluster.

Genetic clusters at K = 7 discriminated among different areas of slave recruitment in Africa, revealing genetic similarities between populations currently living in these areas and populations descended from enslaved Africans. However, while African-Barbadians (ACB) and African-Americans (ASW) displayed relatively simple patterns with resemblance mainly to Central and Western Africa, Cabo Verde and São Tomé exhibited more complex genetic structures that cannot be simply explained by the diverse African origins of their enslaved settlers. In Cabo Verde, while individuals from the southern islands (Fogo, Santiago, and Brava) showed high proportions of the green cluster, which was maximized in West Western Africa, individuals from the northern islands (Santo Antão, São Vicente and São Nicolau) maximized the orange cluster, which was predominantly found there and may have resulted from in situ differentiation due to high genetic drift (Beleza et al. 2012; Korunes et al. 2022; Laurent et al. 2023). In São Tomé, some individual profiles were close to those of Cabo Verdeans, while others showed varying proportions of clusters found in populations from East Western, West Central and West Southern Africa. Finally, another group of individuals from São Tomé showed a unique genetic pattern, represented by the violet cluster, which is not observed outside the island at this value of K.

Haplotype-based Population Structure Within São Tomé

We analysed patterns of shared haplotypes between all sampled individuals from São Tomé using ChromoPainter2 and fineSTRUCTURE (Lawson et al. 2012). By looking at the patterns of the number and length of shared haplotypes, we identified five different clusters of individuals (C1-C5) with high levels of inter-individual haplotypic resemblance (Fig. 4).

Fig. 4.

Fig. 4.

Genetic clustering and population structure in São Tomé. a) The 96 individuals from São Tomé grouped into five genetic clusters using fineSTRUCTURE, based on shared haplotype segments. The vertical axis of the dendrogram represents the distance or dissimilarity between clusters. The distribution of individuals across sampling sites is shown below. Black circles indicate the number of individuals sampled at each site, while coloured dots represent the number of individuals per cluster. b) Principal Component Analysis (PCA) based on the co-ancestry matrix in c), with individual labels color-coded by genetic clusters in a). c) Co-ancestry matrix showing the number of haplotypes each individual (row) copies from any other (column). d) Similar to c) but based on the cumulative length (in cM) of copied haplotypes.

Figure 4a presents a dendrogram displaying the phylogenetic relationships between the five clusters, as well as their spatial distribution across São Tomé. The majority of individuals sampled from the same locations were members of more than one genetic cluster. This indicates that the geographic location of sampled individuals does not fully account for the observed genetic clustering patterns. Instead, it suggests that these patterns may be influenced by other factors, such as variation in admixture proportions. In a Principal Component Analysis calculated on the same co-ancestry matrix used to infer the fineSTRUCTURE dendrogram, individuals of cluster C1 were separated from those of cluster C4 along the first PC and from those of cluster C3 along the second PC (Fig. 4b). Within this triangle, C2 individuals laid in between C1 and C3 along the second PC, and C5 individuals laid in between C4 and the rest of São Tomé individuals along the first PC.

It is interesting to look at both the total number (Fig. 4c) and cumulative length (Fig. 4d) of shared haplotypes. C1 individuals shared many long haplotypes with each other and copied relatively few haplotypes from other individuals on the island. On the contrary, C2 individuals copied relatively long haplotypes from C1 individuals, as well as several haplotypes from the rest of São Toméan individuals. Individuals of cluster C3 copied relatively more haplotypes from themselves than from the rest of the island, while individuals from clusters C4 and C5 differed markedly from individuals in other clusters, sharing the highest number of haplotypes among one another.

Haplotypic Ancestry of Genetic Clusters

The haplotypic ancestry of each São Toméan cluster was modelled as a mixture of the possible source populations included in the reference dataset using the ChromoPainter2-SOURCEFIND pipeline (Chacón-Duque et al. 2018). Based on the co-ancestry matrix computed with ChromoPainter2, we inferred a maximum of eight source populations per “Target” population sample using SOURCEFIND.

First, we considered each one of the admixed populations descended from enslaved-Africans as a separate “Target” population, and all other populations from continental Africa and Europe as “Donors” (Fig. 5a). In agreement with previous studies (Beleza et al. 2012; Patin et al. 2017; Ongaro et al. 2019; Micheletti et al. 2020; Laurent et al. 2023), we found that the genetic composition of island populations from Cabo Verde essentially resulted from admixture between Europeans (from 27% in Santiago to 51% in Fogo) and West Western Africans (from 46% in Fogo to 69% in Santiago), while African-Barbadians (ACB), and African-Americans (ASW) resulted from admixture between Europeans (15% in ACB, 24% in ASW) and Africans from Central and East Western Africa (49% in ASW, 82% in ACB), West Southern Africa (3% in ACB, 18% in ASW), and West Western Africa (9% in ASW only).

Fig. 5.

Fig. 5.

Haplotypic ancestry of São Toméans and other populations descended from enslaved-Africans. SOURCEFIND results showing the shared haplotypic ancestry between a) each São Toméan (STP), Barbadian (ACB), African American (ASW), and Cabo Verdean (from Santiago to Santo Antão) population sample, set as Target, and 57 populations from continental Africa and Europe, set as Donors; b) each STP, ACB, and ASW population sample, set as Target, and 66 populations from Africa and Europe, including Cabo Verdean populations, set as Donors; c) each São Toméan cluster (from C1 to C5), set as Target, and 66 populations from Africa and Europe, including Cabo Verdean populations, set as Donors. Contributions from the surrogate populations choosen by SOURCEFIND among the Donors for each Target are reported, color-coded by geographic regions in Africa and Europe, as indicated in the central legend.

São Tomé as a whole displayed a more complex ancestry profile, including contributions from Europe (9%), West Western Africa (15%), West Southern Africa (35%) and East Western Africa (41%) (Fig. 5a). Importantly, when Cabo Verdean populations were considered as possible “Donors” (Fig. 5b), the haplotypic ancestry from the island of Santiago in Cabo Verde (15%) replaced the Mandinka GWD haplotypic ancestry and part of the Iberian IBS component that were identified previously in São Tomé (Fig. 5a).

In Fig. 5c, each of the five São Toméan genetic clusters (C1 to C5) was considered as a separate “Target” population and Cabo Verdean individuals born in seven different islands were also included as seven separate “Donor” populations. Interestingly, clusters corresponding to the most basal bifurcation of the fineSTRUCTURE dendrogram (C1-C3 vs. C4-C5, Fig. 4a) had different ancestry profiles: clusters C1, C2, and C3 shared haplotypic ancestries with Kongo-speakers from Angola, in West Southern Africa (from 39% in C1% to 48% in C3), and the Ga-Adangbe from Ghana and the Igbo from Nigeria, in East-Western Africa (from 49% in C1% to 51% in C2); clusters C4 and C5 also shared haplotypic ancestry with Cabo Verdeans, mostly from the island of Santiago (30% in C5, 49% in C4). Moreover, we found evidence for small amounts of shared haplotypic ancestry between clusters C4 and C5 and the Sena and Manyika population from Mozambique, in East Southern Africa (2% and 8%, respectively).

Taken together, these observations suggest that the major division between the São Toméan clusters (C1-C3 vs. C4-C5) is due to variable contributions from different source populations, while further splits occurring within each major division may be related to subsequent episodes of in situ isolation and gene flow.

Long IBD Shared Among São Toméans and Cabo Verdeans

We explored the sharing of Identical-By-Descent (IBD) tracts of the five genetic clusters in São Tomé with each other and with each of the island populations of Cabo Verde. Figure 6a summarizes sharing patterns of IBD tracts longer than 18 cM, representing approximately the last eight generations of recombination between individuals (Baharian et al. 2016). We observed differences in long IBD sharing patterns between and within São Toméan clusters. Notably, individuals of cluster C1 shared the highest total length of long IBD tracts between each other and with individuals of cluster C3. Individuals of cluster C2 shared relatively low levels of long IBD tracts with each other, while sharing long IBD tracts with both clusters C1 and C3. Clusters C4 and C5 were the only ones sharing long IBD tracts with Cabo Verdean populations, in accordance with haplotypic ancestry results (Fig. 5). Moreover, cluster C5 showed higher values of total length of long IBD tracts shared with clusters C1 and C3 in São Tomé, compared to cluster C4.

Fig. 6.

Fig. 6.

IBD tracts sharing and ROH analysis. a) Network showing the cumulative length of Identical-by-Descent (IBD) tracts longer than 18 cM shared within and between population samples from Cabo Verde (left) and São Tomé (right). The line colour indicates the total length of long IBD shared (cM) and the point size is proportional to sample size. b) Violin plots showing the total length of Runs of Homozygosity (ROH) per individual genome across clusters in São Tomé, the Santiago population in Cabo Verde, the Mandinka form Gambia (GWD), the Iberians from Spain (IBS), the Yoruba from Nigeria (YRI), and the Kongo from Angola.

Runs of Homozygosity

We further investigated the presence of Runs of Homozygosity (ROH), arising when an individual inherits IBD segments from recent common ancestors. ROH of different lengths are usually generated by different demographic processes: medium ROH can be generated by recent demographic events such as bottlenecks, while long ROH are usually due to recent parental relatedness (Pemberton et al. 2012; Szpiech et al. 2017, 2019). Figure 6b reports the total length of long, medium and short ROH calculated for each population separately using GARLIC (Szpiech et al. 2017). We reproduced known patterns of ROH observed in world-wide populations, in particular, the overall higher levels of ROH in European populations compared to African populations (Pemberton et al. 2012; Korunes et al. 2022). Interestingly, cluster C1 in São Tomé presented the highest total length of long ROH of all the populations considered in this study. The sum of total length of long ROH in C2 individuals was intermediate between clusters C1 and C3 in São Tomé. Finally, clusters C4 and C5 presented similar average levels of medium and long ROH as the Santiago population in Cabo Verde.

Dating Admixture Events for Each São Toméan Cluster

We inferred the timing of the major admixture events that gave rise to each of the five São Toméan genetic clusters, based on recombination distances between ancestry tracts using the algorithm implemented in fastGLOBETROTTER (Wangkumhang et al. 2022). We specified a separate set of “Surrogate” populations for each São Toméan genetic cluster, which were identified by SOURCEFIND as the “Donor” populations most likely involved in the admixture events (Fig. 5c). The fastGLOBETROTTER algorithm infers the relative contributions of each “Surrogate” population to two alternative mixtures representing the first and second admixture sources. Overall, fastGLOBETROTTER identified strong signals of admixture in all São Toméan clusters and attempted to date admixture assuming a single pulse, an important caveat that need to be kept in mind for the interpretation of the following results (See Materials and Methods and Discussion).

For clusters C1, C2, and C3, the major admixture event primarily involved source populations from East Western Africa, represented here by the Igbo and Ga-Adangbe populations currently living in Nigeria and Ghana, on one side, and from West Southern Africa, represented here by the Kongo population currently living in Angola, on the other (Fig. 7). The dates of admixture events, considering a single pulse of admixture, ranged from 9 to 11 generations ago, and their relative Confidence Intervals were largely overlapping. Admixture was estimated to approximately 9 generations before present for cluster C1, corresponding to the year 1763 (CI: 1654 to 1824), and similarly for cluster C2, around 1748 (CI: 1683 to 1864), considering a generation time of 30 years. For cluster C3, the single pulse of admixture inferred 11 generations before present would correspond to the year 1694 (CI: 1636 to 1769).

Fig. 7.

Fig. 7.

Admixing sources and timing of admixture events. For each São Toméan cluster, the best-supported single-pulse admixture event from fastGLOBETROTTER is shown. The surrogate populations considered here are those inferred by SOURCEFIND as the best proxies for sources among the populations considered in this study (Fig. 5c). The relative contributions of surrogates to the first and second sources, totalling 100%, are reported, along with admixture dates in generations, with 95% confidence intervals from 100 bootstraps.

Clusters C4 and C5 resulted from varying levels of admixture between Cabo Verdeans and São Toméans. In both clusters, the contribution from admixed São Toméans was represented as a mixture of East Western Africans and West Southern Africans, and admixture events were inferred at approximately 9 generations before present, around 1760 (CI: 1736 to 1795) for cluster C4, and 1751 (CI: 1714 to 1790) for cluster C5.

Discussion

We analysed the detailed structure of the genome-wide diversity of São Tomé—the earliest European colony in Equatorial Africa, which became a model for the Plantation Economic System (PES) that was later deployed in numerous other European colonies across the Americas (Curtin 1990). Our investigation reveals that the genetic structure of São Tomé is more complex than that of several other populations descended from enslaved Africans, due to an intricate process of admixture and isolation events during and after the TAST.

We identified five genetic clusters across the island (Fig. 4a), based on the hypothesis that individuals sharing the most haplotypes may have experienced similar genetic admixture histories. Notably, individuals from the same sampling site on São Tomé were often assigned to more than one genetic cluster. The distribution of sampling sites in São Tomé reflects the current demographic landscape of the island (Fig. 1): most are located in the more densely populated northeast, first settled by the Portuguese, while others lie along the coasts or near former plantation estates, where communities associated with specific histories of marooning and indentured labour now mostly live. Our findings suggest that community boundaries, limiting mate choice between geographically close groups, likely shaped genetic diversity at a micro-scale on the island.

A major division separated clusters sharing similar proportions of haplotypic ancestries with East Western African and West Southern African populations (C1-C3), from clusters with a clear contribution from Cabo Verde (C4 and C5) (Figs. 4 and 5). This distinction is in line with the historically known recruitment of enslaved Africans from the Gulf of Guinea and Congo-Angola regions for the peopling of São Tomé from the 15th to the 17th centuries, associated with the first sugarcane-based PES, which was followed by the arrival of Cabo Verdean immigrants for indentured labour in the 20th century, associated with the second major PES based on coffee and cacao production.

In addition to differences in the African origins of their genetic ancestors, the present gene pool of São Tomé's inhabitants has also been shaped by demographic events occurring within the island. In Fig. 8, we propose an interpretation of the demographic history of São Tomé based on different types of genetic evidence, which explores the role of external population sources as well as in situ admixture and isolation events in shaping the genetic landscape of the island. Hereafter, we establish a correspondence between genetic clusters and identifiable social groups in São Tomé by discussing genetic findings in light of historical, linguistic, and demographic information on the island, as well as familial anthropology information available for the sample.

Fig. 8.

Fig. 8.

Nested admixture in São Tomé. Demographic scenario explaining the observed genetic diversity in São Tomé and Cabo Verde. The rectangles at the bottom represent the five São Toméan genetic clusters (C1-C5), and the Cabo Verdean sample (CV). Bars represent genetic populations evolving over time, with colours marking the locations of admixture events. Proxies for source populations are shown at the top, coloured by African and European regions, and arrows indicate gene flow over different time periods. The timeline is divided into three epochs that could be linked to significant historical events; (i) the foundation of the São Toméan population in the 15th century; (ii) the formation of isolated maroon communities during the first plantation economy from the 15th to the 17th century; (iii) the changing historical and social contexts during two centuries of economic decline, through the second plantation economy in the 19th and 20th centuries, up to the present. In most cases, explaining the observed genetic diversity in São Tomé requires taking into account gene flow between admixed groups that were previously isolated—a process that we call “nested admixture”.

A Population Descended From Freed Slaves

Towards the end of the 16th century, competition from Brazilian sugar production, coupled with significant slave revolts, led to São Tomé's rapid economic decline and diminishing Portuguese interest in the colony (Caldeira 2011). Throughout the 17th and 18th centuries, a growing population of Forros, literally “freed slaves”, eventually became the dominant social group in the island (Seibert 2015). Linguistically, the Forros speak a creole language with a Nigerian Edoid-related syntax and a predominantly Portuguese-based lexicon, which also includes various lexical items derived from Edo and Kikongo, a Bantu language from Congo-Angola (Hagemeijer and Rocha 2019). Although the number of Forro speakers has declined in recent decades, it remains the most widely spoken creole on the island (Gonçalves and Hagemeijer 2015). Forro speakers are currently found throughout São Tomé, with a notable concentration in the north-eastern areas where the capital city of São Tomé is located (Instituto Nacional de Estatística 2012).

Cluster C3 may represent the Forros of São Tomé. Individuals of cluster C3 were the most numerous among the São Toméan clusters and were mostly sampled in the northeast of the island (Fig. 4a). They showed similar patterns of genetic diversity to those of Forro-speaking individuals analysed in previous studies (Almeida et al. 2021).

While it is likely that C3 individuals representing the Forros result from admixture between Africans of different origins, with limited genetic contribution from Europeans, it is less clear when this admixture occurred. According to the available historical and linguistic information, the features of the Forro creole that have a Nigerian origin can be traced back to the early phases of the peopling of São Tomé in the beginning of the 16th century, when the slave trade was mostly limited to the Gulf of Guinea. Bantu-derived linguistic features would have entered the Forro creole in a later phase, still in the 16th century, during the expansion of the first plantation economy, when Bantu-speaking regions became more important for slave recruitment (Hagemeijer and Rocha 2019). According to our dating approach, using a single pulse model, the mixing of East Western African and West Southern African genetic contributions in cluster C3 would have occurred around 1694 (CI: 1636 to 1769) (Fig. 7), substantially later than the mixing of linguistic features and the timing of arrival of slaves from the African mainland would suggest.

However, fastGLOBETROTTER cannot always distinguish between discrete pulses and continuous admixture: if the period of continuous admixture is short, the model may infer a single admixture event, and the estimated date may fall within the time frame of the actual recurring admixture period (Hellenthal et al. 2014). Therefore, genetic admixture between East Western Africans and West Southern Africans may have occurred recurrently, both before and after the late-17th century date inferred here. Moreover, the admixture model here assumes random mating, which may bias admixture time estimates toward more recent dates compared to methods accounting for assortative mating (Zaitlen et al. 2017; Korunes et al. 2022).

One of the First Communities of Runaway Slaves

According to historical records, enslaved Africans escaped plantations from the onset of Portuguese colonization in São Tomé and managed to control the mountainous interior of the island until occupation by colonial authorities in the late 19th century, when some of these communities became known as the Angolares (Seibert 2013). Today, Angolar creole is mostly spoken in fishing communities scattered along the coast, with major settlements in the villages of São João dos Angolares and Santa Catarina, where most individuals of cluster C1 were collected (Figs. 1 and 4a). Moreover, cluster C1 shows patterns of genetic diversity similar to those found in Angolar-speaking individuals in previous studies (Coelho et al. 2008; Almeida et al. 2021). It is therefore likely that cluster C1 represents the Angolares of São Tomé.

While colonial narratives have suggested that the Angolares descended from survivors of a shipwreck with enslaved Africans from Angola who remained almost completely isolated (Seibert 2013), historical, linguistic, and genetic studies have proved this hypothesis unrealistic (Hagemeijer and Rocha 2019; Almeida et al. 2021). The Angolar creole spoken by the Angolares shares many features with the Forro creole, although having a stronger lexical influence from the Bantu language Kimbundu, spoken in northern Angola. Thus, it seems likely that the two creole languages descend from the same proto-creole of the Gulf of Guinea and subsequently accumulated differences in the amount and type of incorporated Bantu features (Hagemeijer 2011). Consistent with previous genetic studies (Almeida et al. 2021), we found that the Forros and Angolares of clusters C3 and C1, respectively, derived most of their genetic ancestry from the Gulf of Guinea (East Western Africa, 50% and 49%) and Bantu-speaking regions in Congo-Angola (West Southern Africa, 48% and 39%).

In addition, C1 individuals exhibited markedly distinct genetic patterns. Despite being recently admixed, they maximized the unique ADMIXTURE violet cluster likely due to genetic drift (Fig. 3). They also shared very long haplotypes (Fig. 4d), long IBD tracts (Fig. 6a), and presented many long ROH (Fig. 6b). Together, these characteristics are typical of isolated populations who experienced bottlenecks and inbreeding (Ceballos et al. 2018). Similar features have also been observed in other admixed population-isolates from South America (Mooney et al. 2018), as well as in other populations descended from maroon communities, including the Noir Marron from Suriname and French Guyana (Fortes-Lima et al. 2017) and Angolar-speaking individuals from São Tomé (Almeida et al. 2021).

The strong signal of genetic drift can significantly affect recombination distances between ancestry tracts (Pool and Nielsen 2009; Gravel 2012). Nevertheless, the single pulse of admixture was estimated approximately 9 generations before present for cluster C1, corresponding to the year 1763 (CI: 1654 to 1824), which was only slightly earlier than the date inferred for cluster C3, and their confidence intervals largely overlapped. Both clusters may share an early admixture event involving the same source populations, with the genetic signal of cluster C1 more heavily impacted by subsequent genetic drift. It is therefore likely that the Angolares of cluster C1, sharing a major part of their genetic ancestry with the Forros of cluster C3, became differentiated through isolation after running away from the plantations (Fig. 8).

Cabo Verdeans in São Tomé

The second period of Portuguese colonization in São Tomé, corresponding to the establishment of coffee and cacao plantations since the 19th century, marked a profound demographic shift on the island. The number of indentured laborers working on São Tomé e Príncipe represented approximately half of the population during the first half of the 20th century (Nascimento 2002). According to the last census, 8% of the population in São Tomé currently speaks Cabo Verdean Kriolu (Instituto Nacional de Estatística 2012).

Individuals of cluster C4 in São Tomé are more similar to Cabo Verdeans than to the rest of the São Toméan samples in terms of haplotypic diversity (supplementary Fig. S9, Supplementary Material online). They share over half of their haplotypic ancestry with the islands of Santiago and Santo Antão, and present the lowest proportions of East Western African and West Southern African contributions among all São Toméan clusters (Fig. 5c). Moreover, they include the five São Toméan individuals that reported, in the interviews, that one or both their parents were born in Cabo Verde, and they have mostly been sampled in sites close to historical roça plantations, such as Agostinho Neto, Monte Café, São João dos Angolares, and Porto Alegre, which hosted numerous Cabo Verdean serviçais. Therefore, cluster C4 likely represent the Cabo Verdeans that remained and left descendants on the island of São Tomé.

Our findings suggest that the Cabo Verdean haplotypic ancestry mostly came from the islands of Santiago and Santo Antão, in the south and the north of the Cabo Verde archipelago, respectively (Fig. 5d). However, looking at shared patterns of long IBD tracts, cluster C4 showed recent common ancestry with populations from different islands in Cabo Verde (Fig. 6a). Cabo Verdean populations from different islands shared many long IBD tracts with each other, particularly with Santiago, the first island to be settled in the 15th century, which played a key role in the founder effects that shaped the genetic diversity of the other islands (Laurent et al. 2023). SOURCEFIND, limited to using a maximum of eight surrogate populations, identified Santiago as a major contributor to Cabo Verdean genetic contribution in São Tomé, potentially underestimating contributions from other islands.

The main admixture event inferred for individuals of cluster C5, approximately 9 generations before present (circa 1760, CI: 1736 to 1795, assuming 30 years per generation; Fig. 7) between West Western African and European populations, likely reflects the founding of the admixed gene pool in Cabo Verde. However, this date is more recent than expected for the settlement of the archipelago (Laurent et al. 2023). The accuracy of admixture dating depends strongly on how well the model captures the underlying history of admixture and how well the surrogate populations represent the actual sources. The admixture model used here cannot account, for instance, for subsequent, albeit minor, admixture between Cabo Verdeans and São Toméans, as we are not considering São Toméans as proxies of sources of admixture. In this model, São Toméans were likely represented as a mixture of East Western African and West Southern African contributions to the first admixing source, which consisted mainly of West Western African contribution, while admixed Cabo Verdean genomes contributed to the second source, which consisted mainly of European contribution (Fig. 7).

Limited European Genetic Contribution

Overall, São Toméans exhibited limited levels of haplotypic ancestry shared with European populations, in particular, compared to numerous other enslaved-African descendant populations in the Americas and in Africa (Fortes-Lima and Verdu 2020) and even specifically compared to other populations from the former Portuguese colonial empire (Ongaro et al. 2019; Laurent et al. 2023). This was somewhat unexpected compared to historical records, as admixture was generally tolerated at the very beginning of the settlement as part of Portugal's colonization efforts (Caldeira 2007). However, historically, the African population in São Tomé, including both enslaved and free individuals, far outnumbered the European presence on the island (Henriques 2000), and to a larger extent than what has been reconstructed by historians in Cabo Verde (Albuquerque and Santos 1991). Furthermore, a substantial part of the Portuguese settlers left the island in the 18th century, a period of economic crisis between the two main plantation economies (Lucas 2015). The European genetic relative contribution may thus have been diminishing over time and was low but detectable in all São Toméan clusters, except for the Angolares of cluster C1, where it was absent.

It is interesting to note that almost half of the European ancestry tracts in São Tomé actually come from Cabo Verdean admixed genomes. In fact, the European haplotypic ancestry in São Tomé was overestimated when Cabo Verdeans were not considered as sources of admixture (compare Fig. 5a and b). Moreover, individuals with higher European descent may have clustered preferentially with those of Cabo Verdean descent in the haplotype-sharing analyses within São Tomé (Fig. 4a), which would explain the relatively higher levels of European haplotypic ancestry found in clusters C4 and C5 (Fig. 5c).

Nested Admixture Between Recently Admixed Populations

Beyond the three social groups previously discussed—the Forros of cluster C3; the Angolares of cluster C1; and the Cabo Verdean serviçais and their descendants of cluster C4—we also identified evidence of admixture among these groups within the São Toméan sample, specifically in clusters C2 and C5.

Admixture Between Angolares and Forros

The nine individuals of cluster C2 are genetically closer to cluster C3 (Figs. 3, 4a) and share numerous long IBD tracts with both the Angolares of cluster C1 and the Forros of cluster C3, as well as showing intermediate levels of medium and long ROH between these two clusters (Fig. 6b). C2 individuals may therefore result from gene flow between the Angolares descendants of runaway slaves and the freed population of Forros, after the maroon communities begun to establish economic, social, and cultural connections with the rest of the island population in the 19th century (Fig. 8).

Previous studies on populations descended from maroon communities emphasize strong isolation and genetic differentiation (Fortes-Lima et al. 2017; Almeida et al. 2021), in line with their history of escaping slavery and seeking refuge in remote locations. However, they do not account for the potential admixture with surrounding populations over time. In contrast, our findings show that marooning isolation does not necessarily imply the absence of complex admixture processes, which in turn argues for further refinement of genetic expectations for descendants of “isolated” populations in the context of admixture in general and the TAST in particular.

Admixture Between Cabo Verdeans and São Toméans

The immigrant indentured workers were largely confined to the roças, or plantation units, effectively isolating them from the local São Toméan population, who refused to engage in plantation work after the abolition of slavery. This period was thus marked by the emergence of new forms of discrimination and social stratification on the island (Bouchard 2023). We found evidence of varying degrees of recent common ancestry between clusters C4 and C5, of Cabo Verdean descent, and clusters C3 and C1, representing the Forros and Angolares, respectively (Fig. 6a). The reduced East Western African and West Southern African genetic contributions, representing the local São Toméan populations, to the haplotypic ancestry of cluster C4 (Fig. 5c), aligns with the history of discrimination. However, the patterns of haplotypic ancestry of cluster C5 suggest that substantial admixture occurred between Cabo Verdeans and São Toméans (Fig. 7), meaning that social stratification may have not fully prevented inter-community mate-choices.

The Genetic Contributions Form Mozambique and Angola

Since the 19th century, enslaved Africans and contractual workers were recruited also on a large scale from Angola and Mozambique to work on São Tomé's coffee and cacao plantations (Nascimento 2003, 2023). While the São Toméan sample shows a substantial amount of haplotypic ancestry from Cabo Verde due to indentured labour migrations, we found relatively little genetic contribution from Mozambique, primarily in clusters C4 and C5 (2% and 8%, respectively; Fig. 5c). The genetic impact of successive migrations from Angola is more challenging to disentangle, as West Southern African ancestry in São Tomé may reflect both TAST-related migrations in the 16th and 17th centuries and migrations of enslaved and contractual workers in the 19th and 20th century.

Conclusions and Perspectives

The fluctuation of plantation economies in São Tomé resulted in successive waves of migration and changes in social organization, which influenced gene flow between populations and communities over time. Our results suggest that three genetic clusters on the island correspond to significant anthropological and linguistic categories—the Angolares, Forros, and Cabo Verdeans—while two other clusters arise from relatively recent gene flow between these groups. This observation reflects the gradual dismantling of a previously established genetic structure shaped over five centuries of changing historical and social contexts.

To characterize the intricate mosaic of genetic diversity observed in São Tomé, we use the expression “nested admixture” (Fig. 8), which emphasize the importance of accounting for multiple layers of genetic ancestry resulting from successive admixture events—an inherently difficult task using current methods, as demonstrated in several previous studies based on both modern and ancient DNA (Baharian et al. 2016; Busby et al. 2016; Sarno et al. 2017; Wangkumhang and Hellenthal 2018). Recent advances have begun to address some challenges by developing methods that can infer sources without relying on modern populations as proxies, or disentangle contributions from closely related source populations (Salter-Townshend and Myers 2019; Browning et al. 2023), which would be of major interest to apply to São Tomé in future studies.

A genealogical perspective on admixture could further help to clarify the complex patterns of genetic ancestry in São Tomé. Recent theoretical developments have explored the relationship between the number of genetic and genealogical ancestors in the Afro-American population (Mooney et al. 2023; Agranat-Tamir et al. 2024), providing useful insights into the extent and dynamics of admixture. Applying this approach to São Tomé may shed new light on how successive admixture events led to the formation of the observed genetic clusters.

Finally, previous genetic studies have suggested that in several populations descended from enslaved Africans, female and male contributions from source populations were not equal, and that mating might have occurred preferentially between individuals with certain genetic ancestry, both processes affecting genome-wide genetic diversity (Goldberg et al. 2014, 2020; Zaitlen et al. 2017; Micheletti et al. 2020; Kim et al. 2021; Ongaro et al. 2021; Korunes et al. 2022; Mas-Sandoval et al. 2023). It is therefore imperative that future studies investigate the impact of ancestry-related sex bias and other non-random mating processes on genetic diversity and admixture patterns in São Tomé.

Materials and Methods

Genetic Dataset

Sampling Strategies

DNA sampling has been conducted by two different research groups on the two archipelagos, respectively. Sampling strategies are detailed in Coelho et al. 2008 for São Tomé e Príncipe and in Laurent et al. 2023 for Cabo Verde. In São Tomé, the sampling reflected the island's uneven population distribution. Due to the island's mountainous geography and peopling history, settlements are mainly in the northeast and along the coast, with few in the centre and southeast, and almost none in the forested southwest. Samples were collected across 14 locations, including 11 villages and 3 former plantation estates, representing over 80% of São Tomé's population. Anthropological questionnaires, completed alongside DNA sampling, provide birthplace locations of each individual and their parents. DNA was collected with buccal swabs and extracted from saliva samples following standard protocols.

Genotyping and Quality Control

We newly genotyped 100 samples collected in São Tomé e Príncipe for this study using the HumanOmni2.5Million-BeadChip genotyping array read with iScan at the OMICS platform of the Institut Pasteur. Two hundred sixty-one samples from Cabo Verde were previously genotyped on the same chip (albeit different versions) using the same technology at the same platform (Laurent et al. 2023). Therefore, we curated the raw genotypes of 361 samples from São Tomé e Príncipe and Cabo Verde from 5 separate batches produced with different versions of the same genotyping array (supplementary table S1, Supplementary Material online). We conducted a comprehensive Genotyping Quality Control in four phases using the Genotyping Module v1.9.4 from Illumina GenomeStudio and custom Python scripts (supplementary table S2, Supplementary Material online). We retained 2,104,148 autosomal SNPs from 330 unrelated individuals, including 233 unrelated individuals sampled in Cabo Verde and 97 unrelated individuals sampled in São Tomé e Príncipe.

We assigned each individual sample from Cabo Verde and São Tomé e Príncipe to an island of birth based on individual birthplaces recorded in the family anthropology questionnaires. Out of the 233 individuals sampled in Cabo Verde, 225 were born on seven of the archipelago's islands, while seven were born outside of Cabo Verde, in particular, two in São Tomé, three in Angola, one in Brazil, and one in Portugal. One individual has been excluded due to missing birthplace information. Out of 97 individuals sampled in São Tomé e Príncipe, almost all individuals were born in São Tomé, apart from one individual born in Gabon, and two individuals born in the island of Príncipe. Finally, we retained 96 individuals born in São Tomé, including the 94 individuals sampled in São Tomé and the two sampled in Cabo Verde. The dataset containing 323 individuals born in Cabo Verde and São Tomé e Príncipe will be referred to as CVSTP henceforth.

Merging With Worldwide Populations

The resulting dataset of 330 family unrelated individuals from São Tomé e Príncipe and Cabo Verde was merged with 2,504 samples from 26 populations worldwide included in the 1000 Genomes project Phase 3 (Auton et al. 2015), with 1,307 samples from 14 African populations included in the African Genome Variation Project (EGAS00001000959) (Gurdasani et al. 2015), with 188 samples from 15 populations in Mozambique and Angola (E-MTAB-8450) (Semo et al. 2020), and with 1,366 samples from 38 sub-Saharan African populations (EGAS00001002078) (Patin et al. 2017;  supplementary table S3, Supplementary Material online). For the Semo et al. (2020) dataset, which was genotyped on a similar version of Illumina HumanOmni2.5 SNPs array as the São Tomé e Príncipe and Cabo Verde samples, we performed the same steps of Genotyping Quality Control from the raw genotyping data as for the CVSTP dataset. After each successive merging, we checked for genetic relatedness among samples up to the second degree using KING (Manichaikul et al. 2010). We finally retained 411,121 SNPs from 5,423 family unrelated individuals.

Population Labels and Geographical Regions

The 5,423 genetically unrelated individuals from 107 populations in the final merged dataset were grouped into 16 distinct geographical regions, including ten regions within Africa (with CVSTP identified separately), three regions in the Americas, and one region in Europe, South Asia and East Asia. When two cardinal points were used to define a region, such as “East Western Africa’, the second term indicates the wider region in Africa, including Western, Central or Southern Africa, while the first term specifies the relative position within that region (e.g. the eastern part of “Western Africa’).

Population Genetics Descriptions

Allele Sharing Dissimilarity

We investigated genetic diversity patterns between each pair of individuals based on successive subsets of populations in the merged dataset. We first computed a matrix of pairwise allele sharing dissimilarities (Bowcock et al. 1994), including all the individuals and all SNPs in the merged dataset using the ASD software (https://github.com/szpiech/asd). Then, we explored three axes of variation of Multi-Dimensional Scaling projections of various, separate, subsets of this ASD pairwise-matrix using the cmdscale function in R (R Core Team 2021). In particular, we removed East Asian, South Asian, South American, Puerto Rican PUR, Mexican American MXL, Central African, East African, and four West-Central African hunter gatherers populations. Figure 2b reports the first two dimensions of the ASD-MDS based on 411,121 autosomal SNPs of 3,203 individuals from 77 populations, and the third dimension is presented in supplementary Fig. S1, Supplementary Material online.

Genetic Clustering

We used ADMIXTURE version 1.3 (Alexander et al. 2009) to explore further genetic resemblances among individuals. ADMIXTURE analysis is sensitive to sample size heterogeneities, therefore we randomly resampled without replacement 20 individuals for each population, and we removed 7 populations with less than 5 individuals, except for the island's populations of interest in Cabo Verde and São Tomé e Príncipe, where we retained all individuals. This analysis was performed using 1,347 individuals from 70 populations, referred to as the Working Dataset henceforth (supplementary table S4, Supplementary Material online). We filtered the initial set of 411,121 autosomal SNPs for low Linkage Disequilibrium using the –indep-pairwise function in PLINK with a 50 SNP-window moving every 10 SNPs, and 0.1 r2 cutoff. We thus run ADMIXTURE on 110,499 LD-pruned autosomal SNPs from the 1,347 individuals in the Working Dataset. We performed 10 independent runs of ADMIXTURE for values of K ranging from 2 to 7 (Fig. 3). We used PONG (Behr et al. 2016) to define ADMIXTURE “modes” with a greedy approach for similarity threshold 0.95. The alternative ADMIXTURE mode for K = 5 is presented in supplementary Fig. S2, Supplementary Material online. We conducted an evaluation of the cross-validation error across 10 distinct runs for 15 values of K and found that K = 4 yielded the lowest error (supplementary Fig. S3, Supplementary Material online). ADMIXTURE results for K > 7 are reported in supplementary Fig. S4, Supplementary Material online.

Local-ancestry Inferences

Phasing With shapeit4

We phased individual genotypes using SHAPEIT4 version 4.2.2 (Delaneau et al. 2019), considering each autosomal chromosome separately with the HapMap Phase 3 Build GRCh38 genetic recombination map (Altshuler et al. 2010), and default parameters for autosomal data: minimum phasing window length of 2.5 Mb, and a total of 15 MCMC iterations, including 7 burn-in, 3 pruning, and 5 main iterations.

Chromosome Painting for fineSTRUCTURE

We used the inferential algorithm implemented in ChromoPainter v2 (Lawson et al. 2012) to paint each São Toméan genome as a combination of fragments received from other São Toméan individuals. We performed a first run of ChromoPainter2 to estimate nuisance parameters on four chromosomes using 10 expectation-maximisation (EM) iterations. We obtained Ne = 792.022979835471 and mu = 0.0020978990636338, then used to run ChromoPainter2 on the whole dataset. We averaged the results for all individuals by chromosome.

Clustering São Tomé Individuals With fineSTRUCTURE

We ran fineSTRUCTURE v4 (Lawson et al. 2012) using the co-ancestry matrix based on the number of shared chunks between each pair of individuals previously computed with ChromoPainter2. We performed 100,000 burn-in steps to allow the algorithm to reach a stable state. Subsequently, 100,000 further iterations were performed, retaining samples every 10,000th iteration. Following the MCMC sampling, a tree representing the inferred relationships among individuals was constructed. The best state observed during the MCMC sampling, reflecting the optimal population structure, served as the initial state for tree inference. fineSTRUCTURE classified the 96 São Toméan individuals into 17 clusters. In order to increase the interpretability of subsequent analysis, and based on the haplotype sharing patterns of the co-ancestry matrices (Fig. 4c and d), we identified five genetic clusters by cutting the dendrogram at height 4. We provide a more detailed representation of the fineSTRUCTURE dendrogram in supplementary Fig. S5, Supplementary Material online. The PCA in Fig. 4b has been calculated on the co-ancestry matrix with the pcares function in R. The third and fourth Principal Components are reported in supplementary Fig. S6, Supplementary Material online. The heatmaps in Fig. 4c and 4d are produced using the function provided by the authors in the FinestructureDendrogram.R script and are capped at 450 and 250 cM, respectively, for visualization purposes. For a visual representation of how consistently pairs of individuals are grouped together across different iterations of the MCMC process, refer to the pairwise coincidence matrix in supplementary Fig. S7, Supplementary Material online.

Chromosome Painting for SOURCEFIND and fastGLOBETROTTER

We used the phased haplotype information to reconstruct the chromosomes of each individual sample from Africa, Europe, and the Americas in the Working Dataset as a series of genomic segments inherited from a set of Donor individuals using ChromoPainter v2 (Lawson et al. 2012). As with the previous ChromoPainter2 run on the São Toméan sample, we first estimated nuisance parameters using 10 EM iterations, this time on a subset of the Working Dataset, following the author's recommendations. We randomly sampled 3 individuals per population (1/10th of the entire dataset), and we selected 4 chromosomes: 1, 7, 14, and 21. We averaged the estimated values across chromosomes, weighted by chromosome size, over the 10 replicate analyses. Finally, we used the a posteriori estimated nuisance parameters to run the ChromoPainter2 algorithm on the entire Working Dataset.

We run ChromoPainter2 on 411,121 autosomal SNPs from 1,347 individuals of the Working Dataset to prepare input files for three distinct SOURCEFIND analyses. For the first SOURCEFIND analysis, we run ChromoPainter2 setting each individual as both Donor and Recipient, except for the admixed individuals of interest Cabo Verde (CV), São Tomé e Príncipe (STP), African-Barbadians (ACB), and African-Americans (ASW), which were set as Recipient only. Importantly, each Donor population consisted of a maximum of 20 individuals in the Working Dataset. We obtained the averaged values of nuisance parameters Ne = 228.3939, and mu = 0.00076. Finally, we combined painted chromosomes for each individual in each of CV, STP, ACB, and ASW populations, separately. For the second and third SOURCEFIND analysis, and for the fastGLOBETROTTER analysis, we run ChromoPainter2 setting all individuals as both Donor and Recipient, except for the admixed individuals of interest STP, ACB, and ASW, which were set as Recipient only. Importantly, the nine CV populations were set as both Donor and Recipient. We obtained the averaged values of nuisance parameters Ne = 193.2902 and mu = 0.00040. Finally, we combined painted chromosomes for each individual in each of STP, ACB, and ASW populations, separately. Finally, we ran ChromoPainter2 setting all individuals as both Donor and Recipient to have a symmetric matrix for PCA computation (supplementary Fig. S9, Supplementary Material online), itself computed using the eigen function in R on the normalized matrix of chunks counts considering all individuals in the Working Dataset.

Estimating Possible Sources Populations Using SOURCEFIND

We applied the model-based method SOURCEFIND (Chacón-Duque et al. 2018) to estimate the shared haplotypic ancestry among populations based on chromosome painting. We modelled the copying vector of each “Target” admixed individual, obtained with the previous ChromoPainter2 analysis, as a weighted mixture of copying vectors from a set of “Surrogate” individuals. The Target and Surrogate individuals were allowed to copy from the same set of Donors in the ChromoPainter2 analysis, in particular all the populations included in the Working Dataset except for the Target populations of interest.

We ran three separate analyses to model different hypotheses of admixture. In the first analysis (Fig. 5a), we used CV, STP, ACB, and ASW populations as Target, and all other populations in the dataset as Surrogates. In the second analysis (Fig. 5b), we used STP, ACB, and ASW populations as Target, and all other populations in the dataset including CV populations as Surrogates. To reduce the influence of differences in sample size among CV populations, they were randomly subsampled to a maximum of 20 individuals per island. In the third analysis (Fig. 5c), we used the five São Toméan genetic clusters as separate Targets, and the same set of Surrogate populations as the second analysis. We finally aggregated results obtained for all individuals for each Target population, separately.

We conducted 20 independent SOURCEFIND runs for each analysis. Each Target individual was modelled as a mixture of Surrogate individuals only (“selfcopying” self.copy.ind), allowing for a maximum of 8 Surrogates (num.surrogates), with 4 expected number of Surrogates (exp.num.surrogates), and dividing each Target individual genome in 100 slots (num.slots) with possibly different ancestry. We considered 400,000 MCMC iterations (num.iterations), we discarded the first 100,000 as “burn-in” (num.burnin), and we sampled an MCMC iteration every 10,000 (num.thin). The final results consisted of 30 MCMC samples following the formula M = (num.iterations—num.burnin)/num.thin. For each Target population, we retained as the final result the inferred contributions to the mixture model of the MCMC iteration with the highest posterior probability over 20 independent runs (supplementary Fig. S8, Supplementary Material online).

Dating Admixture Events With fastGLOBETROTTER

We used fastGLOBETROTTER (Wangkumhang et al. 2022) to infer admixture using the Surrogate populations imputed by SOURCEFIND for each Target São Toméan cluster, and dated admixture events based on the LD decay patterns among haplotypes matching any pair of Surrogates. We used the same co-ancestry matrix used for the third SOURCEFIND analysis. Additionally, we prepared a copying vector file in which each Target genome is copying from each individual genome in the Donor population samples for dating admixture events. We used all the populations of the reference panel including CV as Donors, and the populations imputed by SOURCEFIND for each São Toméan cluster as Surrogates.

We first ran fastGLOBETROTTER separately for each cluster setting “null.ind: 1”, inferring admixture proportions, dates, and sources over 5 iterations, and performing 100 bootstrap re-samples. In all bootstrap resamples for all Target São Toméan cluster, no date estimate was less or equal to 1, nor more or equal to 400. Therefore, the P-value for evidence of “any detectable admixture” was 0 in all cases. In this study, we presented the output of a second fastGLOBETROTTER run, setting “null.ind: 0”. The results of the first and second run were consistent in terms of best guess of admixture dating. The “null individual analysis” is meant to detect signals in the LD decay curve due to strong genetic drift (Hellenthal et al. 2014). Moreover, fastGLOBETROTTER can detect if the left end of the coancestry curve is affected by long chunks and eventually trim it prior to model fitting. Despite these corrections, the authors advise caution, as LD effects can still influence results (Wangkumhang et al. 2022).

Four São Toméan clusters better fitted a single admixture pulse, while cluster C5 fitted multiple pulses, since the additional goodness-of-fit (R2) explained by adding a second date was 0.44, which is higher than the threshold of 0.35 suggested by Hellenthal et al. (2014). For all pairs of Surrogate populations of cluster C5, there is an increased probability to copy DNA segments separated by small cM distances. However, the 1-date curve fit the data as well as the 2-date curve starting from 2 cM distances. We thus decided to retain the 1-date results for C5, also because the 2-date results accounted for a pulse of admixture between Cabo Verdean populations 197 generations ago, which is highly improbable since Cabo Verdean populations have been founded in the late 15th century as per historical records (Albuquerque and Santos 1991). Confidence intervals for the date of admixture were obtained from 100 bootstrap replicates (Fig. 7).

Recent Shared Ancestry Tracts

Long Identical by Descent Tracts

We used hapIBD to generate shared identity-by-descent (IBD) segments from the phased data of 96 São Toméan and 225 Cabo Verdean unrelated individuals. We used the same genetic distance map used for phasing, and we employed the seed-and-extend algorithm implemented in hap-IBD with default parameters including a maximum non-IBS gap of 1,000 base pairs and a minimum extension length of 1 cM. We found a total of 259,624 IBD segments shared between the 323 unrelated individuals from São Tomé and Cabo Verde. We filtered for IBD segments larger than 18 cM, taking into account the last 8 generations of recombination according to previous calculations (Baharian et al. 2016). By summing the length of all shared IBD segments greater than 18 cM, we calculated the cumulative IBD sharing between each pair of individuals. We plotted the resulting matrix as a heat map, capped at 10,000 cM (supplementary Fig. S10, Supplementary Material online). In Fig. 6a, we built a network based on this matrix using the “networkx” package in Python (Hagberg et al. 2008).

Runs of Homozygosity

We called runs of homozygosity (ROH) with GARLIC (Szpiech et al. 2017), considering the five genetic clusters in São Tomé as separate populations, as well as 59 individuals from the island of Santiago, in Cabo Verde, 20 Gambian (GWD) individuals, 20 Iberian (IBS) individuals, 11 Kongo individuals, and 20 Yoruba (YRI) individuals. For each population separately, we ran GARLIC using the weighted logarithm of the odds (wLOD) (Blant et al. 2017), with a genotyping-error rate of 0.001, and using the same genetic distance map used for phasing, window sizes ranging from 30 to 90 SNPs in increments of 10 SNPs, 100 resampling to estimate allele frequencies, and all other GARLIC parameters set to default values.

GARLIC estimates ROH using population-specific parameters: window sizes for ROH detection and length boundaries for ROH classification were determined independently for each population, based on allele frequency spectra and the distribution of ROH lengths, repsectively (supplementary table S5, Supplementary Material online). As a result, direct comparisons of ROH levels across studies can be challenging. Denser SNP panels enable the detection of more and shorter ROH segments, which likely contributes to the differences in total ROH and class length boundaries determined by GARLIC between our study and that of Korunes et al. (2022), based on approximately 411k and 880k independent SNPs, respectively.

Supplementary Material

msaf156_Supplementary_Data

Acknowledgments

The authors would like to thank all the São Toméan participants who contributed to this study. We also thank the “Paléogénomique et génétique moléculaire” (P2GM) platform at the Muséum National d’Histoire Naturelle, Musée de l’Homme, for their assistance in handling biological samples and generating genetic data, and the BIOMICS platform at the Institut Pasteur for carrying out the genotyping analyses.

Contributor Information

Marta Ciccarella, UMR7206 Eco-Anthropologie, CNRS, MNHN, Université Paris Cité, Paris, France; CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, Vairão 4485-661, Portugal; BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão 4485-661, Portugal.

Romain Laurent, UMR7206 Eco-Anthropologie, CNRS, MNHN, Université Paris Cité, Paris, France.

Zachary A Szpiech, Department of Biology, Penn State University, PA, USA; Institute for Computational and Data Sciences, Penn State University, PA, USA.

Etienne Patin, Human Evolutionary Genetics Unit, Institut Pasteur, Université Paris Cité, Paris, France.

Françoise Dessarps-Freichey, UMR7206 Eco-Anthropologie, CNRS, MNHN, Université Paris Cité, Paris, France.

José Utgé, UMR7206 Eco-Anthropologie, CNRS, MNHN, Université Paris Cité, Paris, France.

Laure Lémée, Plateforme Technologique Biomics, C2RT, Institut Pasteur, Paris, France.

Armando Semo, CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, Vairão 4485-661, Portugal; BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão 4485-661, Portugal.

Jorge Rocha, CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, Vairão 4485-661, Portugal; BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão 4485-661, Portugal; Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto 4099-002, Portugal.

Paul Verdu, UMR7206 Eco-Anthropologie, CNRS, MNHN, Université Paris Cité, Paris, France.

Supplementary Material

Supplementary material is available at Molecular Biology and Evolution online.

Author Contributions

M.C.—designed the study, conducted raw data QC and merging, conducted statistical and population genetics analyses, analyzed results, wrote the first draft of the article. R.L.—conducted raw data QC and merging. Z.A.S.—helped design statistical and population genetics analyses. E.P.—contributed raw data. F.D.-F.—conducted molecular genetics raw data generation. J.U.—conducted molecular genetics raw data generation. L.L.—conducted molecular genetics raw data generation. A.S.—contributed raw data. J.R.—designed the study, contributed samples, analysed results, participated to write the first draft of the article. P.V. designed the study, contributed samples, conducted statistical and population genetics analyses, analysed results, participated to write the first draft of the article. All authors—participated to write the article.

Funding

This project was partially funded by the French Agence Nationale de la Recherche (ANR) under grant ANR METHIS 15-CE32-0009-1. M.C. was supported by the Fundação para a Ciência e a Tecnologia (FCT), Portugal. Z.A.S. was supported by the National Institute of General Medical Sciences under award number R35GM146926.

Data Availability

The novel genetic data presented here can be accessed via the European Genome-phenome Archive (EGA) database with study number EGAS50000000920, upon request to the corresponding Data Access Committee. The dataset can be shared provided that future envisioned studies comply with the informed consents provided by the participants, and in agreement with institutional ethics committee's recommendations applying to this data.

Ethic Statement

Research sampling protocols followed the Declaration of Helsinki guidelines and written informed consent was obtained from all subjects involved in the sampling in São Tomé and Cabo Verde. The study of the São Toméan sample was undertaken with the support and permission of the Provincial Government of the Ministry of Health of the Democratic Republic of São Tomé and Príncipe, and the Provincial Government of Príncipe. For the Cabo Verdean sample, research and ethics authorizations were provided by the Ministério da Saúde de Cabo Verde (228/DGS/11), and the French ethics committees and CNIL (Declaration n°1972648).

References

  1. Agranat-Tamir  L, Mooney  JA, Rosenberg  NA. Counting the genetic ancestors from source populations in members of an admixed population. Genetics. 2024:226(4):iyae011. 10.1093/genetics/iyae011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Albuquerque  Ld, Santos  M. História geral de Cabo Verde. Lisboa: Centro de Estudos de História e Cartografia Antiga; 1991. [Google Scholar]
  3. Alexander  DH, Novembre  J, Lange  K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res.  2009:19(9):1655–1664. 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Almeida  J, Fehn  A-M, Ferreira  M, Machado  T, Hagemeijer  T, Rocha  J, Gayà-Vidal  M. The genes of freedom: genome-wide insights into marronage, admixture and ethnogenesis in the Gulf of Guinea. Genes (Basel). 2021:12(6):833. 10.3390/genes12060833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Altshuler  DM, Gibbs  RA, Peltonen  L, Altshuler  DM, Gibbs  RA, Peltonen  L, Dermitzakis  E, Schaffner  SF, Yu  F, Peltonen  L, et al.  Integrating common and rare genetic variation in diverse human populations. Nature. 2010:467(7311):52–58. 10.1038/nature09298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Auton  A, Abecasis  GR, Altshuler  DM, Durbin  RM, Abecasis  GR, Bentley  DR, Chakravarti  A, Clark  AG, Donnelly  P, Eichler  EE, et al.  A global reference for human genetic variation. Nature. 2015:526(7571):68–74. 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Baharian  S, Barakatt  M, Gignoux  CR, Shringarpure  S, Errington  J, Blot  WJ, Bustamante  C, Kenny  EE, Williams  SM, Aldrich  MC, et al.  The great migration and African-American genomic diversity. PLoS Genet. 2016:12(5):e1006059. 10.1371/journal.pgen.1006059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Behr  AA, Liu  KZ, Liu-Fang  G, Nakka  P, Ramachandran  S. Pong: fast analysis and visualization of latent clusters in population genetic data. Bioinformatics. 2016:32(18):2817–2823. 10.1093/bioinformatics/btw327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Beleza  S, Campos  J, Lopes  J, Araújo  II, Almada  AH, Silva  AC, Parra  EJ, Rocha  J. The admixture structure and genetic variation of the archipelago of cape verde and its implications for admixture mapping studies. PLoS One. 2012:7(11):e51103. 10.1371/journal.pone.0051103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Blant  A, Kwong  M, Szpiech  ZA, Pemberton  TJ. Weighted likelihood inference of genomic autozygosity patterns in dense genotype data. BMC Genomics. 2017:18(1):928. 10.1186/s12864-017-4312-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bouchard  M-E. Scaling proximity to whiteness: racial boundary-making on São Tomé island. Ethnography. 2023:24(2):197–126. 10.1177/1466138120967373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Bowcock  AM, Ruiz-Linares  A, Tomfohrde  J, Minch  E, Kidd  JR, Cavalli-Sforza  LL. High resolution of human evolutionary trees with polymorphic microsatellites. Nature. 1994:368(6470):455–457. 10.1038/368455a0. [DOI] [PubMed] [Google Scholar]
  13. Browning  SR, Waples  RK, Browning  BL. Fast, accurate local ancestry inference with FLARE. Am J Hum Genet.  2023:110(2):326–335. 10.1016/j.ajhg.2022.12.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Busby  GB, Band  G, Si Le  Q, Jallow  M, Bougama  E, Mangano  VD, Amenga-Etego  LN, Enimil  A, Apinjoh  T, Ndila  CM, et al.  Admixture into and within sub-saharan Africa. eLife. 2016:5:e15266. 10.7554/eLife.15266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Caldeira  AM. Mestiçagem, estratégias de casamento e propriedade feminina no arquipélago de São Tomé e Príncipe nos séculos XVI, XVII e XVIII. ARQUIPÉLAGO. História. 2007:11–12:49–71. http://hdl.handle.net/10400.3/624. [Google Scholar]
  16. Caldeira  AM. Learning the ropes in the tropics: slavery and the plantation system on the island of São Tomé. Afr Econ Hist. 2011:39:35–71. https://www.jstor.org/stable/23718978. [Google Scholar]
  17. Caldeira  AM. Escravos e traficantes no império português: o comércio negreiro português no Atlântico durante os séculos XV a XIX. Lisboa: A Esfera dos Livros; 2013. [Google Scholar]
  18. Caldeira  AM. The island trade route of São Tomé in the 16th century: ships, products, capitals. Riv Ist Stor Eur Mediterr RiMe. 2021:9:55–76. 10.7410/1507. [DOI] [Google Scholar]
  19. Carreira  A. Cabo Verde: formação e extinção de uma sociedade escravocrata (1460-1878). Bissau: Centro de Estudos da Guiné Portuguesa; 1972. [Google Scholar]
  20. Carreira  A. Migrações nas ilhas de Cabo Verde. Lisboa: Universidade Nova; 1976. [Google Scholar]
  21. Ceballos  FC, Joshi  PK, Clark  DW, Ramsay  M, Wilson  JF. Runs of homozygosity: windows into population history and trait architecture. Nat Rev Genet. 2018:19(4):220–234. 10.1038/nrg.2017.109. [DOI] [PubMed] [Google Scholar]
  22. Chacón-Duque  J-C, Adhikari  K, Fuentes-Guajardo  M, Mendoza-Revilla  J, Acuña-Alonzo  V, Barquera  R, Quinto-Sánchez  M, Gómez-Valdés  J, Everardo Martínez  P, Villamil-Ramírez  H, et al.  Latin Americans show wide-spread Converso ancestry and imprint of local Native ancestry on physical appearance. Nat Commun.  2018:9(1):5388. 10.1038/s41467-018-07748-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Coelho  M, Coia  CAV, Luiselli  D, Useli  A, Hagemeijer  T, Amorim  A, Destro-Bisol  G, Rocha  J. Human microevolution and the Atlantic slave trade. A case study from São Tomé. Curr Anthropol. 2008:49(1):134–143. 10.1086/524762. [DOI] [Google Scholar]
  24. Curtin  PD. The rise and fall of the plantation complex: essays in Atlantic history. Cambridge: Cambridge University Press; 1990. [Google Scholar]
  25. Delaneau  O, Zagury  J-F, Robinson  MR, Marchini  JL, Dermitzakis  ET. Accurate, scalable and integrative haplotype estimation. Nat Commun.  2019:10(1):5436. 10.1038/s41467-019-13225-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Eltis  D, Richardson  D. Atlas of the transatlantic slave trade. New Haven: Yale University Press; 2010. [Google Scholar]
  27. Fortes-Lima  C, Gessain  A, Ruiz-Linares  A, Bortolini  M-C, Migot-Nabias  F, Bellis  G, Moreno-Mayar  JV, Restrepo  BN, Rojas  W, Avendaño-Tamayo  E, et al.  Genome-wide ancestry and demographic history of African-descendant maroon communities from French Guiana and Suriname. Am J Hum Genet. 2017:101(5):725–736. 10.1016/j.ajhg.2017.09.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Fortes-Lima  C, Laurent  R, Thouzeau  V, Toupance  B, Verdu  P. Complex genetic admixture histories reconstructed with approximate Bayesian computation. Mol Ecol Resour. 2021:21(4):1098–1117. 10.1111/1755-0998.13325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Fortes-Lima  C, Verdu  P. Anthropological genetics perspectives on the transatlantic slave trade. Hum Mol Genet.  2020:30:R79–R87. 10.1093/hmg/ddaa271. [DOI] [PubMed] [Google Scholar]
  30. Goldberg  A, Rastogi  A, Rosenberg  NA. Assortative mating by population of origin in a mechanistic model of admixture. Theor Popul Biol.  2020:134:129–146. 10.1016/j.tpb.2020.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Goldberg  A, Verdu  P, Rosenberg  NA. Autosomal admixture levels are informative about sex bias in admixed populations. Genetics. 2014:198(3):1209–1229. 10.1534/genetics.114.166793. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Gonçalves  R, Hagemeijer  T. O português num contexto multilingue: o caso de São Tomé e príncipe. Rev Cient UEM: Sér Ciênc Soc. 2015:1:87–107. http://hdl.handle.net/10451/31032. [Google Scholar]
  33. Gravel  S. Population genetics models of local ancestry. Genetics. 2012:191(2):607–619. 10.1534/genetics.112.139808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Gurdasani  D, Carstensen  T, Tekola-Ayele  F, Pagani  L, Tachmazidou  I, Hatzikotoulas  K, Karthikeyan  S, Iles  L, Pollard  MO, Choudhury  A, et al.  The African genome variation project shapes medical genetics in Africa. Nature. 2015:517(7534):327–332. 10.1038/nature13997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Hagberg  AA, Schult  DA, Swart  PJ. Exploring network structure, dynamics, and function using NetworkX. Proc SciPy. 2008:1:11–15. 10.25080/TCWV9851. [DOI] [Google Scholar]
  36. Hagemeijer  T. The Gulf of Guinea creoles: genetic and typological relations. JPidgin Creole Lang. 2011:26(1):111–154. 10.1075/jpcl.26.1.05hag. [DOI] [Google Scholar]
  37. Hagemeijer  T, Rocha  J. Creole languages and genes: the case of São Tomé and Príncipe. Faits Lang.  2019:49(1):167–182. 10.1163/19589514-04901011. [DOI] [Google Scholar]
  38. Hellenthal  G, Busby  GBJ, Band  G, Wilson  JF, Capelli  C, Falush  D, Myers  S. A genetic atlas of human admixture history. Science. 2014:343(6172):747–751. 10.1126/science.1243518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Henriques  IC. São Tomé e Príncipe: a invenção de uma sociedade. Lisboa: Vega; 2000. [Google Scholar]
  40. Instituto Nacional de Estatística . Dados Distritais e Nacional Recenseamento 2012. São Tomé e Príncipe; 2012. https://www.ine.st/index.php/publicacao/documentos/category/71-dados-distritais-e-nacional-recenseamento-2012.
  41. Kim  J, Edge  MD, Goldberg  A, Rosenberg  NA. Skin deep: the decoupling of genetic admixture levels from phenotypes that differed between source populations. Am J Phys Anthropol. 2021:175(2):406–421. 10.1002/ajpa.24261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Korunes  KL, Soares-Souza  GB, Bobrek  K, Tang  H, Araújo  II, Goldberg  A, Beleza  S. Sex-biased admixture and assortative mating shape genetic variation and influence demographic inference in admixed Cabo Verdeans. G3 Genes|Genomes|Genetics. 2022:12:jkac183. 10.1093/g3journal/jkac183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Laurent  R, Szpiech  ZA, da Costa  SS, Thouzeau  V, Fortes-Lima  CA, Dessarps-Freichey  F, Lémée  L, Utgé  J, Rosenberg  NA, Baptista  M, et al.  A genetic and linguistic analysis of the admixture histories of the islands of Cabo Verde. eLife. 2023:12:e79827. 10.7554/eLife.79827. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Lawson  DJ, Hellenthal  G, Myers  S, Falush  D. Inference of population structure using dense haplotype data. PLoS Genet. 2012:8:e1002453. 10.1371/journal.pgen.1002453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Lorenzino  G. The Angolar creole Portuguese of São Tomé: its grammar and sociolinguistic history. Munchen: LINCOM Europa; 1998. [Google Scholar]
  46. Lucas  PG. The demography of São Tomé and Príncipe (1758–1822): preliminary approaches to an insular slave society. Anais Hist Alem Mar.  2015:XVI:51–78. [Google Scholar]
  47. Manichaikul  A, Mychaleckyj  JC, Rich  SS, Daly  K, Sale  M, Chen  W-M. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010:26:2867–2873. 10.1093/bioinformatics/btq559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Martin  AR, Gignoux  CR, Walters  RK, Wojcik  GL, Neale  BM, Gravel  S, Daly  MJ, Bustamante  CD, Kenny  EE. Human demographic history impacts genetic risk prediction across diverse populations. Am J Hum Genet. 2017:100:635–649. 10.1016/j.ajhg.2017.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Mas-Sandoval  A, Mathieson  S, Fumagalli  M. The genomic footprint of social stratification in admixing American populations. eLife. 2023:12:e84429. 10.7554/eLife.84429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Micheletti  S, Bryc  K, Esselmann  S, Freyman  W, Moreno  M, Poznik  G, Shastri  A, Beleza  S, Mountain  J, Agee  M, et al.  Genetic consequences of the transatlantic slave trade in the Americas. Am J Hum Genet. 2020:107:265–277. 10.1016/j.ajhg.2020.06.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Mooney  JA, Agranat-Tamir  L, Pritchard  JK, Rosenberg  NA. On the number of genealogical ancestors tracing to the source groups of an admixed population. Genetics. 2023:224:iyad079. 10.1093/genetics/iyad079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Mooney  JA, Huber  CD, Service  S, Sul  JH, Marsden  CD, Zhang  Z, Sabatti  C, Ruiz-Linares  A, Bedoya  G, Fears  SC, et al.  Understanding the hidden complexity of Latin American population isolates. Am J Hum Genet.  2018:103:707–726. 10.1016/j.ajhg.2018.09.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Nascimento  A. Poderes e Quotidiano nas Roças de S. Tomé e Príncipe. De finais de oitocentos a meados de novecentos. Lousa: Tipografia Lousanense; 2002. [Google Scholar]
  54. Nascimento  A. O Sul da diáspora : Cabo-Verdianos em plantaçōes de S. Tomé e Príncipe e Moçambique. Praia: Ediāo da Presidência da República de Cabo Verde; 2003. [Google Scholar]
  55. Nascimento  A. Exile and contract: journeys of the mozambicans to S. Tomé and Príncipe (1940 to 1960). Lisboa: Centro de História da Universidade de Lisboa; 2023. [Google Scholar]
  56. Ongaro  L, Molinaro  L, Flores  R, Marnetto  D, Capodiferro  MR, Alarcón-Riquelme  ME, Moreno-Estrada  A, Mabunda  N, Ventura  M, Tambets  K, et al.  Evaluating the impact of sex-biased genetic admixture in the Americas through the analysis of haplotype data. Genes (Basel). 2021:12:1580. 10.3390/genes12101580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Ongaro  L, Scliar  MO, Flores  R, Raveane  A, Marnetto  D, Sarno  S, Gnecchi-Ruscone  GA, Alarcón-Riquelme  ME, Patin  E, Wangkumhang  P, et al.  The genomic impact of European colonization of the Americas. Curr Biol. 2019:29:3974–3986.e4. 10.1016/j.cub.2019.09.076. [DOI] [PubMed] [Google Scholar]
  58. Patin  E, Lopez  M, Grollemund  R, Verdu  P, Harmant  C, Quach  H, Laval  G, Perry  GH, Barreiro  LB, Froment  A, et al.  Dispersals and genetic adaptation of Bantu-speaking populations in Africa and North America. Science. 2017:356:543–546. 10.1126/science.aal1988. [DOI] [PubMed] [Google Scholar]
  59. Pemberton  TJ, Absher  D, Feldman  MW, Myers  RM, Rosenberg  NA, Li  JZ. Genomic patterns of homozygosity in worldwide human populations. Am J Hum Genet. 2012:91:275–292. 10.1016/j.ajhg.2012.06.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Pool  JE, Nielsen  R. Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics. 2009:181:711–719. 10.1534/genetics.108.098095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. R Core Team . R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2021. https://www.R-project.org/. [Google Scholar]
  62. Salter-Townshend  M, Myers  S. Fine-scale inference of ancestry segments without prior knowledge of admixing groups. Genetics. 2019:212:869–889. 10.1534/genetics.119.302139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Sarno  S, Boattini  A, Pagani  L, Sazzini  M, De Fanti  S, Quagliariello  A, Gnecchi Ruscone  GA, Guichard  E, Ciani  G, Bortolini  E, et al.  Ancient and recent admixture layers in Sicily and Southern Italy trace multiple migration routes along the Mediterranean. Sci Rep. 2017:7:1984. 10.1038/s41598-017-01802-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Seibert  G. A questão da origem dos Angolares de São Tomé. CEsA Brief Papers. 1998:5. http://hdl.handle.net/10400.5/2112. [Google Scholar]
  65. Seibert  G. Tenreiro, Amador e os Angolares ou a reinvenção da história de São Tomé. In: Silva  MC, Saraiva  C, editors. Antropologia, história, África e academia. Lisboa: Etnográfica Press; 2013. p. 171–185. [Google Scholar]
  66. Seibert  G. Crioulização em Cabo Verde e São Tomé e Príncipe: divergências históricas e identitárias. Afro-Ásia. 2014:49:41–70. 10.1590/S0002-05912014000100002. [DOI] [Google Scholar]
  67. Seibert  G. Colonialismo em São Tomé e Príncipe: hierarquização, classificação e segregação da vida social. Anuário Antropol. 2015:40:99–120. 10.4000/aa.1411. [DOI] [Google Scholar]
  68. Semo  A, Gayà-Vidal  M, Fortes-Lima  C, Alard  B, Oliveira  S, Almeida  J, Prista  A, Damasceno  A, Fehn  A-M, Schlebusch  C, et al.  Along the Indian Ocean coast: genomic variation in Mozambique provides new insights into the Bantu expansion. Mol Biol Evol. 2020:37:406–416. 10.1093/molbev/msz224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Szpiech  ZA, Blant  A, Pemberton  TJ. GARLIC: genomic autozygosity regions likelihood-based inference and classification. Bioinformatics. 2017:33:2059–2062. 10.1093/bioinformatics/btx102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Szpiech  ZA, Mak  ACY, White  MJ, Hu  D, Eng  C, Burchard  EG, Hernandez  RD. Ancestry-dependent enrichment of deleterious homozygotes in runs of homozygosity. Am J Hum Genet. 2019:105:747–762. 10.1016/j.ajhg.2019.08.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Tomas  G, Seco  L, Seixas  S, Faustino  P, Lavinha  J, Rocha  J. The peopling of São Tomé (Gulf of Guinea): origins of slave settlers and admixture with the Portuguese. Hum Biol.  2002:74:397–411. 10.1353/hub.2002.0036. [DOI] [PubMed] [Google Scholar]
  72. Wangkumhang  P, Greenfield  M, Hellenthal  G. An efficient method to identify, date, and describe admixture events using haplotype information. Genome Res.  2022:32:1553–1564. 10.1101/gr.275994.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Wangkumhang  P, Hellenthal  G. Statistical methods for detecting admixture. Curr Opin Genet Dev.  2018:53:121–127. 10.1016/j.gde.2018.08.002. [DOI] [PubMed] [Google Scholar]
  74. Zaitlen  N, Huntsman  S, Hu  D, Spear  M, Eng  C, Oh  SS, White  MJ, Mak  A, Davis  A, Meade  K, et al.  The effects of migration and assortative mating on admixture linkage disequilibrium. Genetics. 2017:205:375–383. 10.1534/genetics.116.192138. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

msaf156_Supplementary_Data

Data Availability Statement

The novel genetic data presented here can be accessed via the European Genome-phenome Archive (EGA) database with study number EGAS50000000920, upon request to the corresponding Data Access Committee. The dataset can be shared provided that future envisioned studies comply with the informed consents provided by the participants, and in agreement with institutional ethics committee's recommendations applying to this data.


Articles from Molecular Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES