Summary
Over the past decade, genomic data have contributed to several insights on global human population histories. These studies have been met both with interest and critically, particularly by populations with oral histories that are records of their past and often reference their origins. While several studies have reported concordance between oral and genetic histories, there is potential for tension that may stem from genetic histories being prioritized or used to confirm community-based knowledge and ethnography, especially if they differ. To investigate the interplay between oral and genetic histories, we focused on the southwestern region of India and analyzed whole-genome sequence data from 156 individuals identifying as Bunt, Kodava, Nair, and Kapla. We supplemented limited anthropological records on these populations with oral history accounts from community members and historical literature, focusing on references to non-local origins such as the ancient Scythians in the case of Bunt, Kodava, and Nair, members of Alexander the Great’s army for the Kodava, and an African-related source for Kapla. We found these populations to be genetically most similar to other Indian populations, with the Kapla more similar to South Indian tribal populations that maximize a genetic ancestry related to Ancient Ancestral South Indians. We did not find evidence of additional genetic sources in the study populations than those known to have contributed to many other present-day South Asian populations. Our results demonstrate that oral and genetic histories may not always provide consistent accounts of population origins and motivate further community-engaged, multi-disciplinary investigations of non-local origin stories in these communities.
This study uses community-engaged and multi-disciplinary methods to investigate the interplay between oral and genetic histories in four Southwest Indian populations: Bunt, Kodava, Nair, and Kapla. Combining whole-genome sequencing data with unique oral histories from these populations reveals that oral and genetic histories do not always provide consistent accounts.
Introduction
Recent studies have made substantial inroads into characterizing the global genetic diversity of humans. Yet, many populations remain underrepresented in the genetic literature, leaving gaps in our understanding of regional genetic variation, genetic histories, and consequences for human health.1,2,3 While such gaps in the literature contribute to the motivation behind genomic research in underrepresented populations, it is imperative to be mindful of the ethical and socio-political ramifications of the research process for several Indigenous and marginalized peoples.4,5,6 In particular, when investigating genetic histories, conflation between self-identities, derived from a myriad of sources such as oral histories and life experiences, and genetics can lead to harmful consequences for populations.7,8,9 Prioritizing genetic histories over oral histories or using the former to “confirm” the latter can not only be disrespectful to Indigenous knowledge, but also create tension between geneticists, communities, and ethnographers. Instead of attempting to reconcile genetic and oral histories, it is more valuable to recognize the distinct nature and complexities of both forms of knowledge.10,11,12 Unfortunately, oral histories, based on, e.g., folk songs and stories, that underpin self-identities and local concepts of ethnogenesis in many populations are dwindling, with limited written records to preserve this unique heritage. All of this calls for efforts to document oral histories and seek out more interdisciplinary ways for geneticists to engage with populations and their cultural histories.
India has a complex history of human migrations and genetic admixture, as well as a rich set of varied socio-cultural practices, all of which have contributed to the extensive cultural and genetic diversity in this region. Previous studies of genetic diversity in India identified population genetic structure that broadly tracks with geography and language.13,14,15,16,17,18,19 These studies showed that many Indian populations, excluding speakers of Austroasiatic and Tibeto-Burman languages with substantial East Asian ancestries, can be modeled along a genetic cline bounded by two statistical constructs: first, Ancestral South Indians (ASI), with genetic components related to Ancient Ancestral South Indians (AASI) and ancient Iranian farmers and, second, Ancestral North Indians (ANI), modeled as a mix of ASI and a genetic component related to Middle-to-Late Bronze Age (MLBA) groups in the central Eurasian steppe region.15,20,21 In addition to these larger-scale ancestry gradients, at a finer-scale, genetic structure in India has been impacted by founder events and endogamous practices.14,21,22,23,24 Despite substantial diversity, characterization of human genetic variation in India has been limited, disproportionate to its proportion of the global population.
This study aims to develop a fine-scale characterization of population structure in Southwest India by recognizing the distinct positions held by inferences from genetic data and oral histories within a community-informed framework. We generated and analyzed whole-genome sequences from individuals identifying as Kodava, Bunt, Nair, and Kapla and, in conjunction with published genome-wide sequences from worldwide populations, investigated genetic histories and population structure in present-day Southwest India (Figure 1). We concurrently present community-engaged documentation of oral histories surveyed by the research team via conversations and observations. The Bunt, Kodava, and Nair have strong self-identities as well as unique cultural traits and oral histories reflected in their origin narratives, some of which are only shared between these populations and not with other geographically proximal Indian populations. Compared with these three populations, oral histories of the Kapla, based on anthropological literature and/or community accounts, are less well documented. Available information from communities and historical records reference non-local origins and contacts for all four populations. Here, we considered genetic and oral histories from our investigations to motivate future research on the latter of these populations and, more broadly, on thoughtful ways of bringing together their self-identities and genetic histories.
Materials and methods
Community engagement and sampling
This project was approved by the Mangalore University Institutional Review Board (IRB) (MU-IHEC-2020-4), the Institutional Ethical Committee of Birbal Sahni Institute of Palaeosciences, Lucknow (BSIP/Ethical/2021), and the University of Chicago Biological Sciences Division IRB (IRB18-1572). Community engagement in this study included information sessions in India and the US. Prior to sampling, project aims, methods, and other relevant details were explained to interested community members, and informed consent principles were observed during enrollment in the study. In several locations, local translators, community members, and long-term community contacts joined the research team to facilitate communication with the communities. At these sessions and during subsequent interactions, the research team received valuable input from members and leaders of the study populations on their social organization, cultural traits, and oral histories as well as their interest in genetic studies and expectations, which enriched the research process.
Information on oral histories and origin narratives shared by community members both during initial outreach and through a combination of directed and open-ended questions during sampling, in addition to few anthropological and historical texts, formed the basis of the results described in survey of population oral histories relating to origin stories. The interviews conducted with all donors while sampling consisted of the following questions on their self-identities: donor origins (origins of their parents and grandparents with additional population-specific prompts as appropriate, e.g., the maternal/paternal “Balya Mane” or “ancestral home” for the Kodava and the ancestral temple for the Nair), language spoken, and whether their family practiced matrilocal versus patrilocal traditions. We required that both parents of the donors were from the same community, and recording of the self-identities and origins of the donors and their parents and grandparents was critical to ensure that this criterion was fulfilled. Finally, the donors were asked if they wished to share any community-level origin stories that may have been passed down the generations in their families or through other means such as conversations with community elders. This part of the questionnaire was intentionally unstructured so the donors had the choice of sharing narratives of oral histories and origin stories freely.
Sampling was conducted in India and the US. In India, community contacts and sampling of whole blood were performed in January 2018 in South India in the states of Karnataka, Kerala, and Tamil Nadu. Members of the research team traveled to Kodagu, southern Karnataka, to conduct sampling of Kodava community members (n = 15). The team also conducted sampling of Kapla community members (n = 8), primarily men, who had come down from an isolated mountain settlement close by to work in the Kodagu coffee plantations. The team traveled to Mangalore, southern Karnataka, to conduct sampling of Bunt community members originating in the districts of Udupi and Dakshina Kannada in southern Karnataka and Kasaragod district in northern Kerala (n = 11). Moreover, the team visited four locations in Kerala and Tamil Nadu to conduct sampling of Nair community members (n = 44) who had origins in the districts of Pathanamthitta, Palakkad, Kozhikode, and Kannur in Kerala. In the US, the research team was approached by members of the Kodava diaspora to partner on a project investigating their population history. Community engagement was conducted through community representatives who initiated contacts with other community members across the US. Saliva kits, consent forms, and a brief questionnaire were mailed out to interested participants, and 105 participants returned all three to the research team and were enrolled in the study. This dataset was subsequently merged with the data generated from Indian participants, given overlapping aims, with the consent of members of the Kodava_US community.
Results of this study were disseminated to community members of all the study populations, both in India and in the US. Results were returned through a combination of written reports, presentations by the researchers, and a newspaper article, with efforts taken to make the information more accessible to a non-scientific audience. Dissemination was performed both in English and local languages.
Sample processing and genotype calling
For the Kodava, Nair, Bunt, and Kapla individuals from India, DNA extractions from whole blood were performed in India and were sent to MedGenome, Bangalore, India, for sequencing on Illumina HiSeq X Ten. They were sequenced to an average autosomal depth of 2.5×. For members of the Kodava_US population, saliva samples were collected using the Oragene DNA self-collection kit and extraction was performed using the QIAcube standard protocol in Chicago. The extracts were sequenced to an average autosomal depth of 5.5× on the Illumina NovaSeq 6000 at Novogene, Sacramento, CA in four batches (Table S1). Four samples were re-extracted due to low DNA concentration in the first extraction round and are denoted as ID∗_b in Table S1. We re-sequenced three Kodava_US individuals to 77× average depth and two individuals each from Bunt, Kapla, and Nair populations to 30× average depth for variant discovery and for validation of concordance in our genotype calling pipeline. All reads were subsequently filtered to have a mapping quality score greater than 30 and were aligned to the human reference genome build 37 (hg19) and the revised Cambridge Reference Sequence (rCRS) build 17, for autosomal + Y chromosome and mtDNA variation, respectively.
Genotypes were called on all high- and low-coverage samples jointly using the GATK (v.4.0) best practices germline short variant discovery workflow on the 8,183,696 sites from the Genome Asia panel.19 After this initial phase of genotype calling, the GATK phred-scaled genotype likelihoods were converted to genotype probabilities. IMPUTE2 (v.2.3.2) along with the 1000 Genome Phase 3 reference panel were used to perform genotype refinement.25,26 After genotype refinement, hard-called genotypes were set to the genotype with the maximum probability and genotypes with a maximum probability less than 0.9 were considered missing. Following genotype calling and refinement, chromosomes were phased using SHAPEIT4 (v.4.2.2) along with the 1000 Genome Phase 3 reference panel.27
Merging with external datasets and quality control
We merged our sequence data with publicly available data from the GenomeAsia 100K consortium19 and the Human Genome Diversity Project.28 To provide additional context in South Asia, especially to increase representation of Indian populations, we further merged this dataset with genotype data from Nakatsuka et al.14 Following these merges, we retained 425,620 SNPs for analysis of population structure.
To compare genetic affinities with ancient individuals, we merged samples from the Allen Ancient DNA Resource,29 which contains genotypes across ∼1.23 million sites represented on the “1240k capture panel.” We also merged the publicly available Human Origins dataset from Lazaridis et al.,30 which contains present-day worldwide samples. We used this final merged dataset, consisting of our study samples and five external SNP datasets for all downstream analyses in this article.
Following these merging steps, we filtered samples according to relatedness (up to second degree) using KING and removed samples with >5% SNP missingness for all downstream population genetic analyses.31 We did not apply this filter to data from ancient samples to retain sparse genotyping data for these individuals. Applying these criteria to our data led to the retention of all individuals as described in community engagement and sampling, except for the Kodava_US, from which we excluded 27 genomes for a final sample number of 78 from the 105 individuals sampled initially, and the Kapla, for which we excluded two genomes for a final sample number of 8 from the 10 individuals sampled initially (Table S1). The applied filters also retained 425,620 unique autosomal SNPs on which all subsequent analyses were performed.
Principal-component analysis and ADMIXTURE
We performed a principal-component analysis (PCA) using smartpca from the Eigensoft package (v.7.2.1).32,33 We filtered the merged datasets according to linkage disequilibrium using PLINK 1.9 [--indep-pairwise 200 25 0.4].34 We chose to plot only population median locations in PCA space (as gray text labels) of populations from external datasets for ease of visualization, with populations exclusive to this study represented by larger dots of the same color.
This same set of LD-pruned variants was used to run ADMIXTURE v.1.3, with K = 6–11.35 We employed 10-fold cross-validation to find the value of K = 7, which minimized the cross-validation error. We restricted visualization of clustering results in the main text figure to representative populations from each geographic region with at least eight samples.
Admixture history using Treemix
We modeled the relationship between the study populations in Southwest India and a subset of present-day and ancient Eurasian groups (Table S2) using Treemix.36 Positions were filtered by missingness and LD pruned with PLINK 1.9 as for PCA. The analysis was run, estimating up to 10 migration edges with Mbuti as the outgroup.
Ancestry estimation using f-statistics
For calculating f-statistics defined in Patterson et al.,37 we used the ADMIXTOOLS v.7.0.2 software. The first statistic we estimated was the outgroup-f3 statistic, which measures the shared drift or genetic similarity between populations (A and B) relative to an outgroup population (O) and was calculated using the qp3pop program. We used the Mbuti population as the outgroup, unless otherwise noted.28
To model the study populations (Study) as mixtures of ANI- and ASI-related ancestries, we used the f4 ratio test.20 We measured the proportion of Central Steppe MLBA ( genetic ancestry via the ratio (see Figure S1):15
We selected populations to include in the above topologies based on the scheme in Moorjani et al.21 As noted previously by others,15,38 Onge are genetically distant from the true AASI source but continue to be used to model the AASI source in South Asians due to the lack of data from the true source or closer proxies. In brief, we ran a D-statistic using the qpDstat program of the form D(Central_Steppe_MLBA, Iran_GanjDareh; H3; Ethiopian4500) to evaluate the degree of allele sharing between Central Steppe MLBA and H3. As H3, we selected a large array of present-day populations from Eurasia. We identified the population with the highest value in this test (Karelia) as the closest population to Central Steppe MLBA (Figure S2). It should be noted that this test is likely to overestimate the proportion of Central Steppe MLBA-related ancestry as it is unable to differentiate between the genetic ancestries related to the former and ancient Iranians.
We computed allele sharing of sub-clusters of Kapla based on PCA, Kapla_A, and Kapla_B, with different present-day populations (H3) from Europe, the Middle East, and South Asia, specifically those on the ANI-ASI cline through the topology D(Kapla_B, Kapla_A; H3, Mbuti),37 using the qpDstat program.
Furthermore, to investigate African-related admixture in Kapla individuals, we tested the topology D(Paniya, Kapla_A/Kapla_B/Kapla; H3, Mbuti), with H3 being select ancient and present-day African populations: Luhya, Yoruba, BantuKenya, and Ethiopia_4500BP_published.SG. Paniya was fixed as one of the in-group populations as they are genetically similar to the Kapla and do not have African-related genetic ancestry.
For all computed D-statistics, we used the (f4mode: NO, printsd: YES) options in qpDstat.
Estimation of ancestry proportions based on qpAdm
To model the ancestry proportions of the study and other select Indian populations for comparison, we used qpAdm39,40 implemented in ADMIXTOOLS v.7.0.2 software with (allsnps: YES, oldallsnpsmode: NO, inbreed: NO, details: YES). As many present-day South Asians on the ANI-ASI cline can be modeled as a mix of genetic ancestries related to the Indus Periphery Cline, Onge as a distant representative of the AASI, and/or Central Steppe populations from the Middle and Late Bronze Age,15 we considered these as our genetic sources for the target populations included in the analysis (Table S3). For the outgroups, we used the list outlined in the proximal model of present-day South Asians by Narasimhan et al.15 (Table S3). This includes Mbuti (n = 10), Western European Hunter Gatherer (WEHG) (n = 36), Eastern European Hunter Gatherer (EEHG) (n = 4), West Siberian Hunter Gatherer (WSHG) (n = 2), East Siberian Hunter Gatherer (ESHG) (n = 10), Dai (n = 10), Ganj_Dareh_N (n = 8), and Anatolia_N (n = 21). Leveraging oral history knowledge, we also ran tests including as a fourth source published genome-wide data from various Scythian groups from Krzewińska et al.,41 Järve et al.,42 Damgaard et al.,43 Gnecchi-Ruscone et al.,44 and Unterländer et al.45 for the two Kodava groups, Nair, and Bunt as targets. As proxy for genetic diversity contemporaneous to Alexander the Great’s reign and may have contributed to his army, we included as a fourth source Classical and Hellenistic period groups from Greece and Macedonia, a Late Bronze Age group from Armenia, and Iron Age groups from Turkmenistan, Iran, Pakistan, and Anatolia15,15,46,47,48,49,50 for the two Kodava groups as targets (Table S3). For these tests, we modified the outgroup list from Narasimhan et al.15 used in the three-source model by additionally including proximal outgroups to the Scythian (Israel_Natufian_published, Altaian) and Greek and Macedonian Classical and Hellenistic period (CHG, Italy_Sicily_LBA, Jordan_PPN) source groups (Table S3). We used qpWave as implemented in ADMIXTOOLS251 to confirm that these additional outgroups were able to differentiate pairs of source populations in each model. We used the function extract_f2 with the maxmiss parameter set at 0.2 to increase the number of overlapping positions, followed by the function qpwave_pairs to evaluate the cladality between all possible pairs of source populations using the same set of “right” populations (outgroups in qpAdm models) (Figure S3). Only non-cladal populations were used in each model.
Admixture dates based on LD decay in ALDER
To estimate the timing of ANI-related genetic admixture in the study populations, we used the weighted admixture linkage disequilibrium decay method in ALDER.52 The true ancestral populations for South Asian populations are unknown, so we followed the approach originally proposed in Narasimhan et al.15 and Moorjani et al.21 In brief, this modified approach approximates the spectrum of ANI-ASI admixture using PCA-derived SNP weights to capture ancestry-associated differences in allele frequency without requiring explicitly labeled source populations. Following previous studies, we included primarily Dravidian- and Indo-European-speaking populations in India that fall on the ANI-ASI cline on the PCA and have greater than five individuals, with the exception of Kapla_B, which comprised three individuals, and Basque individuals from Europe as an anchor for Western Eurasian ancestry when constructing the SNP weights (Table S4).21 We estimated the timing of admixture in generations and years, assuming a generation time of 28 years.53 Individuals from the focal test population for admixture were excluded when computing PC-based weights to limit biases. For this reason, we also excluded the Coorghi from the reference panel and the test group due to historical connections between the names “Coorg” and “Kodagu,” referring to the same region in Southwest India. Based on historical accounts, Coorghi and Kodava may be the same or closely related communities. Unless otherwise stated, we use Z score ≥2 (p < 0.05) across ALDER tests as indicative of a significant signal of admixture.21,52
Haplotype-based estimation of population structure using fineSTRUCTURE
We implemented a haplotype-based approach using the software ChromoPainter and fineSTRUCTURE.54 In brief, this approach estimates a co-ancestry matrix based on haplotype sharing under a haplotype-copying model (ChromoPainter), which can be used to identify population structure through clustering (fineSTRUCTURE). We first estimated the effective population size (Ne) and the mutation parameter θ in a subset of chromosomes (1, 5, 10, and 20) with 10 expectation maximization iterations. The average value for both parameters was estimated across chromosomes and used in subsequent runs (Ne = 401.352, θ = 0.0002). We standardized to five individuals per population, filtering out groups with a sample size below this threshold and randomly sampling five individuals from groups with larger sample size. The co-ancestry matrix obtained from ChromoPainter was used for further inferences in fineSTRUCTURE. The analysis was run at different levels of population composition; the first included all Eurasian populations, followed by South Asia-specific runs. The per-locus copying probabilities across all haplotypes in a population under the maximum-likelihood parameters were used to estimate the median per-locus copying probabilities.
mtDNA and Y chromosome analyses
The alignment files for each sample were used to retrieve reads mapping specifically to the mitochondrial genome (mtDNA) using samtools.55 Variants from mtDNA reads for each sample were called against the revised Cambridge Reference Sequence (rCRS) using bcftools mpileup and bcftools call, requiring a mapping quality of 30 and base quality of 20. The mtDNA variant call from 156 samples were used as input for assigning mtDNA haplogroups using the program Haplogrep2 with PhyloTree mtDNA tree Build 17.56,57 The final haplogroup assignments reported in the study were based on the highest quality score and rank assigned by Haplogrep2.
For assigning Y chromosome haplogroups, we first extracted reads mapping to the Y chromosome using samtools. Variants from Y chromosome reads were then called using bcftools mpileup and bcftools call, requiring a mapping quality of 30 and base quality of 20 (-d 2000 -m 3 -EQ 20 -q 30). The variant calls from 98 male samples were then used to call haplogroups using Yhaplo.58
To investigate the influence of matrilocality and patrilocality on uniparentally inherited markers, we calculated haplotype and nucleotide diversity on the mitochondrial DNA.59,60 We estimated standard deviations of haplotype diversity within populations using 100 bootstrap samples and used one-sided t tests to evaluate differences between haplotype diversity.
Pseudo-haploidization of whole-genome sequencing data
We replicated select analyses with pseudo-haploid calls to provide confirmation of many of our population genetic results in light of the lower depth of coverage for many of the sequenced individuals. The alignment files were used to generate pseudo-haploid calls for the Kodava, Kodava_US, Nair, Bunt, and Kapla individuals. We first used samtools mpileup with the flags (-Q 30 -q 20 -R -B) to generate a mpileup output format for the individuals’ alignment files. The resulting mpileup file was then used as input for pileupCaller from the SequenceTools package.
Estimation of IBD scores and founder events
To measure the extent of endogamy in each population, we evaluated the IBD score, which is a measure of recent strength of consanguinity.14 The IBD score is defined as the sum of IBD between 3 and 20 cM detected between individuals of the same population, divided by , where n is the number of individuals. We used GERMLINE2 to call IBD segments with the flag -m 3 to only consider IBD segments at least 3 cM in length, using the deCODE genetic map coordinates for genome build hg19.61 We also removed individuals related up to second degree using KING31 and removed IBD segments >20 cM.14 In the original application of the IBD score,14 the authors normalized to individuals of Finnish and Ashkenazi Jewish ancestry as canonical examples of founder populations with a higher recessive disease burden. However, we left the IBD scores as raw values, so we only interpreted the results relatively between Indian populations we tested against.
To characterize founder events, we used ASCEND, with default standard parameters for present-day populations.62 We selected Mbuti as an outgroup, under the assumption that this population has limited shared recent demography with the target populations. We reported a founder event to be significant if four criteria were met:62 (1) the 95% CIs of the estimated founder age and intensity did not include 0, (2) the estimated founder age was <200 generations before present and its associated SE was <50 generations, (3) the estimated founder intensity was >0.5%, and (4) the normalized root mean-square deviation was <0.29.
Runs of homozygosity
We estimated runs of homozygosity (ROH) using PLINK 1.9, using the parameters: --homozyg-window-snp 50 --homozyg-snp 50 --homozyg-kb 1500 --homozyg-gap 1000 --homozyg-density 50 --homozyg-window-missing 5 --homozyg-window-het 1.63,64 To increase SNP density, we limited our analysis to a panel including only whole-genome sequencing data from GenomeAsia 100K19 and the Human Genome Diversity Project.28 After filtering positions with more than 5% missing data and MAF of 5%, the final database retained 3,288,336 variants.
Novel variant discovery
To perform variant discovery, we relied on the nine high-coverage whole-genome sequencing (WGS) samples (coverage >30×) from four populations. In total, we had three Kodava_US, two Bunt, two Kapla, and two Nair individuals in this smaller set of individuals with WGS data. Alignment to GrCh37 (hg19) was performed using bwa-mem.65 We called genotypes within these individuals using GATK v.4.0 best practices and filtered to SNPs and indels with a phred-scaled quality score of >30 to carry forward for variant annotation.
For variant annotation, we used Ensembl Variant Effect Predictor (VEP) v.105.0 with custom annotations derived from GnomAD genomes site-level VCF files and GenomeAsia site-level VCF files. In addition, we used the --af_gnomad --af_1kg flags to detect whether the alleles were found in the 1000 Genomes project or in GnomAD exomes. Therefore, our definition of novel variants only includes variants that were not found in any of the aforementioned datasets. When assessing variant effects, we used the broad categories provided by VEP (MODIFIER, LOW, MODERATE, HIGH).
Results
Survey of population oral histories relating to origin stories
While the Kodava, Bunt, and Nair have strong self-identities and associated oral histories, relatively little is documented in the ethnographic literature.66,67,68,69,70 Consequently, as noted in materials and methods, the information on the socio-cultural characteristics of these three populations noted below derives primarily from a combination of directed and open-ended questions asked of the participants and community representatives either during initial outreach or sampling. These populations are all speakers of Dravidian languages—Kodava (Kodava), Malayalam (Nair), and Tulu/Kannada (Bunt). From the unstructured portion of the interviews pertaining to the origin stories, there were notable overlaps in these populations’ traditions and cultural traits, which may stem from geographical proximity and historical contacts that are also referenced in these populations’ oral histories.70 Community members from the three populations shared that their self-identities included historical “warrior” status designations/identities and matrilineal descent in the Bunt and Nair, with the latter additionally practicing matrilocality in the recent past. Moreover, as noted during interviews and in the anthropological literature, the Nair have a complex and long-standing social system that consists of several subgroups.71,72,73 Similarly, there are anecdotal accounts, for instance, noted in blogs maintained by community members and relayed to members of the research team during sampling and subsequent population interactions, of phenotypic characteristics that members of these populations draw on to support possible non-local origins. As noted during the interviews, these narratives are used by population members to contemplate unique phenotypes and customs such as music, diet, religious affiliation, and traditional wear that set them apart from neighboring populations. Since the exact mode and timing of past population contacts may not always be explicitly stated in oral histories, it is unknown whether the nature of the non-local population interactions that they invoke was socio-economic, genetic, or both.
Based on the interviews and community interactions, we identified that one link to non-local populations shared across the oral histories of the Bunt, Kodava, and Nair was to the Scythians, ancient nomadic people inhabiting the Eurasian Steppe during the Iron Age. In addition, members of the Kodava community shared with the researchers a particular origin story suggesting links to the Kurdish Barzani tribe through the latter’s involvement in Alexander the Great’s campaign into India. The following is noted in:74
In those days, when the army advanced, their families of fighting men too moved behind them, as camp followers. After Alexander turned back, some tribes in his army who had no energy to get back to their homeland, stayed back in India. …..Our ancestors [Kodava, are] believed to have taken a southernly route along with Western Ghats in search of better prospects [and] eventually settled in Kodagu (Coorg) which was then an unnamed, inhospitable and extremely rugged hilly region”.
Physical appearance was cited during our interactions with Kodava_US community members as possible evidence of non-local origin.
To our knowledge, there is very little documentation on the Kapla in the anthropological literature. Based on the structured portion of the interview, the Kapla donor origins were noted to be geographically close to the Kodava in the Kodagu region of southern Karnataka (Figure 1), and they spoke a mixture of two Dravidian languages, Tulu and Kodava. The donors as well as other community members shared that the population practiced a form of hunting and gathering, but their subsistence was transitioning slowly with increased contacts with the Kodava, on whose coffee plantations Kapla men worked during the harvesting season. Furthermore, the Kapla donors shared that they belonged to families practicing patriarchal and patrilocal systems, with a preference toward marriages involving MBD (mother’s brother’s daughter) and junior sororate marriage practices. The donors and community members did not have any origin stories to share during the unstructured portion of the interview. Historical records suggest:75
The Kaplas, who live near Nalkanad palace seem to be mixed descendants of the Siddis [ …]. They have landed property of their own near the palace, given by the Rajahs, and work also as day laborers with the Coorgs. Their number consists of only 15 families.
Siddis are historical migrants from Africa who live in India and Pakistan.76 Interviews with long-term community contacts revealed that once the Kapla settled in their present location in Kodagu, they were isolated from neighboring populations.
Broad-scale population structure in southwest India
We used PCA to explore the broader population structure within South Asia, with a particular focus on Southwest India. The main axes of genetic variation highlight the ANI-ASI genetic cline (Figures 2A, 2B, and S4). The Kodava, Kodava_US, Bunt, and Nair populations are placed in PCA space close to other populations that are geographically proximal (e.g., Iyer, Iyangar, and Urban Bangalore individuals from Nakatsuka et al.14 and GenomeAsia100K Consortium19). This pattern was also captured by genetic ancestry clusters from ADMIXTURE where, at K = 7, the Kodava, Kodava_US, Bunt, and Nair shared genetic components (colored pink and gray in Figures 2C, S5A, S5B, and S6) with many other populations from South Asia. In addition, they displayed components maximized in the Kalash (colored orange) and in Europe (colored red) that were also observed in populations from Central Asia and the Middle East as well as in most North Indian and some South Indian populations. Notably, the proportion of all these components in the Bunt, Kodava, Kodava_US, and Nair were in line with the proportions displayed by geographical neighbors such as Iyer and Urban Bangalore and Urban Chennai individuals from the GenomeAsia100K Consortium.19 Furthermore, we did not detect any structure within the Nair despite sampling broadly across the state of Kerala or within the Kodava_US donors who, while being recent migrants to the US, originate from various locations within the Kodagu district in southern Karnataka (Figure 1).
On the other hand, the Kapla showed genetic similarity to populations with higher ASI-related genetic ancestry from South India, such as the Ulladan and Paniya, displaying pink and gray components found in most South Asians but substantially lower orange component maximized in the Kalash and found in populations with higher ANI-related genetic ancestry (Figures 2A–2C). In addition to being genetically differentiated from the other populations sampled in this study, the Kapla also exhibited greater spread on the PCA than other South Indian populations maximizing ASI ancestry, such as Ulladan and Paniya (Figures 2B and S7A). Finally, we found no qualitative support for substantial genetic similarity between the Kapla and the Siddi or tested African individuals (Figure S7B).
We evaluated shared genetic drift between the study populations from Southwest India and several Eurasian populations using an outgroup-f3 statistic of the form f3(Study_populations, X; Mbuti), where Study_populations correspond to one of the Southwest Indian populations sequenced in this study and X to a set of Eurasian populations from the dataset described in materials and methods. We observed a higher genetic affinity between all our study populations and other Indian populations (Figures 2D, S8, and S9), particularly from South India, which is consistent with the results of PCA and ADMIXTURE. More specifically, patterns of shared drift between the study populations and other South Indian populations differed, as seen in previous analyses. The Kodava, Kodava_US, Bunt, and Nair shared higher genetic affinity with each other and, to an extent, with populations that derive genetic ancestry from both ANI- and ASI-related sources, while the Kapla were genetically similar to populations with more ASI-related genetic ancestry such as tribal populations such as Ulladan from Kerala77 and individuals from Handigodu village of the Shimoga District of Karnataka.14
We constructed a maximum-likelihood tree using Treemix and modeled up to ten admixture events using a subset of present-day and ancient populations from Eurasia and an African outgroup (Mbuti), excluding populations with known recent admixture from Africa and East Asia (Figures S10A–S10K; Table S5). Without migration edges, we observed expected broad-level relationships between the populations, with Western and Eastern Eurasian populations forming distinct clusters. South Asian populations fell between these clusters, with populations that maximized ANI-related genetic ancestry reflecting higher similarity to Western Eurasians and populations that maximized ASI-related genetic ancestry being more similar to the Onge and East Asians populations. The Kodava, Kodava_US, Bunt, and Nair were closely related to each other and to other South Indian populations such as Iyer and Iyangar. The Kapla clustered with populations with a higher proportion of ASI-related ancestry, such as Paniya and Palliyar, consistent with previous analyses (Figure 2B). Few early migration edges (Table S5) recapitulated previously reported events such as the Steppe-related gene flow, represented by ancient Central Steppe MLBA individuals from Russia, into the French.
Estimation of admixture proportions and timing in southwestern India
To directly model the study populations as a mixture of genetic ancestral components, we used both the f4 ratio test and qpAdm. A two-way admixture model tested using the f4 ratio method has been shown to be overly simplified for most Indian populations,15,21,30,46 and we used it primarily to assess the relative proportions of Central Steppe MLBA-related genetic ancestry, a proxy for ANI-related genetic ancestry, in the study populations. The proportion of Central Steppe MLBA in the Nair, Bunt, Kodava, and Kodava_US ranged between 45% (±1.4%) and 48% (±1.6%), similar to the proportion found in other populations with higher ANI-related genetic ancestry, including some South Indian and the majority of North Indian populations. In contrast, the proportion of Central Steppe MLBA in Kapla was genetically similar to neighboring South Indian tribal populations with higher ASI-related genetic ancestry (26.34% ± 2.5%), which was lower than the other study populations (Figure S11).
We next investigated more complex admixture histories, particularly three-way admixture related to Central Steppe MLBA, Indus Periphery Cline (primarily ancient Iranian farmer-related ancestry), and Onge. We employed the qpAdm framework starting from a previously proposed model by Narasimhan et al.15 and Harney et al.39 Results from this analysis are expected to provide more accurate proportions of the three ancestral sources compared with the f4 ratio test, which is unable to distinguish between the ancestries related to Central Steppe MLBA and ancient Iranians and would hence overestimate the proportion of the former. As observed in Narasimhan et al.,15 several South Asian populations, including the two Kodava groups, Nair, and Bunt sequenced in this study, have p values less than 0.0539 (Table S6). This suggests that the tested model does not sufficiently explain the genetic ancestries in these four populations as well as in many other South Asians. We further included as additional sources various Scythian groups and proxies of groups contemporaneous to Alexander the Great’s reign, as indicated in the oral histories of these populations, but also failed to retrieve working models for these tests (Table S7).
In contrast, the Kapla could be modeled with little Central Steppe MLBA-related genetic ancestry (4.6%) compared with genetic ancestries related to the Indus Periphery Cline (40.3%) and Onge (55%). These genetic ancestry proportions were similar to Palliyar, Paniya, and Ulladan, which are South Indian tribal groups with higher ASI-related genetic ancestry (Table S6). Furthermore, given the current resolution of the methods and data available, we found no evidence for Siddi (or African) ancestry contributing to the Kapla, based on the above qpAdm results as well as admixture f3 statistics, with no f3 statistics reflecting a significant genetic contribution from a Siddi-related or African genetic ancestry source (Table S8). We additionally explored the increased variation in PCA among the Kapla by separating Kapla individuals into two groups based on the PCA results where Kapla_A included five individuals closer to the ASI end of the cline and Kapla_B included the other three individuals closer to the ANI end. We estimated D-statistics of the form D(Kapla_B, Kapla_A; H3, Mbuti), where H3 denotes populations from Europe, the Middle East, and populations from South Asia that fall on the ANI-ASI cline. In agreement with the PCA, we found that most tested European and Middle Eastern populations were significantly closer to the three Kapla individuals closer to the ANI end (Kapla_B), and a few tested populations that maximized ASI ancestry, such as Ulladan, Paniya, and Palliyar, were significantly closer to Kapla_A (Table S9). We further computed D-statistics of the form D(Paniya, Kapla_A/Kapla_B/Kapla; H3, Mbuti), where H3 consists of ancient and present-day groups with African ancestries to infer whether the Kapla individuals show any evidence of African-related admixture. We found no evidence of additional allele sharing between any of the Kapla groups and the H3 groups (Table S10). To infer the proportions of Central Steppe MLBA, Indus Periphery Cline, and Onge in Kapla_A and Kapla_B, we implemented qpAdm using sources and outgroups from Narasimhan et al.15 (Table S6). Kapla_A exhibited a slightly higher proportion of Onge-related ancestry (58.4%) compared with Kapla_B (49.4%), and Kapla_B exhibited a slightly higher proportion of the other two components: 7.3% Central Steppe MLBA versus 2.9% in Kapla_A and 43.4% Indus Periphery Cline versus 38.7% in Kapla_A. We could not further narrow down the mode or timing of the within-population differentiation we observed above (e.g., via recent admixture with an ANI group or more ancient structure) due to the small sample size and substantial ancestry sharing between Kapla and neighboring South Indian populations included in our tests.
Broadly, the genetic source proportions in the study populations were within the range displayed by other South Asian populations, although the genetic ancestries related to Central Steppe MLBA and Indus Periphery Cline were qualitatively at the higher end of the range in the Nair and Kodava compared with those observed in neighboring Indian populations. The relative amounts of allele sharing of our study and other South Asian populations with the three genetic sources, determined using D-statistics (Figures S12A–S12C and S13A–S13C), were in agreement with these results and our other analyses. More genetic data, especially from ancient samples from the region, can potentially shed further light on the complexities in genetic ancestry that may be missing in the current model.
Finally, to estimate the timing of the introduction of genetic ancestral source(s) related to ANI in South Asia, we ran ALDER on select South Asian populations using PC-derived weights reflective of the ANI-ASI ancestry cline to anchor the ancestry sources.15,21,52 The date estimates for populations with higher Central Steppe MLBA-related genetic ancestry, typically from North India, ranged from 59.86 ± 5.78 to 117.27 ± 13.7 generations. South Indian populations with higher ASI-related genetic ancestry tended to display slightly older dates than the former group, ranging from 91.9 ± 6.74 to 195.93 ± 48.75 generations (Table S11), in agreement with previous studies.15,21 The ANI-ASI admixture time estimates for the Bunt, Kodava, Kodava_US, and Nair ranged between 94.28 ± 6.27 and 110.93 ± 9.23 generations, which was within the range displayed by other populations included in the analysis (Table S11). We ran Kapla_A and Kapla_B separately given their differential admixture histories and, while Kapla_A did not yield a significant result (Z score ≥2, corresponding to p < 0.05), the estimate for Kapla_B was recent (5.03 ± 1.99 generations). As pointed out previously, these dates represent a complex series of admixture events and, particularly for populations with both the Indus Periphery Cline and Central Steppe MLBA-related genetic ancestries, may reflect an average of these events.15,21
Characterizing fine-scale genetic admixture using haplotype-based analyses
To evaluate population structure and genetic affinities at a finer scale, we implemented a haplotype-based analysis using ChromoPainter and fineSTRUCTURE.54 In this analysis, we first investigated general patterns of haplotype similarities followed by regional level patterns, including only South Indian populations. The degree to which an individual copies their haplotypes from another individual reflects the genetic similarity between those individuals at a haplotypic level.
At a broad level, we observed that both the Kodava groups, Bunt, Nair, and Kapla had the highest haplotype-copying rates from South Asian ancestry donors, particularly South Indian populations, consistent with previous analyses (Figure S14). At a regional scale, the populations sequenced in this study clustered into three groups: (1) Nair and Bunt, (2) Kodava, Kodava_US, and Coorghi, and (3) Kapla (Figures 3 and S15). Within the first two clusters, we could not identify further discernable sub-structure. When evaluating haplotype similarity to Central and Western Eurasian sources of ancestry, we found that the Nair and Bunt have a qualitatively higher per-locus copying probability to these sources relative to the Kodava, although not quantitatively significant (Figure S16; Table S12). In the case of Kapla, we observed haplotype similarity with Ulladan and Handigodu village donors, and significantly lower per-locus copying to Eurasian ancestry sources (Figure S16). The additional sub-structure uncovered through haplotype-based methods across the Nair, Bunt, and Kodava groups suggests more recent population structure in these populations.
Uniparental marker diversity in the Southwest Indian populations
The populations sequenced in this study displayed a mix of maternal lineages found today in South Asia (haplogroup M) and Western Eurasia (haplogroups R, J, U, HV) (Figure 4A; Table S13). Mitochondrial DNA haplogroup (mtDNA hg) R, reported to have originated in South Asia,78,79,80,81 was observed in each of the study populations. Individuals from both Kodava groups, Bunt, and Nair carried lineages of mtDNA hg U, previously reported in ancient and present-day individuals from the Near East, South Asia, Central Asia, and Southeast Asia.15,82,83 Some of the newly sequenced individuals belonged to subclades of haplogroup HV, a major subclade of haplogroup R0 found broadly from Eastern Europe to South Asia.15,84,85 In addition, two Kodava_US individuals were assigned to mtDNA lineages that have, to our knowledge, not been observed in South Asia thus far. These included a subclade of haplogroup J (haplogroup J1c1b1a) reported in ancient individuals in Central and Eastern Europe,46,86 and haplogroup A1a, a subclade of haplogroup A primarily observed in East Asia.87
We observed a total of eight Eurasian Y chromosome haplogroups across 98 newly sequenced male individuals (R, H, L, G, N, J, E, Q) (Figure 4B; Table S14). R, H, and L Y chromosome lineages have been previously reported in present-day and ancient individuals from South Asia, Southeast Asia, Central Asia, the Arabian Peninsula, Europe, and present-day Turkey,15,88,89,90,91,92 while haplogroups G, N, J, E, and Q are found more broadly across Eurasia and, additionally, lineages of haplogroup E in individuals from Africa.93
To explore the relationship between uniparental markers and matrilocality for the Nair, we compared the haplotype and pairwise diversity on the mitochondrial DNA (mtDNA) similar to analyses from Gunnarsdóttir et al.59 (Table S15). For groups with similar sample sizes, we found the strongest difference in haplotype diversity to be between the Bunt and Kodava samples (p = 0.19; t = −0.88), which does not reject the null hypothesis of similar haplotype diversity matrilocal and patrilocal groups in our dataset.
Endogamy in Southwest Indian populations
Founder effects and endogamy are prominent demographic forces shaping genetic diversity in populations of South Asian genetic ancestries.14,20,94 We sought to characterize rates of endogamy in the study populations relative to other populations sampled in India as endogamy can increase the frequency of recessive disease alleles. We calculated the identity-by-descent (IBD) score, which measures the extent to which unrelated individuals in a population share segments of the genome identical-by-descent (Figure 5).
We found that the Nair and Bunt populations had similar levels of endogamy as captured by the IBD score. In addition, both Kodava populations had higher IBD scores than the Bunt and Nair but were consistent between themselves. We found the Kapla to have elevated levels of endogamy relative to geographically neighboring populations such as the Kodava, but similar to several tribal groups from Central and South India.
We estimated ROH to further explore endogamy in the study populations. In agreement with the IBD scores, we observed the Kapla had an increased number of ROH (NROH) and generally longer ROH (>10 Mb) (Figures S17 and S18), consistent with smaller effective population size and higher levels of consanguinity.64 To corroborate these results, we also estimated the occurrence and magnitude of founder events in the newly sampled populations using ASCEND.60 We detected a recent founder event for the Kapla occurring ∼8 generations ago and with a founder intensity of ∼9%, but the Nair, Bunt, and both Kodava populations did not show signs of recent founder events (Figure S19). In summary, our results suggest modest levels of endogamy within our newly sampled populations, with the Kapla reflecting a novel founder effect.
Novel variant discovery in Southwest Indian populations
We generated high-coverage genomes (>30×) for a total of nine individuals representing the Bunt, Kapla, Kodava_US, and Nair, and evaluated the presence of variants not previously categorized in existing catalogs of genetic diversity. After alignment, quality control filters, and genotype calling across these WGS data, we ran Ensembl VEP v.105.0 to assess whether each variant was found in pre-existing catalogs of variation and their putative functional consequences. We used 1000 Genomes, GnomAD, and GenomeAsia as catalogs of allelic variation against which to compare and establish novel occurrences of variants. We observed 11,132,650 total variants (8,562,627 SNPs, 2,570,023 indels) and, of these, 208,041 (151,137 SNPs, 56,904 indels) were not found in existing catalogs of variation, which we term as “novel variants.” Of these novel variants, we assessed the proportion of SNPs and indels in specific functional impact classes as defined by VEP (modifier, low, medium, high). For novel SNPs, there were 150,315 modifier, 349 low, 444 moderate, and 29 high impact variants. For novel indels, the proportions were similar with 56,714 modifier, 49 low, 36 moderate, and 105 high impact variants. We also observed that the majority of the putatively high impact variants were singletons, i.e., only found on a single haplotype out of the 18 that are tested, which is to be expected based on the action of negative selection.95
We also explored the burden of moderate and high effect variants within this small set of high-coverage genomes, given previous evidence of endogamy in Indian populations influencing the burden of deleterious mutations.14,19 When looking at only observed mutations with moderate or high predicted effects based on VEP, we found that there was an enrichment of deleterious variants in the Kodava_US population (p < 1e-12; Mann-Whitney U: 24e-6 between Kodava_US and Bunt), which does not support a direct relationship between predicted endogamy (Figure 5) and burden of deleterious variants (Figure S20A). However, this is confounded by the fact that the Kodava_US samples had ∼77× coverage compared with 30× coverage from the other populations, so there was a higher absolute number of variants discovered in the Kodava_US individuals.
We hypothesized that populations with higher fractions of ASI ancestry would have a higher fraction of novel variants due to undersampling of ASI-related genetic ancestry in known catalogs of human allelic variation. We found a strong enrichment of novel variants in the Kapla, supporting our hypothesis (p < 1e-12; Mann-Whitney U: 25e-6 between Kapla and Bunt). In absolute terms, this is a difference of ∼3,000 variants per haplotype (Figure S20B). Overall, these findings motivate conducting further whole-genome surveys of genomic variation in underrepresented genetic ancestries both for discovery of functional genetic variation and improving catalogs of allelic diversity, e.g., the GenomeAsia 100K consortium.19
Discussion
This study engaged with the genetic and oral histories of populations from Southwest India. By analyzing whole-genome sequences from participants in India and the US, we reconstructed broader and fine-scale genetic relationships of the study populations to worldwide and other South Indian populations, respectively. Surveys of population oral histories and origin stories were conducted through community interactions, augmented by published historical and limited anthropological works. In the context of this study, we did not differentiate between oral history and oral tradition as many anthropologists do, and instead used oral history as an umbrella term to encompass aspects of cultural histories of the study populations that have both been passed down orally from one generation to the next and drawn from contemporary experiences and observations. Furthermore, oral histories and self-identities are inherently complex and may be shaped over time by community and external influences. Hence, we focused on those aspects of oral histories that specifically relate to origins and connections with other communities as recognized by community members today.
We found the Bunt, Kodava, and Nair to share close genetic ancestry with each other and with other neighboring South Indian populations. These populations, along with several other Indian populations, have genetic ancestries that can be modeled with Onge, Indus Periphery Cline, and Central Steppe MLBA populations. This mix of genetic ancestries is reflected in the uniparental markers that also show a mix of South Asian and Western Eurasian mtDNA and Y chromosome haplogroups. Future studies are needed to better understand the distribution ranges of haplogroups not typical of this region, such as mitochondrial haplogroups A1a and J1c1b1a and Y chromosome haplogroup E.
Our result of not detecting additional sources of Central and/or Western Eurasian ancestry in the Bunt, Kodava, and Nair motivates anthropological follow-up to better understand the connections in their oral histories to non-local populations such as ancient Scythians and members of Alexander’s army. It is possible that our sampling scheme and/or analytical power limits the detection of such a signal in the genetic data, particularly if subsequent admixture with local Indian populations caused a dilution of non-local genetic signatures in these populations. Moreover, the timing of ANI-ASI admixture, reported to be between 2,000–4,000 years before present,21 may confound detection of admixture events noted in oral histories, such as that between local Indian populations and members of Alexander’s army sometime after 327 BCE. In fact, the Nair and Kodava individuals sequenced in this study have a slightly higher proportion of Western Eurasian ancestry compared with neighboring populations in South India, although comparable with and/or lower than most North Indian populations. The available data in this study precludes determination of the reason for this slightly elevated signal in these populations, including potential sampling bias and long-standing endogamy.
The observed discordance between oral and genetic histories leads to interesting avenues for ethnographic and anthropological follow-up while, at the same time, providing a valuable opportunity to consider the dangers of conflating self-identities with genetic ancestries. Could these origin stories be reflective of past cultural and/or economic contacts rather than involving gene flow? In studies of demographic histories, the expectation of origin stories arising from oral traditions to converge with genetic histories, by using the former to motivate hypotheses that are then either accepted or rejected by the genetic data, tends to suggest that genetic and oral histories should be alignable. On the contrary, they represent distinct facets of individual and group identities that should not be forced to reconcile by pitting one against another10,12 or be assumed to occur across similar time frames. Taking a similar stance, we stress that our findings based on genetic data should not be used to validate or reject community origin stories based on oral traditions. Instead, we encourage further anthropological research on the nature of the relationships with non-local populations prevalent in the oral histories of the populations included in our study. This would serve to enhance our understanding of the cultural and oral legacy of these populations and potentially illuminate new dimensions of the regional socio-cultural history. Such conversations are especially pertinent today as the genomics revolution coupled with direct-to-consumer initiatives over the past two decades has brought conversations around genetics and ancestry squarely into the public domain and calls for increased engagement from researchers to explain the interpretive scope of the data.96
At a more fine-scale level, haplotype-based analysis suggests more recent genetic contacts between Bunt and Nair populations. Moreover, results from clustering analyses as well as outgroup-f3 statistics, Treemix, and haplotype-based analyses suggest that the Kodava from India and Kodava_US are genetically more similar to each other than either is to other Indian populations. Within the limits of these tests, the Kodava individuals sampled in the US and those sampled in India can be considered representative of the same population. Although we lack socio-cultural context for the Coorghi individuals genotyped in Nakatsuka et al.,14 we conjecture that the genetic similarity of the Coorghi and the two Kodava datasets from this study may, again, be capturing their common origin since Coorghi is an anglicized version of Kodava that was adopted during the colonial times. Overall, close genetic affinities between the Bunt, Kodava, and Nair do, in fact, complement their oral histories that speak to historical cultural contacts as well as overlapping self-identities, likely facilitated by their geographical proximity to one another. Our genetic analyses do not detect any discernible sub-structure within the Bunt, Kodava, and Nair despite the Nair and Kodava donors originating in different locations within Kerala and Kodagu, respectively, and the complex socio-cultural subgroups recognized within the Nair population. This may suggest that geography and social stratification have not produced long-term reductions to gene flow within these populations.
We also found little statistical support from genetic diversity statistics (Table S15) to distinguish between groups anthropologically associated to be matrilocal. From haplotype and nucleotide diversity on mtDNA, we did not find statistically significant lower levels of these metrics expected for matrilocal groups, especially the Nair, compared with patrilocal groups. From haplogroup diversity on the Y chromosome relative to the mtDNA, the Nair display a slightly higher diversity than the Kodava, which may be expected under matrilocality. However, this result is qualitative and it is likely that larger sample sizes per group would be required to address this in a stronger quantitative sense. There are other possible explanations for a lack of clear signal in uniparental markers between matrilocal and patrilocal populations, such as recent shifts from strict matrilocality that could potentially dilute the signal in the mtDNA genetic diversity, or smaller long-term population size that could result in shifts in diversity which may not be well-accounted for in our analyses. We leave these as future areas of work bridging matrilocal and patrilocal population dynamics to observed genetic data.
In contrast to the other study populations, the Kapla are genetically more similar to tribal South Indian populations, which have higher proportions of AASI-related genetic ancestry, modeled using Onge, and negligible amounts of the Central Steppe MLBA genetic component. In this regard, the Kapla may represent one of several close genetic descendants of the ASI group.15 A close link of the Kapla to the ASI group is also evident in the uniparental markers, which are enriched for haplogroups with proposed South Asian origins. The lack of substantial support for a Siddi or African origin for the Kapla in the genetic data raises an interesting follow-up to these links proposed in historical records. Furthermore, despite their geographical proximity to the Kodava and socio-economic relationships between these two populations observed by the research team, stark differences in lifestyle between the two populations may have resulted in more long-term genetic isolation of the Kapla from the Kodava. Their genetic isolation and elevated IBD score may be reflected in historical narratives, both published accounts that state the Kapla “consist of only 15 families”75 and were relocated to their present location by a local ruler, and anecdotal accounts that suggest they were then isolated from neighboring populations. However, we do detect slightly higher genetic ancestry related to Central Steppe MLBA and Indus Periphery Cline in four Kapla individuals relative to the others, which may be due to either recent contacts with neighboring populations, driven by socio-economic contacts between these populations, or latent population structure. Future studies with increased sample sizes are needed to further characterize the genetic variability within the Kapla more accurately.
Although our analyses were not designed to make claims regarding complex traits or disease, we expect to find an increased prevalence of recessive genetic disorders in populations with higher rates of endogamy.14,20,97 While follow-up studies with detailed phenotype information across populations in Southwest India will bring more evidence to bear on the relationship between endogamy and disease risk, particularly for recessive diseases, we found all four study populations to be within the range of IBD scores and ROH displayed by other Indian populations. Notably, the Kapla registered a higher IBD score and increased number and length of ROH compared with the Bunt, Kodava, and Nair, which suggests elevated endogamy in the Kapla, presumably due to their isolation and smaller population size. Our analysis of deleterious variants in a smaller, but higher-coverage, whole-genome dataset found that the Kapla were not the population with the highest burden of deleterious mutations. However, the Kapla did contain significantly higher rates of novel variants not found in existing public catalogs of allelic variation.
The results from this study provide an impetus for follow-up studies to further characterize the levels of genetic variation in these and other populations in India through increased sample sizes and phenotype collection. In addition, future studies of population histories should engage closely with relevant fields such as anthropology. Importantly, such work should be conducted with substantial community engagement to ensure accurate recording of information on those aspects of their history that the population would like to further follow up on, given the sensitivities around identities stemming from oral histories.
Data and code availability
In compliance with the informed consent obtained from participants in the study, raw data (fastq files), alignments (bam files), and variant calls (VCF files) are available for demographic analyses under data access agreement, requests for which should be submitted to M.R. (mraghavan@uchicago.edu) and N.R. (nirajrai@bsip.res.in).
Acknowledgments
First and foremost, we would like to thank members of the Kodava (both in India and in the US), Bunt, Nair, and Kapla populations for their participation in this study and for graciously hosting research team members in their communities and sharing their oral histories. We also thank members of the sequencing facilities at MedGenome Inc., Bangalore (India), Novogene, Sacramento (USA), and the University of Chicago DNA Sequencing Facility, Chicago (USA). We are also extremely grateful to Anna Di Rienzo, John Novembre, Matthias Steinrücken, Carole Ober, to Harald Ringbauer for technical support with the Y-chromosome haplogroup assignments, Bridget Chak and Constantine V. Nakassis for their valuable feedback on the manuscript, and to Kendra Kodira for creating awareness and promoting participation among the Kodava_US community. This project was funded through the NIH grant R35GM143094, University of Chicago start-up funds, Cincinnati Children’s Endowment, the Gibbs Travelling Research Fellowship from the Newnham College, University of Cambridge, and the Austrian Marshall Plan Foundation for supporting S.F.’s visit to the University of Chicago.
Declaration of interests
The authors declare no competing interests.
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.xhgg.2024.100305.
Contributor Information
Niraj Rai, Email: nirajrai@bsip.res.in.
Maanasa Raghavan, Email: mraghavan@uchicago.edu.
Web resources
AADR: https://doi.org/10.7910/DVN/FFIDCW
ADMIXTOOLS: https://github.com/DReichLab/AdmixTools
ADMIXTOOLS2: https://github.com/uqrmaie1/admixtools
ADMIXTURE: https://dalexander.github.io/admixture/index.html
ALDER: https://groups.csail.mit.edu/cb/alder/
ASCEND: https://github.com/sunyatin/ASCEND
bcftools: https://github.com/samtools/bcftools
BWA-MEM: https://github.com/lh3/bwa
ChromoPainter: https://people.maths.bris.ac.uk/∼madjl/finestructure-old/chromopainter_info.html
EIGENSOFT: https://github.com/DReichLab/EIG
Ensembl-VEP: https://useast.ensembl.org/info/docs/tools/vep/index.html
fineSTRUCTURE: https://people.maths.bris.ac.uk/∼madjl/finestructure/finestructure.html
GATK4: https://github.com/broadinstitute/gatk
GERMLINE2: https://github.com/gusevlab/germline2
Haplogrep2: https://github.com/seppinho/haplogrep-cmd?tab=readme-ov-file
IMPUTE2: https://mathgen.stats.ox.ac.uk/impute/impute_v2.html
KING: https://www.kingrelatedness.com/
Plink1.9:https://www.cog-genomics.org/plink/1.9/dev
Samtools: https://github.com/samtools/samtools
SequenceTools: https://github.com/stschiff/sequenceTools
Supplemental information
References
- 1.Popejoy A.B., Fullerton S.M. Genomics is failing on diversity. Nature. 2016;538:161–164. doi: 10.1038/538161a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Sengupta D., Choudhury A., Basu A., Ramsay M. Population Stratification and Underrepresentation of Indian Subcontinent Genetic Diversity in the 1000 Genomes Project Dataset. Genome Biol. Evol. 2016;8:3460–3470. doi: 10.1093/gbe/evw244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Sirugo G., Williams S.M., Tishkoff S.A. The Missing Diversity in Human Genetic Studies. Cell. 2019;177:1080. doi: 10.1016/j.cell.2019.04.032. [DOI] [PubMed] [Google Scholar]
- 4.Silva C.P., de la Fuente Castro C., González Zarzar T., Raghavan M., Tonko-Huenucoy A., Martínez F.I., Montalva N. The Articulation of Genomics, Mestizaje, and Indigenous Identities in Chile: A Case Study of the Social Implications of Genomic Research in Light of Current Research Practices. Front. Genet. 2022;13:817318. doi: 10.3389/fgene.2022.817318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lemke A.A., Esplin E.D., Goldenberg A.J., Gonzaga-Jauregui C., Hanchard N.A., Harris-Wai J., Ideozu J.E., Isasi R., Landstrom A.P., Prince A.E.R., et al. Addressing underrepresentation in genomics research through community engagement. Am. J. Hum. Genet. 2022;109:1563–1571. doi: 10.1016/j.ajhg.2022.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Prince A.E.R., Berkman B.E. Reconceptualizing harms and benefits in the genomic age. Pers. Med. 2018;15:419–428. doi: 10.2217/pme-2018-0022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Clarke A.J., van El C.G. Genomics and justice: mitigating the potential harms and inequities that arise from the implementation of genomics in medicine. Hum. Genet. 2022;141:1099–1107. doi: 10.1007/s00439-022-02453-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Garrison N.A., Hudson M., Ballantyne L.L., Garba I., Martinez A., Taualii M., Arbour L., Caron N.R., Rainie S.C. Genomic Research Through an Indigenous Lens: Understanding the Expectations. Annu. Rev. Genom. Hum. Genet. 2019;20:495–517. doi: 10.1146/annurev-genom-083118-015434. [DOI] [PubMed] [Google Scholar]
- 9.Forzano F., Genuardi M., Moreau Y., European Society of Human Genetics ESHG warns against misuses of genetic tests and biobanks for discrimination purposes. Eur. J. Hum. Genet. 2021;29:894–896. doi: 10.1038/s41431-020-00786-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Crellin R.J., Harris O.J. Beyond binaries. Interrogating ancient DNA. Archaeol. Dialogues. 2020;27:37–56. [Google Scholar]
- 11.Donovan B., Nehm R.H. Genetics and Identity. Sci. Educ. 2020;29:1451–1458. [Google Scholar]
- 12.TallBear K. Genomic articulations of indigeneity. Soc. Stud. Sci. 2013;43:509–533. [Google Scholar]
- 13.Basu A., Sarkar-Roy N., Majumder P.P. Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure. Proc. Natl. Acad. Sci. USA. 2016;113:1594–1599. doi: 10.1073/pnas.1513197113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Nakatsuka N., Moorjani P., Rai N., Sarkar B., Tandon A., Patterson N., Bhavani G.S., Girisha K.M., Mustak M.S., Srinivasan S., et al. The promise of discovering population-specific disease-associated genes in South Asia. Nat. Genet. 2017;49:1403–1407. doi: 10.1038/ng.3917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Narasimhan V.M., Patterson N., Moorjani P., Rohland N., Bernardos R., Mallick S., Lazaridis I., Nakatsuka N., Olalde I., Lipson M., et al. The formation of human populations in South and Central Asia. Science. 2019;365 doi: 10.1126/science.aat7487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Tätte K., Pagani L., Pathak A.K., Kõks S., Ho Duy B., Ho X.D., Sultana G.N.N., Sharif M.I., Asaduzzaman M., Behar D.M., et al. The genetic legacy of continental scale admixture in Indian Austroasiatic speakers. Sci. Rep. 2019;9:3818. doi: 10.1038/s41598-019-40399-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Pathak A.K., Kadian A., Kushniarevich A., Montinaro F., Mondal M., Ongaro L., Singh M., Kumar P., Rai N., Parik J., et al. The Genetic Ancestry of Modern Indus Valley Populations from Northwest India. Am. J. Hum. Genet. 2018;103:918–929. doi: 10.1016/j.ajhg.2018.10.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Metspalu M., Mondal M., Chaubey G. The genetic makings of South Asia. Curr. Opin. Genet. Dev. 2018;53:128–133. doi: 10.1016/j.gde.2018.09.003. [DOI] [PubMed] [Google Scholar]
- 19.GenomeAsia100K Consortium The GenomeAsia 100K Project enables genetic discoveries across Asia. Nature. 2019;576:106–111. doi: 10.1038/s41586-019-1793-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Reich D., Thangaraj K., Patterson N., Price A.L., Singh L. Reconstructing Indian population history. Nature. 2009;461:489–494. doi: 10.1038/nature08365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Moorjani P., Thangaraj K., Patterson N., Lipson M., Loh P.-R., Govindaraj P., Berger B., Reich D., Singh L. Genetic evidence for recent population mixture in India. Am. J. Hum. Genet. 2013;93:422–438. doi: 10.1016/j.ajhg.2013.07.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Debortoli G., Abbatangelo C., Ceballos F., Fortes-Lima C., Norton H.L., Ozarkar S., Parra E.J., Jonnalagadda M. Novel insights on demographic history of tribal and caste groups from West Maharashtra (India) using genome-wide data. Sci. Rep. 2020;10 doi: 10.1038/s41598-020-66953-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Arciero E., Dogra S.A., Mezzavilla M., Tsismentzoglou T., Huang Q.Q., Hunt K.A., Mason D., van Heel D.A., Sheridan E., Wright J., et al. Fine-scale population structure and demographic history of British Pakistanis. bioRxiv. 2020 doi: 10.1101/2020.09.02.279190. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Finer S., Martin H.C., Khan A., Hunt K.A., MacLaughlin B., Ahmed Z., Ashcroft R., Durham C., MacArthur D.G., McCarthy M.I., et al. Cohort Profile: East London Genes & Health (ELGH), a community-based population genomics and health study in British Bangladeshi and British Pakistani people. Int. J. Epidemiol. 2020;49:20–21i. doi: 10.1093/ije/dyz174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Howie B., Fuchsberger C., Stephens M., Marchini J., Abecasis G.R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 2012;44:955–959. doi: 10.1038/ng.2354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.1000 Genomes Project Consortium. Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., Abecasis G.R. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Delaneau O., Zagury J.-F., Robinson M.R., Marchini J.L., Dermitzakis E.T. Accurate, scalable and integrative haplotype estimation. Nat. Commun. 2019;10:5436. doi: 10.1038/s41467-019-13225-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Bergström A., McCarthy S.A., Hui R., Almarri M.A., Ayub Q., Danecek P., Chen Y., Felkel S., Hallast P., Kamm J., et al. Insights into human genetic variation and population history from 929 diverse genomes. Science. 2020;367:eaay5012. doi: 10.1126/science.aay5012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Mallick S., Micco A., Mah M., Ringbauer H., Lazaridis I., Olalde I., Patterson N., Reich D. The Allen Ancient DNA Resource (AADR) a curated compendium of ancient human genomes. Sci. Data. 2024;11:182. doi: 10.1038/s41597-024-03031-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Lazaridis I., Patterson N., Mittnik A., Renaud G., Mallick S., Kirsanow K., Sudmant P.H., Schraiber J.G., Castellano S., Lipson M., et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature. 2014;513:409–413. doi: 10.1038/nature13673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Manichaikul A., Mychaleckyj J.C., Rich S.S., Daly K., Sale M., Chen W.-M. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26:2867–2873. doi: 10.1093/bioinformatics/btq559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
- 33.Patterson N., Price A.L., Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2 doi: 10.1371/journal.pgen.0020190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Alexander D.H., Novembre J., Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Pickrell J.K., Pritchard J.K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 2012;8 doi: 10.1371/journal.pgen.1002967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Patterson N., Moorjani P., Luo Y., Mallick S., Rohland N., Zhan Y., Genschoreck T., Webster T., Reich D. Ancient admixture in human history. Genetics. 2012;192:1065–1093. doi: 10.1534/genetics.112.145037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Yelmen B., Mondal M., Marnetto D., Pathak A.K., Montinaro F., Gallego Romero I., Kivisild T., Metspalu M., Pagani L. Ancestry-Specific Analyses Reveal Differential Demographic Histories and Opposite Selective Pressures in Modern South Asian Populations. Mol. Biol. Evol. 2019;36:1628–1642. doi: 10.1093/molbev/msz037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Harney É., Patterson N., Reich D., Wakeley J. Assessing the performance of qpAdm: a statistical tool for studying population admixture. Genetics. 2021;217 doi: 10.1093/genetics/iyaa045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Haak W., Lazaridis I., Patterson N., Rohland N., Mallick S., Llamas B., Brandt G., Nordenfelt S., Harney E., Stewardson K., et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature. 2015;522:207–211. doi: 10.1038/nature14317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Krzewińska M., Kılınç G.M., Juras A., Koptekin D., Chyleński M., Nikitin A.G., Shcherbakov N., Shuteleva I., Leonova T., Kraeva L., et al. Ancient genomes suggest the eastern Pontic-Caspian steppe as the source of western Iron Age nomads. Sci. Adv. 2018;4 doi: 10.1126/sciadv.aat4457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Järve M., Saag L., Scheib C.L., Pathak A.K., Montinaro F., Pagani L., Flores R., Guellil M., Saag L., Tambets K., et al. Shifts in the Genetic Landscape of the Western Eurasian Steppe Associated with the Beginning and End of the Scythian Dominance. Curr. Biol. 2019;29:2430–2441.e10. doi: 10.1016/j.cub.2019.06.019. [DOI] [PubMed] [Google Scholar]
- 43.Damgaard P.D.B., Marchi N., Rasmussen S., Peyrot M., Renaud G., Korneliussen T., Moreno-Mayar J.V., Pedersen M.W., Goldberg A., Usmanova E., et al. 137 ancient human genomes from across the Eurasian steppes. Nature. 2018;557:369–374. doi: 10.1038/s41586-018-0094-2. [DOI] [PubMed] [Google Scholar]
- 44.Gnecchi-Ruscone G.A., Khussainova E., Kahbatkyzy N., Musralina L., Spyrou M.A., Bianco R.A., Radzeviciute R., Martins N.F.G., Freund C., Iksan O., et al. Ancient genomic time transect from the Central Asian Steppe unravels the history of the Scythians. Sci. Adv. 2021;7 doi: 10.1126/sciadv.abe4414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Unterländer M., Palstra F., Lazaridis I., Pilipenko A., Hofmanová Z., Groß M., Sell C., Blöcher J., Kirsanow K., Rohland N., et al. Ancestry and demography and descendants of Iron Age nomads of the Eurasian Steppe. Nat. Commun. 2017;8:14615. doi: 10.1038/ncomms14615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Allentoft M.E., Sikora M., Sjögren K.-G., Rasmussen S., Rasmussen M., Stenderup J., Damgaard P.B., Schroeder H., Ahlström T., Vinner L., et al. Population genomics of Bronze Age Eurasia. Nature. 2015;522:167–172. doi: 10.1038/nature14507. [DOI] [PubMed] [Google Scholar]
- 47.de Barros Damgaard P., Martiniano R., Kamm J., Moreno-Mayar J.V., Kroonen G., Peyrot M., Barjamovic G., Rasmussen S., Zacho C., Baimukhanov N., et al. The first horse herders and the impact of early Bronze Age steppe expansions into Asia. Science. 2018;360:eaar7711. doi: 10.1126/science.aar7711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Broushaki F., Thomas M.G., Link V., López S., van Dorp L., Kirsanow K., Hofmanová Z., Diekmann Y., Cassidy L.M., Díez-Del-Molino D., et al. Early Neolithic genomes from the eastern Fertile Crescent. Science. 2016;353:499–503. doi: 10.1126/science.aaf7943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Reitsema L.J., Mittnik A., Kyle B., Catalano G., Fabbri P.F., Kazmi A.C.S., Reinberger K.L., Sineo L., Vassallo S., Bernardos R., et al. The diverse genetic origins of a Classical period Greek army. Proc. Natl. Acad. Sci. USA. 2022;119 doi: 10.1073/pnas.2205272119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Lazaridis I., Alpaslan-Roodenberg S., Acar A., Açıkkol A., Agelarakis A., Aghikyan L., Akyüz U., Andreeva D., Andrijašević G., Antonović D., et al. The genetic history of the Southern Arc: A bridge between West Asia and Europe. Science. 2022;377 doi: 10.1126/science.abm4247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Maier R., Flegontov P., Flegontova O., Işıldak U., Changmai P., Reich D. On the limits of fitting complex models of population history to f-statistics. Elife. 2023;12 doi: 10.7554/eLife.85492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Loh P.-R., Lipson M., Patterson N., Moorjani P., Pickrell J.K., Reich D., Berger B. Inferring admixture histories of human populations using linkage disequilibrium. Genetics. 2013;193:1233–1254. doi: 10.1534/genetics.112.147330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Moorjani P., Sankararaman S., Fu Q., Przeworski M., Patterson N., Reich D. A Genetic Method for Dating Ancient Genomes Provides a Direct Estimate of the Human Generation Interval in the Last 45,000 years. Proc. Natl. Acad. Sci. USA. 2016;113:5652–5657. doi: 10.1073/pnas.1514696113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Lawson D.J., Hellenthal G., Myers S., Falush D. Inference of population structure using dense haplotype data. PLoS Genet. 2012;8 doi: 10.1371/journal.pgen.1002453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Danecek P., Bonfield J.K., Liddle J., Marshall J., Ohan V., Pollard M.O., Whitwham A., Keane T., McCarthy S.A., Davies R.M., Li H. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10 doi: 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Weissensteiner H., Pacher D., Kloss-Brandstätter A., Forer L., Specht G., Bandelt H.-J., Kronenberg F., Salas A., Schönherr S. HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing. Nucleic Acids Res. 2016;44:W58–W63. doi: 10.1093/nar/gkw233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Van Oven M. PhyloTree Build 17: Growing the human mitochondrial DNA tree. Forensic Sci. Int. Genet. Suppl. Ser. 2015;5:e392–e394. [Google Scholar]
- 58.Poznik G.D. 2016. White Paper 23-13 yHaplo| Identifying Y-Chromosome Haplogroups in Arbitrarily Large Samples of Sequenced or Genotyped Men. [Google Scholar]
- 59.Gunnarsdóttir E.D., Nandineni M.R., Li M., Myles S., Gil D., Pakendorf B., Stoneking M. Larger mitochondrial DNA than Y-chromosome differences between matrilocal and patrilocal groups from Sumatra. Nat. Commun. 2011;2:228. doi: 10.1038/ncomms1235. [DOI] [PubMed] [Google Scholar]
- 60.Korunes K.L., Samuk K. pixy: Unbiased estimation of nucleotide diversity and divergence in the presence of missing data. Mol. Ecol. Resour. 2021;21:1359–1368. doi: 10.1111/1755-0998.13326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Nait Saada J., Kalantzis G., Shyr D., Cooper F., Robinson M., Gusev A., Palamara P.F. Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations. Nat. Commun. 2020;11:6130. doi: 10.1038/s41467-020-19588-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Tournebize R., Chu G., Moorjani P. Reconstructing the history of founder events using genome-wide patterns of allele sharing across individuals. PLoS Genet. 2022;18 doi: 10.1371/journal.pgen.1010243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Joshi P.K., Esko T., Mattsson H., Eklund N., Gandin I., Nutile T., Jackson A.U., Schurmann C., Smith A.V., Zhang W., et al. Directional dominance on stature and cognition in diverse human populations. Nature. 2015;523:459–462. doi: 10.1038/nature14618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Ceballos F.C., Joshi P.K., Clark D.W., Ramsay M., Wilson J.F. Runs of homozygosity: windows into population history and trait architecture. Nat. Rev. Genet. 2018;19:220–234. doi: 10.1038/nrg.2017.109. [DOI] [PubMed] [Google Scholar]
- 65.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 2013 doi: 10.48550/arXiv.1303.3997. Preprint at. [DOI] [Google Scholar]
- 66.Panikkar K.M. Some Aspects of Nayar Life. J. Roy. Anthropol. Inst. G. B. Ireland. 1918;48:254–293. [Google Scholar]
- 67.Srinivas M.N. Asia Publishing House; 1965. Religion and Society Among the Coorgs of South India. [Google Scholar]
- 68.Thruston E. Cosmo Publications; 1909. The Caste and Tribes of Southern India. [Google Scholar]
- 69.Schneider D.M. In: Matrilineal Kinship. Schneider D.M., Gough K., editors. University of California Press; 1962. 1962 C1961. [Google Scholar]
- 70.Kushalappa M. Createspace Independent Pub; 2013. The Early Coorgs: A History of Early Kodagu and its People. [Google Scholar]
- 71.Fuller C.J. The Internal Structure of the Nayar Caste. J. Anthropol. Res. 1975;31:283–312. doi: 10.1086/jar.31.4.3629883. Preprint. [DOI] [Google Scholar]
- 72.Menon A. 2018. Searching for preliminary mitochondrial and phenotypic signatures that may underpin the “self-identities” of three distinct South Indian warrior populations: the Nairs, the Bunts & the Kodava. [Google Scholar]
- 73.Fuller C.J. Cambridge University Press; 1976. The Nayars Today. [Google Scholar]
- 74.Karumbaya C. In: Are Kodavas (Coorgs) Hindus? Bopanna P.T., editor. Rolling Stone Publications; 2018. Kodavas through the ages. [Google Scholar]
- 75.Richter G. 1870. Manual of Coorg: A Gazetteer of the Natural Features of the Country and the Social and Political Condition of its Inhabitants (Basel Mission Book Depository [Published by C Stolz]) [Google Scholar]
- 76.Shah A.M., Tamang R., Moorjani P., Rani D.S., Govindaraj P., Kulkarni G., Bhattacharya T., Mustak M.S., Bhaskar L.V.K.S., Reddy A.G., et al. Indian Siddis: African descendants with Indian admixture. Am. J. Hum. Genet. 2011;89:154–161. doi: 10.1016/j.ajhg.2011.05.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Scheduled Tribes Development Department . 2022. Scheduled Tribes Development Department.https://www.stdd.kerala.gov.in [Google Scholar]
- 78.Quintana-Murci L., Chaix R., Wells R.S., Behar D.M., Sayar H., Scozzari R., Rengo C., Al-Zahery N., Semino O., Santachiara-Benerecetti A.S., et al. Where west meets east: the complex mtDNA landscape of the southwest and Central Asian corridor. Am. J. Hum. Genet. 2004;74:827–845. doi: 10.1086/383236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Metspalu M., Kivisild T., Metspalu E., Parik J., Hudjashov G., Kaldma K., Serk P., Karmin M., Behar D.M., Gilbert M.T.P., et al. Most of the extant mtDNA boundaries in south and southwest Asia were likely shaped during the initial settlement of Eurasia by anatomically modern humans. BMC Genet. 2004;5:26. doi: 10.1186/1471-2156-5-26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Chaubey G., Karmin M., Metspalu E., Metspalu M., Selvi-Rani D., Singh V.K., Parik J., Solnik A., Naidu B.P., Kumar A., et al. Phylogeography of mtDNA haplogroup R7 in the Indian peninsula. BMC Evol. Biol. 2008;8:227. doi: 10.1186/1471-2148-8-227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Sylvester C., Rao J.S., Chandrasekar A., Krishna M.S. An Updated Phylogeny of mtDNA Haplogroup R8 Based on Complete Mitogenomes. J. Anthropol. Surv. India. 2019;68:114–122. [Google Scholar]
- 82.Sahakyan H., Hooshiar Kashani B., Tamang R., Kushniarevich A., Francis A., Costa M.D., Pathak A.K., Khachatryan Z., Sharma I., van Oven M., et al. Origin and spread of human mitochondrial DNA haplogroup U7. Sci. Rep. 2017;7 doi: 10.1038/srep46044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Kutanan W., Kampuansai J., Brunelli A., Ghirotto S., Pittayaporn P., Ruangchai S., Schröder R., Macholdt E., Srikummool M., Kangwanpong D., et al. New insights from Thailand into the maternal genetic history of Mainland Southeast Asia. Eur. J. Hum. Genet. 2018;26:898–911. doi: 10.1038/s41431-018-0113-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Shamoon-Pour M., Li M., Merriwether D.A. Rare human mitochondrial HV lineages spread from the Near East and Caucasus during post-LGM and Neolithic expansions. Sci. Rep. 2019;9 doi: 10.1038/s41598-019-48596-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Mathieson I., Alpaslan-Roodenberg S., Posth C., Szécsényi-Nagy A., Rohland N., Mallick S., Olalde I., Broomandkhoshbacht N., Candilio F., Cheronet O., et al. The genomic history of southeastern Europe. Nature. 2018;555:197–203. doi: 10.1038/nature25778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Mathieson I., Lazaridis I., Rohland N., Mallick S., Patterson N., Roodenberg S.A., Harney E., Stewardson K., Fernandes D., Novak M., et al. Genome-wide patterns of selection in 230 ancient Eurasians. Nature. 2015;528:499–503. doi: 10.1038/nature16152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Tanaka M., Cabrera V.M., González A.M., Larruga J.M., Takeyasu T., Fuku N., Guo L.-J., Hirose R., Fujita Y., Kurata M., et al. Mitochondrial genome variation in eastern Asia and the peopling of Japan. Genome Res. 2004;14:1832–1850. doi: 10.1101/gr.2286304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Kivisild T., Rootsi S., Metspalu M., Mastana S., Kaldma K., Parik J., Metspalu E., Adojaan M., Tolk H.-V., Stepanov V., et al. The genetic heritage of the earliest settlers persists both in Indian tribal and caste populations. Am. J. Hum. Genet. 2003;72:313–332. doi: 10.1086/346068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Kerchner C.F. 2013. YDNA Haplogroup Descriptions & Information Links. Preprint. [Google Scholar]
- 90.Brunelli A., Kampuansai J., Seielstad M., Lomthaisong K., Kangwanpong D., Ghirotto S., Kutanan W. Y chromosomal evidence on the origin of northern Thai people. PLoS One. 2017;12 doi: 10.1371/journal.pone.0181935. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Haber M., Nassar J., Almarri M.A., Saupe T., Saag L., Griffith S.J., Doumet-Serhal C., Chanteau J., Saghieh-Beydoun M., Xue Y., et al. A Genetic History of the Near East from an aDNA Time Course Sampling Eight Points in the Past 4,000 Years. Am. J. Hum. Genet. 2020;107:149–157. doi: 10.1016/j.ajhg.2020.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Scorrano G., Finocchio A., De Angelis F., Martínez-Labarga C., Šarac J., Contini I., Scano G., Novokmet N., Frezza D., Rickards O. The genetic landscape of Serbian populations through mitochondrial DNA sequencing and non-recombining region of the Y chromosome microsatellites. Coll. Antropol. 2017;41:275–296. [Google Scholar]
- 93.Isogg, C. 2019-2020 by ISOGG 2019 Y-DNA Haplogroup Tree. https://isogg.org/tree/index.html.
- 94.Angural A., Spolia A., Mahajan A., Verma V., Sharma A., Kumar P., Dhar M.K., Pandita K.K., Rai E., Sharma S. Review: Understanding rare genetic diseases in low resource regions like Jammu and Kashmir - India. Front. Genet. 2020;11:415. doi: 10.3389/fgene.2020.00415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Nelson M.R., Wegmann D., Ehm M.G., Kessner D., St Jean P., Verzilli C., Shen J., Tang Z., Bacanu S.-A., Fraser D., et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science. 2012;337:100–104. doi: 10.1126/science.1217876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Claw K.G., Anderson M.Z., Begay R.L., Tsosie K.S., Fox K., Garrison N.A., Summer internship for INdigenous peoples in Genomics SING Consortium A framework for enhancing ethical genomic research with Indigenous communities. Nat. Commun. 2018;9:2957. doi: 10.1038/s41467-018-05188-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.GUaRDIAN Consortium. Sivasubbu S., Scaria V. Genomics of rare genetic diseases-experiences from India. Hum. Genom. 2019;14:52. doi: 10.1186/s40246-019-0215-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
In compliance with the informed consent obtained from participants in the study, raw data (fastq files), alignments (bam files), and variant calls (VCF files) are available for demographic analyses under data access agreement, requests for which should be submitted to M.R. (mraghavan@uchicago.edu) and N.R. (nirajrai@bsip.res.in).