Skip to main content
Human Genomics logoLink to Human Genomics
. 2025 Sep 30;19:111. doi: 10.1186/s40246-025-00822-w

Investigating the demographic history of Sindhi population inhabited in West Coast India

Lomous Kumar 1,2,, Suraj Nongmaithem 1, Sachin Kumar 2, Kumarasamy Thangaraj 1,
PMCID: PMC12487577  PMID: 41029758

Abstract

Background

South Asian populations are genetically well stratified due to multiple waves of migration, admixture events, and endogamy. India remains a rich resource for population genomics studies with many small and socio-culturally homogeneous communities whose origins and demographic histories are largely unknown.

Methods

In this study, we analysed such a small Sindhi settlement in the Thane district in Maharashtra of West coast India using genome-wide autosomal SNP data from 13 healthy Sindhi individuals using both frequency- and haplotype-based approaches.

Results

Our analyses suggest that the West coast Indian Sindhi community is very unique and has significant population affinity with a group more closely related to the Pakistani Burusho than to the Pakistani Sindhi, as it has an additional East/Southeast Asian component. Furthermore, the sharing of haplotypes and Identity by Descent (IBD) suggests recent gene flow from the local Konkani population on the west coast of India into Indian Sindhi. Admixture modelling suggested that Indian Sindhi admixture with the East/Southeast Asian source group could be 40–50 generations before present (GBP), explaining their current unique demographics. However, apart from this additional admixture, they share the basic genetic composition of the Pakistan/Northwest Indian groups, as reflected in Principal Component Analysis (PCA), outgroup F3 and IBD sharing.

Conclusion

Our new findings suggest that Indian Sindhi settlement from the Thane in Maharashtra in West coast of India derive their genetic ancestry not directly from Pakistani Sindhis but from other groups related to Burusho in Pakistan. The study therefore encourages further research to identify the heterogeneous nature of migrations to the Indian subcontinent and thus further decipher its unique demographics.

Supplementary Information

The online version contains supplementary material available at 10.1186/s40246-025-00822-w.

Keywords: Indian Sindhi, NWI, Haplotype-based, IBD, Demographic modelling

Introduction

South Asia is considered as one of the most geographically diverse locations in terms of linguistics, culture, and genetics [1, 2]. Among the South Asian countries, India and Pakistan are the land of fascinating diversity, a vibrant web of different cultures, languages, traditions and beliefs [3]. Archaeological finds from the Mesolithic and Palaeolithic periods in India and Pakistan represent crucial phases in human prehistory characterized by significant migrations, technological advances, and cultural developments [46]. The archaeological evidence of various sites, i.e. Palaeolithic sites such as Soan Valley (Pakistan), Bhimbetka (India), etc., and Mesolithic sites such as Bagor (India), Mehrgarh (Pakistan), etc., across the region provide valuable insights into the lives and movements of early human populations [1, 7, 8]. The region encompassing present-day India and Pakistan is home to some of the oldest and most advanced civilizations in human history [914]. These geographical regions (i.e. India and Pakistan) play an important role in the patterns of human migration, which leads to the exchange of culture and language according to different ethnic populations [5, 15].

One of the largest, fastest and most human migrations was observed during the partition of India in 1947 [16, 17]. This mass movement of people was not only a population shift but also entailed an ethnic and religious reconfiguration of the region, with millions forced to leave their ancestral homes [18, 19]. Among the various affected populations, the Sindhis living in Pakistan’s Sindh province were largely affected [20]. Sindh province utilizes one of the ancient archaeological sites (3rd millennium BC) from the Indus Valley Civilization (IVC), namely Mohenjo-Daro [21]. During Partition, a large number (about a million) of non-Muslim Sindhi migrated to Gujarat, Rajasthan, and Punjab [22, 23]. They speak the Sindhi language, which belongs to the Indo-European language family. Over time, the language underwent various transitions such as Prakrit, Arabic, Persian, etc. due to the influence of different traders and rulers [24, 25]. From its ancient roots, such as the Indus Valley Civilization, to its present status, Sindhi has absorbed and integrated influences from various languages ​​and cultures, making it a unique and resilient language [26].

The genetic makeup of the Sindhi population also reflects historical migrations and interactions with neighbouring regions [27]. Genetic studies on the maternal markers of the Sindhi population show a wide range of mtDNA haplogroups, indicating diverse maternal ancestry [28]. Many of the mtDNA haplogroups in Sindhis are ancient and have deep roots in the South Asian subcontinent, such as M, R and U [29]. In contrast, haplogroups such as W and HV indicate genetic links with populations from West and Central Asia [28, 30]. Genetic markers shared with North Indian (Indo-European) and South Indian (Dravidian) populations indicate a history of gene flow between these groups in Sindhis from South Asia [31]. The Arab conquest of Sindh in the 8th century introduced new maternal lineages into the Sindhi gene pool, reflected in the presence of West Eurasian haplogroups [28, 30, 32].

The paternal ancestry of the Sindhi population, like their maternal ancestry, reflects a rich and diverse history marked by ancient migrations, invasions and cultural interactions [33, 34]. The common haplogroups among Sindhis include R1a, J2, J1, L and H, which are widespread in South Asia and adjacent regions [3437]. R1a in Sindhis indicates the influence of Indo-European migrations into the subcontinent about 3,500 years ago, J2 is associated with agricultural communities in the Fertile Crescent and ancient Mesopotamia. Haplogroup L is widespread in the Indian subcontinent and the Middle East [35, 38]. This haplogroup is widespread in South Asia and is prevalent in the Dravidian populations; The presence of J1 haplogroups is typical of the Arabian Peninsula, which is less common but is still notable among Sindhis [36, 38].

The studies on autosomal STR markers also suggest a connection between Sindhis and other populations of Pakistan and Northwest India [37, 39]. Other genome-wide studies have also shown a relationship between northwest Indian populations such as Gujjar and Ror, and the Sindhis population, proving that they share genetic ancestry with Indo-Europeans and Middle Eastern populations [40]. These connections likely stem from trade, invasions, and other forms of interaction over millennia. Earlier study pointed out the connection between Northwest India (NWI) and Southwest coast India [41] through much earlier migrations. However, very limited genetic information is available about Sindhis living in other parts of India, particularly in the West coast of India. In the present study, for the first-time, we report genotype data of the Sindhi settlement from the Thane in Maharashtra in the West coastal region of India. This study examined in detail the common ancestry, assimilation and the past migration history of the Sindhis to South India.

Methods

For this project, we followed approved guidelines, applied protocols, and obtained approval from the Institutional Ethics Committee of CSIR-CCMB, Hyderabad, India. Blood samples were collected from 13 healthy Sindhi individuals (all Male) from Thane in Maharashtra, India after informed written consent was obtained, and DNA was isolated using the standard phenol-chloroform method and used for subsequent genetic analysis. The limited sample size is mainly as a result of small population size of the study population with constraint on the sample access (as most of the family were not consenting for sampling) and we had collected from unrelated families in small area. All Indian Sindhi samples (n = 13) were genotyped with the Illumina HumanOmniExpress 24 v2 array kit using the manufacturer’s protocol for a total of 642,824 genome-wide single nucleotide polymorphisms (SNPs). The dataset was merged with the published DNA dataset of contemporary populations from the HGDP [42] and GenomeAsia panel [43] and quality filtering was applied in Plink 1.9 [44] to include only autosomal markers on 22 chromosomes with a genotyping call rate of > 99% to be retained and minor allele frequency > 1% (511668 SNPs). Kinship based filtering was performed by removing individuals with first- and second-degree relatives using the KING-robust function implemented in Plink2 [44, 45]. After all filtering, the final merged dataset included 866 modern individuals genotyped at 511,828 SNPs. To minimize the effect of background linkage disequilibrium (LD) in principal component analysis (PCA) and ADMIXTURE-like analyses [46], markers were further pruned by selecting SNPs in strong LD (r2 > 0.4, window of 200 SNPs, sliding window of 25 SNPs each) using Plink 1.9 [44]. The LD-trimmed data included 365,621 SNPs on the autosome. For all subsequent analysis except PCA and admixture, the full SNP set of 511,828 sites was used.

PCA was performed on the merged dataset of modern Eurasia using the SmartPCA package implemented in EIGENSOFT 7.2.1 [47] with default settings. The first two components were recorded to infer genetic variability. The model-based clustering algorithm ADMIXTURE [46] was executed to infer ancestral genomic components in the Indian Sindhi population. Cross-validation was performed 25 times for 11 ancestral clusters (K = 2 to K = 12) (Fig. S1). The lowest CV error parameter was obtained at K = 6 and used for downstream analysis. The Weir and Cockerham (1984) Fst was calculated among modern populations from South Asia and Indian Sindhi using the package SNPRelate [48]. The qp3Pop implementation of the ADMIXTOOLS package [49] was used to calculate the outgroup F3 statistics and also Admixture F3 statistics. To infer the gene flow of modern Eurasians in the Indian Sindhi population, the F3 statistic was used in the form F3 (Mbuti; SND, X), where X is any modern West Eurasian or South Asian population. (SND = Indian Sindhi).

The haplotype-based approach implemented in CHROMOPAINTER [50] and FineStructure [50] was used to infer a fine-scale co-ancestry matrix and population clustering, respectively using the all reference groups included in merged dataset with HGDP and GenomeAsia panel as source groups. The data were first phased with SHAPEIT5 [51] using default parameters, followed by a CHROMOPAINTER run to derive the co-ancestry matrix, first by performing a 10-expectation maximization iteration (EM) with 5 randomly selected chromosomes with a subset of individuals to derive global mutation rate (µ) and switch rate (Ne) parameters. The main algorithm was then run on 22 chromosomes from all individuals to derive the co-ancestry matrix. This matrix was used by FineStructure to infer clustering using a probabilistic model by applying the Markov Chain Monte Carlo (MCMC) method and then deriving a hierarchical tree by merging all clusters with the least change in posterior probability. The run used 500,000 burn-in iterations and 5,000,000 subsequent iterations and the results of each 10,000-iteration saved. Estimates of admixture date and best admixture models were derived with fastGlobeTrotter [52] using Chromopainter chunklength files. Local ancestry-based admixture modelling was performed using MultiWaver2 tool [53] and local ancestry was inferred using Rfmix2 [54].

For identity by descent (IBD) analysis, we performed haplotype inference or phasing using three independent runs of Beagle-5.4 [55]. IBD segments were determined from phase data from all three runs separately using refined IBD [56], then segments from all three runs were combined, and then combined segments were merged using the Merge IBD Segments tool. The IBD release matrix was then recorded using a custom script in R [57].

To derive the best-fitting demographic model and model parameters, we used the parameter optimization method implemented in Moments [58]. For Indian Sindhi, we used a preliminary model based on hypothesis driven from either FastGlobeTrotter [52] admixture models of Indian Sindhi groups, with much earlier Dai-like admixture or alternatively a more recent pulse in India. For model construction, we applied the Python package Demes [59]. Parameter files were created based on the respective Demes models. Two alternative models were used to compare the demographic scenario of Indian Sindhi groups (Supplementary Fig S4 & S5). The site frequency spectrum was calculated from empirical data in VCF format as well as from Demes model specifications using moments. Model parameter optimizations were performed with 500 iterations and the lbfgsb method. Confidence intervals for derived parameters were calculated using the moments.Demes.Inference.uncerts function of Moments [58].

Results

Genetic structure in Indian Sindhi

The PCA was performed with merged modern Eurasian dataset of from HGDP and GenomeAsia panel (whole genome sequencing dataset). The PCA axes were clearly differentiated into West Eurasian (Europe and Mideast) and East Eurasian (East Asia and Southeast Asia) and South Asian cline was along the West Eurasian axis ending with Austroasiatic and Dravidian speaking groups at the convergence point of East and West Eurasian axis (Fig. 1b). The South Asian cline incorporated Indian Sindhi (black), Indo-Europeans (blue), Dravidians (red), Austroasiatic speakers (khaki), Tibeto-Burman (orange), Pakistani groups (forest-green), and North West Indians (light-green) (Fig. 1b). Interestingly, the Indian Sindhi (black dots) clustered near one extreme of the South Asian cline (but away from main cline) with most of the Pakistan and Northwest Indian (NWI) groups with highest ANI ancestry. Their clustering pattern was more shifted towards the Burusho from Pakistan. This shifting is towards the PCA axis occupied by East/Southeast Asian groups (Fig. 1b). Two of the Indian Sindhi individuals are shifted towards Indian Indo-Europeans and one of them clustered along with individuals from Konkani population.

Fig. 1.

Fig. 1

Sampling location and population structure of Indian Sindhi. A Location in the Indian state of Maharashtra (Thane) from West coast, B PCA biplot of Indian Sindhi (SND) with modern Eurasians, C Admixture barplot with modern Eurasians (red elipse shows East Asian Khaki colour component). D Outgroup F3 statistics with modern Eurasians (showing highest allele sharing of SND with NWI) (Dark green colour in NWI)

In the unsupervised model-based clustering with ADMIXTURE [46] using K = 6 (Fig. 1C), Indian Sindhi formed a unique East Asian component (Khaki) different from most of the Pakistan/NWI groups but this component was also observed in Burusho and only a few individuals of Pathan. This ancestral component is maximised among East Asians, Southeast Asians, Indian Tibeto-Burmans and Indian Austroasiatic groups. Besides this, both Indian Sindhi and Burusho have typical South Asian Indo-European ancestral components (Blue, Forest green and Orange) (Fig. 1C).

The Weir and Cockerham (1984) Fst calculation between Indian Sindhi and modern South Asian reference populations showed similar values between Indian Sindhi from Maharashtra and Pathan (0.43), Khatri (0.43) and Pakistani Sindhi (0.43) and also with Indian populations like Saryuparin Brahmin (0.43), Rajput (0.43) and Iyer (0.43). However, surprisingly the values were only slightly higher with Burusho (0.44) and Gujjar (0.44), Konkani (0.44) and Balochi (0.44) (Supplementary Fig. 6).

Allele sharing between Indian Sindhi and modern Eurasians

Outgroup F3 statistics was calculated for Indian Sindhi with same merge dataset of HGDP and GenomeAsia whole genome panel and Onge as the outgroup population. In the outgroup F3 statistics, the Indian Sindhi showed highest allele sharing with the Khatri population (F3 = 0.02648; z = 133.8107) from NWI followed by Kalash (F3 = 0.02639; z = 129.6765) from Pakistan and Gujjar (F3 = 0.02621; z = 133.5288) again from NWI (Supplementary Table S1). Some of the other top hits were mostly Indian Indo-European populations like Rabadi, Rajput and Brahmin_UP. In the formal test of admixture using Admixture F3 statistics the top hit with significant z-score (-6.524633905) was obtained with Konkani population from Maharashtra, India (Supplementary Table S4).

Fine scale population structure and IBD sharing with HGDP and genome Asia panel

The fineSTRUCTURE [50] tree kept all the populations in two major clades, with one major clade included East/Southeast Asians, Andamanese, Indian Tibeto-Burman, Indian Austroasiatic and Dravidians, while other clade incorporated Europe/MidEast, Pakistan/NWI and Indian Indo-Europeans (Fig. 2a). In this second clade Indian Sindhi forms an altogether separate minor branching with Konkani group from west coast India. Most of the Pakistani Sindhi individual were sharing clades with Pakistan/NWI populations (Fig. 2a).

Fig. 2.

Fig. 2

Haplotype and IBD sharing statistics. A fineStructure MCMC tree for Indian Sindhi (SND) with all modern Eurasians, B Inter-population IBD sharing matrix adjusted for population size, C Intra-population IBD sharing within group (LOD > 10)

In the cross population IBD sharing matrix adjusted for population size, Indian Sindhi showed highest IBD sharing with Konkani population from West coast (Maharashtra) India, followed by Khatri population from Northwest India (NWI) (Fig. 2b). Whereas, in the intra-population IBD sharing the length distribution is smaller in comparison to most of the modern Eurasians and almost comparable to Pakistani Sindhi (Fig. 2c). In the within population IBD sharing, we excluded the values below LOD score of 10 in all cases.

Admixture modelling and dating

The admixture modelling was performed with phased merged dataset of Indian Sindhi and HGDP and GenomeAsia whole genome panel. The best fit sources for Indian Sindhi in the estimation of the best fitted model and date of admixture using fastGlobeTrotter [52] were Khatri from Northwest India (NWI) and Dhurwa (an Austronesian proxy) (Fig. 3b). The best fit date of admixture was approximately 48 GBP (Supplementary Table S2). In the local ancestry-based admixture modelling with MultiWaver2 [53], we observed similar dating of the admixture event from East Asia which corresponds to ~ 50 GBP (Supplementary Table S5).

Fig. 3.

Fig. 3

Admixture modelling and dates of Indian Sindhi A ALDER LD decay curve for Burusho and SND (Indian Sindhi), B fastGlobeTrotter co-ancestry curve fitting for SND (Indian Sindhi), C Best fit demographic model with admixture date from Dai in SND (Indian Sindhi)

Demographic history and demographic parameter Estimation

We proposed two alternate demographic models for model competition for the Indian Sindhi population, with first model based on our admixture modelling results of Dai admixture, which is evident in model-based Admixture as well as fastGlobeTrotter [52] modelling and another model with possible admixture with Indian (possibly Austroasiatic) groups as source of Dai-like component. For replicating the admixture history of Indian Sindhi in the prior model, we utilized Demes [59]. We used Moments’ [58] inference optimization function to arrive at best likelihood model and parameters. Of the tested two alternate models, Model1 hypothesize the pulse of admixture from Dai-like source much earlier and with similar event to Burusho in Pakistan (probably through Mongolian invasion), while Model2 hypothesize putative admixture between ANI-ASI and later admixture event with Dai-like source recently from Indian Austroasiatic groups to form the Indian Sindhi group. We selected Model1 (Log-likelihood: -198268.97404874387) (Fig. 3c) over Model2 (Log-likelihood: -206243.77405798397) based on their likelihood scores, which corroborated well with admixture model inferred from fastGlobeTrotter [52] run. This recent admixture with Dai was dated to approximately 37.4 GBP (95% C.I. 29.003-45.8005), which is almost comparable to the date estimate in Burusho population from fastGlobeTrotter run and upper limit of 95% C.I. corresponds to Indian Sindhi fastGlobeTrotter estimate (48 GBP) (Supplementary Table S2).

The parameter estimates from best fitted model of Indian Sindhi suggest that there was not significant reduction in effective population size in this group (NA = 5200; NF = 3260), with noticeable migration rates between ANI and Indian Sindhi (M_French_GroupA = 0.00183) (Supplementary Figs. 4 & 5). This effective population size change was more prominent in case of ASI (Paniya as a proxy; Ne = 61000 and NeF = 854) (Fig. 3c).

Discussion

The high level of population diversity and stratification in India is due to multiple waves of migration from outside into the region over millennia and eventual mixing and cultural assimilation. Genetic evidence of later migrations and admixture events is well documented, particularly on the west coast of India, such as among Indian Parsees [60], Cochin Jews [61], and Roman Catholic Jews [62] and migration and local assimilation of warriors clans from Northwest India to Southwest coast [41]. In the present study, we have carried out, for the first time, a detailed investigation of the genetic architecture of a small, isolated and socio-culturally unique Indian Sindhi settlement from the Thane in Maharashtra. Our analysis suggests that this Indian Sindhi settlement in Thane represent a unique group, distinct from the local population in India as well as the Pakistani Sindhi population. Their genetic structure surprisingly indicates their closer affinity to the Burusho-like population from Pakistan, due to the presence of an additional East/Southeast Asian genetic component. This Indian Sindhi subgroup did not form a close cluster with the Pakistani Sindhi, reflecting their marked contrast with that group. This subtle divergence between Indian Sindhi group from Maharashtra and Pakistani Sindhi might have emerged as a consequence of additional admixture with East Asian like source group (probably Hazara-like group in the Pakistan) prior to their migration to India. Admixture modelling and Demographic modelling clearly supports the genetic distinctness of the Indian Sindhi from Thane, Maharashtra which cannot be solely because of limited sampling. Allele sharing statistics (outgroup F3) using Mbuti as the outgroup and various modern Eurasians as the test group indicate greater shared-drift of Indian Sindhi with populations from Northwest India (NWI) and Pakistan. This reflects their long-standing shared affinity with Pakistan/NWI, as does Pakistani Sindhi. This pattern of genetic affinity of populations from Pakistan (Sindhi, Pathan etc.,) with Northwest Indian is well pointed out through earlier genetic studies [37, 39, 40]. Indian Sindhi have a well-documented migration history from Pakistan/NWI, which correlates well with our genetic findings.

Of note, the genetic architecture of these Indian Sindhis from Thane in context of local population also equally draw attention, as they share haplotypes with the local Konkani groups from West coast India. This kind of haplotype sharing often reflects very recent gene flow patterns and this is also evident in the IBD chunk sharing. They share larger IBD segments with Konkani population apart from comparatively short segments with Khatri population from Northwest India (NWI). Former represents much recent gene flow while later represents earlier shared genetic history of Indian Sindhi with NWI populations. Pakistani Sindhi also share their genetic history with Pakistan/NWI populations based on earlier study on Y-STR (Anwar et al. 2019; Perveen et al. 2017). Further Northwest Indian populations like Ror and Gujjar showed major affinity with Pakistan and Northwest Indian populations [40]. Therefore, the PCA-based clustering of Indian Sindhi among Pakistani populations as well as haplotype sharing with Khatri largely reflects their long-term shared genetic history with both Pakistani and northwest Indian populations.

At this point, it is important to discuss the unique East Asian (Dai-like) component that clearly distinguishes Indian Sindhi settlement in Thane from Pakistani Sindhi. Our haplotype-based admixture modelling suggests that this minor component (~ 10%) was introduced at 48 GBP with the best-matched surrogates as Khatri (NWI) and Dhurwa (Austroasiatic group). The latter group represents a proxy for an Austronesian (Dai-like) surrogate. Estimates of admixture date using the linkage disequilibrium-based method were also similar (55.47+-19.46). Furthermore, these date estimates were well supported in our demographic modelling, with the admixture timing found to be 37.4 GBP from the Dai-like source group in the best-fit model. Second model which suggest more recent gene flow from a group carrying Dai-like component after Indian Sindhi migration to India, is excluded (not supported in our modelling). Thus, successful model indicates much earlier admixture of this component and the time frame overlaps with the Mongolian (Genghis khan) invasion in Pakistan although other possible source cannot be excluded and require further detailed investigation. Apart from Indian Sindhi and Burusho, Pathan also shows the minor presence of this additional East Asian component. The Late Bronze Age Steppe populations and populations associated with many Iron age migrations (Saka, Hun, Kushan etc.) were having additional East Asian genetic components [63]. Majority of the Populations from Pakistan (Balochi, Pathan, Burusho, Sindhi and Hazara) are in geographical proximity and in a crucial transition zone in relation to pre-Historical and Historical migrations. Hence there is possibility of acquiring such component during any of these migration waves, which require much detailed genetic investigations. Alternatively, this genetic exchange (of East Asian component) can be from geographically proximal Hazara like groups in Pakistan. Although the ALDER-based estimate was comparatively much higher (55.47 GBP), the fastGlobeTrotter date estimate was at the upper limit of the 95% confidence interval of the demographic model-based admixture date estimate (95% C.I. 19–45 GBP). This may be due to other admixture events or noise in ALDER admixture date estimates, which is also evident in larger standard error in the estimation (+-19.46). Furthermore, parameter estimates in our Indian Sindhi demographic modelling revealed no evidence of a significant founder event or population bottleneck in this group, and there was no significant change in the effective population size.

In conclusion, this study presents the first insightful genetic evidence for the ancient origin and unique genetic architecture of Sindhi population settlement from the Thane in Maharashtra in West coast India. The study is the first to report evidence of East Asian admixture in this group much earlier in history than their migration to the West coast of India. Their genetic assimilation with the local majority population (Konkani) reflects their predominantly exogamous nature, which is well reflected in lower IBD exchange within the population. Given the limited sample size of the Indian Sindhi from West coast India in the present study, future efforts with a much larger sample size incorporating Sindhi populations from most of the India along with uniparental and whole genome analysis will reveal more interesting aspects of their population history and population stratification. Furthermore, population genetic studies of this kind on different groups will shed light on the heterogeneity of many similar genetic migrations to India.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 2. (28.4KB, xlsx)

Acknowledgements

We thank all the study participants, who volunteered in this study. KT was supported by CSIR-Bhatnagar Fellowship [GDA No 24 (P90807)] Council of Scientific and Industrial Research, Ministry of Science and Technology, Government of India.

Author contributions

K.T. and L.K. conceptualized the study and recruited the study samples. L.K. and K.T. devised the methodology. L.K. and S.N. genotyped the genome-wide SNP markers. L.K. performed the data analyses. K.T. , S.K. and L.K. wrote the first draft of the manuscript. K.T. finalized the report. K.T. and S.K. provided feedback on the report. All authors contributed to and have approved the final manuscript.

Data availability

The data supporting the findings of this study are available upon request.

Declarations

Ethics approval and consent to participate

Informed written consent was obtained from each participant. The project was carried out in accordance with the guidelines approved by the Institutional Ethical Committees of Centre for Cellular and Molecular Biology, Hyderabad, India. All the procedure has been followed according to the recommendations of the Helsinki Declaration.

Informed consent

Informed written consent was obtained from all the participants involved in the study.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Lomous Kumar, Email: lomousmishra@gmail.com.

Kumarasamy Thangaraj, Email: thangs@ccmb.res.in.

References

  • 1.Chakrabarty DK, India. An archaeological history: palaeolithic beginnings to early historic foundations. Oxford University Press; 2009.
  • 2.Reich D, Thangaraj K, Patterson N, Price AL, Singh L. Reconstructing Indian population history. Nature. 2009;461(7263):489–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Khan FD. Preserving the heritage: a case study of handicrafts of Sindh (Pakistan). 2011.
  • 4.Jacobson J. Recent developments in South Asian prehistory and protohistory. Annu Rev Anthropol. 1979:467–502.
  • 5.James HA, Petraglia M. Modern human origins and the evolution of behavior in the later pleistocene record of South Asia. Curr Anthropol. 2005;46(S5):S3–27. [Google Scholar]
  • 6.Blinkhorn J, Petraglia MD. Environments and cultural change in the Indian subcontinent: implications for the dispersal of homo sapiens in the late pleistocene. Curr Anthropol. 2017;58(S17):S463–79. [Google Scholar]
  • 7.Allchin B, Allchin R. The rise of civilization in India and Pakistan. Cambridge University Press; 1982.
  • 8.Kennedy KA. Prehistoric skeletal record of man in South Asia. Annu Rev Anthropol. 1980:391–432.
  • 9.Ahmad N, Rehman AU. The emergence of Gandhara civilization: A politico-historical discourse. J Humanit Social Manage Sci (JHSMS). 2021;2(2):42–54. [Google Scholar]
  • 10.Ahmed M. Ancient Pakistan-an archaeological history: volume III: Harappan Civilization-the material culture. Amazon; 2014.
  • 11.Dutt RC. A history of civilization in ancient india: Vedic and epic ages: Thacker. Spink and Company; 1889.
  • 12.Gupta GS. India: from indus Valley civlization to Mauryas. Concept Publishing Company; 1999.
  • 13.Rose D, Allen R. Ancient civilizations of the world. Scientific e-Resources; 2018.
  • 14.Thapar R. The Mauryan empire in early India. Hist Res. 2006;79(205):287–305. [Google Scholar]
  • 15.Gadgil M, Joshi N, Manoharan S, Patil S, Prasad US. Peopling of India. The Indian human heritage. 1998:100 – 29.
  • 16.Zamindar VF-Y. The long partition and the making of modern South asia: refugees, boundaries, histories. Columbia University; 2007.
  • 17.Bharadwaj P, Khwaja A, Mian A. The big march: migratory flows after the partition of India. Economic Political Wkly. 2008:39–49.
  • 18.Amrith SS. Migration and diaspora in modern Asia. Cambridge University Press; 2011.
  • 19.Leaning J, Bhadada S. The 1947 partition of British india: forced migration and its reverberations. SAGE Publishing India; 2022.
  • 20.Kumar P, Kothari R, Sindh. 1947 and beyond. Taylor & Francis; 2016. pp. 773 – 89.
  • 21.Parpola A. The roots of hinduism: the early Aryans and the indus civilization. USA: Oxford University Press; 2015. [Google Scholar]
  • 22.Kothari R. Unbordered memories: Sindhi stories of partition. Penguin Random House India Private Limited; 2018.
  • 23.Boivin M, Lalchandani T. Everyday Religiosity among the Hindu Sindhis of India: Sindhi Identity and the Religious Market in the Era of Social Networks. 2024:153 – 72.
  • 24.Wadhwani Y. The origin of the Sindhi Language. Bull Deccan Coll Res Inst. 1981:192–201.
  • 25.Rahman T. Language and politics in a Pakistan province: the Sindhi Language movement. Asian Surv. 1995;35(11):1005–16. [Google Scholar]
  • 26.Iyengar AV, Ndhlovu F, Schneider C. Sindhī multiscriptality, past and present: A sociolinguistic investigation into community acceptance. 2017.
  • 27.Ahmad M, Sinha A, Ghosh S, Kumar V, Davila S, Yajnik CS et al. Inclusion of Population-specific reference panel from India to the 1000 genomes phase 3 panel improves imputation accuracy. Sci Rep. 2017;7(1). [DOI] [PMC free article] [PubMed]
  • 28.Bhatti S, Aslamkhan M, Attimonelli M, Abbas S, Aydin HH. Mitochondrial DNA variation in the Sindh population of Pakistan. Australian J Forensic Sci. 2017;49(2):201–16. [Google Scholar]
  • 29.Quintana-Murci L, Chaix R, Wells RS, Behar DM, Sayar H, Scozzari R, et al. Where West Meets east: the complex MtDNA landscape of the Southwest and central Asian corridor. Am J Hum Genet. 2004;74(5):827–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Yasmin M, Rakha A, Noreen S, Salahuddin Z. Mitochondrial control region diversity in Sindhi ethnic group of Pakistan. Leg Med. 2017;26:11–3. [DOI] [PubMed] [Google Scholar]
  • 31.Singh M, Sarkar A, Kumar D, Nandineni MR. The genetic affinities of Gujjar and Ladakhi populations of India. Sci Rep. 2020;10(1):2055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Bhatti S, Abbas S, Aslamkhan M, Attimonelli M, Trinidad MS, Aydin HH, et al. Genetic perspective of uniparental mitochondrial DNA landscape on the Punjabi population, Pakistan. Mitochondrial DNA Part A. 2018;29(5):714–26. [DOI] [PubMed] [Google Scholar]
  • 33.Levi SC. The Indian diaspora in Central Asia and its trade, 1550–1900. 2001.
  • 34.Ikram MS, Mehmood T, Rakha A, Akhtar S, Khan MIM, Al-Qahtani WS et al. Genetic diversity and forensic application of Y-filer STRs in four major ethnic groups of Pakistan. BMC Genomics. 2022;23(1). [DOI] [PMC free article] [PubMed]
  • 35.Adnan A, Rakha A, Nazir S, Khan MF, Hadi S, Xuan J. Evaluation of 13 rapidly mutating Y-STRs in endogamous Punjabi and Sindhi ethnic groups from Pakistan. Int J Legal Med. 2019;133:799–802. [DOI] [PubMed] [Google Scholar]
  • 36.Mehdi S, Qamar R, Ayub Q, Khaliq S, Mansoor A, Ismail M, et al. The origins of Pakistani populations: evidence from Y chromosome markers. Genomic diversity: applications in human population genetics. Springer; 1999. pp. 83–90.
  • 37.Perveen R, Shahid AA, Shafique M, Shahzad M, Husnain T. Genetic variations of 15 autosomal and 17 Y-STR markers in Sindhi population of Pakistan. Int J Legal Med. 2017;131:1239–40. [DOI] [PubMed] [Google Scholar]
  • 38.Qamar R, Ayub Q, Mohyuddin A, Helgason A, Mazhar K, Mansoor A, et al. Y-chromosomal DNA variation in Pakistan. Am J Hum Genet. 2002;70(5):1107–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Anwar I, Hussain S, Rehman AU, Hussain M. Genetic variation among the major Pakistani populations based on 15 autosomal STR markers. Int J Legal Med. 2019;133:1037–8. [DOI] [PubMed] [Google Scholar]
  • 40.Pathak AK, Kadian A, Kushniarevich A, Montinaro F, Mondal M, Ongaro L, et al. The genetic ancestry of modern indus Valley populations from Northwest India. Am J Hum Genet. 2018;103(6):918–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kumar L, Chowdhari A, Sequeira JJ, Mustak MS, Banerjee M, Thangaraj K. Genetic affinities and adaptation of the South-West Coast populations of India. Genome Biol Evol. 2023;15(12):evad225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Bergstrom A, McCarthy SA, Hui RY, Almarri MA, Ayub Q, Danecek P, et al. Insights into human genetic variation and population history from 929 diverse genomes. Science. 2020;367(6484):1339–. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.GenomeAsia KC. The genomeasia 100K project enables genetic discoveries across Asia. Nature. 2019;576(7785):106–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4(2047-217X (Electronic)):7. [DOI] [PMC free article] [PubMed]
  • 45.Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26(22):2867–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Alexander DH, Novembre J, Lange K. Fast model-based Estimation of ancestry in unrelated individuals. Genome Res. 2009;19(9):1655–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2(12):e190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012;28(24):3326–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, et al. Ancient admixture in human history. Genetics. 2012;192(3):1065–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Lawson DJ, Hellenthal G, Myers S, Falush D. Inference of population structure using dense haplotype data. PLoS Genet. 2012;8(1):e1002453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Hofmeister RJ, Ribeiro DM, Rubinacci S, Delaneau O. Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK biobank. Nat Genet. 2023;55(7):1243–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Wangkumhang P, Greenfield M, Hellenthal G. An efficient method to identify, date, and describe admixture events using haplotype information. Genome Res. 2022;32(8):1553–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Ni X, Yuan K, Liu C, Feng Q, Tian L, Ma Z, et al. MultiWaver 2.0: modeling discrete and continuous gene flow to reconstruct complex population admixtures. Eur J Hum Genet. 2019;27(1):133–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Maples BK, Gravel S, Kenny EE, Bustamante CD. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am J Hum Genet. 2013;93(2):278–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Browning BL, Zhou Y, Browning SR. A One-Penny imputed genome from Next-Generation reference panels. Am J Hum Genet. 2018;103(3):338–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Browning BL, Browning SR. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics. 2013;194(2):459–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Team RC. R: A Language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2021. [Google Scholar]
  • 58.Jouganous J, Long W, Ragsdale AP, Gravel S. Inferring the joint demographic history of multiple populations: beyond the diffusion approximation. Genetics. 2017;206(3):1549–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Gower G, Ragsdale AP, Bisschop G, Gutenkunst RN, Hartfield M, Noskova E, et al. Demes: a standard format for demographic models. Genetics. 2022;222(3):iyac131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Chaubey G, Ayub Q, Rai N, Prakash S, Mushrif-Tripathy V, Mezzavilla M, et al. Like sugar in milk: reconstructing the genetic history of the Parsi population. Genome Biol. 2017;18(1):110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Chaubey G, Singh M, Rai N, Kariappa M, Singh K, Singh A, et al. Genetic affinities of the Jewish populations of India. Sci Rep. 2016;6(1):19166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Kumar L, Farias K, Prakash S, Mishra A, Mustak MS, Rai N, et al. Dissecting the genetic history of the Roman Catholic populations of West Coast India. Hum Genet. 2021;140(10):1487–98. [DOI] [PubMed] [Google Scholar]
  • 63.Narasimhan VM, Patterson N, Moorjani P, Rohland N, Bernardos R, Mallick S, et al. The formation of human populations in South and central Asia. Science. 2019;365(6457):eaat7487. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 2. (28.4KB, xlsx)

Data Availability Statement

The data supporting the findings of this study are available upon request.


Articles from Human Genomics are provided here courtesy of BMC

RESOURCES