Summary
The effect of consanguinity on identifying universal induced pluripotent stem cell (iPSC) donors, i.e., homozygous for the major human leukocyte antigen (HLA) loci, is unknown. The discovery sample size was calculated in a consanguineous population using a method () based on the inbreeding coefficient. The result was orders of magnitude smaller compared to the standard method.
Introduction
The relatively recent discovery that human somatic cells can be reprogrammed into induced pluripotent stem cells (iPSCs) is considered one of the most influential breakthroughs of the 21st century. Since iPSCs can differentiate into any cell type, they have the potential to serve as an unlimited resource for cell therapy and regenerative medicine (Cossu et al., 2018). There is a growing interest in utilizing iPSCs for transplantation to cure a myriad of disorders, such as Parkinson’s disease and age-related macular degeneration (Mandai et al., 2017; Schweitzer et al., 2020). Currently, immune rejection after transplantation is considered one of the biggest challenges facing regenerative medicine (Simpson et al., 2023). Although autologous iPSCs would bypass this hurdle, the prospect of creating autologous iPSCs for each individual is an unscalable and costly approach. Instead, a viable alternative would be the establishment of an “off-the-shelf” bank of universal iPSC donors.
Incompatibility of the major histocompatibility complex (MHC), determined by the human leukocyte antigen (HLA) genes, is the key player in immune rejection. Of these HLA loci, HLA-A, HLA-B, and HLA-DRB1 are the major determinants of matching (Gourraud et al., 2012). One approach to address this incompatibility problem is to genetically engineer the cells to delete the MHC itself, thus rendering them, in theory, universal donor iPSCs. This, however, has its own set of risks and complications including neoplasm and teratoma formation (Simpson et al., 2023). On the other hand, the use of iPSCs from donors who are homozygous for the three major HLA loci (triple-homozygous donors) represents a much safer and more practical alternative. Therefore, an iPSC bank containing triple-homozygous cell lines for the most frequent HLA haplotypes in the population can provide a largely risk-free, “off-the-shelf” solution (Taylor et al., 2005).
This establishment of iPSC banks has been recognized as an appealing solution, and mathematical models have been developed to calculate the number of individuals to be screened in order to identify the triple-homozygous donors (discovery sample size) (Gourraud et al., 2012) for the banks. As shown in populations where this approach was contemplated, the discovery sample size tends to be very large. For instance, a Brazilian study estimated that in order to identify the 559 triple-homozygous donors who would provide coverage for 95% of that population, one would need to screen nearly 4,000,000 individuals (de Oliveira et al., 2023). Surprisingly, the impact of consanguinity on discovery sample size estimates has yet to be explored. This is despite the fact that it is known to facilitate the high occurrence of homozygous alleles in a population. Thus, one would expect a highly consanguineous population to be enriched for triple-homozygous donors such that the discovery sample size would be relatively small. Saudi Arabia is a highly consanguineous population with well-established HLA frequency data (Alfraih et al., 2021). This presented us with an opportunity to investigate the effect of consanguinity on discovery sample size estimates.
Predicted large iPSC bank size indicates high genetic diversity of the Saudi population
We found that in order to cover 95% of the Saudi population, 581 triple-homozygous iPSC lines are needed (Figure 1; Table S1). Table 1 displays the 30 most frequent haplotypes, their estimated coverage, and the corresponding discovery sample size. In addition, in order to cover 50%, 75%, and 85% of the population, 41, 140, and 251 triple-homozygous iPSC lines would be needed, respectively (Figure 1). This iPSC bank size is large even when compared to the highly genetically diverse Brazilian population. The corresponding 95% coverage of the much larger and highly admixed Brazilian population is 559 triple-homozygous iPSC lines (de Oliveira et al., 2023). The predicted iPSC bank size for the Saudi population is even more striking when compared to that of the Finnish population, which is also enriched for homozygosity but where less than 50 homozygous iPSC lines (2 locus HLA) are required to cover 95% of the population (Clancy et al., 2022).
Figure 1.
Graph representation of the relationship between the number of iPSC lines and the percentage of the target (Saudi) population covered.
The red points correspond to the 41, 140, 251, and 581 triple-homozygous iPSC lines needed to cover 50%, 75%, 85%, and 95% of population, respectively.
Table 1.
The 30 most frequent HLA haplotypes in the Saudi population, their frequencies, estimated coverage, and the corresponding discovery sample size
| Haplotype | n | Frequency | Cumulative population coverage | Discovery sample size (IC method) | Discovery sample size (standard method) |
|---|---|---|---|---|---|
| A∗02∼B∗50∼DRB1∗07 | 3,257 | 0.035820445 | 0.070357785 | 1,163.2 | 779.4 |
| A∗02∼B∗51∼DRB1∗04 | 1,710 | 0.018817209 | 0.106290035 | 2,214.3 | 2,824.2 |
| A∗26∼B∗08∼DRB1∗03 | 1,256 | 0.01381056 | 0.13221127 | 3,017.0 | 5,243.0 |
| A∗23∼B∗50∼DRB1∗07 | 1,168 | 0.01284866 | 0.155984567 | 3,242.9 | 6,057.4 |
| A∗02∼B∗07∼DRB1∗15 | 1,073 | 0.011789139 | 0.177507021 | 3,534.3 | 7,195.1 |
| A∗01∼B∗41∼DRB1∗07 | 989 | 0.010885212 | 0.197132434 | 3,827.8 | 8,439.7 |
| A∗31∼B∗51∼DRB1∗13 | 897 | 0.009864418 | 0.214712733 | 4,223.9 | 10,276.8 |
| A∗24∼B∗08∼DRB1∗03 | 886 | 0.00973965 | 0.231879733 | 4,278.0 | 10,541.8 |
| A∗68∼B∗08∼DRB1∗03 | 789 | 0.008673389 | 0.247007651 | 4,804.0 | 13,293.0 |
| A∗33∼B∗14∼DRB1∗01 | 684 | 0.007517493 | 0.259997767 | 5,542.6 | 17,695.1 |
| A∗02∼B∗51∼DRB1∗13 | 631 | 0.006935427 | 0.271881844 | 6,007.8 | 20,790.0 |
| A∗02∼B∗51∼DRB1∗15 | 619 | 0.00678727 | 0.28341891 | 6,138.9 | 21,707.5 |
| A∗01∼B∗73∼DRB1∗10 | 582 | 0.006398224 | 0.294210309 | 6,512.2 | 24,427.6 |
| A∗02∼B∗07∼DRB1∗03 | 579 | 0.00635411 | 0.304846275 | 6,557.4 | 24,768.0 |
| A∗11∼B∗52∼DRB1∗15 | 564 | 0.006205572 | 0.315155666 | 6,714.4 | 25,967.9 |
| A∗68∼B∗50∼DRB1∗07 | 562 | 0.006184673 | 0.325353708 | 6,737.1 | 26,143.7 |
| A∗03∼B∗50∼DRB1∗07 | 555 | 0.006106359 | 0.335347563 | 6,823.5 | 26,818.6 |
| A∗33∼B∗58∼DRB1∗03 | 529 | 0.005815561 | 0.344796156 | 7,164.7 | 29,567.6 |
| A∗31∼B∗50∼DRB1∗07 | 512 | 0.005623392 | 0.353868204 | 7,409.5 | 31,623.0 |
| A∗30∼B∗42∼DRB1∗03 | 512 | 0.005620087 | 0.362871732 | 7,413.9 | 31,660.2 |
| A∗24∼B∗35∼DRB1∗11 | 489 | 0.005374934 | 0.371423421 | 7,752.0 | 34,614.2 |
| A∗31∼B∗15∼DRB1∗13 | 479 | 0.005264996 | 0.379744176 | 7,913.9 | 36,074.8 |
| A∗30∼B∗53∼DRB1∗13 | 457 | 0.005026816 | 0.387636778 | 8,288.9 | 39,574.4 |
| A∗24∼B∗15∼DRB1∗13 | 457 | 0.00502551 | 0.395476811 | 8,291.0 | 39,595.0 |
| A∗02∼B∗51∼DRB1∗16 | 430 | 0.004726393 | 0.402804117 | 8,815.7 | 44,765.2 |
| A∗68∼B∗07∼DRB1∗04 | 426 | 0.004685168 | 0.410023416 | 8,893.3 | 45,556.4 |
| A∗02∼B∗58∼DRB1∗03 | 423 | 0.004675942 | 0.417184728 | 8,910.9 | 45,736.4 |
| A∗30∼B∗13∼DRB1∗07 | 423 | 0.004654791 | 0.424270214 | 8,951.4 | 46,153.0 |
| A∗68∼B∗51∼DRB1∗11 | 414 | 0.004557838 | 0.431166128 | 9,141.8 | 48,137.4 |
| A∗01∼B∗52∼DRB1∗15 | 413 | 0.004550811 | 0.438009959 | 9,155.9 | 48,286.1 |
Complete list can be found in Table S1. n denotes the number of individuals carrying the corresponding haplotype in the cohort (Alfraih et al., 2021). IC stands for inbreeding coefficient method.
Consanguinity predicts a remarkably small discovery sample size
When using the standard method of , we estimate that 9,184,050 individuals would have to be screened in order to identify the 581 triple-homozygous haplotypes mentioned earlier (Table S1). However, when accounting for the high average inbreeding coefficient of the population and using the formula , the discovery sample size drops dramatically to just 126,272. Table S1 demonstrates the dramatic reduction in the discovery sample size as predicted by the two methods (standard vs. inbreeding coefficient). The validity of the inbreeding coefficient method was confirmed using recently published genotype data of more than 60,000 Saudi individuals (Abouelhoda et al., in press). This dataset contains 625 triple homozygotes, which we used as the truth set (observed) to compare the performance of the two methods. As shown in Table S2, the predicted frequency of triple homozygotes was significantly (p < 0.0001) more accurate (smaller deviation observed) by the inbreeding coefficient method as compared to the standard method. Figure S1 illustrates the distribution of paired differences in absolute percent deviations from observed between the standard method and the inbreeding coefficient-corrected method (a positive value reflecting the standard method with a larger percent deviation).
Relevance of Saudi iPSC bank to other populations
We have investigated the coverage of the 581 triple-homozygous HLA haplotypes in other populations, namely, the Japanese and German populations, due to the availability of large sample sizes and their apparent distinctness compared to the Saudi population. The striking differences between these two populations and the Saudi population can be seen through two revealing observations. First, of the 581 triple-homozygous HLA haplotypes, only 200 and 181 haplotypes were present in the German and Japanese populations, respectively (Tables S3). Second, the HLA-A∗03-B∗07-DRB1∗15 haplotype, which is the second most common overlapping HLA haplotype in the German population, is 40 times more common in Germany compared to Saudi Arabia. Similarly, the most common overlapping HLA haplotype in Japan, HLA-A∗24-B∗52-DRB1∗15, is also around 41 times more frequent in Japan than in Saudi Arabia. Despite the clear distinctness between these populations and the Saudi one, we found that the Saudi triple-homozygous HLA haplotypes that are overlapping between the two populations still cover more than half of both populations, specifically 59.9% and 50.1% in the German and Japanese populations, respectively.
Conclusion
With the growing interest in iPSCs in the field of regenerative medicine, the creation of a bank consisting of iPSC lines representative of the population provides a more effective and scalable approach compared to autologous derivation for transplantation. This is especially true when using triple-homozygous donors for the three major HLA loci (HLA-A, HLA-B, and HLA-DRB1) to provide maximum coverage of the population as they can serve as “universal donors.” In this study, we estimated the size of such a bank of iPSC lines in Saudi Arabia based on existing databases on HLA frequencies in the population. In addition, we were able to determine how the discovery sample size in this population is influenced by its uniquely high rate of consanguinity.
After estimating the number of iPSC lines required to cover 95% of the Saudi population, we found that the cell bank size was comparable to that of the Brazilian population, which is regarded as a highly genetically diverse population. 581 iPSC lines correspond to a Saudi population coverage of 95%, and 559 iPSC lines provide a similar coverage in the Brazilian population (de Oliveira et al., 2023). Considering the much larger size of the Brazilian population, this suggests an even more impressive degree of genetic diversity in the Saudi population. This may seem surprising as it is commonly assumed that the high rates of consanguinity in this population should lead to increased homogeneity. However, this simplistic view disregards the growing evidence that indigenous Arabs are very ancient people who have accumulated a myriad of different genetic variations over their rich history (Almarri et al., 2021; Mineta et al., 2021). In fact, they rank only second to Africans as the world’s most genetically diverse population (Razali et al., 2021).
Another characteristic feature of the Saudi population is its high rate of consanguinity, among the highest in the world. This has led to a high prevalence and incidence of a large number of autosomal recessive (AR) diseases (Monies et al., 2019). Another population sharing a high rate of AR diseases is the Finnish. This may lead to the erroneous assumption that the genetic landscape of the two populations follows the same dynamics. However, the underlying mechanism causing these high rates is inherently different. While the Saudi population’s high AR disease rate is caused by its consanguineous nature, the Finnish population is characterized by a strong bottleneck effect, leading to a high carrier rate of only a small number of AR diseases (Norio 2003). In fact, when comparing the coverage provided by a given number of iPSC lines between the two populations, the Finnish population requires a significantly smaller cell bank size, which is consistent with their low genetic diversity due to the aforementioned bottleneck effect. Interestingly, the analysis we performed to compare the Saudi population to the Japanese and German populations shows that despite the differences in HLA haplotype frequencies, the Saudi iPSC bank can still be greatly beneficial to other populations.
One major issue when considering the establishment of an iPSC bank is determining the number of individuals to be screened in order to identify the required triple-homozygous HLA haplotypes. This discovery sample size has been estimated in other populations using the standard method, which assumes random mating between individuals in the population. However, this approach would clearly be inaccurate in highly consanguineous populations, such as that of Saudi Arabia. To address this, we used a different method, which takes into consideration the high inbreeding coefficient of the local population (Abouelhoda et al., 2016). As a result, we were able to show that a discovery sample size that is nearly two orders of magnitude smaller is sufficient and more faithfully reflects the enriched homozygosity caused by consanguinity. Importantly, the accuracy of our prediction was confirmed by comparing our results with the observed genotype frequencies in a large Saudi cohort. Moreover, this reduced discovery sample size makes the establishment of an iPSC bank even more appealing for policymakers as a viable approach to realize the benefits of regenerative medicine.
Acknowledgments
Declaration of interests
The authors declare no competing interests.
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.stemcr.2024.06.004.
Supplemental information
References
- Abouelhoda M., Sobahy T., El-Kalioby M., Patel N., Shamseldin H., Monies D., Al-Tassan N., Ramzan K., Imtiaz F., Shaheen R., Alkuraya F.S. Clinical genomics can facilitate countrywide estimation of autosomal recessive disease burden. Genet. Med. 2016;18:1244–1249. doi: 10.1038/gim.2016.37. [DOI] [PubMed] [Google Scholar]
- Alfraih F., Alawwami M., Aljurf M., Alhumaidan H., Alsaedi H., El Fakih R., Alotaibi B., Rasheed W., Bernas S.N., Massalski C., et al. High-resolution HLA allele and haplotype frequencies of the Saudi Arabian population based on 45,457 individuals and corresponding stem cell donor matching probabilities. Hum. Immunol. 2021;82:97–102. doi: 10.1016/j.humimm.2020.12.006. [DOI] [PubMed] [Google Scholar]
- Almarri M.A., Haber M., Lootah R.A., Hallast P., Al Turki S., Martin H.C., Xue Y., Tyler-Smith C. The genomic history of the Middle East. Cell. 2021;184:4612–4625.e14. doi: 10.1016/j.cell.2021.07.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clancy J., Hyvärinen K., Ritari J., Wahlfors T., Partanen J., Koskela S. Blood donor biobank and HLA imputation as a resource for HLA homozygous cells for therapeutic and research use. Stem Cell Res. Ther. 2022;13:502. doi: 10.1186/s13287-022-03182-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cossu G., Birchall M., Brown T., De Coppi P., Culme-Seymour E., Gibbon S., Hitchcock J., Mason C., Montgomery J., Morris S., et al. Lancet Commission: Stem cells and regenerative medicine. Lancet. 2018;391:883–910. doi: 10.1016/S0140-6736(17)31366-1. [DOI] [PubMed] [Google Scholar]
- de Oliveira M.L.M., Tura B.R., Leite M.M., Dos Santos E.J.M., Pôrto L.C., Pereira L.V., de Carvalho A.C.C. Creating an HLA-homozygous iPS cell bank for the Brazilian population: Challenges and opportunities. Stem Cell Reports. 2023;18:1905–1912. doi: 10.1016/j.stemcr.2023.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gourraud P.-A., Gilson L., Girard M., Peschanski M. The role of human leukocyte antigen matching in the development of multiethnic “haplobank” of induced pluripotent stem cell lines. Stem Cell. 2012;30:180–186. doi: 10.1002/stem.772. [DOI] [PubMed] [Google Scholar]
- Mandai M., Watanabe A., Kurimoto Y., Hirami Y., Morinaga C., Daimon T., Fujihara M., Akimaru H., Sakai N., Shibata Y., et al. Autologous induced stem-cell–derived retinal cells for macular degeneration. N. Engl. J. Med. 2017;376:1038–1046. doi: 10.1056/NEJMoa1608368. [DOI] [PubMed] [Google Scholar]
- Mineta K., Goto K., Gojobori T., Alkuraya F.S. Population structure of indigenous inhabitants of Arabia. PLoS Genet. 2021;17 doi: 10.1371/journal.pgen.1009210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Monies D., Abouelhoda M., Assoum M., Moghrabi N., Rafiullah R., Almontashiri N., Alowain M., Alzaidan H., Alsayed M., Subhani S., et al. Lessons learned from large-scale, first-tier clinical exome sequencing in a highly consanguineous population. Am. J. Hum. Genet. 2019;104:1182–1201. doi: 10.1016/j.ajhg.2019.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Norio R. Finnish Disease Heritage II: population prehistory and genetic roots of Finns. Hum. Genet. 2003;112:457–469. doi: 10.1007/s00439-002-0876-2. [DOI] [PubMed] [Google Scholar]
- Razali R.M., Rodriguez-Flores J., Ghorbani M., Naeem H., Aamer W., Aliyev E., Jubran A., Qatar Genome Program Research Consortium, Clark A.G., Fakhro K.A., et al. Thousands of Qatari genomes inform human migration history and improve imputation of Arab haplotypes. Nat. Commun. 2021;12:5929. doi: 10.1038/s41467-021-25287-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schweitzer J.S., Song B., Herrington T.M., Park T.-Y., Lee N., Ko S., Jeon J., Cha Y., Kim K., Li Q., et al. Personalized iPSC-derived dopamine progenitor cells for Parkinson’s disease. N. Engl. J. Med. 2020;382:1926–1932. doi: 10.1056/NEJMoa1915872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simpson A., Hewitt A.W., Fairfax K.A. Universal cell donor lines: A review of the current research. Stem Cell Reports. 2023;18:2038–2046. doi: 10.1016/j.stemcr.2023.09.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Taylor C.J., Bolton E.M., Pocock S., Sharples L.D., Pedersen R.A., Bradley J.A. Banking on human embryonic stem cells: estimating the number of donor cell lines needed for HLA matching. Lancet. 2005;366:2019–2025. doi: 10.1016/S0140-6736(05)67813-0. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

