A major challenge of studies with human pluripotent stem cells (hPSCs) is their intrinsic variability. Inspired by the success of isogenic strains of model organisms, a recent initiative proposed to overcome this challenge by adopting a widespread use of reference hPSC lines1. Genomic and functional characterization of nine induced hPSCs (iPSCs) suggested KOLF2.1J iPSCs as ideal reference due to their good pluripotency and differentiation performance, high gene editing efficiency and low burden of deleterious genetic variants after the correction of a heterozygous 19bp deletion in ARID2, a gene associated with a neurodevelopmental disorder (OMIM #617808)1.
KOLF2.1 iPSCs and gene-edited derivatives have been quickly adopted by the neurodegeneration field and many labs are starting to report their findings. We also embraced the initiative and started working with them. However, our routine genome integrity tests uncovered heterozygous copy number variants (CNVs) that may influence neurological disease modeling in KOLF2.1 iPSCs.
To reach our conclusion, we first performed an Illumina Infinium GSA-24v3 BeadChIP SNP array on three independent KOLF2.1J iPSC stocks and an NGN2 transgenic KOLF2.1J line obtained from The Jackson Laboratory. Using PennCNV for high-resolution CNV detection (CNV calls with >=20 supporting SNPs, >=100,000bp in length and <0.5 segmental duplication track coverage), we recurrently detected five heterozygous CNVs in all KOLF2.1J iPSC lines and confirmed their presence by visual review of the signal intensity (Log2 Rartio) and B allele frequency (BAF) of the SNPs (Fig S1A).
We then wondered whether these five CNVs were already present in the published KOLF2.1J whole genome sequencing (WGS) data1. Notably, the structural analysis of published short-read WGS data was performed with an algorithm that considers discordant read-pairs and split-reads for CNV calling1, which can result in false negatives if aberrant read pairs fail to pass the quality control filtering processes. Instead, to increase sensitivity of CNV detection from short-read WGS data, it is recommended to combine these algorithms with ones that consider read depth2. Thus, we downloaded the published KOLF2.1J iPSC WGS data from ADWB1 for re-analysis. After aligning to the reference genome with minimap2 (v2.17-r941) in paired-end short read mode, we applied an in-house software to calculate the coverage Log2 Ratios of the five CNV regions. We also called alternate SNPs and obtained their BAF using DeepVariant and BCFTools view (v1.9) after filtering by variant quality score >30 and read depth >10. This analysis confirmed that the five CNVs identified by the SNP array are inherent to the KOLF2.1J iPSCs and were missed in the original genome sequencing analysis (Fig S1B).
To further determine functional consequences, we defined the CNV breakpoints at base-resolution by analyzing the WGS alignment file with DELLY structural variant (SV) caller and filtering results by region, SV type and a minimum length of 100kb. This analysis revealed that the chr3p14 duplication CNV is larger than predicted by the SNP array (Fig S1A, B). Among the five CNVs identified, three span coding regions. Chr3p14 CNV duplicates the FHIT and PTPRB, but the functional consequences are unclear since there might be a tandem duplication or an extra copy elsewhere in the genome. The heterozygous chr6p22 CNV deletes the entire DTNBP1 gene and 11 out of the 18 coding exons of JARID2 (Fig S1C) and chr9q33 CNV deletes exon 20 of ASTN2 and its antisense gene (ASTN2-AS1) (Fig S1D), all predicting pathogenic haploinsufficiencies. DTNBP1 encodes Dysbindin, a protein involved in lysosome-related organelle biogenesis and associated with increased susceptibility for schizophrenia and Hermansky-Pudlak syndrome 7 (OMIM #614076)3. JARID2 is a gene highly intolerant to genetic disruption (pLI=1, pLOF o/e=0.09, pHaplo=0.99, pTriplo=0.99 and pHI=0.59)4,5. As a regulatory subunit of the Polycomb Repressive Complex 2 (PRC2), JARID2 is critical for embryonic development and tissue homeostasis. Notably, CNVs causing JARID2 haploinsufficiency cause a neurodevelopmental delay with variable intellectual disability syndrome (OMIM #620098)6. Genetic constraint and disease associations of JARID2 are reminiscent of ARID2 (pLI=1, pLOF o/e=0.04, pHaplo=0.99, pTriplo=0.98 and pHI=0.62), in which a heterozygous 19bp deletion was corrected to generate the KOLF2.1J iPSC line1. ASTN2 encodes a protein involved in trafficking and degradation of neuronal cell adhesion molecules important for the regulation of neuronal migration and synaptic development. Similar to JARID2, mutational constraint metrics indicate that disruptive variants in ASTN2 are rare in human genomes (pLI=1, pHaplo=0.88, pTriplo=0.672 and pHI=0.95)4,5 and constitute a risk factor for autism and other neurodevelopmental disorders7.
Given the relevance of JARID2, DTNBP1 and ASTN2 haploinsufficiencies for neural phenotypes derived from KOLF2.1Js, we further validated these CNVs with locus-specific methods. Remarkably, qPCR analysis of gDNA showed half dose of the three genes in KOLF2.1Js (Fig S1C,D) and Sanger sequencing of PCR products comprising breakpoint junctions confirmed coordinates with only 1bp discrepancy for chr6p22 CNV (Fig S1E). Furthermore, this analysis revealed that the chr9q33 CNV likely arose from an Alu/Alu mediated genomic rearrangement (Fig S1F). In agreement, data retrieved from a recent proteomic study showed that KOLF2.1J iPSCs and derived neural progenitors express half levels of ASTN2, JARID2 and DTNBP1 compared to a control hPSC line (Fig S1G)8.
Finally, we reasoned that if these CNVs were already present in the healthy 55–59 year-old donor from which KOLF2.1Js were derived, they would not be deleterious for the donor individual neither for iPSC performance. However, the analysis of publicly available SNP genotyping of the donor’s fibroblasts line (HPSI0114pf-kolf) only detected the chr18q22 duplication CNV (Fig S1H). In addition, there is no evidence for chr6p22 and chr9q33 deletion CNVs in the human pangenome reference sequence9. These data collectively suggest that the healthy donor is unlikely to carry the chr6p22 and chr9q33 CNVs which likely arose in vitro over the KOLF2.1J iPSC generation process.
Although the functional significance of chr6p22 and chr9q33 for neurological disease modeling with KOLF2.1J remains to be determined, our findings are important to guide the interpretation of data derived from KOLF2.1J neural models and to advocate for the use of multiple hPSC lines in research, as also recommended by ISSCR standards10.
Supplementary Material
Acknowledgments
We thank members of the iNDI initiative for their open discussion, suggestions and encouragement to publish our findings as aligned with their transparency policy. We also thank Dr. Skarnes for kindly sharing with us KOLF2.1J and NGN2-KOLF2.1J iPSCs early on. This work was supported by NIH/NINDS R01NS119699 and NIH/NICHD R21HD107592 (to N.A.), by the CZI Collaborative Pairs Award (to R.A.N. and E.J.B.) and by technical consultation from the IDDRC Biostatistics and Data Science core (NIH/NICHD HD105354).
Footnotes
Declaration of Interests
The authors declare no competing interests.
References
- 1.Pantazis CB, Yang A, Lara E, McDonough JA, Blauwendraat C, Peng L, Oguro H, Kanaujiya J, Zou J, Sebesta D, et al. (2022). A reference human induced pluripotent stem cell line for large-scale collaborative studies. Cell Stem Cell 29, 1685–1702 e1622. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Ahsan MU, Liu Q, Perdomo JE, Fang L, and Wang K (2023). A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat Methods 20, 1143–1158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Mullin AP, Gokhale A, Larimore J, and Faundez V (2011). Cell biology of the BLOC-1 complex subunit dysbindin, a schizophrenia susceptibility gene. Mol Neurobiol 44, 53–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alfoldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP, et al. (2020). The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Collins RL, Glessner JT, Porcu E, Lepamets M, Brandon R, Lauricella C, Han L, Morley T, Niestroj LM, Ulirsch J, et al. (2022). A cross-disorder dosage sensitivity map of the human genome. Cell 185, 3041–3055 e3025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Verberne EA, Goh S, England J, van Ginkel M, Rafael-Croes L, Maas S, Polstra A, Zarate YA, Bosanko KA, Pechter KB, et al. (2021). JARID2 haploinsufficiency is associated with a clinically distinct neurodevelopmental syndrome. Genet Med 23, 374–383. [DOI] [PubMed] [Google Scholar]
- 7.Lionel AC, Tammimies K, Vaags AK, Rosenfeld JA, Ahn JW, Merico D, Noor A, Runke CK, Pillalamarri VK, Carter MT, et al. (2014). Disruption of the ASTN2/TRIM32 locus at 9q33.1 is a risk factor in males for autism spectrum disorders, ADHD and other neurodevelopmental phenotypes. Hum Mol Genet 23, 2752–2768. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Nam KH, and Ordureau A (2022). Quantitative proteome remodeling characterization of two human reference pluripotent stem cell lines during neurogenesis and cardiomyogenesis. Proteomics 22, e2100246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas JK, Monlong J, Abel HJ, et al. (2023). A draft human pangenome reference. Nature 617, 312–324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ludwig TE, Andrews PW, Barbaric I, Benvenisty N, Bhattacharyya A, Crook JM, Daheron LM, Draper JS, Healy LE, Huch M, et al. (2023). ISSCR standards for the use of human stem cells in basic research. Stem Cell Reports 18, 1744–1752. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
