Abstract
A major challenge in deciphering the complex genetic landscape of polycystic ovary syndrome (PCOS) lies in the limited understanding of how susceptibility loci drive molecular mechanisms across diverse phenotypes. To address this, we integrated molecular and epigenomic annotations from proposed causal cell types and employed a deep learning (DL) framework to predict cell type–specific regulatory effects of PCOS-risk variants. Our analysis revealed that these variants affect key transcription factor–binding sites, including NR4A1/2, NHLH2, FOXA1, and WT1, which regulate gonadotropin signaling, folliculogenesis, and steroidogenesis across brain and endocrine cell types. The DL model, which showed strong concordance with reporter assay data, identified enhancer-disrupting activity in ∼20% of risk variants. Notably, many of these variants disrupt transcription factors involved in androgen-mediated signaling, providing molecular insights into hyperandrogenemia in PCOS. Variants prioritized by the model were more pleiotropic and exerted stronger regulatory effects on gene expression compared with other risk variants. Using the IRX3-FTO locus as a case study, we demonstrate how regulatory disruptions in tissues such as the fetal brain, pancreas, adipocytes, and endothelial cells may link obesity-associated mechanisms to PCOS pathogenesis via neuronal development, metabolic dysfunction, and impaired folliculogenesis. Collectively, our findings highlight the utility of integrating DL models with epigenomic data to uncover disease-relevant variants, reveal cross-tissue regulatory effects, and refine mechanistic understanding of PCOS.
Keywords: polycystic ovary syndrome (PCOS), regulatory genomics, enhancer variants, deep learning, artificial intelligence, disease-causal noncoding variants
Polycystic ovary syndrome (PCOS) is a multifactorial endocrine disorder characterized by abnormal LH:FSH (luteinizing hormone:follicle-stimulating hormone) ratios and elevated androgen levels, leading to anovulation, polycystic ovaries, and various hyperandrogenism-related comorbidities [1-3]. The reproductive abnormalities in PCOS stem from disruptions in the hypothalamic–pituitary–gonadal (HPG) axis, which also contributes to other conditions like oligomenorrhea, ovarian insufficiency, infertility, hyper- and hypogonadism, and endometriosis [4]. This overlap in clinical features complicates PCOS diagnosis, prompting the establishment of multiple diagnostic criteria by the National Institutes of Health (NIH), Rotterdam, and the Androgen Excess and PCOS Society. Based on a consensus, diagnosis relies on clear indications of hyperandrogenism, polycystic ovarian morphology, and ovulatory dysfunction [3, 5].
Decades of research on molecular mechanisms underlying the disease have identified impaired folliculogenesis and enhanced steroidogenesis in theca and granulosa cells (GCs) as key contributors to PCOS development [6]. These pathways are spatiotemporally regulated by LH and FSH, secreted by the pituitary gland in response to hypothalamic gonadotropin-release hormone (GnRH)-based stimulation [7]. Moreover, PCOS often coincides with hyperinsulinemia [8], though the molecular origins of the association between the 2 are still being investigated. Hyperinsulinemia worsens hyperandrogenism by affecting adrenal androgen production and reducing sex hormone–binding globulin (SHBG) levels in the liver [9] and may also contribute to the metabolic comorbidities, such as obesity, type II diabetes, and liver dysfunction [3, 10].
Twin studies indicate a strong genetic component to PCOS pathogenicity [11], involving impaired regulation of the HPG axis and hyperandrogenemia [12]. Polymorphisms in coding regions of genes expressing kisspeptin (Kiss1, an upstream regulator of GnRH) [13], GnRH receptor [14], anti-Müllerian hormone (AMH) [15] and its receptor, AMHR2 [16], as well as the FSH receptor [17], have been associated with disrupted hormonal signaling in patients with PCOS. However, with the exception of AMH and AMHR2, there is limited functional evidence directly linking coding variants to the PCOS phenotype [15, 16]. On the other hand, PCOS genome-wide association studies (GWASs) across diverse populations have identified multiple noncoding variants in novel disease loci, implicating several genes, such as IRF1, THADA, FTO, etc. [12], whose relevance in PCOS etiology remains to be established. Only a handful of studies have validated the regulatory activity of noncoding variants and their target genes with mechanistic insights into disease pathogenicity. For instance, functional assays at the FSHB locus have confirmed that the variant rs11031006, located within a distal enhancer, modulates FSHB expression by disrupting the binding of steroidogenic factor 1 (SF1) [18], and rs28441318 and rs10117455 have been shown to regulate DENND1A expression [19], which influences androgen levels [20]. Furthermore, studies investigating the functions of risk locus-associated genes have offered valuable insights into their potential contribution to PCOS pathophysiology [12]. For instance, ERBB4 and GATA4 regulate folliculogenesis [21, 22], ZNF217 regulates androgen production in theca cells [23], and HMGA2 promotes GC proliferation [24]. Interestingly, some genes exhibit pleiotropy depending on cell type context; HMGA2 also regulates adipogenesis [25], and FSH influences bone density and adipose mass [4]. Understanding how PCOS-associated regulatory variants collectively shape gene regulation via complex genomic and epigenomic interactions across diverse cellular contexts and manifest as disease phenotypes and comorbidities served as the motivation for our study [26].
We performed a functional assessment of PCOS susceptibility loci by integrating epigenomic data, functional assays, and a deep learning (DL)-based approach to predict regulatory single nucleotide variants (SNVs) across 11 disease-associated cell types. We further investigated their potential influence on the molecular mechanisms underlying PCOS etiology. This approach facilitated the identification of key transcription factors (TFs) involved in folliculogenesis, androgen-mediated signaling, and ovarian development, whose binding sites are predicted to be disrupted by regulatory variants. Using the well-characterized regulatory locus of FTO, which harbors distal enhancers targeting IRX3, we demonstrate how DL models combined with prior knowledge of key PCOS TFs can effectively prioritize regulatory variants.
Materials and methods
PCOS susceptibility loci
The PCOS GWAS summary statistics were obtained from the National Human Genome Research Institute-European Bioinformatics Institute (NHGRI-EBI) GWAS catalog (accessed in October 2024, Table S1 [27]). Effects and other alleles for each SNV were identified based on information reported in the original GWAS publication or inferred from the reported odds ratios. All analyses were conducted using the coordinates and datasets of GRCh38 reference genome (patch release 14).
Identification and enrichment analysis of TF-binding site
Input regions for motif identification were defined by extending variant sites by 30 bp on each side. We used command line Find Individual Motif Occurrences (FIMO) [28] to scan vertebrate TF motifs from JASPAR [29] and HOCOMOCO [30] databases along the sequences, applying a P-value threshold of 10−5. To identify TFs enriched in the loci of pcosSNVs, we generated a background set of SNVs by extracting all variants from the 1000 Genomes Project within a 50 kb flanking region of each pcosSNV. After excluding the pcosSNVs themselves and removing duplicates, this resulted in a nonredundant background set of ∼71 000 control SNVs.
To avoid redundant counting of overlapping binding sites detected by FIMO, the best-scoring binding sites among those overlapping by >50% were considered as the true binding site. To calculate fold enrichment for a TF T among predicted regulatory SNVs compared with control SNVs, we used the following:
Here, TP represents the number of binding sites among predicted regulatory SNVs, and TN represents the total number among control SNVs. Npositive and Ncontrol denote the total number of distinct binding sites (ie, not exceeding 50% overlap) among the predicted regulatory SNVs and control SNVs, respectively.
Differential enrichment of transcription factor–binding sites (TFBSs) between the metabolic and reproductive subtypes was assessed using a hypergeometric test, with normalized counts of a TF overlapping variants of the reproductive subtype analyzed against the normalized counts of the same TF overlapping variants of the metabolic subtype.
To identify allele-specific changes by pcosSNVs, the variant sites were scanned with reference and alternate alleles to identify gain and loss of motifs, and changes in motif scores were used to assess affinity differences between reference and alternate alleles.
Cell type–specific DL models
We used a 2-phase TREDNet model developed in our lab for cell type–specific enhancer prediction [31]. The first phase of the model was pretrained on 4560 genomic and epigenomic profiles, which included DNase I hypersensitive sites (DHS), assay for transposase-accessible chromatin using sequencing (ATAC-seq), histone chromatin immunoprecipitation sequencing (ChIP-seq), and TF ChIP-seq peaks from Encyclopedia of DNA Elements (ENCODE) v4 [32]. The second phase was fine-tuned to predict cell type–specific enhancers using training datasets described below. Chromosomes 8 and 9 were held out for testing, chromosome 6 was used for validation, and other autosomal chromosomes were used to build the second-phase model.
Open chromatin (DHS or ATAC-seq) and H3K27ac profiles for the causal cell types were downloaded from ENCODE [32] (Table S2 [27]). Positive datasets were defined as 2 kb regions centered on DHS or ATAC-seq peaks overlapping with H3K27ac (or H3K4me1 in fetal brain) peaks of each cell type, excluding coding sequences, promoter proximal regions (<2 kb from transcription start sites [TSS]) and ENCODE blacklisted regions [33]. A 10-fold control dataset was generated for each cell type using randomly sampled 2 kb fragments of the genome, excluding the positive dataset of that cell type and blacklisted regions.
Each 2 kb fragment received an enhancer probability score. Active enhancers were predicted at a 10% false positive rate (FPR) with a 1:10 positive-to-control ratio. Variant effects were assessed by scoring 2 kb regions centered on each variant for reference and alternate alleles. A significant enhancer activity change was defined as an alternate/reference score ratio >1.2 or <0.8.
GWAS trait enrichment
Summary statistics for 25 649 traits were downloaded from the NHGRI-GWAS catalog. Linkage disequilibrium (LD) variants for each GWAS SNV were identified using PLINK (v1.9 [34]) with an r2 threshold of ≥0.8. Traits with at least 1000 combined GWAS and LD variants were retained for downstream enrichment analysis in reproductive and metabolic SNV categories.
Data and tools
The H3K27ac peaks for KGN cells and adipocytes were sourced from literature [35, 36]. The KGN wig file was converted to NarrowPeak format using UCSC BigWig tools [37] and MACS peak-calling software [38].
Motif logos were retrieved from HOCOMOCO database [30]. Functional enrichment of risk loci was performed using Genomic Regions Enrichment of Annotations Tool (GREAT) using whole genome as background [39]. The element–gene association setting was selected as “2-nearest genes” within 100 kb. Gene set enrichment of putative target genes of pcosSNVs was performed using gProfiler [40]. Protein interaction networks and their enriched pathways were obtained from STRING database [41].
Evolutionary conservation of genomic regions was measured by their extent of overlap with phastCons elements conserved across 30 primates (https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/phastConsElements30way.txt.gz).
Results
A majority of PCOS-risk SNVs lie in regulatory regions and are enriched in neuroendocrine cell types
To conduct a comprehensive analysis of the regulatory features of PCOS-risk loci, we identified 85 SNVs from 12 studies (Table S1 [27]). Briefly, these consist of 11 GWAS conducted for PCOS, with cohort sizes ranging from a few hundreds to over ten thousand participants across Han Chinese [42-44], European [45-48], African American and European American [49], Estonian and Finnish [50], and Korean populations [51, 52]. Diagnoses were based on NIH or Rotterdam criteria, ICD codes, or self-reported data. In addition, a twelfth study performed a cross-trait meta-analysis between PCOS and glycemic traits and identified novel loci shared between type 2 diabetes, body mass index (BMI), and PCOS, including the FTO locus [53].
This set was expanded to 1472 SNVs, referred to herein as pcosSNVs, by including variants in LD with GWAS-identified variants across all superpopulations (African, American, South and East Asian, and European) with an r2 ≥ 0.8, obtained from SNiPA [54]. After including variants in LD, adjacent variants within a 100 kb window were merged into 50 loci, and named based on the nearest gene and/or previously associated genes in the literature (Fig. 1A, Table S3 [27]). Most of these variants are located within intronic regions, with the highest density observed in the DENND1A and AOPEP loci (Fig. 1A). To explore the biological significance of pcosSNVs, we performed functional enrichment analysis using GREAT (Materials and methods). GREAT explicitly models the regulatory domain architecture of the genome, linking distal regulatory elements to putative target genes based on biologically informed association rules [39]. In addition to the expected hormone-mediated signaling pathway, the PCOS-risk loci showed significant enrichment for processes related to cell development, differentiation, and apoptosis (hypergeometric, P = 10−3; Fig. 1B, Table S4 [27]), suggesting their involvement in the molecular mechanisms governing the 5 developmental stages of folliculogenesis in oocytes and GCs [55]. These loci were also enriched for immune-related processes, including leukocyte differentiation and hematopoiesis, involving SNVs near IRF1/IL5, ZBTB16, ERBB2, MED1, and other genes (Table S4 [27]), underscoring the emerging recognition of immune dysregulation in PCOS [3].
Figure 1.
(A) PCOS susceptibility loci and their distribution in noncoding regions, (B) functional enrichment of susceptibility loci, (C) fold enrichment of PCOS eVariants in GTEx cell types (reported eVariants with enrichment binomial, P < .01).
We then assigned target genes to the pcosSNVs using the ENCODE-rE2G model (https://github.com/EngreitzLab/ENCODE_rE2G) that predicts enhancer–gene interactions across various cell types by integrating enhancer activity, 3D chromatin interactions, and DNase I hypersensitivity maps. Applying a threshold corresponding to 70% recall, derived from the model's CRISPR-validated benchmark dataset, we identified 128 target genes linked to pcosSNVs across 323 cell types (Table S5 [27]), many of which have not been previously associated with PCOS. The most pleiotropic loci, each connected to >10 target genes, were located near IRF1 and ERBB3, underscoring extensive distal enhancer regulation (Table S5 [27]). Notably, rs2706385 and rs2706386, situated upstream of the IRF1 promoter, were predicted to regulate RAD50 across multiple cell types, including those representing developmental lineages (Table S5 [27]). This enhancer–gene pair was further supported by ChIA-PET–derived chromatin interactions observed in the WTC11 developmental cell line (ENCODE accession ENCSR543HLV). Mouse knockout studies have demonstrated a critical role for RAD50 in follicle development [56]. In addition, functional enrichment analysis of the 128 target genes revealed significant overrepresentation of genes in this locus within the JAK–STAT signaling pathway, including IL3/4/5/9 and CSF2, which are known STAT pathway components (Table S4 [27]). Interestingly, JAK–STAT signaling has been implicated in the primordial-to-follicular phase transition during folliculogenesis [55]. Together, these findings suggest that certain pcosSNVs reside within pleiotropic enhancers that may contribute to PCOS pathophysiology through distinct molecular mechanisms.
We next investigated the functional impact of pcosSNVs through their association with changes in gene expression characterized by the GTEx consortium [57]. Among 1472 pcosSNVs, 832 colocalized with cis-eQTLs, termed eVariants (Table S6 [27]). Two-thirds of these eVariants are shared across multiple cell types (we use the term cell types herein synonymously with tissues defined in GTEx), meaning they influence gene expression in multiple cell types. In contrast, the remaining eVariants, such as those in the FSHR and ERBB4 loci, are cell type specific (Fig. S1a [27]). The number of target genes scaled almost linearly with the number of affected cell types (Spearman correlation: 0.71, Fig. S1b [27]), suggesting that shared eQTLs may contribute to distinct cell type–specific regulatory networks by regulating different genes in different cell types. Notably, eVariants in the locus of the GATA4 gene were linked to 39 genes across 49 cell types (Fig. S1b, Table S7 [27]). Four percent of eVariants targeted long noncoding RNAs (Table S6 [27]), suggesting their role as potential trans-eQTLs [58]. Meanwhile, the remaining 640 pcosSNVs that were not identified as GTEx eVariants belong to 18 susceptibility loci, including that of CNTNAP5, ASIC2, and CDH1 (Table S7 [27]), likely due to low target gene expression or restricted function in specific cell states or developmental stages in the dynamic transcription landscape that not captured in bulk tissue analysis [57, 59]. These pcosSNVs may play key spatiotemporal roles in mediating GnRH response in the hypothalamus, pituitary regulation, and follicular phase progression in PCOS [59].
We also examined the enrichment of PCOS eVariants across GTEx cell types. Compared with a randomly selected set of 832 eQTLs (excluding PCOS eVariants), PCOS eVariants were found to be enriched in brain cell types, endocrine cell types such as the ovary and adrenal gland, as well as target cell types of circulating reproductive hormones like the breast and prostate (binomial, P < .001, Fig. 1C). Since many PCOS eVariants are shared across cell types, they likely influence regulatory networks by mediating interactions between cell type–specific and ubiquitous TFs, thereby invoking cell type–specific regulatory pathways that contribute to distinct phenotypic outcomes in different cellular contexts [60]. For example, PPARG, a susceptibility locus, plays a central role in regulating lipid metabolism, adipocyte differentiation, gluconeogenesis, folliculogenesis, and steroidogenesis through multiple TFs that are critical regulators of these biological processes (www.kegg.jp/pathway/map=map03320) [61, 62]. This suggests that causal variants within this locus may contribute to distinct phenotypic outcomes through pleiotropic effects across multiple cell types. Given these complexities, investigating disease-causal variants and their cell type–specific effects on downstream signaling pathways may help elucidate the mechanisms underlying the diverse phenotypic manifestations of PCOS.
A DL model for prioritizing regulatory variants across causal cell types
The challenge of characterizing the cell type–specific impact of thousands of susceptibility variants in complex traits and diseases has led to the development of computational approaches for inferring regulatory variants. Deep learning models have been particularly effective in predicting variant effects on gene regulation by integrating diverse cell type–specific epigenomic features [63, 64]. We previously developed a convolutional neural network–based DL model, TREDNet, which can predict the effects of noncoding variants on enhancer activity [31]. This 2-phase DL model was demonstrated as a successful approach in regulatory variants of type 2 diabetes and autism [31, 65]. Building on its success, we applied TREDNet to investigate the regulatory mechanisms underlying PCOS.
We adapted TREDNet to predict allele-specific enhancer activity of pcosSNVs across causal cell types implicated in PCOS. Putative cell type–specific enhancers used to train the model were defined as accessible chromatin regions (identified by DNase-seq or ATAC-seq) that overlapped with H3K27ac peaks. Accordingly, we sought to identify all relevant cell types with available epigenomic data. The primary pathogenic cell types implicated in PCOS include theca and GCs, ovary, adrenal gland, liver, pancreas, hypothalamus, and pituitary, which collectively regulate folliculogenesis through signaling pathways that modulate androgen, estrogen, SHBG, and insulin levels [3]. In addition, adipocytes play a key role in driving insulin resistance, another hallmark of PCOS [3].
Epigenomic datasets for ovary, adrenal gland, liver, adipocyte, and pancreas were obtained from ENCODE. In the absence of primary human GC data, H3K27ac peaks from KGN cells were used as a proxy. To capture broader regulatory features, we additionally incorporated brain microvascular endothelial cells (BMECs), mammary epithelial cells, and human umbilical vein endothelial cells (HUVECs) as GC proxies based on (1) similarity of their H3K27ac profiles to those of KGN cells (Jaccard similarity index), (2) availability of chromatin accessibility profiles, and (3) availability of chromatin contact maps for downstream target gene analyses (Table S8 [27]). Similarly, in the absence of pituitary and hypothalamic epigenomic profiles, we included the fetal brain. We also incorporated WTC11, a developmental cell line, to capture regulatory variants active during early development, as fetal development has been implicated in PCOS onset later in life [66]. Cell type–specific epigenomic data used to trait the models are provided in Table S2 [27]. The DL models demonstrated robust performance, achieving an area under the receiver operating characteristic curve ranging from 0.9 to 0.98 and an area under the precision-recall curve ranging from 0.54 to 0.84 across the 11 cell types (Fig. 2A).
Figure 2.
(A) ROC and PRC curves of 11 cell type–specific TREDNet models. (B) A comparison of fold change (alternate/reference allele) in TREDNet scores between all variants and those exhibiting significant change in enhancer activity in MPRA, using Mann–Whitney test. (C) Fraction of SNVs overlapping with phastCons elements conserved across 30 primates. (D) TFs enriched among reSNVs compared with control SNVs (hypergeometric, P < .01). ns: P > .05, *P ≤ .05, **P ≤ .01, ***P ≤ .001.
To evaluate TREDNet's ability to predict regulatory variants, we examined the correlation between TREDNet-predicted differences in allele-specific enhancer activity and those determined through a massively parallel reporter assay (MPRA) in the developing human brain and stem cell–derived adipocytes using our model trained on fetal brains and adipocytes [67, 68]. We compared allele-specific TREDNet scores across all assayed alleles and those showing significant changes in reporter activity (Materials and methods) and observed a significantly higher fold change in TREDNet scores for the latter group (Mann–Whitney test, P = .001 for adipocytes and 10−5 for fetal brain, Fig. 2B). These findings underscore the effectiveness of TREDNet in predicting regulatory variants across diverse cell types.
Next, we evaluated the impact of pcosSNVs within active regulatory regions across the 11 selected cell types. Across summary statistics from 10 studies, the reported effect and other alleles for tag SNVs corresponded to either the GRCh38 reference or alternate allele (Table S1 [27]). However, 6 tag SNVs from 2 studies [51, 52] were exceptions. The allele frequencies reported for these variants were highly inconsistent with those observed in reference populations, and the designated effect and other alleles did not correspond to either the major or minor alleles in any population. To avoid introducing ambiguity into our DL model, we excluded these 6 pcosSNVs and their LD proxies (41 variants in total) from the scoring analysis. Consequently, the remaining 1431 pcosSNVs were evaluated for allele-specific effects, with both reference and alternate alleles scored across all cell type–specific models (Materials and methods).
For pcosSNVs located in active regulatory regions marked by H3K4me1, H3K27ac, or DNase/ATAC-seq, we classified strengthening alleles as those with scores below the threshold (determined at 10% FDR) for the reference allele and above for the alternate allele, while damaging alleles followed the opposite criterion. Applying this approach, we identified 302 pcosSNVs with predicted allelic differences in activity, termed reSNVs (Table S9 [27]). These reSNVs were significantly enriched in conserved elements compared with both pcosSNVs and 13 million common SNVs from the 1000 Genomes catalog (binomial, P = .0002 and 10−9, respectively, Fig. 2C).
To assess the regulatory impact of reSNVs, we evaluated the enrichment of host TFBSs. For each TF, enrichment was calculated by comparing the density of its binding motifs overlapping reSNVs against a background set of control SNVs. The background consisted of a specifically curated set of 71 000 nonoverlapping SNVs within a 100 kb window centered on pcosSNVs (Materials and methods). This localized background enabled us to investigate the regulation of target genes within the context of PCOS-specific biological processes, particularly for ubiquitously expressed genes. Several TFs showed significant enrichment at reSNV loci, including FOXA1, a pioneer factor in estrogen and androgen signaling [69]; LHX4, involved in pituitary development [70]; NHLH2, associated with GnRH signaling [71]; WT1, a regulator of GC proliferation [72]; PLAG1, involved in oocyte reserve maintenance [73]; and NR4A1, which regulates steroidogenesis [74] (hypergeometric, P < 10−2, Fig. 2D). Notably, we observed a 2.7-fold enrichment of PPARG-binding sites, a significant finding given PPARG's role as a known susceptibility locus for PCOS. We also found enrichment of TFs associated with neuronal signaling, such as TBX21, POU6F1, and NKX6.2. While not previously linked to PCOS, these TFs represent promising candidates for involvement in neuroendocrine regulation. These findings highlight the capacity of our model to identify transcriptional regulators with potential functional roles in the diverse phenotypic manifestations of PCOS.
Next, we evaluated the allele-specific effects of these reSNVs on chromatin accessibility (ATAC-seq and DNase-seq) and TF binding (ChIP-seq) in corresponding cell types available from the UDACHA and ADASTRA databases [75, 76]. Our analysis revealed that, compared with pcosSNVs, reSNVs are over 3-fold more enriched for ASE in ATAC-seq (hypergeometric P = 6e−16) and over 4-fold more enriched for ASE in DNase-seq (P = 2.8 × 10−36), whereas ASE in TF binding was observed exclusively for reSNVs for CTCF, ERG, HNF4A, and RELA (Table S10 [27]). Finally, we examined the overlap of reSNVs with histone acetylation QTLs (haQTLs) and chromatin accessibility QTLs (caQTLs) identified in induced pluripotent stem cells [77]. We observed a nominal enrichment of reSNVs compared with pcosSNVs (1.9-fold for haQTLs, P = 3.4 × 10−8; 1.2-fold for caQTLs, P = 3.1 × 10−4), further highlighting fetal origins in PCOS pathogenesis. Collectively, these findings suggest that PCOS-associated regulatory variants may influence disease risk through multiple mechanisms, such as modulating TFBSs, histone modifications, and chromatin accessibility, and that integrative analyses will enhance the identification of putative disease-causal variants.
Our DL-based approach identified 20% of the pcosSNVs as putative regulatory variants, effectively narrowing down disease risk variants in loci, such as ERBB4, LHCGR, MC4R, etc. (Fig. S2 [27]). For example, among 51 variants in the ERBB4 locus, we identified enhancer-damaging G-to-A substitution at rs79230362 in HUVECs (Table S9 [27]). This variant is in LD with the GWAS SNV rs113168128 and is predicted to disrupt the binding site of the ELK1:SREBF2 motif complex (Fig. S3 [27]). Given the established role of SREBF2 in steroidogenesis [78] and the highly cell type–specific expression of ELK1 in GCs (Fig. S3 [27]), this variant likely affects ERBB4 expression, a key regulator of the oocyte microenvironment during folliculogenesis [21]. Similarly, in the MC4R locus, we predicted an enhancer damaging T-to-C substitution at rs17773430 in WTC11 cells (Table S9 [27]). MC4R is a critical component of the melanocortin pathway and a well-established obesity susceptibility gene that is also linked to PCOS. Knockout studies of MC4R in mice result in both obesity and infertility phenotypes, highlighting shared regulatory architectures underlying these conditions [79]. rs17773430 is predicted to disrupt the binding site of TBX2/TBXT, TFs responsible for the development of hypothalamus–pituitary axis [80]. Given that reduced MC4R levels are associated with lower LH levels [81], this variant likely contributes to PCOS etiology through its impact on HPG axis.
On the other hand, multiple reSNVs were identified in the locus of DENND1A, FTO, and MAPRE1 (Fig. S4 [27]). The significant overlap of reSNVs in DENND1A and MAPRE1 locus with active regulatory regions in the fetal brain and WTC11 suggests their potential role in disease manifestation during early development. Notably, a reSNV in the MAPRE1 locus, rs187178, was validated as a regulatory variant in the fetal brain and colocalizes with an eQTL for the neighboring gene DNMT3B, which regulates dynamic methylation transitions during folliculogenesis [55]. Interestingly, the reference and alternate alleles at rs187178, along with the risk allele A at its tag SNV rs853854, display marked heterogeneity in their distribution across populations. This variability presents a compelling case study for investigating population-specific PCOS risk arising from the MAPRE1/DNMT3B locus and may provide important insights into differences in disease susceptibility.
In total, we identified 12 reSNVs that have been experimentally validated as enhancer-disrupting variants in adipocytes and fetal brain through MPRA studies (Table S11 [27]) [67, 68]. Of note, epigenomic data from fetal brain used by the DL model failed to capture the regulatory impact of pathogenic variants in the FSHB locus, including rs10835638 and rs11031006, which have been experimentally shown to reduce FSHB expression restricted to the pituitary gland [18]. This underscores the necessity of incorporating additional, relevant cell types for a more comprehensive study of the regulatory landscape of PCOS, when experimental characterization of chromatin marks becomes available for these cell types.
The diverse phenotypic comorbidities associated with PCOS typically manifest as either metabolic or reproductive abnormalities. We leveraged the reSNVs to investigate potential differences between these subtypes. We first categorized tag SNVs into metabolic or reproductive groups based on subtype features, namely, elevated AMH, LH, and SHBG levels for reproductive, and high DHEAS (proxy for hyperandrogenism), BMI, insulin, or glucose levels for metabolic subtype, as described in prior studies [45, 82]. Based on the sub-phenotype associations of GWAS susceptibility variants, we classified 15 loci as metabolic and 16 as reproductive (Table S12 [27]). To elucidate the regulatory mechanisms distinguishing the 2 subtypes, we first examined the differential enrichment of TFBSs in each subtype (Fig. S5A [27], Materials and methods). The reproductive reSNVs showed significant enrichment for TFBSs of immune-related TFs such as IRF3 and STAT1/2, which have been implicated in follicle maturation [55] as well as KLF15, which regulates androgen production and whose expression is modulated by insulin [83] (Fisher's exact, P < .05). Notably, binding sites of the immune TFs IRF3/8 and STAT1/2 were found to colocalize exclusively with reSNVs of the reproductive subtype. In contrast, reSNVs of the metabolic subtype were enriched for binding sites of ZNF384, whose role in PCOS or metabolic dysregulation has not yet been characterized.
Furthermore, the 2 subtypes are enriched for variants from distinct GWAS trait groups (Fig. S5B [27], Materials and methods). Specifically, the metabolic subtype was enriched for traits correlating with phenotype such as BMI, cholesterol levels, SHBG levels, diabetes, etc. (Fisher's exact, P < 10−3). On the other hand, reproductive subtype reSNVs exhibit enrichment of traits such as vitamin D levels, blood pressure, etc. (P < 10−3). While this analysis is preliminary and limited to a small set of reSNVs, further GWAS studies in larger cohorts that identify more loci will be useful in capturing more regulatory insights into the 2 subtypes which is important step toward precision therapies.
reSNVs have a stronger impact on expression of target genes and are more likely to exert pleiotropic effects across multiple cell types
We further explored the functional impact of reSNVs by examining their association with gene expression using eQTL data from GTEx. Analyzing 206 reSNVs that colocalize with eQTLs (Table S9 [27]), hereafter referred to as reQTLs, we observed that reQTLs are active in a significantly higher proportion of GTEx cell types compared with otherSNVs (ie, SNVs not prioritized by TREDNet in causal cell types; Fig. 3A, Mann–Whitney, P = 1.39 × 10−3). Within these enriched cell types, reQTLs were associated with significantly greater changes in gene expression relative to otherSNVs, as measured by normalized effect sizes from GTEx (Fig. 3B, Mann–Whitney, P = 1.16 × 10−11).
Figure 3.
Regulatory impact of reSNVs prioritized by TREDNet. (A) Comparison of the number of GTEx cell types impacted by reSNVs vs other SNVs. (B) Absolute normalized effect size of reSNVs vs other SNVs. (C) Genomic overlap of an intronic reSNV (rs1784692) at the ZBTB16 locus with epigenomic features from cell types where it exhibits predicted allele-specific activity. The affected androgen receptor (AR) motif is shown below. (D) Functional enrichment of biological processes in the ZBTB16 protein interaction network (STRING database). The plot shows the top 10 terms (FDR < .001), with enrichment strength calculated as log10 (observed/expected). ns: P > .05, *P ≤ .05, **P ≤ .01, ***P ≤ .001.
The most significant downregulatory effect was observed at the RAB5B–SUOX–RPS26 locus, where reQTLs were linked to reduced expression of RPS26 in multiple cell types, including the ovary, hypothalamus, and liver. RPS26 is a ubiquitously expressed ribosomal protein whose downregulation in the ovaries impairs oocyte growth and premature ovarian failure [84], a hallmark of PCOS. Notably, the T-to-C substitution (C being the minor allele) at the reQTL rs3741499 within this locus, which is in LD with the European tag SNV rs705702, where the minor allele G confers risk (Table S1 [27]), is associated with a strong negative effect on RPS26 expression (Fig. S6 [27]). This substitution is predicted to function as an enhancer damaging variant in 7 cell types likely through disrupted binding of PROX1 (Table S9 [27]), a PCOS-risk gene involved in lymphatic vessel formation around oocytes [85], suggesting a plausible mechanism for impaired oocyte maturation.
We further investigated pleiotropy by examining the ZBTB16 locus. Although no eQTLs colocalize with reSNVs in this locus, they were predicted to exert strong differential enhancer activity across multiple cell types (Table S9 [27]). Notably, rs1784692, located in an intron of ZBTB16, demonstrated the highest predicted enhancer-strengthening effect in the pancreas, adipocytes, WTC11, and liver (Table S9 [27]). The T→C polymorphism, in which C represents the protective allele (Table S1), is predicted to enhance AR binding (Fig. 3C). This suggests that the risk allele T may impair downstream androgen signaling and potentially link this locus to cell type–specific androgen-response functions, such as insulin secretion in the pancreas [86], and regulation of adipocyte differentiation [87]. ZBTB16 was identified as a genome-wide significant locus in European cohorts [88] and has been demonstrated to be an androgen responsive gene [89]. While a mechanistic link between the molecular function of ZBTB16 and PCOS disease etiology has not been established yet, its protein interaction network is enriched for components of androgen signaling (Fig. 3D). Collectively, these observations suggest that ZBTB16 may represent a susceptibility gene involved in androgen-mediated regulatory pathways disrupted in PCOS, potentially through allele-specific effects of rs1784692 on AR binding as one of the underlying regulatory mechanisms.
In conclusion, reSNVs prioritized by TREDNet offer valuable insights into disease-associated regulatory mechanisms and highlight the potential role of risk genes hitherto uncharacterized in PCOS etiology.
The FTO locus demonstrates disruption of an androgen-mediated network pleiotropy
The regulatory locus within the intronic region of FTO is a well-established susceptibility locus for obesity and type 2 diabetes. Although it has not been identified as a genome-wide susceptibility locus for PCOS, independent case–control studies have reported a significant association of this locus with patients with PCOS [90]. It was included in our analysis based on its significant shared association with PCOS and type 2 diabetes identified through genome-wide cross-trait analyses [53]. The prevailing consensus is that the effects of the FTO locus on PCOS are mediated through BMI [12].
Interestingly, obesity-associated noncoding regions within this locus have been experimentally shown to form long-range chromatin interactions that enable distal regulation of IRX3 [91, 92]. Subsequently, this locus was hypothesized to have broader pleiotropic effects across different cell types due to variations in the expression of IRX3, which may influence multiple biological pathways [88]. Interestingly, the PCOS susceptibility variants localize in the genomic region regulating IRX3 (chr16:53731249-54975288) [93], suggesting that IRX3 is likely the target gene of the PCOS susceptibility locus as well (Fig. S7 [27]). To address the association of this locus with PCOS, we focused on a previous study that identified IRX3 and another gene in this susceptibility locus, IRX5, as key regulators of folliculogenesis in GCs [90]. These findings suggest that IRX3/IRX5 may contribute to the manifestation of PCOS symptoms in women with obesity or type 2 diabetes. Accordingly, we investigated the regulatory features of this locus in PCOS-relevant cell types. Furthermore, leveraging chromatin contact maps from granulosa-like cells, BMECs, and HUVECs, we examined how reSNVs within this locus might contribute to both impaired folliculogenesis and metabolic dysfunction through dysregulated IRX3/IRX5 activity across multiple cell types.
We identified 12 reSNVs exhibiting significant fold changes across 9 cell types (Fig. S8 [27]). Among these, 3 variants, rs1421085, rs11642015, and rs9940128 have been validated by MPRA studies to show allelic changes in enhancer activity in mouse preadipocyte and/or neuronal cell lines [92], further supporting the predictive accuracy of TREDNet in identifying regulatory variants. Interestingly, we predicted that T-to-C substitution at rs1421085 additionally strengthens enhancer activity in BMEC, a granulosa-like cell line, by potentially disrupting the binding site of ONECUT2 (Fig. 4A, Table S9 [27]), a suppressor of androgen receptor signaling which was recently identified as a marker of follicle growth [94, 95].
Figure 4.
reSNVs in FTO locus exhibiting significant fold change in TREDNet-predicted enhancer activity. (A) Overlap of reSNVs with active regulatory regions of pathogenic cell types. (B) Intact Hi-C map of chromatin interactions from reSNVs in FTO locus in HUVEC (doi:10.17989/ENCSR788FB).
In addition to rs1421085, G-to-A substitution at rs9940128 was predicted as enhancer-damaging variant in BMECs and HUVECs and was found to localize within regions forming chromatin contacts with the promoters of IRX3 and IRX5 in HUVECs (Fig. 4B). Another variant within this locus, rs7193144, was predicted to exhibit nominal allele-specific regulatory differences in KGN cells, granulosa-like cell lines, BMECs, and HUVECs (Table S9 [27]). Notably, this variant was also predicted to modulate the binding site of the AR and to display allele-specific regulatory activity in pancreas and adipocyte, making it a compelling candidate for mediating the pleiotropic effects of IRX3/IRX5 dysregulation across these cell types through disrupted androgen signaling.
The allelic effects of variants in this locus may also impact IRX3/IRX5-mediated functions in hypothalamic neurons (Fig. S7 [27]), as demonstrated in mice [92]. In this regard, we predicted rs3751812 as a regulatory variant in fetal brain which is located within binding sites of T-box family TFs (Fig. 4A). Members of the T-box family play a critical role in the commitment of hypothalamus and pituitary lineages from neuronal precursors [80, 96]. However, given the short temporal window of expression of these TFs in neuronal development, inferring causal mechanisms remains challenging. This highlights the necessity of using epigenomic datasets across different developmental time points for a comprehensive investigation.
We also identified another reSNV within this locus, rs8050136, where C-to-A substitution is predicted as an enhancer-strengthening variant in the pancreas and liver (Fig. 4A). This variant colocalizes with an eQTL for IRX3 in the pancreas, where it regulates the conversion of β cells to ε cells, directly linking it to type 2 diabetes [97]. Notably, rs8050136 is also predicted to disrupt the binding site of ONECUT1, a TF essential for pancreatic development (Fig. 4A). However, no allelic differences were predicted in KGN or related granulosa-like cell types, suggesting that this variant is unlikely to have direct consequences on PCOS pathophysiology.
Discussion
Our limited understanding of the regulatory landscape of PCOS stems from its complex genetic architecture, which presents with heterogeneous phenotypes across different cell types, individuals, and populations. This complexity has necessitated evolving diagnostic criteria as our knowledge of the underlying pathophysiology expands. Several key questions remain unresolved, including the genetic and molecular origins of reproductive and metabolic dysfunction, the role of androgens and other hormones in regulatory pathways, and the inheritance patterns affecting both males and females. To date, GWAS have identified 50 genomic loci associated with PCOS across diverse populations. The functional significance of genes such as ERBB4, PPARG, DENND1A, and IRX3 has been well established [20, 21, 62, 90]. In the case of PPARG, this understanding has even led to the development of agonists such as rosiglitazone, which improve follicular function [98], underscoring how advances in elucidating the molecular mechanisms by which risk loci and genes contribute to PCOS pathophysiology can inform the development of effective, targeted therapies. Continued progress in whole-genome and exome sequencing continues to reveal novel loci, further expanding the landscape of PCOS genetics and emphasizing the need for deeper exploration of the core regulatory mechanisms that drive its pathophysiology.
Leveraging extensive genetic and epigenetic data, we sought to identify key mechanisms linking PCOS susceptibility loci to disease etiology using TREDNet. TREDNet can be applied to identify allele-specific regulatory activity of variants located within cis-regulatory elements across diverse cell types. This framework can be readily extended to detect regulatory disruptions underlying other complex traits and disorders, as demonstrated in previous studies on type 2 diabetes and autism [31, 65]. By training TREDNet on cis-regulatory elements specific to pathogenic cell types relevant to a given trait, it can be used to systematically characterize the cell type–specific regulatory impact of variants in noncoding regions. We trained the model on putative enhancers from 11 cell types of neuroendocrine, developmental, epithelial and endothelial origins, and found that reSNVs prioritized by our model are significantly enriched for TFBSs associated with folliculogenesis, including those of WT1, NHLH2, and FOXA1. Notably, reSNVs also show enrichment for the binding sites of PPARG, which itself is a PCOS-risk gene. Thirteen reSNVs are predicted to overlap PPARG-binding sites (Table S9 [27]). These findings highlight the importance of dissecting the underlying gene regulatory networks, where disruptions at specific nodes (genes), edges (regulatory interactions), or both, as exemplified by PPARG and its target sites, may generate diverse molecular outcomes contributing to the heterogeneity of PCOS severity and phenotypic presentation. Our results also highlight the need for further characterization of TFs, especially those involved in neuronal signaling, such as TBX21, LHX4, etc., along with their interactions with hormonal receptors, to gain deeper insights into cis- and trans-regulatory mechanisms disrupted in PCOS pathophysiology.
The established role of the HPG axis [99] in regulating circulating reproductive hormone levels highlights the hypothalamus, pituitary, adrenal gland, and ovarian granulosa and theca cells as key mediators of PCOS pathophysiology. However, PCOS manifestations extend beyond the neuroendocrine system, impacting peripheral tissues such as the pancreas, adipocytes, liver, and heart [3, 12]. Past studies have demonstrated neuronal dysregulation of androgen signaling in the development of reproductive and metabolic phenotypes of PCOS [3]. Given the widespread expression of AR, disruptions in circulating androgen levels impact peripheral cell types expressing AR and may have broader consequences, such as insulin resistance and altered adipogenesis, independent of classical reproductive symptoms like oligomenorrhea [100]. Moreover, regulatory variants often exert extensive pleiotropic effects [101, 102]. Therefore, PCOS-associated causal variants may contribute to disease pathophysiology through both variant pleiotropy and disrupted neuroendocrine signaling. In this context, we discuss our findings using predicted regulatory variants in the FTO locus within this framework, highlighting potential cell type–specific mechanisms by which altered androgen signaling and regulatory variant pleiotropy may drive PCOS pathogenesis.
The identification of multiple reSNVs at several susceptibility loci is suggestive of regulatory mechanisms wherein 1 gene can be regulated by multiple enhancers, according to which, the expression of a target gene can be influenced by >1 variant [92, 103]. For example, 2 distinct variants in the FSHB locus, rs10835638 and rs11031006, alter FSHB expression, ultimately contributing to infertility [18]. These variants may occur in different individuals, leading to distinct, individual-specific phenotypes depending on the cell type–specific networks they modulate in a pleiotropic manner. In addition, the potential pleiotropic impact of disease-associated variants in nonpathogenic cell types is often buffered by robust regulatory networks, preventing overt disease manifestation. This suggests that assessing polygenic risk scores may be necessary to fully understand their contribution to disease susceptibility. Given that few predicted regulatory variants in the FTO locus have high-risk allele frequencies (>0.4), which far exceed the prevalence of PCOS, it is evident that the disease phenotypes emerge from the cumulative effects of multiple dysregulated genes and pathways. Further investigations into polygenic interactions and gene-environment influences will be essential to expand our understanding of the complexity of PCOS.
The susceptibility loci of PCOS implicate genes such as ZBTB16, AOPEP, THADA, and CCDC91 (Fig. 1A), which are ubiquitously expressed, raising the question of how disease-specific variants selectively affect certain cell types. At the molecular level, follicle progression involves signaling pathways, like TGFβ, Hippo, Wnt, and mTOR, which regulate fundamental processes such as cell proliferation, differentiation, and apoptosis [7]. Why, then, do complex diseases manifest in only a subset of susceptible cell types? In the case of ZBTB16, we predicted that rs1784692 strengthens enhancer activity by increasing the binding affinity of AR, thereby implicating ZBTB16 in downstream pathways of androgen signaling. This suggests that perturbations in disease-relevant TF interactions, specific to causal cell types, disrupt molecular networks in a way that surpasses compensatory mechanisms in other cell types, thereby making certain cells uniquely vulnerable. Consequently, TFs act as primary responders to disease-associated alterations, preceding the genes they regulate, and may therefore serve as more informative markers of disease susceptibility than the genes themselves.
Although our framework advances variant-to-function interpretation to better understand the regulatory landscape of PCOS, it remains constrained by the availability of comprehensive multi-omic datasets required to dissect converging disease mechanisms. First, our analyses used variants in LD blocks without stratifying by superpopulation, which may overlook ancestry-specific phenotypes that exhibit substantial variation in clinical presentation [104] and are important for understanding how genetic background influences disease risk and therapeutic response. Addressing this limitation will require large-scale association studies in diverse cohorts to enable robust, population-informed genetic associations. Second, our analysis of regulatory variants was limited to those occurring within putative enhancers. However, variants can impact gene regulation beyond enhancer activity. Variants located in silencers or insulators may disrupt distal enhancer interactions, as observed with IRX3, emphasizing the need for Hi-C data from pathogenic cell types to resolve target genes not identifiable through eQTL analysis. Moreover, trans-regulatory effects of risk variants, whether through TFs encoded by susceptibility loci (PROX1, PPARG, and IRF1) or noncoding RNAs that contribute to epigenomic regulation of gene expression, also warrant systematic investigation. Third, disease susceptibility may arise at TFBSs independent of sequence variation. Gene expression can be modulated by epigenetic modifications at regulatory elements that occur independently of sequence variation, such as hypo- or hypermethylation, histone acetylation, or other chromatin alterations, etc. [105]. Fourth, large-scale genomic changes, such as copy number variants or insertions/deletions (indels), may result in the gain or loss of TFBSs, potentially leading to ectopic gene expression [106]. Therefore, both epigenetic markers of disease susceptibility and genetic changes beyond SNVs should be considered when analyzing the molecular basis of disease risk. Lastly, a more comprehensive understanding of gene regulatory networks requires integrating epigenomic datasets from key pathogenic cell types—such as the pituitary gland, granulosa, and theca cells, and potentially, the hypothalamus—across follicular phases to map the spatiotemporal regulation of genes involved in steroidogenesis and folliculogenesis. Despite the hypothalamus's central role in the HPG axis, regulatory networks mediated by GnRH signaling remain poorly understood. Disruptions in this pathway may explain the involvement of risk genes such as CNTNAP5, ASIC2, and CUX2, potentially linking PCOS to prevalent mental health disorders [3]. Incorporating these datasets can enable the development of more inclusive deep learning models capable of predicting regulatory activity changes beyond enhancer disruptions, offering deeper insights into PCOS pathophysiology.
Conclusions
For a multifactorial disorder like PCOS, we are still far from accurately predicting an individual's disease severity or susceptibility based on their genetic variants. Disease severity likely reflects the cumulative effect of multiple risk alleles carried by an individual. Accurate estimation of this genetic burden requires identification of causal PCOS variants from among GWAS signals, determination of their tissue-specific effect sizes, and delineation of the mechanisms through which comorbidities and cardiometabolic factors modulate the phenotype. Together, these efforts are essential for developing robust polygenic risk scores that integrate the effects of both common and rare variants for clinical risk prediction.
A central challenge in this endeavor lies in estimating the effect sizes of noncoding variants, which constitute over 90% of disease-associated loci. These variants act through context-dependent regulatory mechanisms that remain poorly understood without systematic genetic and epigenetic characterization in relevant cell types. Nevertheless, integrative analyses such as the one presented here, along with emerging computational and experimental frameworks for regulatory annotation and variant-to-function mapping, are progressively illuminating these mechanisms. Scaling such approaches to large GWAS and whole-genome or exome sequencing datasets will be crucial for translating genetic discoveries into individualized disease risk assessment and therapeutic insight.
Acknowledgments
This work utilized the computational resources of the NIH HPC Biowulf cluster. This research was supported by the Division of Intramural Research of the NIH. The contributions of the NIH author(s) are considered Works of the US Government. The findings and conclusions presented in this paper are those of the author(s) and do not necessarily reflect the views of the NIH or the US Department of Health and Human Services.
Abbreviations
- AMH
anti-Müllerian hormone
- BMECs
brain microvascular endothelial cells
- BMI
body mass index
- caQTLs
chromatin accessibility
- DL
deep learning
- ENCODE
Encyclopedia of DNA Elements
- FSH
follicle-stimulating hormone
- GCs
granulosa cells
- GnRH
gonadotropin-release hormone
- GREAT
Genomic Regions Enrichment of Annotations Tool
- GWASs
genome-wide association studies
- haQTLs
histone acetylation QTLs
- HPG axis
hypothalamic–pituitary–gonadal axis
- HUVECs
human umbilical vein endothelial cells
- LD
linkage disequilibrium
- LH
luteinizing hormone
- MPRA
massively parallel reporter assay
- NIH
National Institutes of Health
- PCOS
polycystic ovary syndrome
- SHBG
sex hormone–binding globulin
- SNVs
single nucleotide variants
- TFs
transcription factors
- TFBS
transcription factor binding site
Contributor Information
Jaya Srivastava, Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, USA.
Ivan Ovcharenko, Email: ovcharen@nih.gov, Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, USA.
Funding
This research was funded by National Institutes of Health (grant number 1-ZIA-LM200881-12 [to I.O.]).
Author contributions
J.S. performed the computational analysis, analyzed the data, and prepared figures and tables. I.O. supervised the study. J.S. and I.O. wrote the manuscript.
Disclosures
The authors have nothing to disclose.
Data availability
Please see the section “Data and tools” for the details. The DL models for 11 cell types trained in the study are available on Zenodo [27]. Original data generated and analyzed during this study are included in this published article or in the data repositories listed in Ref. [27].
References
- 1. Joham AE, Norman RJ, Stener-Victorin E, et al. Polycystic ovary syndrome. Lancet Diabetes Endocrinol. 2022;10(9):668‐680. [DOI] [PubMed] [Google Scholar]
- 2. Palomba S, Piltonen TT, Giudice LC. Endometrial function in women with polycystic ovary syndrome: a comprehensive review. Hum Reprod Update. 2021;27(3):584‐618. [DOI] [PubMed] [Google Scholar]
- 3. Stener-Victorin E, Teede H, Norman RJ, et al. Polycystic ovary syndrome. Nat Rev Dis Primers. 2024;10(1):1‐23. [DOI] [PubMed] [Google Scholar]
- 4. Zaidi M, Yuen T, Kim SM. Pituitary crosstalk with bone, adipose tissue and brain. Nat Rev Endocrinol. 2023;19(12):708‐721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. McCartney CR, Marshall JC. Polycystic ovary syndrome. N Engl J Med. 2016;375(1):54‐64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Abedel-Majed MA, Romereim SM, Davis JS, Cupp AS. Perturbations in lineage specification of granulosa and theca cells may alter corpus luteum formation and function. Front Endocrinol (Lausanne). 2019;10:832. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Wang K, Li Y. Signaling pathways and targeted therapeutic strategies for polycystic ovary syndrome. Front Endocrinol (Lausanne). 2023;14:1191759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Houston EJ, Templeman NM. Reappraising the relationship between hyperinsulinemia and insulin resistance in PCOS. J Endocrinol. 2025;265(2):e240269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Liao B, Qiao J, Pang Y. Central regulation of PCOS: abnormal neuronal-reproductive-metabolic circuits in PCOS pathophysiology. Front Endocrinol (Lausanne). 2021;12:667422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Zhang CH, Liu XY, Wang J. Essential role of granulosa cell glucose and lipid metabolism on oocytes and the potential metabolic imbalance in polycystic ovary syndrome. Int J Mol Sci. 2023;24(22):16247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Vink JM, Sadrzadeh S, Lambalk CB, Boomsma DI. Heritability of polycystic ovary syndrome in a Dutch twin-family study. J Clin Endocrinol Metab. 2006;91(6):2100‐2104. [DOI] [PubMed] [Google Scholar]
- 12. Dapas M, Dunaif A. Deconstructing a syndrome: genomic insights into PCOS causal mechanisms and classification. Endocr Rev. 2022;43(6):927‐965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Zheng Z, Ma Y, Shi Y, Xu Y. Associations between kisspeptin hormone level and its genetic polymorphisms with polycystic ovary syndrome. Int J Gynaecol Obstet. 2026;172(1):383‐395. [DOI] [PubMed] [Google Scholar]
- 14. Caburet S, Fruchter RB, Legois B, Fellous M, Shalev S, Veitia RA. A homozygous mutation of GNRHR in a familial case diagnosed with polycystic ovary syndrome. Eur J Endocrinol. 2017;176(5):K9‐K14. [DOI] [PubMed] [Google Scholar]
- 15. Gorsic LK, Kosova G, Werstein B, et al. Pathogenic anti-Müllerian hormone variants in polycystic ovary syndrome. J Clin Endocrinol Metab. 2017;102(8):2862‐2872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Gorsic LK, Dapas M, Legro RS, Hayes MG, Urbanek M. Functional genetic variation in the anti-Müllerian hormone pathway in women with polycystic ovary syndrome. J Clin Endocrinol Metab. 2019;104(7):2855‐2874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Du J, Zhang W, Guo L, et al. Two FSHR variants, haplotypes and meta-analysis in Chinese women with premature ovarian failure and polycystic ovary syndrome. Mol Genet Metab. 2010;100(3):292‐295. [DOI] [PubMed] [Google Scholar]
- 18. Bohaczuk SC, Thackray VG, Shen J, Skowronska-Krawczyk D, Mellon PL. FSHB transcription is regulated by a novel 5′ distal enhancer with a fertility-associated single nucleotide polymorphism. Endocrinology. 2021;162(1):bqaa181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Sankaranarayanan L, Brewer KJ, Morrow S, et al. Gene regulatory activity associated with polycystic ovary syndrome revealed DENND1A-dependent testosterone production. Nat Commun. 2025;16(1):7697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. McAllister JM, Modi B, Miller BA, et al. Overexpression of a DENND1A isoform produces a polycystic ovary syndrome theca phenotype. Proc Natl Acad Sci U S A. 2014;111(15):E1519‐E1527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Veikkolainen V, Ali N, Doroszko M, et al. Erbb4 regulates the oocyte microenvironment during folliculogenesis. Hum Mol Genet. 2020;29(17):2813‐2830. [DOI] [PubMed] [Google Scholar]
- 22. LaVoie HA. The GATA-keepers of ovarian development and folliculogenesis. Biol Reprod. 2014;91(2):38. [DOI] [PubMed] [Google Scholar]
- 23. Waterbury JS, Teves ME, Gaynor A, et al. The PCOS GWAS candidate gene ZNF217 influences theca cell expression of DENND1A.V2, CYP17A1, and androgen production. J Endocr Soc. 2022;6(7):bvac078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Li M, Zhao H, Zhao SG, et al. The HMGA2-IMP2 pathway promotes granulosa cell proliferation in polycystic ovary syndrome. J Clin Endocrinol Metab. 2019;104(4):1049‐1059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Xi Y, Shen W, Ma L, et al. HMGA2 promotes adipogenesis by activating C/EBPβ-mediated expression of PPARγ. Biochem Biophys Res Commun. 2016;472(4):617‐623. [DOI] [PubMed] [Google Scholar]
- 26. Boix CA, James BT, Park YP, Meuleman W, Kellis M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature. 2021;590(7845):300‐307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Srivastava J. 2025. Regulatory risk loci link disrupted androgen response to pathophysiology of polycystic ovary syndrome. Version v2. Zenodo. 10.5281/zenodo.17654414 [DOI]
- 28. Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27(7):1017‐1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Rauluseviciute I, Riudavets-Puig R, Blanc-Mathieu R, et al. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2024;52(D1):D174‐D182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Vorontsov IE, Eliseeva IA, Zinkevich A, et al. HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors. Nucleic Acids Res. 2024;52(D1):D154‐D163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Hudaiberdiev S, Taylor DL, Song W, et al. Modeling islet enhancers using deep learning identifies candidate causal variants at loci associated with T2D and glycemic traits. Proc Natl Acad Sci U S A. 2023;120(35):e2206612120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Luo Y, Hitz BC, Gabdank I, et al. New developments on the encyclopedia of DNA elements (ENCODE) data portal. Nucleic Acids Res. 2020;48(D1):D882‐D889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Amemiya HM, Kundaje A, Boyle AP. The ENCODE blacklist: identification of problematic regions of the genome. Sci Rep. 2019;9(1):9354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559‐575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Hazell Pickering S, Abdelhalim M, Collas P, Briand N. Alternative isoform expression of key thermogenic genes in human beige adipocytes. Front Endocrinol (Lausanne). 2024;15:1395750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Weis-Banke SE, Lerdrup M, Kleine-Kohlbrecher D, et al. Mutant FOXL2C134W hijacks SMAD4 and SMAD2/3 to drive adult granulosa cell tumors. Cancer Res. 2020;80(17):3466‐3479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010;26(17):2204‐2207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Zhang Y, Liu T, Meyer CA, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. McLean CY, Bristor D, Hiller M, et al. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol. 2010;28(5):495‐501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Raudvere U, Kolberg L, Kuzmin I, et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 2019;47(W1):W191‐W198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Szklarczyk D, Kirsch R, Koutrouli M, et al. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023;51(D1):D638‐D646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Chen ZJ, Zhao H, He L, et al. Genome-wide association study identifies susceptibility loci for polycystic ovary syndrome on chromosome 2p16.3, 2p21 and 9q33.3. Nat Genet. 2011;43(1):55‐59. [DOI] [PubMed] [Google Scholar]
- 43. Shi Y, Zhao H, Shi Y, et al. Genome-wide association study identifies eight new risk loci for polycystic ovary syndrome. Nat Genet. 2012;44(9):1020‐1025. [DOI] [PubMed] [Google Scholar]
- 44. Yan J, Tian Y, Gao X, et al. A genome-wide association study identifies FSHR rs2300441 associated with follicle-stimulating hormone levels. Clin Genet. 2020;97(6):869‐877. [DOI] [PubMed] [Google Scholar]
- 45. Dapas M, Lin FTJ, Nadkarni GN, et al. Distinct subtypes of polycystic ovary syndrome with novel genetic associations: an unsupervised, phenotypic clustering analysis. PLoS Med. 2020;17(6):e1003132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Hayes MG, Urbanek M, Ehrmann DA, et al. Genome-wide association of polycystic ovary syndrome implicates alterations in gonadotropin secretion in European ancestry populations. Nat Commun. 2015;6(1):7502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Day FR, Hinds DA, Tung JY, et al. Causal mechanisms and balancing selection inferred from genetic associations with polycystic ovary syndrome. Nat Commun. 2015;6(1):8464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Day F, Karaderi T, Jones MR, et al. Large-scale genome-wide meta-analysis of polycystic ovary syndrome suggests shared genetic architecture for different diagnosis criteria. PLoS Genet. 2018;14(12):e1007813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Zhang Y, Ho K, Keaton JM, et al. A genome-wide association study of polycystic ovary syndrome identified from electronic health records. Am J Obstet Gynecol. 2020;223(4):559.e1‐559.e21. [Google Scholar]
- 50. Tyrmi JS, Arffman RK, Pujol-Gualdo N, et al. Leveraging Northern European population history: novel low-frequency variants for polycystic ovary syndrome. Hum Reprod. 2022;37(2):352‐365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Kim SH, Liu M, Jin HS, Park S. High genetic risk scores of ASIC2, MACROD2, CHRM3, and C2orf83 genetic variants associated with polycystic ovary syndrome impair insulin sensitivity and interact with energy intake in Korean women. Gynecol Obstet Invest. 2019;84(3):225‐236. [DOI] [PubMed] [Google Scholar]
- 52. Lee H, Oh JY, Sung YA, et al. Genome-wide association study identified new susceptibility loci for polycystic ovary syndrome. Hum Reprod. 2015;30(3):723‐731. [DOI] [PubMed] [Google Scholar]
- 53. Liu Q, Tang B, Zhu Z, et al. A genome-wide cross-trait analysis identifies shared loci and causal relationships of type 2 diabetes and glycaemic traits with polycystic ovary syndrome. Diabetologia. 2022;65(9):1483‐1494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Arnold M, Raffler J, Pfeufer A, Suhre K, Kastenmüller G. SNiPA: an interactive, genetic variant-centered annotation browser. Bioinformatics. 2015;31(8):1334‐1336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Zhang Y, Yan Z, Qin Q, et al. Transcriptome landscape of human folliculogenesis reveals oocyte and granulosa cell interactions. Mol Cell. 2018;72(6):1021‐1034.e4. [DOI] [PubMed] [Google Scholar]
- 56. Inagaki A, Roset R, Petrini JHJ. Functions of the MRE11 complex in the development and maintenance of oocytes. Chromosoma. 2016;125(1):151‐162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. GTEx Consortium, Laboratory; Data Analysis & Coordinating Center (LDACC)—Analysis Working Group; Statistical Methods groups—Analysis Working Group, et al. Genetic effects on gene expression across human tissues. Nature. 2017;550(7675):204‐213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Bai Y, Dai X, Harrison AP, Chen M. RNA regulatory networks in animals and plants: a long noncoding RNA perspective. Brief Funct Genomics. 2015;14(2):91‐101. [DOI] [PubMed] [Google Scholar]
- 59. Fairfax BP, Makino S, Radhakrishnan J, et al. Genetics of gene expression in primary immune cells identifies cell type-specific master regulators and roles of HLA alleles. Nat Genet. 2012;44(5):502‐510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Sonawane AR, Platig J, Fagny M, et al. Understanding tissue-specific gene regulation. Cell Rep. 2017;21(4):1077‐1088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Han L, Shen WJ, Bittner S, Kraemer FB, Azhar S. PPARs: regulators of metabolism and as therapeutic targets in cardiovascular disease. Part II: PPAR-β/δ and PPAR-γ. Future Cardiol. 2017;13(3):279‐296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Kim J, Bagchi IC, Bagchi MK. Control of ovulation in mice by progesterone receptor-regulated gene networks. Mol Hum Reprod. 2009;15(12):821‐828. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Zhou J, Theesfeld CL, Yao K, Chen KM, Wong AK, Troyanskaya OG. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat Genet. 2018;50(8):1171‐1179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Avsec Ž, Agarwal V, Visentin D, et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat Methods. 2021;18(10):1196‐1203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Li S, Hannenhalli S, Ovcharenko I. De novo human brain enhancers created by single-nucleotide mutations. Sci Adv. 2023;9(7):eadd2911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Hartanti MD, Rosario R, Hummitzsch K, et al. Could perturbed fetal development of the ovary contribute to the development of polycystic ovary syndrome in later life? PLoS One. 2020;15(2):e0229351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Joslin AC, Sobreira DR, Hansen GT, et al. A functional genomics pipeline identifies pleiotropy and cross-tissue effects within obesity-associated GWAS loci. Nat Commun. 2021;12(1):5253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Deng C, Whalen S, Steyert M, et al. Massively parallel characterization of regulatory elements in the developing human cortex. Science. 2024;384(6698):eadh0559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Lupien M, Eeckhoute J, Meyer CA, et al. Foxa1 translates epigenetic signatures into enhancer-driven lineage-specific transcription. Cell. 2008;132(6):958‐970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Mullen RD, Colvin SC, Hunter CS, et al. Roles of the LHX3 and LHX4 LIM-homeodomain factors in pituitary development. Mol Cell Endocrinol. 2007;265-266:190‐195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Topaloglu AK, Simsek E, Kocher MA, et al. Inactivating NHLH2 variants cause idiopathic hypogonadotropic hypogonadism and obesity in humans. Hum Genet. 2022;141(2):295‐304. [DOI] [PubMed] [Google Scholar]
- 72. Cen C, Chen M, Zhou J, et al. Inactivation of Wt1 causes pre-granulosa cell to steroidogenic cell transformation and defect of ovary development†. Biol Reprod. 2020;103(1):60‐69. [DOI] [PubMed] [Google Scholar]
- 73. Li X, Wu X, Zhang H, et al. Analysis of single-cell RNA sequencing in human oocytes with diminished ovarian reserve uncovers mitochondrial dysregulation and translation deficiency. Reprod Biol Endocrinol. 2024;22(1):146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Zhang Y, Federation AJ, Kim S, et al. Targeting nuclear receptor NR4A1-dependent adipocyte progenitor quiescence promotes metabolic adaptation to obesity. J Clin Invest. 2018;128(11):4898‐4911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Abramov S, Boytsov A, Bykova D, et al. Landscape of allele-specific transcription factor binding in the human genome. Nat Commun. 2021;12(1):2751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Buyan A, Meshcheryakov G, Safronov V, et al. Statistical framework for calling allelic imbalance in high-throughput sequencing data. Nat Commun. 2025;16(1):1739. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Arthur TD, Nguyen JP, Henson BA, et al. Multiomic QTL mapping reveals phenotypic complexity of GWAS loci and prioritizes putative causal variants. Cell Genom. 2025;5(3):100775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Nakanishi T, Tanaka R, Tonai S, et al. LH induces de novo cholesterol biosynthesis via SREBP activation in granulosa cells during ovulation in female mice. Endocrinology. 2021;162(11):bqab166. [DOI] [PubMed] [Google Scholar]
- 79. Talbi R, Stincic TL, Ferrari K, et al. POMC neurons control fertility through differential signaling of MC4R in kisspeptin neurons. eLife. 2025;13:RP100722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Pontecorvi M, Goding CR, Richardson WD, Kessaris N. Expression of Tbx2 and Tbx3 in the developing hypothalamic-pituitary axis. Gene Expr Patterns. 2008;8(6):411‐417. [DOI] [PubMed] [Google Scholar]
- 81. Villa PA, Ruggiero-Ruff RE, Jamieson BB, Campbell RE, Coss D. Obesity alters POMC and kisspeptin neuron cross talk leading to reduced luteinizing hormone in male mice. J Neurosci. 2024;44(28):e0222242024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82. van der Ham K, Moolhuijsen LME, Brewer K, et al. Clustering identifies subtypes with different phenotypic characteristics in women with polycystic ovary syndrome. J Clin Endocrinol Metab. 2024;109(12):3096‐3107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83. Du X, Rosenfield RL, Qin K. KLF15 is a transcriptional regulator of the human 17β-hydroxysteroid dehydrogenase type 5 gene. A potential link between regulation of testosterone production and fat stores in women. J Clin Endocrinol Metab. 2009;94(7):2594‐2601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84. Liu XM, Yan MQ, Ji SY, et al. Loss of oocyte Rps26 in mice arrests oocyte growth and causes premature ovarian failure. Cell Death Dis. 2018;9(12):1‐15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85. Svingen T, François M, Wilhelm D, Koopman P. Three-dimensional imaging of Prox1-EGFP transgenic mouse gonads reveals divergent modes of lymphangiogenesis in the testis and ovary. PLoS One. 2012;7(12):e52620. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86. Karagiannopoulos A, Westholm E, Ofori JK, Cowan E, Esguerra JLS, Eliasson L. Glucocorticoid-mediated induction of ZBTB16 affects insulin secretion in human islets and EndoC-βH1 β-cells. iScience. 2023;26(5):106555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87. Sun L, Ji S, Xie X, et al. Deciphering the interaction between Twist1 and PPARγ during adipocyte differentiation. Cell Death Dis. 2023;14(11):764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88. Brynedal B, Choi J, Raj T, et al. Large-scale trans-eQTLs affect hundreds of transcripts and mediate patterns of transcriptional co-regulation. Am J Hum Genet. 2017;100(4):581‐591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89. Tidwell A, Zhu J, Battiola T, Welt CK. Phenotypes associated with polycystic ovary syndrome risk variants. J Endocr Soc. 2024;9(1):bvae219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90. Fu A, Koth ML, Brown RM, et al. IRX3 and IRX5 collaborate during ovary development and follicle formation to establish responsive granulosa cells in the adult mouse†. Biol Reprod. 2020;103(3):620‐629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91. Smemo S, Tena JJ, Kim KH, et al. Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature. 2014;507(7492):371‐375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92. Sobreira DR, Joslin AC, Zhang Q, et al. Extensive pleiotropism and allelic heterogeneity mediate metabolic effects of IRX3 and IRX5. Science. 2021;372(6546):1085‐1091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93. Claussnitzer M, Dankel SN, Kim KH, et al. FTO obesity variant circuitry and adipocyte browning in humans. N Engl J Med. 2015;373(10):895‐907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94. Rotinen M, You S, Yang J, et al. ONECUT2 is a targetable master regulator of lethal prostate cancer that suppresses the androgen axis. Nat Med. 2018;24(12):1887‐1898. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95. Zhao ZH, Meng TG, Gao F, et al. Spatiotemporal and single-cell atlases to dissect cell lineage differentiation and regional specific cell types in mouse ovary morphogenesis. Commun Biol. 2025;8(1):849. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96. Lamolet B, Pulichino AM, Lamonerie T, et al. A pituitary cell-restricted T box factor, Tpit, activates POMC transcription in cooperation with Pitx homeoproteins. Cell. 2001;104(6):849‐859. [DOI] [PubMed] [Google Scholar]
- 97. Ragvin A, Moro E, Fredman D, et al. Long-range gene regulation links genomic type 2 diabetes and obesity risk regions to HHEX, SOX4, and IRX3. Proc Natl Acad Sci U S A. 2010;107(2):775‐780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98. Selva DM, Hammond GL. Peroxisome-proliferator receptor gamma represses hepatic sex hormone-binding globulin expression. Endocrinology. 2009;150(5):2183‐2189. [DOI] [PubMed] [Google Scholar]
- 99. Dai R, Sun Y. Altered GnRH neuron-glia networks close to interface of polycystic ovary syndrome: molecular mechanism and clinical perspectives. Life Sci. 2025;361:123318. [DOI] [PubMed] [Google Scholar]
- 100. Sanchez-Garrido MA, Tena-Sempere M. Metabolic dysfunction in polycystic ovary syndrome: pathogenic role of androgen excess and potential therapeutic strategies. Mol Metab. 2020;35:100937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101. Sakaue S, Kanai M, Tanigawa Y, et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet. 2021;53(10):1415‐1424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102. Pickrell JK, Berisa T, Liu JZ, Ségurel L, Tung JY, Hinds DA. Detection and interpretation of shared genetic influences on 42 human traits. Nat Genet. 2016;48(7):709‐717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103. Corradin O, Saiakhova A, Akhtar-Zaidi B, et al. Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome Res. 2014;24(1):1‐13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104. Sendur SN, Yildiz BO. Influence of ethnicity on different aspects of polycystic ovary syndrome: a systematic review. Reprod Biomed Online. 2021;42(4):799‐818. [DOI] [PubMed] [Google Scholar]
- 105. Skinner MK. Epigenetic biomarkers for disease susceptibility and preventative medicine. Cell Metab. 2024;36(2):263‐277. [DOI] [PubMed] [Google Scholar]
- 106. Klopocki E, Mundlos S. Copy-number variations, noncoding sequences, and human phenotypes. Annu Rev Genomics Hum Genet. 2011;12(1):53‐72. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Srivastava J. 2025. Regulatory risk loci link disrupted androgen response to pathophysiology of polycystic ovary syndrome. Version v2. Zenodo. 10.5281/zenodo.17654414 [DOI]
Data Availability Statement
Please see the section “Data and tools” for the details. The DL models for 11 cell types trained in the study are available on Zenodo [27]. Original data generated and analyzed during this study are included in this published article or in the data repositories listed in Ref. [27].




