Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

medRxiv logoLink to medRxiv
[Preprint]. 2024 Apr 22:2024.04.22.24306092. [Version 1] doi: 10.1101/2024.04.22.24306092

Enhancing Genetic Association Power in Endometriosis through Unsupervised Clustering of Clinical Subtypes Identified from Electronic Health Records

Lindsay Guare, Leigh Ann Humphrey, Margaret Rush, Meredith Pollie, Yuan Luo, Chunhua Weng, Wei-Qi Wei, Leah Kottyan, Gail Jarvik, Noemie Elhadad; Penn Medicine Biobank; Regeneron Genetics Center, Krina Zondervan, Stacey Missmer, Marijana Vujkovic, Digna Velez-Edwards, Suneeta Senapati, Shefali Setia-Verma
PMCID: PMC11071578  PMID: 38712122

Abstract

Background

Endometriosis affects 10% of reproductive-age women, and yet, it goes undiagnosed for 3.6 years on average after symptoms onset. Despite large GWAS meta-analyses (N > 750,000), only a few dozen causal loci have been identified. We hypothesized that the challenges in identifying causal genes for endometriosis stem from heterogeneity across clinical and biological factors underlying endometriosis diagnosis.

Methods

We extracted known endometriosis risk factors, symptoms, and concomitant conditions from the Penn Medicine Biobank (PMBB) and performed unsupervised spectral clustering on 4,078 women with endometriosis. The 5 clusters were characterized by utilizing additional electronic health record (EHR) variables, such as endometriosis-related comorbidities and confirmed surgical phenotypes. From four EHR-linked genetic datasets, PMBB, eMERGE, AOU, and UKBB, we extracted lead variants and tag variants 39 known endometriosis loci for association testing. We meta-analyzed ancestry-stratified case/control tests for each locus and cluster in addition to a positive control (Total N endometriosis cases = 10,108).

Results

We have designated the five subtype clusters as pain comorbidities, uterine disorders, pregnancy complications, cardiometabolic comorbidities, and EHR-asymptomatic based on enriched features from each group. One locus, RNLS , surpassed the genome-wide significant threshold in the positive control. Thirteen more loci reached a Bonferroni threshold of 1.3 x 10 -3 (0.05 / 39) in the positive control. The cluster-stratified tests yielded more significant associations than the positive control for anywhere from 5 to 15 loci depending on the cluster. Bonferroni significant loci were identified for four out of five clusters, including WNT4 and GREB1 for the uterine disorders cluster, RNLS for the cardiometabolic cluster, FSHB for the pregnancy complications cluster, and SYNE1 and CDKN2B-AS1 for the EHR-asymptomatic cluster. This study enhances our understanding of the clinical presentation patterns of endometriosis subtypes, showcasing the innovative approach employed to investigate this complex disease.

Full Text Availability

The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.


Articles from medRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES