Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Feb 1.
Published in final edited form as: Hypertension. 2023 Oct 30;81(2):264–272. doi: 10.1161/HYPERTENSIONAHA.123.21053

Preeclampsia Prediction Using Machine Learning and Polygenic Risk Scores from Clinical and Genetic Risk Factors in Early and Late Pregnancy

Vesela P Kovacheva 1,*, Braden W Eberhard 1,*, Raphael Y Cohen 1,2, Matthew Maher 3, Richa Saxena 3,4, Kathryn J Gray 3,5
PMCID: PMC10842389  NIHMSID: NIHMS1937858  PMID: 37901968

Abstract

Background

Preeclampsia, a pregnancy-specific condition associated with new-onset hypertension after 20 weeks gestation, is a leading cause of maternal and neonatal morbidity and mortality. Predictive tools to understand which individuals are most at risk are needed.

Methods

We identified a cohort of N=1,125 pregnant individuals who delivered between 05/2015–05/2022 at Mass General Brigham hospitals with available electronic health record (EHR) data and linked genetic data. Using clinical EHR data and systolic blood pressure polygenic risk scores (SBP PRS) derived from a large genome-wide association study, we developed machine learning (xgboost) and logistic regression models to predict preeclampsia risk.

Results

Pregnant individuals with an SBP PRS in the top quartile had higher blood pressures throughout pregnancy compared to patients within the lowest quartile SBP PRS. In the first trimester, the most predictive model was xgboost, with an area under the curve (AUC) of 0.74. In late pregnancy, with data obtained up to the delivery admission, the best performing model was xgboost using clinical variables, which achieved an AUC of 0.91. Adding the SBP PRS to the models did not improve the performance significantly based on De Long’s test comparing the AUC of models with and without the polygenic score.

Conclusions

Integrating clinical factors into predictive models can inform personalized preeclampsia risk and achieve higher predictive power than the current practice. In the future, personalized tools can be implemented to identify high-risk patients for preventative therapies and timely intervention to improve adverse maternal and neonatal outcomes.

Keywords: preeclampsia, polygenic risk scores, machine learning, pregnancy

Graphical Abstract

graphic file with name nihms-1937858-f0003.jpg

INTRODUCTION

Preeclampsia, defined as new onset of elevated blood pressure after 20 weeks gestation, is a leading cause of maternal and neonatal morbidity and mortality worldwide.1 Preeclampsia affects 2–8% of all pregnancies2 and contributes to 26% of maternal deaths worldwide and 15% of preterm births3. In the US, preeclampsia incidence is increasing and results in significant healthcare utilization.2 Maternal complications include end-organ damage, eclamptic seizures, and death. Fetal/neonatal complications include growth restriction and iatrogenic preterm birth. Timely diagnosis and treatment can reduce the risk for severe maternal and neonatal morbidity by 72–89%.4,5

Current clinical practice for patients at risk for preeclampsia is focused on close surveillance, early detection, and prompt management.6 Pregnant patients’ risk for preeclampsia is assessed at the first prenatal visit and, in those at high risk, prophylaxis with low-dose aspirin and close blood pressure monitoring is recommended. Currently, high-risk individuals are identified based on clinical factors, including pre-existing hypertension, obesity, pregestational diabetes, and prior preeclampsia; however, this approach fails to identify 46–60% of pregnancies that develop preeclampsia.7,8 Improved tools to understand each individual’s personalized disease risk have the potential to markedly improve pregnancy care and clinical outcomes. Machine learning methods, based on implicitly learning relationships in large datasets, allow for precise outcome prognostication and may improve preeclampsia prediction. Recent machine learning studies on hypertensive disorders of pregnancy9 and preeclampsia8,10 risk demonstrate the potential of these methods to generate highly accurate predictions.11 However, models published to date have low predictive power in early pregnancy; in addition, a significant number of patients – especially nulliparous patients without clinical risk factors – develop preeclampsia and, thus, fail to be identified by current models.

While preeclampsia has substantial heritability based both on maternal and fetal factors, 12 the specific genetic factors contributing to risk are just beginning to be identified, as detailed in recent genome-wide association studies (GWAS).1315 Importantly, in the largest published maternal preeclampsia GWAS, the top hits were all loci previously implicated in essential hypertension risk. In addition, several studies have demonstrated that the overall genetic architecture of maternal preeclampsia overlaps with the genetics of both systolic and diastolic blood pressure, as well as body mass index (BMI).1316 Given that essential hypertension is a known clinical risk factor for preeclampsia,17,18 and genetic predisposition to hypertension is associated with increased preeclampsia risk, we hypothesized that a machine learning model incorporating both clinical risk factors and a hypertension genetic risk score (i.e., polygenic risk score, PRS, generated from GWAS summary statistics) could improve preeclampsia risk prediction for pregnant individuals. As PRS are associated with disease risk independent of other clinical and environmental risk factors, all factors can be combined additively in a single model.19

In this study, we utilize a rich database derived from the electronic health record (EHR) of patients who have had a pregnancy in our healthcare system linked with genetic data from the biobank. We investigate the relative importance of different clinical risk factors and polygenic risk scores in the first trimester, as well as late pregnancy (prior to the delivery admission), to predict preeclampsia.

METHODS

The summary data that support the findings of this study are available from the corresponding author upon reasonable request and based on institutional guidelines.

Population

This study was approved by the Mass General Brigham Institutional Review Board, protocol # 2020P002859, with a waiver of patient consent. Pregnant patients were selected based on documentation of pregnancy greater than 20 weeks gestation and associated billing codes for cesarean or vaginal delivery. We included all available patients from May 2015 to May 2022 with genetic data available in the Mass General Brigham Biobank and analyzed each pregnancy independently. These dates were chosen as May 2015 is when our institution implemented electronic health records across all outpatient offices and inpatient sites. All data (including sociodemographic, clinical diagnoses, laboratory, vital signs, and genotyping) was obtained and analyzed using our machine learning platform, which extracts, transforms, and harmonizes data from multiple sources.20 Preeclampsia diagnosis was based on the established American College of Gynecologists and Obstetricians guidelines.6 All preeclampsia cases (N =87 ) were further validated by an experienced clinician.

Genotyping and Imputation

Genome-wide genotyping for each patient was obtained from the Mass General Brigham Biobank,21 a prospective biobank launched in 2010 that contains genotyping data, samples, and questionnaires with ongoing links to the EHR. This effort is continuing, with 129,000 patients enrolled and more than 56,000 genotyped. Genotyping was performed using one of two Illumina single nucleotide polymorphism (SNP) Arrays: the MultiEthnic Genotyping Array (containing >1.6M SNPs) or the Global Screening Array (containing > 575K SNPs). Imputation was performed using the TOPMed Imputation Server.

Polygenic Risk Scores

As systolic blood pressure (SBP) is the trait with the highest genetic correlation with maternal preeclampsia genetics and has the highest predictive power for future hypertensive disorders and cardiovascular disease,2224 we created an SBP PRS using the open-source PRS-CS tool.25 PRS-CS computes SNP effect sizes by high-dimensional Bayesian regression using GWAS summary statistics and a linkage disequilibrium reference panel. While maternal preeclampsia GWAS data is available,15 the GWAS remain underpowered, and the polygenic risk scores developed using these results were weaker compared to SBP PRS.23,24 To increase the likelihood of improving prediction using genetic risk factors, we selected the largest blood pressure GWAS meta-analysis to date, with over 1 million individuals,17 and used a European linkage disequilibrium reference panel with 1.1 million variants derived from samples from the 1000 Genomes Project to create a SBP PRS in our study population. We categorized the PRS into quartiles of risk ranging from lowest to highest genetic risk: <25%, 25–49%, 50–75%, and >75%. In all models, we used the continuous numeric polygenic scores. We adjusted all models in which the SBP PRS was used by the first ten principal components of ancestry (PCAs) to account for population structure.

Machine Learning and Logistic Regression Predictive Models

Our machine learning platform, which utilizes Python 3.9 (sci-kit learn library), was used for the development of predictive models.20 We selected established clinical risk factors known to be associated with preeclampsia risk in published studies and guidelines. 68,10,26,27 For the predictive models, we created datasets in which only data obtained up to the selected time point was included to minimize the risk of data leakage. When adding the SBP PRS to the models, we considered the PRS as an independent predictor and adjusted by the first ten PCAs. To predict preeclampsia risk, we developed logistic regression models, which perform well with binary outcomes, and xgboost machine learning models, which have high interpretability and perform well in structured data from EHR.28,29 To assess the discrimination performance of the models, receiver operator characteristic curves (ROC) were developed, and the area under the curve (AUC), accuracy, sensitivity, specificity, and precision were calculated.

Statistical Analyses and Definitions

For the analyses, we used all available EHR data from before conception to up to 6 weeks postpartum. Variables were treated as parametric or non-parametric according to their distribution; continuous parametric variables were expressed as mean ± SD, and nonparametric variables as the median with interquartile range (IQR). Significance was determined using the Student’s t-test and one-way ANOVA for parametric variables, the Kruskal-Wallis rank sum test for non-parametric variables, and Fisher’s exact or Chi-squared test for categorical variables. Pairwise comparisons were performed with Bonferroni adjustment. We used DeLong’s test to compare model performance based on their AUC. A p-value of less than 0.05 was considered significant.

RESULTS

Patient Characteristics

Of 105,673 pregnancies recorded in our healthcare system after May 2015, genotyping data were available for 1,125 pregnancies (828 unique patients), all of whom were included in the study. The patient population was multi-ancestry, with 32.7% of patients self-identifying as non-White. Of the 1,125 pregnancies, 87 had a clinical diagnosis of preeclampsia (7.8%). Patients with preeclampsia were older and more likely to be nulliparous (Table 1). Patients who self-identified as Black or Hispanic were more likely to have hypertension and were more likely to develop preeclampsia (p<0.01). In addition, patients with any hypertensive disorder, including preeclampsia, chronic, or gestational hypertension, were more likely to have a family history of chronic hypertension and preeclampsia compared to normotensive patients (p<0.01). Patients with preeclampsia delivered before the 37th week of gestation more often as compared to patients who were normotensive or who had chronic or gestational hypertension. As expected, patients with preeclampsia had the highest systolic and diastolic blood pressure during pregnancy compared to those with chronic or gestational hypertension, and normotension (p<0.01).

Table 1.

Pregnant patient clinical characteristics.

Clinical variables Preeclampsia (n=87) Chronic and gestational hypertension (n=95) Normotensive (n=943) P-value

Maternal age at delivery, y 32.9 (29.5 – 36.4) 34.4 (30.5 – 37.8) 33.5 (30.5 – 36.3) 0.27
Self-reported race
White 60 (69%) 56 (67%) 691 (72%) 0.8
Black 13 (15%) 14 (17%) 61 (6%) < 0.01#
Asian 3 (3%) 1 (1%) 64 (7%) 0.57
Native American 0 (0%) 0 (0%) 2 (0%) 0.82
Other 12 (14%) 16 (17%) 149 (16%) 0.87
Self-reported ethnicity
Hispanic 3 (3%) 4 (5%) 41 (4%) 0.91
Non-Hispanic 84 (97%) 80 (95%) 913 (96%) 1
Hospital
Tertiary 84 (97%) 82 (98%) 886 (93%) 0.87
Community 3 (3%) 2 (2%) 68 (7%) 0.13
Gravidity 2.0 (1.0 – 3.0) 2.0 (2.0 – 4.0) 2.0 (1.0 – 3.0) 0.15
Parity 1.0 (0.2 – 2.0) 1.0 (0.5 – 2.0) 1.0 (1.0 – 2.0) 0.35
Gestational age at delivery, weeks 37.1 (35.3 – 38.3) 38.3 (37.0 – 39.1) 39.3 (38.6 – 40.1) < 0.01§#
Gestational age at preeclampsia diagnosis, weeks 34.1 (28.1 – 37.4) N/A N/A N/A
Last BMI before pregnancy, kg/m2 29.3 (23.8 – 34.3) 28.7 (24.5 – 34.3) 24.9 (22.0 – 29.2) < 0.01#
BMI at delivery (kg/m2) 32.9 (28.6 – 37.9) 33.1 (30.0 – 37.5) 29.62 (26.3 – 33.2) < 0.01#
Maximal SBP during pregnancy, mmHg 151 (142 – 160) 146 (135 – 154) 128 (120 – 136) < 0.01§#
Maximal DBP during pregnancy, mmHg 93 (87– 99) 91 (84 – 95) 80 (74 – 84) < 0.01§#
Family history of chronic hypertension 39 (45%) 55 (65%) 409 (43%) 0.012
Family history of preeclampsia 1 (1%) 4 (5%) 11 (1%) 0.028

Median (IQR) for continuous variables; n (%) for categorical variables;

P-value based on one-way ANOVA or Chi-squared test for all groups. Subsequent pair-wise comparisons using Bonferroni adjusted P-value <0.02, were significant for the following comparisons:

#

preeclampsia and normotensive

§

preeclampsia and chronic and gestational hypertension

chronic and gestational hypertension and normotensive

Abbreviations: SBP, systolic blood pressure; DBP, diastolic blood pressure, BMI, body mass index

Polygenic Risk Scores and Maternal Blood Pressure

Patients with SBP PRS in the highest quartile had higher maximal systolic and diastolic blood pressure during pregnancy compared to patients with the lowest quartile SBP PRS (Table S1). As SBP PRS was developed using a European population, we performed a sensitivity analysis applying SBP PRS only in the subset of the population that self-identified as White. This sensitivity analysis demonstrated similar findings (Fig. S1) and additionally identified that patients with the highest PRS had a higher incidence of chronic hypertension. Also, patients with any hypertension diagnosis (gestational, chronic, or preeclampsia) had higher SBP PRS compared to normotensive patients (Fig. S2).

Models to Predict Preeclampsia

We sought to predict patient preeclampsia risk using clinical and genetic data (Fig. S3) at two-time points – early in pregnancy, at the first prenatal visit, and late in pregnancy, prior to the delivery admission. If a patient had a preeclampsia diagnosis or delivered before the time point, any data after that event were excluded to minimize data leakage. Because relationships between predictors may not be linear, we developed both logistic regression and nonlinear machine learning models. Subsequently, we investigated if the addition of SBP PRS improved the predictive power of the respective model and evaluated the predictive power of each model using only clinical, only genetic, or both genetic and clinical variables, respectively (Table S2).

In early pregnancy, patients are screened for preeclampsia risk based on the presence of established clinical risk factors. We used these risk factors (Table S3) to develop predictive models. The relationship between all variables is shown in Fig 1A. The clinical logistic regression model, which was developed using only clinical variables available up to 14 weeks gestation, had an AUC of 0.71 (Table S2). We also created a separate genetic logistic regression model using only SBP PRS, adjusted for the PCAs; this model had a weak predictive power, AUC 0.62. Adding the SBP PRS to clinical risk factors in a combined logistic regression model resulted in an improved AUC of 0.72; however, there was no significant difference between the model with only clinical factors and the model with clinical and genetic factors (p = 0.08). As machine learning allows for the incorporation of multiple variables with complex relationships, we developed a clinical xgboost model, which had an AUC of 0.74 (Fig 1B). In this case, adding the SBP PRS did not result in statistically different predictive abilities when comparing the AUC of the xgboost models with and without SBP PRS (p=0.11). The most predictive variables in the model (determined using the Shapley interpretability method) were blood pressure, maternal age, and history of preeclampsia in a prior pregnancy (Fig. 1C). The AUC plots for all models are presented in Fig. S4.

Fig. 1.

Fig. 1.

Correlation matrix (A) and preeclampsia predictive model development (B) in early pregnancy, before 14 weeks gestation. (C) SHapley Additive exPlanations (SHAP) plot of the top variables contributing to the xgboost output in early pregnancy. The horizontal position of each point indicates the impact of the feature on the model’s prediction. Red, high feature value; blue low feature value.

Abbreviations: SBP, systolic blood pressure; DBP, diastolic blood pressure, BMI, body mass index

By the time of delivery, more clinical information becomes available from scheduled outpatient prenatal visits (Table S4), which become more frequent during the 3rd trimester (Fig. 2A). The late pregnancy models were generated using clinical information available prior to (but not after) the admission associated with preeclampsia diagnosis (Table S2). In late pregnancy, the logistic regression model had an AUC of 0.84 and performed better than the logistic early pregnancy model (p = 0.001). Similarly, the machine learning model using clinical risk factors had the best performance of all, AUC 0.91 (Fig. 2B) (p=0.03 compared to the clinical logistic model). At this timepoint, the addition of the SBP PRS to the clinical risk factors in the logistic regression and machine learning models did not significantly improve the performance. In the best performing model, the most predictive variables (determined using the Shapley interpretability method) were blood pressure, body mass index, uric acid level, and past medical history of renal disease (Fig. 2C).

Fig. 2.

Fig. 2.

Correlation matrix (A) and preeclampsia predictive model development (B) in late pregnancy, prior to the delivery admission. (C) SHapley Additive exPlanations (SHAP) plot of the top variables contributing to the xgboost output in late pregnancy. The horizontal position of each point indicates the impact of the feature on the model’s prediction. Red, high feature value; blue low feature value.

Abbreviations: SBP, systolic blood pressure; DBP, diastolic blood pressure, BMI, body mass index.

DISCUSSION

This study examines the ability of machine learning and logistic regression models developed based on electronic health records (EHR) and genetic data to accurately predict preeclampsia. Our results demonstrate that, in a multi-ethnic cohort, systolic blood pressure polygenic risk scores (SBP PRS) correlate with systolic and diastolic blood pressure during pregnancy, as well as with diagnoses of gestational and chronic hypertension. In early and late pregnancy, the models created using clinical risk factors were more predictive than those based on genetic risk factors; the addition of SBP PRS did not improve the predictive power of any of the models. In both early and late pregnancy, machine learning models performed better than logistic regression models; xgboost in late pregnancy was the most predictive.

In line with prior studies,2224 we demonstrate that SBP PRS is associated with clinically measured blood pressure and risk of hypertensive disorders. The heritability of hypertensive disorders based on PRS is well established, and recent studies have demonstrated that these findings also translate to hypertensive disorders of pregnancy.18,23,24 Several recent studies of preeclampsia and blood pressure PRS have shown a strong disease correlation in patients with higher PRS scores.18,23,24 The maximal blood pressure measured during pregnancy was elevated in the group with the top 25% SBP PRS. We were not able to find a significant relationship between SBP PRS and preeclampsia, chronic hypertension, hypertensive disorders of pregnancy, anti-hypertensive medication use, and family history. The global P-value did not reach statistical significance in our cohort, likely due to our small sample size. In addition, SBP is only one risk factor for preeclampsia, and future studies using preeclampsia-specific or multi-trait PRS may improve the predictive capacity of polygenic scores. Also, as the current SBP PRS was generated from a White population, future studies utilizing multi-ethnic PRSs are likely to provide additional insight.

When using only the SBP PRS, both statistical and machine learning models have low predictive power. The relationships between SBP PRSs and outcomes are non-linear; individuals with an SBP PRS in the top 2.5% have a disproportionately higher risk of disease and adverse outcomes than those in the lowest 2.5%.22 We anticipated that machine learning approaches, with their ability to capture complex, nonlinear relationships, would achieve higher predictive power. However, the low overall and inferior performance of the xgboost model is likely due to the small number of variables included in those models and the weak association with the outcome leading to overfitting on the training data.

Efforts to predict preeclampsia in early pregnancy have been longstanding. While multiple other authors used curated datasets,26,30 only clinical variables,8,10,26,27 or a combination of clinical data and biomarkers,30,31 our study adds novelty in exploring the contribution of SBP PRS and adds to the growing evidence that demonstrates improved predictive capacity using machine learning compared to traditional statistical approaches.10,27,31 In early pregnancy, we demonstrate good predictive power of the logistic regression model, which is similar to or better than other studies. 7,8 In order to avoid overfitting of our small sample, we selected only variables known to be associated with a heightened risk of preeclampsia rather than using all available variables from the EHR; the first approach has previously demonstrated better performance. 10 Other models have achieved higher predictive power than ours;32,33 however, those included biomarkers, that are not routinely measured in our hospital network. As PRS may complement clinical predictors, especially in early pregnancy, we explored the role of genetic factors. Once an individual is genotyped, PRS for additional traits can be incorporated into risk prediction strategies that may both improve the prediction of other adverse pregnancy outcomes as well as long-term cardiometabolic disease risk. Two recent studies demonstrated that the addition of SBP PRS or preeclampsia PRS improves the predictive power of a statistical model based on pre-pregnancy co-morbidities; in those studies, vital signs and laboratory values were not available.23,24 We find that the addition of the genetic risk factors to the clinical factors in the early pregnancy logistic model did not significantly increase the predictive power. This may be due to the small sample size –future studies in larger cohorts are needed to further explore the contribution of genetic factors.

The best-performing model in early pregnancy was xgboost and, similarly, others have demonstrated the power of this type of machine learning model in early pregnancy to achieve accurate predictions. 8,10 To incorporate information about the rate of change in variables like blood pressure and BMI in time-series analyses, we included data routinely recorded at scheduled office visits. This approach has been demonstrated to improve predictive power. 8,10 As early pregnancy screening and prevention of preeclampsia can improve maternal and neonatal outcomes by 70–89%,4,5 integrating this type of model into clinical practice has potential for a high impact on patient care.

Similar to early pregnancy, the xgboost model in late pregnancy had higher predictive power than the logistic regression model, demonstrating the superiority of the machine learning approach, as also seen in prior work. 8 The strongest predictors for preeclampsia were blood pressure, history of renal disease, and uric acid levels, which have been shown by others as well.8,10 Integrating this type of model into clinical practice at the time of triage or diagnostic uncertainty will allow for more accurate and personalized prediction, as well as for referral of high-risk patients to maternal-fetal medicine specialists and consideration of delivery at a tertiary care center.34

When combining SBP PRS with clinical factors in the model in late pregnancy, we found no improvement in risk prediction comparable to others’ prior results.22 In addition to considerations in early pregnancy, in late pregnancy, when more clinical information is available, clinical factors are likely to have a greater weight relative to the genetic factors, especially given that current PRS only account for a small portion of the heritability of the pregnancy traits under consideration. In the future, as better genetic tools and larger datasets become available, the incorporation of PRS may have substantially more impact on the predictive power of the models.

Our study has several strengths, including detailed data for all patients from multiple visits with a low level of missingness, recent data collected in the past seven years, when the most current clinical guidelines were implemented, 6 and data from both tertiary and community hospitals within our large healthcare system.

Our study has several limitations. We had a small cohort of patients, and data about aspirin prophylaxis and family history was limited; however, accurate predictions using a dataset of similar size have been previously achieved. 10 Biomarkers have demonstrated utility in preeclampsia risk stratification;35 these tests are not part of our current clinical practice, and future studies may further investigate preeclampsia predictors. To avoid the risk of overfitting, we limited the types of analyses we performed; for example, we were not able to investigate the prediction of specific preeclampsia subtypes, which may have different gene expression patterns.36. In addition, some of the variables were based on billing codes, which may be inaccurate and not reflect disease severity. To overcome this limitation for the preeclampsia phenotype, we developed our own algorithm using the current standard of care and manually validated the cases. We used SBP PRS developed in a White population, which may not have optimally assessed risk in our multiethnic cohort. Our study population is from a large healthcare system in the Northeast and may not be representative of the entire US. The SBP PRS we selected was developed from the largest GWAS to date, which was performed in White patients; currently, large multi-ethnic GWAS are lacking, which is a well-recognized limitation in the field.37 Similarly, we were not able to externally validate this model as most large genetic biobanks lack detailed pregnancy information. We plan for validation in external datasets in the future as more data becomes available.

PERSPECTIVES

Here, we demonstrate that models using clinical data in early and late pregnancy have high predictive power and can accurately predict individual preeclampsia risk. In addition, SBP PRS correlate with risk factors for preeclampsia. Since machine learning models using clinical data available from routine visits had the highest predictive power, these types of models have potential for incorporation into clinical practice as a longitudinal tool within the EHR. In this way, the risk predictions can be made available to the treating physician, who can then advise about prophylactic and therapeutic options, as well as refer for maternal-fetal medicine consultation. As more pregnancy data in multi-ancestry cohorts becomes available, such strategies can be expanded.

Supplementary Material

Supplemental Material

NOVELTY AND RELEVANCE.

What Is New?

  • Preeclampsia is a leading cause of maternal and fetal morbidity and mortality, and predictive tools to understand which pregnant individuals are most at risk are lacking.

  • We investigated the role of both clinical and genetic factors in predicting the preeclampsia risk.

What Is Relevant?

  • The most accurate predictions of preeclampsia in early and late pregnancy were achieved using the machine learning model xgboost using clinical risk factors.

  • The addition of polygenic risk scores for systolic blood pressure to the predictive models for preeclampsia based on clinical risk factors did not improve the performance of those tools; the polygenic risk scores for systolic blood pressure did correlate with patient blood pressures throughout pregnancy.

Clinical/Pathophysiological Implications

  • Integration of predictive models into the clinical care of pregnant individuals can aid the clinician in developing an individualized care plan for each patient with appropriate therapeutic and monitoring strategies.

  • Using machine learning and logistic regression, we demonstrate that the most significant contributing variable to the predictions of preeclampsia in all models was blood pressure.

ACKNOWLEDGEMENT:

We acknowledge the assistance of Jonathan Merrill and John Rigoni with data generation for this study.

SOURCES OF FUNDING:

KJG reports funding from NIH/NHLBI grants K08 HL146963, K08 HL146963–02S1, R01 HL163234, R03HL162756, and a PJP Grant from the Preeclampsia Foundation. VPK reports funding from grants, NIH/NHLBI K08 HL161326–01A1, the Foundation for Anesthesia Education and Research (FAER), Anesthesia Patient Safety Foundation (APSF), Partners Innovation, Brigham Research Institute, Connors Center IGNITE Award, and Brigham Ignite Innovation Award. RS reports funding from NIH/NHLBI grant R01 HL163234 and a PJP Grant from the Preeclampsia Foundation.

NON-STANDARD ABBREVIATIONS AND ACRONYMS

AUC

area under the receiver operator curve

BMI

body mass index

DBP

diastolic blood pressure

GWAS

genome-wide association study

IUGR

intrauterine growth restriction

PCA

principal components of ancestry

PRS

polygenic risk score

SBP

systolic blood pressure

SGA

small for gestational age

SNP

single-nucleotide polymorphism

XGB

xgboost

Footnotes

CONFLICTS OF INTEREST: KJG has served as a consultant to Illumina Inc., Aetion, Roche, and BillionToOne. VPK reports consulting fees from Avania CRO and patent #WO2021119593A1 for control of a therapeutic delivery system assigned to the Mass General Brigham.

REFERENCES

  • 1.Saleem S, McClure EM, Goudar SS, Patel A, Esamai F, Garces A, Chomba E, Althabe F, Moore J, Kodkany B, et al. A prospective study of maternal, fetal and neonatal deaths in low- and middle-income countries. Bull World Health Organ. 2014;92:605–612. doi: 10.2471/BLT.13.127464 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ananth CV, Keyes KM, Wapner RJ. Pre-eclampsia rates in the United States, 1980–2010: age-period-cohort analysis. BMJ. 2013;347:f6564. doi: 10.1136/bmj.f6564 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ying W, Catov JM, Ouyang P. Hypertensive Disorders of Pregnancy and Future Maternal Cardiovascular Risk. J Am Heart Assoc. 2018;7:e009382. doi: 10.1161/JAHA.118.009382 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Gupta M, Greene N, Kilpatrick SJ. Timely treatment of severe maternal hypertension and reduction in severe maternal morbidity. Pregnancy Hypertens. 2018;14:55–58. doi: 10.1016/j.preghy.2018.07.010 [DOI] [PubMed] [Google Scholar]
  • 5.Wright D, Rolnik DL, Syngelaki A, de Paco Matallana C, Machuca M, de Alvarado M, Mastrodima S, Tan MY, Shearing S, Persico N, et al. Aspirin for Evidence-Based Preeclampsia Prevention trial: effect of aspirin on length of stay in the neonatal intensive care unit. Am J Obstet Gynecol. 2018;218:612 e611–612 e616. doi: 10.1016/j.ajog.2018.02.014 [DOI] [PubMed] [Google Scholar]
  • 6.American College of Obstetricians and Gynecologists’ Committee on Practice, Bulletins-Obstetrics. Gestational Hypertension and Preeclampsia: ACOG Practice Bulletin, Number 222. Obstet Gynecol. 2020;135:e237–e260. doi: 10.1097/AOG.0000000000003891 [DOI] [PubMed] [Google Scholar]
  • 7.Wright D, Syngelaki A, Akolekar R, Poon LC, Nicolaides KH. Competing risks model in screening for preeclampsia by maternal characteristics and medical history. Am J Obstet Gynecol. 2015;213:62 e61–62 e10. doi: 10.1016/j.ajog.2015.02.018 [DOI] [PubMed] [Google Scholar]
  • 8.Li S, Wang Z, Vieira LA, Zheutlin AB, Ru B, Schadt E, Wang P, Copperman AB, Stone JL, Gross SJ, et al. Improving preeclampsia risk prediction by modeling pregnancy trajectories from routinely collected electronic medical record data. NPJ Digit Med. 2022;5:68. doi: 10.1038/s41746-022-00612-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Mello G, Parretti E, Ognibene A, Mecacci F, Cioni R, Scarselli G, Messeri G. Prediction of the development of pregnancy-induced hypertensive disorders in high-risk pregnant women by artificial neural networks. Clin Chem Lab Med. 2001;39:801–805. doi: 10.1515/CCLM.2001.132 [DOI] [PubMed] [Google Scholar]
  • 10.Maric I, Tsur A, Aghaeepour N, Montanari A, Stevenson DK, Shaw GM, Winn VD. Early prediction of preeclampsia via machine learning. Am J Obstet Gynecol MFM. 2020;2:100100. doi: 10.1016/j.ajogmf.2020.100100 [DOI] [PubMed] [Google Scholar]
  • 11.Li YX, Shen XP, Yang C, Cao ZZ, Du R, Yu MD, Wang JP, Wang M. Novelelectronic health records applied for prediction of pre-eclampsia: Machine-learning algorithms. Pregnancy Hypertens. 2021;26:102–109. doi: 10.1016/j.preghy.2021.10.006 [DOI] [PubMed] [Google Scholar]
  • 12.Lie RT, Rasmussen S, Brunborg H, Gjessing HK, Lie-Nielsen E, Irgens LM. Fetal and maternal contributions to risk of pre-eclampsia: population based study. BMJ. 1998;316:1343–1347. doi: 10.1136/bmj.316.7141.1343 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Gray KJ, Kovacheva VP, Mirzakhani H, Bjonnes AC, Almoguera B, Wilson ML, Ingles SA, Lockwood CJ, Hakonarson H, McElrath TF, et al. Risk of pre-eclampsia in patients with a maternal genetic predisposition to common medical conditions: a case-control study. BJOG. 2021;128:55–65. doi: 10.1111/1471-0528.16441 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Gray KJ, Kovacheva VP, Mirzakhani H, Bjonnes AC, Almoguera B, DeWan AT, Triche EW, Saftlas AF, Hoh J, Bodian DL, et al. Gene-Centric Analysis of Preeclampsia Identifies Maternal Association at PLEKHG1. Hypertension. 2018;72:408–416. doi: 10.1161/HYPERTENSIONAHA.117.10688 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Steinthorsdottir V, McGinnis R, Williams NO, Stefansdottir L, Thorleifsson G, Shooter S, Fadista J, Sigurdsson JK, Auro KM, Berezina G, et al. Genetic predisposition to hypertension is associated with preeclampsia in European and Central Asian women. Nat Commun. 2020;11:5976. doi: 10.1038/s41467-020-19733-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Honigberg MC, Chaffin M, Aragam K, Bhatt DL, Wood MJ, Sarma AA, Scott NS, Peloso GM, Natarajan P. Genetic Variation in Cardiometabolic Traits and Medication Targets and the Risk of Hypertensive Disorders of Pregnancy. Circulation. 2020;142:711–713. doi: 10.1161/CIRCULATIONAHA.120.047936 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Evangelou E, Warren HR, Mosen-Ansorena D, Mifsud B, Pazoki R, Gao H, Ntritsos G, Dimou N, Cabrera CP, Karaman I, et al. Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits. Nat Genet. 2018;50:1412–1425. doi: 10.1038/s41588-018-0205-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kivioja A, Toivonen E, Tyrmi J, Ruotsalainen S, Ripatti S, Huhtala H, Jaaskelainen T, Heinonen S, Kajantie E, Kere J, et al. Increased Risk of Preeclampsia in Women With a Genetic Predisposition to Elevated Blood Pressure. Hypertension. 2022;79:2008–2015. doi: 10.1161/HYPERTENSIONAHA.122.18996 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Giontella A, Sjogren M, Lotta LA, Overton JD, Baras A, Regeneron Genetics C, Minuz P, Fava C, Melander O. Clinical Evaluation of the Polygenetic Background of Blood Pressure in the Population-Based Setting. Hypertension. 2021;77:169–177. doi: 10.1161/HYPERTENSIONAHA.120.15449 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cohen RY, Kovacheva VP. A Methodology for a Scalable, Collaborative, and Resource-Efficient Platform, MERLIN, to Facilitate Healthcare AI Research. IEEE J Biomed Health Inform. 2023;27:3014–3025. doi: 10.1109/JBHI.2023.3259395 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Gainer VS, Cagan A, Castro VM, Duey S, Ghosh B, Goodson AP, Goryachev S, Metta R, Wang TD, Wattanasin N, et al. The Biobank Portal for Partners Personalized Medicine: A Query Tool for Working with Consented Biobank Samples, Genotypes, and Phenotypes Using i2b2. J Pers Med. 2016;6. doi: 10.3390/jpm6010011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Vaura F, Kauko A, Suvila K, Havulinna AS, Mars N, Salomaa V, FinnGen, Cheng S, Niiranen T. Polygenic Risk Scores Predict Hypertension Onset and Cardiovascular Risk. Hypertension. 2021;77:1119–1127. doi: 10.1161/HYPERTENSIONAHA.120.16471 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Nurkkala J, Kauko A, FinnGen, Laivuori H, Saarela T, Tyrmi JS, Vaura F, Cheng S, Bello NA, Aittokallio J, et al. Associations of polygenic risk scores for preeclampsia and blood pressure with hypertensive disorders of pregnancy. J Hypertens. 2023;41:380–387. doi: 10.1097/HJH.0000000000003336 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Honigberg MC, Truong B, Khan RR, Xiao B, Bhatta L, Vy HMT, Guerrero RF, Schuermans A, Selvaraj MS, Patel AP, et al. Polygenic prediction of preeclampsia and gestational hypertension. Nat Med. 2023;29:1540–1549. doi: 10.1038/s41591-023-02374-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ge T, Chen CY, Ni Y, Feng YA, Smoller JW. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun. 2019;10:1776. doi: 10.1038/s41467-019-09718-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Sandstrom A, Snowden JM, Bottai M, Stephansson O, Wikstrom AK. Routinely collected antenatal data for longitudinal prediction of preeclampsia in nulliparous women: a population-based study. Sci Rep. 2021;11:17973. doi: 10.1038/s41598-021-97465-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Jhee JH, Lee S, Park Y, Lee SE, Kim YA, Kang SW, Kwon JY, Park JT. Prediction model development of late-onset preeclampsia using machine learning-based methods. PLoS One. 2019;14:e0221202. doi: 10.1371/journal.pone.0221202 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wu J, Li Y, Ma Y. Comparison of XGBoost and the Neural Network model on the class-balanced datasets. Paper/Poster presented at: 2021 IEEE 3rd International Conference on Frontiers Technology of Information and Computer (ICFTIC); 12–14 Nov. 2021, 2021; [Google Scholar]
  • 29.SKLearn Documentation. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedGroupKFold.html. 2022. Accessed Dec 6, 2022.
  • 30.North RA, McCowan LM, Dekker GA, Poston L, Chan EH, Stewart AW, Black MA, Taylor RS, Walker JJ, Baker PN, et al. Clinical risk prediction for pre-eclampsia in nulliparous women: development of model in international prospective cohort. BMJ. 2011;342:d1875. doi: 10.1136/bmj.d1875 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Schmidt LJ, Rieger O, Neznansky M, Hackeloer M, Droge LA, Henrich W, Higgins D, Verlohren S. A machine-learning-based algorithm improves prediction of preeclampsia-associated adverse outcomes. Am J Obstet Gynecol. 2022;227:77 e71–77 e30. doi: 10.1016/j.ajog.2022.01.026 [DOI] [PubMed] [Google Scholar]
  • 32.Wright D, Tan MY, O’Gorman N, Poon LC, Syngelaki A, Wright A, Nicolaides KH. Predictive performance of the competing risk model in screening for preeclampsia. Am J Obstet Gynecol. 2019;220:199 e191–199 e113. doi: 10.1016/j.ajog.2018.11.1087 [DOI] [PubMed] [Google Scholar]
  • 33.Park FJ, Leung CH, Poon LC, Williams PF, Rothwell SJ, Hyett JA. Clinical evaluation of a first trimester algorithm predicting the risk of hypertensive disease of pregnancy. Aust N Z J Obstet Gynaecol. 2013;53:532–539. doi: 10.1111/ajo.12126 [DOI] [PubMed] [Google Scholar]
  • 34.Chappell LC, Cluver CA, Kingdom J, Tong S. Pre-eclampsia. Lancet. 2021;398:341–354. doi: 10.1016/S0140-6736(20)32335-7 [DOI] [PubMed] [Google Scholar]
  • 35.Thadhani R, Lemoine E, Rana S, Costantine MM, Calsavara VF, Boggess K, Wylie BJ, Simas TAM, Louis JM, Espinoza J, et al. Circulating Angiogenic Factor Levels in Hypertensive Disorders of Pregnancy. NEJM Evidence. 2022;1:EVIDoa2200161. doi: doi: 10.1056/EVIDoa2200161 [DOI] [PubMed] [Google Scholar]
  • 36.Benton SJ, Leavey K, Grynspan D, Cox BJ, Bainbridge SA. The clinical heterogeneity of preeclampsia is related to both placental gene expression and placental histopathology. Am J Obstet Gynecol. 2018;219:604 e601–604 e625. doi: 10.1016/j.ajog.2018.09.036 [DOI] [PubMed] [Google Scholar]
  • 37.Peterson RE, Kuchenbaecker K, Walters RK, Chen CY, Popejoy AB, Periyasamy S, Lam M, Iyegbe C, Strawbridge RJ, Brick L, et al. Genome-wide Association Studies in Ancestrally Diverse Populations: Opportunities, Methods, Pitfalls, and Recommendations. Cell. 2019;179:589–603. doi: 10.1016/j.cell.2019.08.051 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

RESOURCES