Abstract
Maternal morbidity and mortality continue to rise, and pre-eclampsia is a major driver of this burden1. Yet the ability to assess underlying pathophysiology before clinical presentation to enable identification of pregnancies at risk remains elusive. Here we demonstrate the ability of plasma cell-free RNA (cfRNA) to reveal patterns of normal pregnancy progression and determine the risk of developing pre-eclampsia months before clinical presentation. Our results centre on comprehensive transcriptome data from eight independent prospectively collected cohorts comprising 1,840 racially diverse pregnancies and retrospective analysis of 2,539 banked plasma samples. The pre-eclampsia data include 524 samples (72 cases and 452 non-cases) from two diverse independent cohorts collected 14.5 weeks (s.d., 4.5 weeks) before delivery. We show that cfRNA signatures from a single blood draw can track pregnancy progression at the placental, maternal and fetal levels and can robustly predict pre-eclampsia, with a sensitivity of 75% and a positive predictive value of 32.3% (s.d., 3%), which is superior to the state-of-the-art method2. cfRNA signatures of normal pregnancy progression and pre-eclampsia are independent of clinical factors, such as maternal age, body mass index and race, which cumulatively account for less than 1% of model variance. Further, the cfRNA signature for pre-eclampsia contains gene features linked to biological processes implicated in the underlying pathophysiology of pre-eclampsia.
Subject terms: Gene expression, Predictive markers
Expression signatures from cell-free RNA of pregnant women can be used to reveal normal biology of pregnancy and predict development of pre-eclampsia.
Main
The period from conception to delivery represents the most rapid growth and development in an individual’s life. The ability to support this development requires dramatic and poorly understood alterations in maternal physiology. Research into human pregnancy has clear ethical constraints, and the unique character of human gestation has limited deeper understanding of the physiology and pathophysiology of pregnancy3. Haemochorial placentation is found among many mammalian species; however, in humans, it involves a unique degree of trophoblastic invasion4,5, and because pre-eclampsia occurs predominantly in humans, conventional animal models are of limited value6,7. Pre-eclampsia, a condition marked by maternal endothelial dysfunction and associated new-onset maternal hypertension, complicates up to 1 in 12 pregnancies and is a significant cause of maternal morbidity and higher lifetime risk of cardiovascular disease1.
Here we demonstrate the ability of cfRNA transcripts to establish the normative responses of both maternal and fetal tissues characteristic of normal pregnancy progression. By implication, deviation from normative cfRNA expression patterns should allow the prediction of impending pathology before its presentation. We demonstrate the use of cfRNA to characterize women at risk of pre-eclampsia months before diagnosis. Notably, the cfRNA profiles identify risk solely through molecular mechanisms common to pre-eclampsia and are therefore exclusive of clinical variables such as race, body mass index (BMI), maternal comorbidities and/or obstetrical history.
In this study, we gather the largest and most diverse dataset of maternal transcriptomes to date. Samples were drawn from eight prospectively collected cohorts that provided n = 2,539 plasma samples from n = 1,840 pregnancies for women of multiple ethnicities, nationalities, geographic locations and socioeconomic contexts, while covering a range of gestational ages (Fig. 1a). The broad sociodemographic spectrum of our data (Table 1 and Supplementary Table 1) enabled us to test the applicability of maternal transcriptomes at one gestational time point. A detailed description of each cohort and the methodology is available in the Supplementary Information.
Table 1.
Cohort | A | B | C | D | E | F | G | H |
---|---|---|---|---|---|---|---|---|
Blood draws (n) | 201 | 385 | 69 | 186 | 353 | 793 | 140 | 412 |
Pregnancies (n) | 197 | 219 | 68 | 186 | 352 | 592 | 120 | 106 |
% Asian | 10.7 | 10.0 | 1.5 | 10.2 | 0.0 | 0.5 | 0.0 | 0.0 |
% Black | 18.3 | 4.6 | 0.0 | 25.3 | 45.2 | 48.5 | 100.0 | 0.0 |
% Hispanic | 24.4 | 17.8 | 14.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
% White | 40.1 | 56.6 | 83.8 | 61.3 | 54.8 | 44.3 | 0.0 | 100.0 |
% Unknown or multiracial | 6.6 | 11.0 | 0.0 | 3.2 | 0.0 | 6.8 | 0.0 | 0.0 |
Gestational age at blood draw (weeks) | 12.0–27.9 | 5.6–38.2 | 8.9–28.1 | 12.2–23.8 | 16.9–26.8 | 4.9–40.2 | 8.0–38.7 | 11.4–34.8 |
BMI (kg m−2)* | 28.1 ± 7.4 | 26.9 ± 6.2 | 33.3 ± 9.0 | 26.4 ± 6.2 | 28.6 ± 8.2 | 28.9 ± 7.6 | 24.5 ± 5.1 | 25.4 ± 6.1 |
Maternal age (years)* | 32.4 ± 5.7 | 30.1 ± 5.1 | 29.8 ± 5.2 | 32.7 ± 5.4 | 26.5 ± 5.7 | 24.0 ± 4.5 | 28.8 ± 6.3 | 30.5 ± 4.7 |
*Variation shown as s.d.
Blood draw and pregnancy count, breakdown of ethnicity and race, and clinical factors.
RNA signal independent of clinical factors
Ultrasound-based gestational age has long been used as a surrogate measure of pregnancy progression. Here, we show that a cfRNA signature is as accurate a measure of gestational age while also providing insights into the biology of pregnancy progression. As a first step to develop a machine learning model, we divided our data from all full-term pregnancies without complications into a training set (n = 1,908 samples) and a test set (n = 474 samples), stratified by gestational age so that all age strata were represented proportionally. Before modelling, we standardized the means of gene counts across all cohorts (Methods and Extended Data Fig. 5). A Lasso linear model was fitted to predict gestational age in the training set, with a test set performance of a mean absolute error of 14.7 days (Fig. 1b, Extended Data Fig. 6 and Supplementary Data 1), referencing to first-trimester fetal ultrasound biometry. Overall, the error of our model is equivalent to that of second-trimester ultrasound and superior to that with third-trimester ultrasound8, and could provide an alternative dating procedure for women who start prenatal care later in pregnancy.
Next, we explored whether inclusion of clinical variables altered model performance. By analysis of variance (ANOVA), we showed that the model was driven almost entirely by information from the cfRNA transcripts, with BMI, maternal age and race accounting for less than 1% of variance (Fig. 1c). Rebuilding the gestational age model including maternal race, BMI and age provided no improvement in accuracy (0.07 days, not significant by bootstrap test).
Fetal signatures in maternal circulation
As the cfRNA signatures for gestational age demonstrated a dynamic change in transcripts as pregnancy progresses, we then explored whether transcripts found in the maternal circulation during pregnancy could be linked to their tissue of origin. Specifically, we sought to ascertain whether the molecular status of the placenta, fetal organs and/or maternal tissues (cervix and/or uterus) could be assessed by examining cfRNA profiles. While fetal cells are known to pass into the maternal circulation9,10, individual transcripts from the fetus or fetal cell types are relatively rare in maternal plasma; thus, we investigated these signals by analysing gene sets from Gene Ontology11 or the Molecular Signatures Database12,13. Using longitudinal data from cohort H covering 93 women sampled four times during pregnancy (Supplementary Information), we first confirmed that we could identify pregnancy-related sets such as those for gonadotropin and oestrogen pathways (Extended Data Fig. 1) and that the signal from the gestational age model increased with gestational age as did signal from the placenta (Fig. 2a, b and Methods). We show that hundreds of independently identified gene sets in maternal blood mirror the maternal and fetal physiological changes expected during pregnancy. Specifically, using single-cell RNA-seq data from adult and fetal organs (Supplementary Table 2), we were able to confirm changes in fetal gene sets, including those involved in fetal heart development, in maternal blood (Fig. 2c). Furthermore, the cfRNA profiles reflect expected changes in maternal tissues, such as the uterus and cervix, with progressively increasing expression of collagen and extracellular matrix gene sets14 (Fig. 2d). Extended Data Fig. 2 shows additional examples of fetal gene sets, including those of nephron progenitor cells for which expression become less abundant with gestational age in accordance with a decrease in the nephrogenic zone width15,16 and those in the gastrointestinal tract, where the oesophagus develops early with associated gene expression decreasing later versus small intestine where associated gene expression shows a steady increase17.
To test whether the identified gene sets were uniquely associated with pregnancy progression, we next compared the observed gestational age collection time labels to a set of randomly permuted collection time labels. This comparison verified that all selected gene sets were associated with pregnancy progression (Extended Data Fig. 3). The directional signals could be confirmed in three independent cohorts (n = 351 women) for which longitudinal data were available (Fig. 2e–h). In all cases, the slopes for the gestational age coefficients were distinct from 0 at a 0.05 confidence level. In total, we tested 793 gene sets from single-cell analyses12,13, comprising 384 gene sets from adult and 409 gene sets from fetal tissues. Of these, 129 gene sets (55 fetal) were significantly correlated with gestational age, of which 99 gene sets (40 fetal) showed increased signal and 30 gene sets (15 fetal) showed decreased signal as a function of gestational age at collection in cohort H, and were confirmed in at least two other cohorts with longitudinally sampled individuals (Supplementary Data 2). As changes in these predefined gene sets were only significant in the context of gestational age across at least three cohorts with longitudinal information, we present here a non-invasive window into maternal–fetal development from a maternal blood sample.
Early prediction of pre-eclampsia
Having established that cfRNA profiles can reveal and characterize molecular changes in the maternal–placental–fetal unit over gestation, it is likely that disruption of these pathways might identify women at risk for adverse pregnancy outcomes such as pre-eclampsia.
We evaluated the ability of cfRNA signatures in maternal blood, during the second trimester (16–27 weeks), to predict the development of pre-eclampsia. Maternal blood draws occurred, on average, 14.5 weeks (s.d., 4.5 weeks) before delivery (Fig. 3a); in contrast to work by Munchel et al.18 where plasma was collected at the time of diagnosis, the gestational age time points in our analysis correspond to timepoints where women are asymptomatic. A case–control study with 72 cases of pre-eclampsia and 452 non-cases selected from two independent cohorts (cohorts A and E) was performed (Supplementary Information). Cohort E included 31 controls with chronic hypertension and 19 controls with gestational hypertension and both cohorts included spontaneous preterm birth samples along with the normotensive term controls. Pre-eclampsia was defined by criteria consistent with those from the 2013 Task Force on Hypertension in Pregnancy (ACOG 2013), and each case was adjudicated by two board-certified physicians. As before, a cohort correction was applied before modelling.
Two-sided Spearman correlation tests identified signatures that separated the cases and controls; in each round of cross-validation, we retained features with an adjusted P value below 0.05 (Methods) and consistently identified seven genes: CLDN7, PAPPA2, SNORD14A, PLEKHH1, MAGEA10, TLE6 and FABP1 (Fig. 3b).
Four of the genes selected for modelling have functions relevant to pre-eclampsia or placental development. PAPPA2, encoding pregnancy-associated plasma protein 2, is expressed in the placenta19, specifically in trophoblast cells. It has previously been linked to the development of pre-eclampsia and has been associated with inhibition of trophoblast migration, invasion and tube formation20,21. Claudin 7 (CLDN7) is involved in tight cell junction formation and blastocyst implantation; in healthy pregnancies, expression of CLDN7 is reduced in response to oestrogen at the time of implantation22,23. Similarly, TLE6 has also been linked to preimplantation and early embryonic lethality24. Fatty acid-binding protein 1 (FABP1) was first purified from human cytotrophoblasts and is known to be highly expressed in the fetal liver; it is critical for fatty acid uptake and transport25 and is upregulated threefold when cytotrophoblasts differentiate to syncytiotrophoblasts at implantation26. The other three genes that make up the pre-eclampsia cfRNA signature (SNORD14A, PLEKHH1 and MAGEA10) have been associated with pre-eclampsia through bioinformatic analyses, although their function is less well understood27,28. Two of the identified genes, PAPPA2 and FABP1, were also identified in the gestational age model and highlight the imbalance in cfRNA signatures between pregnancy progression and pathology.
On the basis of these identified gene features, a logistic regression model in a leave-one-out cross-validation set-up was used to estimate the probability of pre-eclampsia. This model framework was chosen on the basis of learning curve analyses (Methods and Extended Data Fig. 7). At a sensitivity of 75%, our cfRNA model achieved a positive predictive value (PPV) of 32.3% (s.d., 3%) given a prevalence of pre-eclampsia of 13.7% in our study, superior to PPVs reported from current clinical state-of-the-art models, which are driven largely by maternal factors2 ; the area under the curve (AUC) for the model was 0.82 (95% confidence interval, ±0.06; Fig. 3c). Consistent with our findings with the gestational age model, inclusion of clinical variables (maternal BMI, age and race) had no effect on performance, as the classifier assigns zero weight to these clinical variables and they explain <1% of the variance based on ANOVA analyses. The lack of contribution to cfRNA profiles from clinical factors highlights the generalizability of these profiles to diverse populations.
When comparing gestational age at delivery between test-positive and test-negative individuals, a significant shift was found in the timing of delivery, with the test-positive population delivering earlier during gestation (P < 2 × 10–7; Fig. 3d). A positive test correctly identified 73% of individuals destined to have a medically indicated preterm birth over 3 months in advance of the onset of clinical symptoms or delivery.
To further understand molecular signature changes and how they might reflect the pathophysiology driving pre-eclampsia, we performed pathway analysis. The top upregulated pathways were dominated by structural cell functions, including placental blood vessel development, artery morphogenesis and embryonic placental development (Extended Data Fig. 4a), while the majority of downregulated pathways were related to immune pathways (Extended Data Fig. 4b). Both the upregulated and downregulated gene sets aligned with the accepted mechanism of pathogenesis for pre-eclampsia29.
In cohort E, the non-case group contained both normotensive women (n = 263) and women with chronic (n = 31) or gestational (n = 19) hypertension. Genes identified through comparison of the groups with chronic or gestational hypertension with the normotensive group showed no overlap with genes significant for pre-eclampsia (two-sided Spearman correlation test, P < 0.05). Additionally, no genes were differentially expressed in the chronic or gestational hypertensive groups when compared with the normotensive group. While others have published studies designed to determine the effect of hypertension more generally on gene expression (e.g., Zeller et al.30), here, we demonstrate that the signal for pre-eclampsia is specific to hypertension driven by a placental disorder and the signature is independent of signals associated with chronic hypertension. Clinically, it can be quite challenging to differentiate superimposed pre-eclampsia in women with pre-existing hypertension from exacerbation of baseline chronic hypertension. This difference is important, as one requires delivery for cure while the other usually does not.
As pre-eclampsia and spontaneous preterm birth are theorized to have some overlapping molecular pathways31,32, we tested whether excluding non-case samples with deliveries before gestational week 37 (n = 85) would affect test prediction. Removal of spontaneous preterm delivery samples did not alter the performance of the model (AUC = 0.79; 95% confidence interval, ±0.06), suggesting that inclusion of spontaneous preterm birth samples in the non-case group does not affect the pre-eclampsia classifier.
We report a standalone molecular predictor that has the potential to be an early detector of pre-eclampsia with a PPV of 32% that is based entirely on transcripts and is exclusive of clinical variables. This predictor contrasts with state-of-the-art methods, which are dependent on clinical factors and achieve a PPV of 4.4%2.
Discussion
While other studies have looked at circulating biomarkers, a recent comprehensive review33 concluded that more data early in pregnancy are needed to support clinical value. Here, we reveal the ability of cfRNA transcripts to provide comprehensive molecular profiles of pregnancy progression by including signals from the placenta and the fetus. We have shown that novel transcript signatures from a single blood sample can (1) accurately track pregnancy progression independently of clinical factors and (2) reliably identify women at risk of developing pre-eclampsia months before presentation of the disease. Given the large sample size and diversity in our study population, it is noteworthy that race has a negligible effect on the expression patterns of gestational age estimates and pre-eclampsia risk evaluation. These findings allow for the development of personalized assessments for pregnancy.
Equally important, our work allows for the assessment of maternal risk independently of clinical factors, such as race, that are fraught with bias. The inclusion of race in clinical assessments results in miscalculation of patient risk and underdiagnoses34–36. While we acknowledge that, within specific subpopulations, the prevalence of complications such as pre-eclampsia may be higher, the evaluation of cfRNA transcripts directly exposes the developing pathophysiology. Further research will be needed to identify drivers of the identified pathophysiological pathways; the focus on molecular mechanisms allows stratification of risk without the need for enrichment of ‘pretest’ probabilities based on maternal sociodemographic characteristics. Further, an understanding of the maternal–fetal–placental transcriptome also represents a vehicle by which comprehension of the biological underpinnings of maternal–fetal development can be improved and provides novel insights into interactions across the maternal–fetal dyad. This holds the promise of precision therapeutic interventions that can target molecular subtypes of pre-eclampsia and preterm birth.
Improvement in maternal outcomes has been limited by the inability to access pregnancy tissues and a lack of understanding of the specific molecular phenotypes that identify those at risk before onset of symptoms. Our findings can now be leveraged to more accurately provide information on future maternal and fetal health and disease. Thus, our approach opens new therapeutic windows to effectively decrease maternal and neonatal morbidity and mortality.
Methods
The Mirvie RNA technology
cfRNA isolation
Plasma samples received on dry ice from our collaborators were stored at –80 °C until further processing. Total circulating nucleic acid was extracted from plasma ranging in volume from ~215 µl to 1 ml, using a column-based commercially available extraction kit, following the manufacturer’s instructions (Plasma/Serum Circulating and Exosomal RNA purification kit, Norgen, 42800).
Following extraction, cfDNA was digested using Baseline-ZERO DNase (Epicentre) and the remaining cfRNA was purified using an RNA Clean and Concentrator-5 kit (Zymo, R1016) or an RNeasy MinElute Cleanup kit (Qiagen, 74204).
RT–qPCR assay
We performed PCR with reverse transcription (RT–qPCR) analysis to assess the relative amount of cfRNA extracted from each sample. We measured and compared the threshold cycle (Ct) values from each RNA sample using a three-colour multiplex qPCR assay from the TaqPath 1-Step Multiplex Master Mix kit (ThermoFisher Scientific, A28526) and a Quant Studio 5 system. We also measured the Ct values for an endogenous housekeeping gene (ACTB; ThermoFisher Scientific, 4351368).
cfRNA library preparation
cfRNA libraries were prepared using the SMARTer Stranded Total RNAseq-Pico Input Mammalian kit (Takara, 634418) following the manufacturer’s instructions, except that we did not use ribo depletion. Library quality was assessed by RT–qPCR following the method described for assessing RNA measurements and fragment analysis on a Fragment Analyzer 5300 (Agilent Technologies).
Enrichment and sequencing
Libraries were normalized before pooling for target capture. We used a SureSelect Target Enrichment kit (Agilent Technologies, 5190-8645) and followed the manufacturer’s instructions for hybrid capture. Samples were quantified, and 50-bp, paired-end sequencing was performed on a Novaseq S2. Between 96 and 144 samples were pooled and sequenced per sequencing run.
Analysis for outliers
qPCR of ACTB as well as MultiQC sequencing metrics were monitored to eliminate sample outliers before performing gene expression analyses. Individual samples more than 3 s.d. from the mean were removed as outliers. A total of 193 of 2,732 samples (7.1%) were removed following this filtering.
Read processing
Reads were processed following a similar protocol to that reported in Ngo et al.37. Briefly, raw sequencing reads were trimmed using trimmomatic38 and then mapped to hg38 using the STAR aligner39. After removing duplicates using Picard tools, gene counts were generated with htseq40.
Cohort correction and feature normalization
For each gene, its relationship to total counts per sample was measured and corrected using linear model residuals. Extended Data Fig. 5a, b shows what this looks like for the gene ACTB.
We also sought to correct the genes such that each cohort had the same mean value for each gene. However, the cohorts came from different parts of the gestational age spectrum. Therefore, only cohort effects orthogonal to the gestational age effect were corrected. This is shown in Extended Data Fig. 5c, d for the gene CAPN6. Each cohort was given its own colour.
Cohort E (bright yellow) had unusually low counts for its gestational age range before correction, and this effect was removed by correction.
Using principal-component analysis (PCA) to compress the high-dimensional space of all genes, the correction could be seen to clarify the separation of samples by gestational age as indicated by the colour gradient (Extended Data Fig. 5e, f).
Linear correction algorithm
1. In the training, correct for (remove the effect of) the variable(s) of interest (e.g., gestational age) using linear model residuals.
2. Learn the required correction for the variables you wish to correct for in this corrected training data.
3. The residuals of that model (in the raw training and testing data) are your corrected data.
Note: the correction was learned entirely in the training data and the variable of interest in the testing data was never used, negating the possibility of a data leak.
Lasso linear model for gestational age prediction and ANOVA
The Lasso model used in the gestational age model had its parameters chosen via 10-fold cross-validation in the training set. The largest cross-validation score within one standard error of the best cross-validation score was chosen (Breiman strategy). We limited our feature space by excluding pseudogenes and non-coding genes, as well as genes with median expression greater than zero, leaving a total of 13,208 features to evaluate. A final Lasso with this was then trained on the whole training set and evaluated in the test set. This was all done with the glmnet R package using the cv.glmnet() function.
The model uses 674 of the available gene features (Supplementary Data 1), although this includes a long tail of features with low contribution. We tested performance for the 50 most informative features from the model and obtained a mean absolute error of 15.4 days. The continued reduction in error as we reached our complete training set of n = 1,908 samples indicated that model learning was not exhausted and that additional samples would have increased performance (Extended Data Fig. 6). Notably, as seen in Extended Data Fig. 6, the similar performance in cross-validation and on the independent held-out test data indicated that the model was not overfit with the 674 gene features. To determine how far the model could be extrapolated, a final model was built using all data; this gave a mean absolute error of 13 days across the entire dataset.
Gestational age learning curve
The main gestational age modelling was done with an 80/20 train/test split. To assess model performance after decreasing amounts of training data, one can repeat analyses with 70/30 splits, 60/40 splits and so on (doing so repeatedly with different random splits to quantify uncertainty). In this way, one builds a learning curve (Extended Data Fig. 6) with different training set sizes on the x axis and model performance on the y axis.
Gestational age model without cohort correction
For this approach, we selected all samples from healthy pregnancies and split the dataset into a training set (80% of data) and a test set (20% of data), in which samples were stratified by cohort. Samples that did not pass quality-control filtering based on basic sequencing metrics had been previously excluded from analysis. We trained a Lasso model to predict the gestational age at collection for each sample using the mean absolute error as an optimization metric and 10-fold cross-validation in the training set. We used all genes with mean log2(counts per million (CPM) + 1) > 1 (12,921 genes) plus a set of sequencing metrics as features for training. Modelling was performed in log2(CPM + 1) space, and all data were centred and scaled before modelling using the training set statistics. This led to a model with a mean absolute error of 15.9 days in the withheld test set using 487 transcriptomic features. We then selected the top 53 features of this model and retrained the Lasso using the same approach described above, achieving a mean absolute error of 16.6 days in the held-out test set.
Gene set enrichment analysis
Gene set enrichment analysis (GSEA)11,41 was done with the fast GSEA algorithm42 using Bioconductor’s fgsea package43. Gene sets were compiled from the Molecular Signatures Database (MSigDB)11,12 using the CRAN msigdbr v7.2 API and directly from c8.all.v7.3.symbols.gmt. We focused on two collections of gene sets: the Gene Ontology (GO) subcollection of the ontology gene sets, C5:GO, and the cell type signature gene sets, C8 v7.3. Genes were ranked on the basis of their shrunken log-transformed fold change values and associated Wald test P values obtained from analysis of differential expression using Bioconductor’s DESeq2 (ref. 44), represented as –log10(P value) × shrunkenLFC. GSEA was carried out on 372 samples from cohort H collected from 93 women with healthy pregnancies over four draw intervals during pregnancy, 11.4−14 weeks, 18−21 weeks, 22.8−27.8 weeks and 29.2–34.8 weeks. Shrunken log-transformed fold change values and corresponding P values were obtained from all six pairwise contrasts between the four draws. We used 102 fetal gene sets that were significantly enriched (Benjamini–Hochberg adjusted P < 0.01) in at least one pairwise comparison (Supplementary Table 2) in downstream analyses, including analysis of plasma transcriptome partitioning and set-specific longitudinal trends.
Using a GO collection of gene sets, we validated our approach and identified seven pregnancy-related sets that were significantly enriched in the comparison between early- and late-pregnancy samples (Extended Data Figure 1). Three gene sets in the gonadotropin and oestrogen pathways exhibited significant changes consistent with known physiology45.
Evaluating changes in plasma transcriptome partitioning
The plasma transcriptome can be phenomenologically viewed as being partitioned into characteristic sets of genes. We assessed this partitioning in each cfRNA sample by converting raw gene counts to CPM and summing CPM over all genes in each of the sets. The resulting cumulative CPM score, which is a relative measure of the abundance of each gene set in the overall transcriptome, was used to directly compare gene sets across collection time points. Cumulative CPM scores for all gene sets significantly enriched between collections 1 and 4 were calculated for every cfRNA sample. The scores for each sample were regressed onto the recorded gestational age (in weeks) using a linear model. Gene sets with an adjusted P value for the gestational age coefficient <0.01 were considered as having a significant (positive or negative) trend in their relative abundance. The association of these trends with the time component in the data was further verified by scrambling the temporal structure and re-examining the trends along the original time variable. For each mother, we also evaluated the monotonicity of the cumulative CPM score function along the collection times. Because there are 24 possible permutations of order for the four collection times and only one of those permutations allows for a monotonic upward trend (with one for a downward trend), we were able to analytically assess the significance of the observed number of monotonic trends among 93 mothers using a chi-squared test.
Pre-eclampsia analysis and learning curve
CIs for AUCs and sensitivity, specificity and PPV were all found via bootstrapping. PPV was calculated as PPV = (sensitivity × prevalence)/((sensitivity × prevalence) + ((1 – specificity) × (1 – prevalence))).
To build the learning curve (Extended Data Fig. 7), we increased the size of the training set going from two- to ninefold cross-validation with a constant model: logistic regression with gene features chosen by Spearman correlation tests with an adjusted P-value threshold of 0.05. The point on the right connected to the learning curve via a dashed line is the leave-one-out cross-validation result shown in the main text.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.
Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41586-021-04249-w.
Supplementary information
Acknowledgements
We thank all women who donated blood samples and made this study possible. This research was conducted using specimens and data collected, stored and managed by INSIGHT, LIFECODES, The Women’s Health Tissue, Pregnancy Outcomes and Community Health (POUCH), Prenatal Exposures and Preeclampsia Prevention (PEPP), Global Alliance to Prevent Prematurity and Stillbirth (GAPPS), Pemba Pregnancy and Newborn Discovery Cohort (PPNDC) and Roskilde biorepositories. We thank the Precia Group for introducing and coordinating with key study collaborators. Samples from the INSIGHT study were collected with support from Tommy’s Charity (no. 1060508), the National Institute for Health Research (NIHR) Biomedical Research Centre (BRC) based at Guy’s and St Thomas’ National Health Service Foundation Trust, the Rosetrees Trust (charity no. 298582) (M303-CD1) and an NIHR Doctoral Research Fellowship (DRF-2013-06-171) to N.L. Hezelgrave. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. Research reported in this publication was supported by UI BioShare, the enterprise biospecimen management system supported by the University of Iowa’s Carver College of Medicine, Holden Comprehensive Cancer Center and Institute for Clinical and Translational Science.
Extended data figures and tables
Author contributions
M. Rasmussen, M.L., E.N., M.J., S.R.Q. and T.F.M. conceptualized and designed the study with input from the remaining authors. N.M.S., D.E.C., L.E., J.D.M., A.D., C.B.-G., M.K.S., S.D., S.M. Ame, S.M. Ali, M.A., D.J.G.-B., L.S., J.A.L., D.A.S., S.S., R.M.T., J.M.R., E.H., C.H. and T.F.M. provided samples and data to the study, curated the collection and obtained approvals for use in this study where required. M. Rasmussen, M. Reddy, J.C.-S. and E.N. designed laboratory protocols; all laboratory experiments were carried out by M. Reddy, T.B., M.T. and J.L. M. Rasmussen, R.N., J.C.-S., A.K., F.S., E.P.S.G., M.D., E.N. and S.R.Q. conceptualized computational analyses; R.N., J.C.-S., A.K., F.S. and E.P.S.G. implemented and reviewed code. M. Rasmussen, M.J., M.A.E. and T.F.M. drafted the manuscript with critical input from all authors.
Data availability
Data are available with a signed data use agreement to protect identifiable data; please contact research@mirvie.com.
Code availability
Code is available as three packages in the following repositories: mirmisc, 10.5281/zenodo.5604683; mirmodels, 10.5281/zenodo.5593282; and mirr, 10.5281/zenodo.5593280.
Competing interests
M. Rasmussen, M. Reddy, R.N., J.C.-S., A.K., T.B., F.S., M.T., E.P.S.G., J.L., M.L., E.N., M.J., M.A.E., M.D., S.R.Q. and T.M. have an equity interest in Mirvie. All cohort contributors were compensated for sample collection and/or shipping. T.M. serves on the scientific advisory board for Mirvie, NxPrenatal, Momenta Pharmaceuticals and Hoffmann–La Roche. M. Rasmussen, M. Reddy, R.N., J.C.-S., A.K., T.B., F.S., M.T., E.P.S.G., J.L., M.L., E.N., M.J., M.A.E., S.R.Q., M.K.S. and D.A.S. are inventors on patent applications (US20170145509A1, US9937182B2 and EP2954324A1) that cover the detection, diagnosis or treatment of pregnancy complications.
Footnotes
Peer review information Nature thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Morten Rasmussen, Email: morten@mirvie.com.
Michal A. Elovitz, Email: elovitz@pennmedicine.upenn.edu
Thomas F. McElrath, Email: tmcelrath@bwh.harvard.edu
Extended data
is available for this paper at 10.1038/s41586-021-04249-w.
Supplementary information
The online version contains supplementary material available at 10.1038/s41586-021-04249-w.
References
- 1.Rich-Edwards JW, Fraser A, Lawlor DA, Catov JM. Pregnancy characteristics and women’s future cardiovascular health: an underused opportunity to improve women’s health? Epidemiol. Rev. 2014;36:57–70. doi: 10.1093/epirev/mxt006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Tan MY, et al. Screening for pre-eclampsia by maternal factors and biomarkers at 11–13 weeks’ gestation: first-trimester PE screening. Ultrasound Obstet. Gynecol. 2018;52:186–195. doi: 10.1002/uog.19112. [DOI] [PubMed] [Google Scholar]
- 3.Marinić M, Lynch VJ. Relaxed constraint and functional divergence of the progesterone receptor (PGR) in the human stem-lineage. PLoS Genet. 2020;16:e1008666. doi: 10.1371/journal.pgen.1008666. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Robillard P-Y, Dekker GA, Hulsey TC. Evolutionary adaptations to pre-eclampsia/eclampsia in humans: low fecundability rate, loss of oestrus, prohibitions of incest and systematic polyandry. Am. J. Reprod. Immunol. 2002;47:104–111. doi: 10.1034/j.1600-0897.2002.1o043.x. [DOI] [PubMed] [Google Scholar]
- 5.McCarthy FP, Kingdom JC, Kenny LC, Walsh SK. Animal models of preeclampsia; uses and limitations. Placenta. 2011;32:413–419. doi: 10.1016/j.placenta.2011.03.010. [DOI] [PubMed] [Google Scholar]
- 6.Chez RA. Nonhuman primate models of toxemia of pregnancy. Perspect. Nephrol. Hypertens. 1976;5:421–424. [PubMed] [Google Scholar]
- 7.Malassiné A, Frendo JL, Evain-Brion D. A comparison of placental development and endocrine functions between the human and mouse model. Hum. Reprod. Update. 2003;9:531–539. doi: 10.1093/humupd/dmg043. [DOI] [PubMed] [Google Scholar]
- 8.Skupski DW, et al. Estimating gestational age from ultrasound fetal biometrics. Obstet Gynecol. 2017;130:433–441. doi: 10.1097/AOG.0000000000002137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Khosrotehrani K, Johnson KL, Cha DH, Salomon RN, Bianchi DW. Transfer of fetal cells with multilineage potential to maternal tissue. JAMA. 2004;292:75–80. doi: 10.1001/jama.292.1.75. [DOI] [PubMed] [Google Scholar]
- 10.Kahn DA, Baltimore D. Pregnancy induces a fetal antigen-specific maternal T regulatory cell response that contributes to tolerance. Proc. Natl Acad. Sci. USA. 2010;107:9299–9304. doi: 10.1073/pnas.1003909107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ashburner M, et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Subramanian A, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Liberzon A, et al. Molecular Signatures Database (MSigDB) 3.0. Bioinformatics. 2011;27:1739–1740. doi: 10.1093/bioinformatics/btr260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Shi J-W, et al. Collagen at the maternal-fetal interface in human pregnancy. Int. J. Biol. Sci. 2020;16:2220–2234. doi: 10.7150/ijbs.45586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Menon R, et al. Single-cell analysis of progenitor cell dynamics and lineage specification in the human fetal kidney. Development. 2018;145:dev164038. doi: 10.1242/dev.164038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ryan D, et al. Development of the human fetal kidney from mid to late gestation in male and female infants. EBioMedicine. 2018;27:275–283. doi: 10.1016/j.ebiom.2017.12.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gao S, et al. Tracing the temporal-spatial transcriptome landscapes of the human fetal digestive tract using single-cell RNA-sequencing. Nat. Cell Biol. 2018;20:721–734. doi: 10.1038/s41556-018-0105-4. [DOI] [PubMed] [Google Scholar]
- 18.Munchel S, et al. Circulating transcripts in maternal blood reflect a molecular signature of early-onset preeclampsia. Sci. Transl. Med. 2020;12:eaaz0131. doi: 10.1126/scitranslmed.aaz0131. [DOI] [PubMed] [Google Scholar]
- 19.Uhlén M, et al. Tissue-based map of the human proteome. Science. 2015;347:1260419. doi: 10.1126/science.1260419. [DOI] [PubMed] [Google Scholar]
- 20.Kramer AW, Lamale-Smith LM, Winn VD. Differential expression of human placental PAPP-A2 over gestation and in preeclampsia. Placenta. 2016;37:19–25. doi: 10.1016/j.placenta.2015.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Chen X, et al. The potential role of pregnancy-associated plasma protein-A2 in angiogenesis and development of preeclampsia. Hypertens. Res. 2019;42:970–980. doi: 10.1038/s41440-019-0224-8. [DOI] [PubMed] [Google Scholar]
- 22.Poon CE, Madawala RJ, Day ML, Murphy CR. Claudin 7 is reduced in uterine epithelial cells during early pregnancy in the rat. Histochem. Cell Biol. 2013;139:583–593. doi: 10.1007/s00418-012-1052-y. [DOI] [PubMed] [Google Scholar]
- 23.Schumann S, Buck VU, Classen-Linke I, Wennemuth G, Grümmer R. Claudin-3, claudin-7, and claudin-10 show different distribution patterns during decidualization and trophoblast invasion in mouse and human. Histochem. Cell Biol. 2015;144:571–585. doi: 10.1007/s00418-015-1361-z. [DOI] [PubMed] [Google Scholar]
- 24.Alazami AM, et al. TLE6 mutation causes the earliest known human embryonic lethality. Genome Biol. 2015;16:240. doi: 10.1186/s13059-015-0792-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wang G, Bonkovsky HL, de Lemos A, Burczynski FJ. Recent insights into the biological functions of liver fatty acid binding protein 1. J. Lipid Res. 2020;56:2238–2247. doi: 10.1194/jlr.R056705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Cunningham P, McDermott L. Long chain PUFA transport in human term placenta. J. Nutr. 2009;139:636–639. doi: 10.3945/jn.108.098608. [DOI] [PubMed] [Google Scholar]
- 27.Ren, Z. et al. Distinct molecular processes in placentae involved in two major subtypes of preeclampsia. Preprint at bioRxiv10.1101/787796 (2019).
- 28.Gormley M, et al. Preeclampsia: novel insights from global RNA profiling of trophoblast subpopulations. Am. J. Obstet. Gynecol. 2017;217:200.e1–200.e17. doi: 10.1016/j.ajog.2017.03.017. [DOI] [PubMed] [Google Scholar]
- 29.Redman CW, Sargent IL. Latest advances in understanding preeclampsia. Science. 2005;308:1592–1594. doi: 10.1126/science.1111726. [DOI] [PubMed] [Google Scholar]
- 30.Zeller T, et al. Transcriptome-wide analysis identifies novel associations with blood pressure. Hypertension. 2017;70:743–750. doi: 10.1161/HYPERTENSIONAHA.117.09458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Challis JR, et al. Inflammation and pregnancy. Reprod. Sci. 2009;16:206–215. doi: 10.1177/1933719108329095. [DOI] [PubMed] [Google Scholar]
- 32.Raghupathy R, Kalinka J. Cytokine imbalance in pregnancy complications and its modulation. Front. Biosci. 2008;13:985–994. doi: 10.2741/2737. [DOI] [PubMed] [Google Scholar]
- 33.Carbone IF, et al. Circulating nucleic acids in maternal plasma and serum in pregnancy complications: are they really useful in clinical practice? A systematic review. Mol. Diagn. Ther. 2020;24:409–431. doi: 10.1007/s40291-020-00468-5. [DOI] [PubMed] [Google Scholar]
- 34.Vyas DA, Eisenstein LG, Jones DS. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. N. Engl. J. Med. 2020;383:874–882. doi: 10.1056/NEJMms2004740. [DOI] [PubMed] [Google Scholar]
- 35.Delgado C, et al. Reassessing the inclusion of race in diagnosing kidney diseases: an interim report from the NKF-ASN Task Force. J. Am. Soc. Nephrol. 2021;32:1305–1317. doi: 10.1681/ASN.2021010039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Grobman, W. A. et al. Prediction of vaginal birth after cesarean delivery in term gestations: a calculator without race and ethnicity. Am. J. Obstet. Gynecol. 10.1016/j.ajog.2021.05.021 (2021). [DOI] [PMC free article] [PubMed]
- 37.Ngo TTM, et al. Noninvasive blood tests for fetal development predict gestational age and preterm delivery. Science. 2018;360:1133–1136. doi: 10.1126/science.aar3819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Dobin A, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Anders S, Pyl PT, Huber W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–169. doi: 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Mootha VK, et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 2003;34:267–273. doi: 10.1038/ng1180. [DOI] [PubMed] [Google Scholar]
- 42.Korotkevich, G. et al. Fast gene set enrichment analysis. Preprint at bioRxiv10.1101/060012 (2016).
- 43.Cre, A. S. Fast gene set enrichment analysis. 10.18129/B9.BIOC.FGSEA (Bioconductor, 2017).
- 44.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Tal, R. & Taylor, H. S. Endocrinology of pregnancy. Endotextwww.endotext.org (MDText.com, 2021).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data are available with a signed data use agreement to protect identifiable data; please contact research@mirvie.com.
Code is available as three packages in the following repositories: mirmisc, 10.5281/zenodo.5604683; mirmodels, 10.5281/zenodo.5593282; and mirr, 10.5281/zenodo.5593280.