Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2023 Jan 30.
Published in final edited form as: Hypertension. 2010 Oct 1;56(4):741–749. doi: 10.1161/HYPERTENSIONAHA.110.157297

Robust Early Pregnancy Prediction of Later Preeclampsia Using Metabolomic Biomarkers

Louise C Kenny 1,✉,#, David I Broadhurst 2,#, Warwick Dunn 3, Marie Brown 4, Robyn A North 5, Lesley McCowan 6, Claire Roberts 7, Garth JS Cooper 8, Douglas B Kell 9, Philip N Baker, on behalf of the Screening for Pregnancy Endpoints Consortium10
PMCID: PMC7614124  EMSID: EMS163201  PMID: 20837882

Abstract

Preeclampsia is a pregnancy-specific syndrome that causes substantial maternal and fetal morbidity and mortality. The etiology is incompletely understood, and there is no clinically useful screening test. Current metabolomic technologies have allowed the establishment of metabolic signatures of preeclampsia in early pregnancy. Here, a 2-phase discovery/validation metabolic profiling study was performed. In the discovery phase, a nested case-control study was designed, using samples obtained at 15±1 weeks’ gestation from 60 women who subsequently developed preeclampsia and 60 controls taking part in the prospective Screening for Pregnancy Endpoints cohort study. Controls were proportionally population matched for age, ethnicity, and body mass index at booking. Plasma samples were analyzed using ultra performance liquid chromatography-mass spectrometry. A multivariate predictive model combining 14 metabolites gave an odds ratio for developing preeclampsia of 36 (95% CI: 12 to 108), with an area under the receiver operator characteristic curve of 0.94. These findings were then validated using an independent case-control study on plasma obtained at 15±1 weeks from 39 women who subsequently developed preeclampsia and 40 similarly matched controls from a participating center in a different country. The same 14 metabolites produced an odds ratio of 23 (95% CI: 7 to 73) with an area under receiver operator characteristic curve of 0.92. The finding of a consistent discriminatory metabolite signature in early pregnancy plasma preceding the onset of preeclampsia offers insight into disease pathogenesis and offers the tantalizing promise of a robust presymptomatic screening test. (Hypertension. 2010;56:741-749.)

Keywords: preeclampsia, metabolomics, biomarkers, screening, hypertension


Preeclampsia (PE) affects 5% of nulliparous pregnancies and globally afflicts ≈4 million women annually. It remains a leading cause of maternal death throughout the world and is responsible for significant baby morbidity and mortality.1 Furthermore, PE has healthcare implications for the women later in life with an increased risk of hypertension, coronary artery disease, stroke, and type 2 diabetes mellitus.2

Although the precise etiology of the disease is unclear, accumulating evidence suggests that the disease results from complex interaction between a poorly perfused placenta, because of defective remodeling of the uteroplacental arteries in early pregnancy, and a maternal response to placental-derived triggers, which results in widespread vascular endothelial cell dysfunction.1,3,4

Widespread plasma alterations precede the clinical onset of PE, and there is intense interest in the identification of predictive biomarkers.5 Numerous candidate biomarkers have been proposed for prediction of disease, including placental hormones, angiogenic factors, and lipids.3,68 To date, none (nor any combination) has emerged with the requisite specificity and sensitivity to be of clinical use.5 Consequently, clinicians are unable to offer either targeted surveillance or potential preventative therapies to those nulliparous women at greatest risk.

Metabolic profiling is a powerful strategy for investigating the low molecular weight (bio)chemicals (metabolites) present in the metabolome of a cell, tissue, or organism.9 Its position as the final downstream product of gene expression enables the provision of a high-resolution multifactorial phenotypic signature of disease etiology, manifestation, or pathophysiology.1012

We previously reported results of an anonymous metabolomic screen of plasma from women with established PE.13,14 Subsequently, we identified highly discriminatory metabolites that effectively distinguished cases with PE from matched controls. We, therefore, sought to take a similar metabolomics approach for the detection and development of predictive early pregnancy biomarkers for PE.

A significant issue limiting the discovery of biomarkers in general is the availability of adequate numbers of quality samples from patients with well-characterized phenotypes, where disease prevalence is low (≈5% in PE). This is particularly the case when searching for predictive biomarkers early in pregnancy at a time remote from disease presentation. In the present study, the women were participants in the multinational Screening for Pregnancy Endpoints (SCOPE) Study (www.scopestudy.net), a prospective cohort study of healthy nulliparous women. These samples are extremely well curated, accompanied by comprehensive metadata, and are well matched to avoid potential sources of bias.15 We performed 2 independent nested case-control studies within the SCOPE cohort, using samples for the discovery and validation phases from 2 different study centers. First, in a biomarker discovery study, plasma samples obtained at 15±1 weeks from 60 women who subsequently developed PE and 60 proportionally matched controls were analyzed using ultra performance liquid chromatography-mass spectrometry (UPLC-MS). The resulting metabolic profiles were investigated using a combination of both univariate and multivariate statistics. A univariate screen was performed to reduce the many thousand metabolite features detected by UPLC-MS down to several hundred that showed any biological variance, thus reducing the multivariate biomarker search space. Multivariate statistics were then used to investigate the underlying correlation between the remaining metabolites and to discover a multifactorial metabolite signature for PE. This signature was then validated using an independent nested case-control study on plasma obtained at 15±1 weeks from 39 different women within the SCOPE cohort who subsequently developed PE and 40 proportionally matched controls.

Methods

Participants and Specimens

The SCOPE Study is a prospective cohort study with the main aim of developing accurate screening methods for later pregnancy complications, including PE (ACTRN12607000551493). Full ethical approval has been obtained, and all of the patients gave written informed consent. Healthy nulliparous women with a singleton pregnancy were recruited between 14 and 16 weeks and tracked throughout pregnancy. For further details of the study population, please see the online Data Supplement at http://hyper.ahajournals.org.

In the discovery phase of our investigation, we performed a nested case-control study within the initial 1628 recruits in Auckland, New Zealand, of whom pregnancy outcome was known in 1608 (98.8%). Sixty-seven women (4.2%) developed PE, and 1021 (63.5%) had uncomplicated pregnancies. The remainder had other pregnancy complications. Sixty women who developed PE were proportionally population matched for age, ethnicity, and body mass index to 60 controls who had uncomplicated pregnancies. The study was limited to 120 samples to guarantee optimal measurement reproducibility from the UPLC-MS systems.16

In the validation-phase of our investigation we performed a nested case-control study within the initial 596 recruits in Adelaide, Australia, of whom pregnancy outcome was known in 595 (99.8%). Forty-six women (7.7%) developed PE, and 267 (44.9%) had uncomplicated pregnancies. The remainder had other pregnancy complications. Thirty-nine women who developed PE were proportionally population matched for age, ethnicity, and body mass index to 40 controls who had uncomplicated pregnancies.

PE was defined as a blood pressure ≥140/90 mmHg after 20 weeks’ gestation (but before the onset of labor) or in the postnatal period, with either proteinuria (24-hour urinary protein ≥300 mg, spot urine protein:creatinine ratio ≥30 mg/mmol, or urine dipstick ≥++) and/or evidence of multiorgan complications.17

Venipuncture was performed at 15±1 weeks’ gestation in non-fasting patients, and plasma samples were collected into BD EDTA-Vacutainer tubes, placed on ice and centrifuged at 2400g at 4°C according to a standardized protocol. Plasma was stored in aliquots at –80°C. The collection and storage conditions were identical for cases and controls, with the time between collection and storage being 2.07 (SD 0.90) and 2.02 (SD 0.96) hours, respectively (P=0.78).

Reagents, Sample Preparation, and Mass Spectral Analysis

All of the chemicals and reagents used were of Analytic Reagent or high-performance liquid chromatography grade and purchased from Sigma-Aldrich or ThermoFisher Scientific. Plasma samples were allowed to thaw on ice for 3 hours, vortex mixed to provide a homogeneous sample, and deproteinized. A total of 450 μL of methanol (high-performance liquid chromatography grade) was added to 150 μL of plasma followed by vortex mixing (15 seconds, full speed) and centrifugation (15 minutes, 11 337 g). Three 170-μL aliquots of the supernatant were transferred to separate 2 mL tubes and lyophilized (HETO VR MAXI vacuum centrifuge attached to a Thermo Svart RVT 4104 refrigerated vapor trap, Thermo Life Sciences). Quality control (QC) samples were obtained by pooling 50-μL aliquots from each plasma sample prepared. This was defined as the pooled QC sample, and 150-μL aliquots were deproteinized as described above.

Deproteinized samples were prepared for UPLC-MS analysis by reconstitution in 70 μL of high-performance liquid chromatography grade water followed by vortex mixing (15 seconds), centrifugation (11 337 g, 15 minutes), and transfer to vials. Samples were analyzed by an Acquity UPLC (Waters Corp) coupled to a hybrid LTQ-Orbitrap mass spectrometry system (Thermo Fisher Scientific) operating in electrospray ionization mode. Samples were analyzed in batches of 120 samples, with an instrument maintenance step at the end of each batch involving mass spectrometer ion source and liquid chromatography column cleaning. For each analytic batch a number of pooled QC samples were included to provide quality assurance. The first 10 injections were pooled QC samples (to equilibrate the analytic system), and then every fifth injection was a pooled QC sample. For each of the analytic experiments (discovery/validation), sample preparation order was randomized from sample picking and rerandomized before sample analysis to ensure no systematic biases (eg, analysis order correlates with sample preparation order). The samples were also blinded to the analytic scientists to avoid any subjective bias. The discovery and validation analyses were performed 3 months apart, such that the 2 studies can be considered independent both in terms of sample population and chemical analysis.

Raw profile data were deconvolved into a peak table using XCMS software.18 Data were then subjected to strict quality assurance procedures so that statistical analysis was only performed on reproducible data. For full details of all of the methods pertaining to sample preparation, UPLC-MS analysis, and quality assurance, please see the online Data Supplement at http://hyper.ahajournals.org.

Statistical Analysis

Comparisons of clinical data between cases and controls were performed using the Student t test, Mann–Whitney test, χ2 test or Fisher exact test, as appropriate (SAS system 9.1).

Discovery Phase

For each metabolite peak reproducibly detected in the discovery phase study, the null hypothesis that the means of the case and control sample populations were equal was tested using either the Mann–Whitney test or Student t test, depending on data normality. The critical P value for significance was set to 0.05. No correction for multiple comparisons was performed at this point, because the aim was to reduce the many thousands of detected features down to a subset of potentially “information-rich” peaks while keeping the number of probable false negatives (type II error) to a minimum. False-positive candidate biomarkers are removed during the cross-validation of multivariate analysis and subsequent modeling of the validation data set.

To uncover multivariate latent structure in the data, which, in turn, helps assess the combinatorial predictive ability of the candidate biomarkers, the significant peaks were combined into a single multivariate discriminant model using partial least-squares discriminant analysis (PLS-DA).1921 The optimal number of latent factors used in the PLS-DA model was selected using stratified 5-fold cross-validation and model quality assessed using the standard R2 and Q2 measures,19 where R2, the squared correlation coefficient between the dependant variable and the PLS-DA prediction, measures “goodness of fit” (a value between 0 and 1, where 1 is a perfect correlation) using all of the available data to build a given PLS-DA model. Q2provides a measure of “goodness of prediction” and is the averaged correlation coefficient between the dependent variable and the PLS-DA predictions for the 5 holdout data sets generated during the cross-validation process.

Further validation was performed to check the robustness of the final PLS-DA model by comparing the R2 value to a reference distribution of all of the possible models using permutation testing (N=1000) following the standard protocol for metabolomic studies.22 Here a reference R2 distribution is obtained by calculating all of the possible PLS-DA models under random reassignment of the case/control labels for each measured metabolic profile. If the correctly labeled model’s R2 value is close to the center of the reference distribution, then the model performs no better than a randomly assigned model and is, therefore, invalid. For all of the PLS-DA models described here, the associated reference distribution plots are provided, from which an estimate of the probability of the candidate model randomly occurring can be estimated. In addition, for each PLS-DA model, a receiver operator characteristic (ROC) curve was determined so that an accurate assessment of discriminatory ability could be made.

As a preprocessing step to remove any structured noise in the data set, direct orthogonal signal correction23 was performed using a single correction factor and a tolerance setting of 1×10−3. All of the peak data were scaled to unit variance before multivariate analysis.19,24

For identification of UPLC-MS–related peaks, the accurate mass for each peak was searched against the Manchester Metabolomics Database25 constructed with information from both the Human Metabolome Database (http://www.hmdb.ca/) and Lipidmaps (http://www.lipidmaps.org/). A metabolite name(s) was reported when a match with a mass difference between observed and theoretical mass was <5 ppm. Using UPLC-MS, metabolites are often detected multiple times because of chemical adduction, dimerization, multiple charging, isotope peaks, and fragmentation. After removal of duplicate identifications, a list of unique metabolites was compiled. Definitive identifications were reported only for metabolites with retention time errors <10 seconds and an accurate mass match <5 ppm. Once identified, the metabolites were grouped into metabolite classes using the Human Metabolome Database “Class” hierarchy.

For each named metabolite, an ROC curve was determined to assess each metabolite’s effectiveness as a univariate discriminatory biomarker. In addition, for each metabolite, the optimal unbiased discriminatory decision boundary was estimated using the optimal Youden index method, and then the associated discriminatory odds ratios with 95% CIs were calculated.26,27

Validation Phase

The identified metabolites found to be significant in the discovery phase study were matched to the metabolite peaks detected in the validation study. If a match was found, then the metabolite was univariately assessed as a potential biomarker using the same protocol as for the discovery stage. A PLS-DA model was constructed to assess the multivariate discriminatory ability of the validation peaks.

Finally, we searched for an optimal multivariate discriminatory model drawn from the named metabolites observed in both the discovery and validation studies. A genetic algorithm-based search program was used to obtain the subset of metabolites that produced an effective predictive rule for the onset of PE. This search method has been shown to be very successful in previous studies.9,2832 In this algorithm, a set of candidate solutions evolves over time toward an optimal state. The evolution is pushed by computational techniques inspired by evolutionary biology. In our algorithm, each candidate solution (subset of metabolites) is assessed by building 2 independent linear discriminant analysis models, one modeling the discovery data and the other modeling the validation data. A candidate’s fitness is proportional to the sum of the root mean square error of prediction of these 2 models. Once the optimal subset of metabolites was found, its predictive ability was assessed using PLS-DA and the Hotelling T2 test.33 Assessment was performed independently for the discovery and validation data.

All of the statistical analyses were carried out using the Matlab scripting language (http://www.mathworks.com/). All of the univariate algorithms were implemented such that any missing values are ignored. All of the multivariate algorithms were implemented such that missing values were imputed using the nearest-neighbor method.34 The Genetic Algorithm search program was written in house.28 Scripts are available on request.

Results

Discovery Phase

Maternal characteristics and pregnancy outcome in the women with PE and controls are shown in Table 1. After quality assurance, preprocessing, and univariate screening (see Methods section), the UPLC-MS analysis revealed 457 information-rich metabolite peaks. PLS-DA was performed. The resulting model had an R2 of 0.76, Q2 of 0.68, and area under the ROC curve (AUC) of 0.99. Model selection was performed using 5-fold cross-validation, and the final model was further validated using permutation testing (see Methods section). The final model used a single latent factor and the probability of this model randomly occurring was <0.001. Figure 1 shows the PLS-DA scores plot and the permutation testing.

Table 1. Characteristics and Pregnancy Outcome of Women Who Later Developed PE and Controls.

Auckland Adelaide
Variables Preeclampsia
(n=60)
Controls
(n=60)
P Preeclampsia
(n=39)
Controls
(n=40)
P
Maternal characteristics
     Age, y 30.2 (4.9) 30.4 (4.7) 0.79 22.0 (4.8) 23.2 (5.3) 0.30
     Ethnicity
          White 46 (77%) 52 (87%) 0.16 39 (100%) 39 (97.5) 1.0
          Other 14 (23%) 8 (13%) 0 (0%) 1 (2.5%)
     At 15 weeks’ gestation
     Body mass index, kg/m2 27.3 (4.9) 26.0 (3.9) 0.12 27.5 (6.2) 26.7 (4.6) 0.48
     Systolic blood pressure, mm Hg 115 (11) 107 (12) 0.0003 113 (11) 108 (10) 0.05
     Diastolic blood pressure, mm Hg 72 (9) 63 (9) <0.0001 67 (7) 65 (7) 0.17
     Current smoker 1 (1.7%) 4 (6.7%) 0.36 11 (28.2%) 12 (30%) 0.86
     Gestation at blood sampling, wk 15.0 (0.9) 15.0 (0.8) 0.59 15.2 (0.7) 15.0 (0.7) 0.19
Pregnancy outcome
     Systolic blood pressure (highest
recorded), mm Hg
156 (15) 119 (9) <0.0001 158 (10) 124 (8) <0.0001
     Diastolic blood pressure (highest
recorded), mm Hg
103 (8) 74 (9) <0.0001 99 (10) 74 (7) <0.0001
     Proteinuria* 54 (90%) 32 (82%)
     Protein:creatinine ratio, mg/mmol 70 (42, 117) 52 (26, 172)
          n 53 38
     24-h proteinuria, g 0.6 (0.4, 1.2) 0.7 (0.2, 2.2)
          n 42 14
     Severe preeclampsia
          Severe hypertension 20 (33.3%) 6 (15.4%)
          Thrombocytopenia 7 (11.7%) 2 (5%)
          Liver involvement 12 (20.0%) 11 (28%)
          Renal involvement 7 (11.7%) 2 (5%)
          Imminent eclampsia 4 (6.7%) 2 (5%)
     Gestation at delivery, wk 37.5 (2.8) 40.1 (1.1) <0.0001 38.1 (2.3) 40.0 (1.3) <0.0001
     Preterm delivery, <37 wk 21 (35%) 8 (21%)
     Birth weight, g 2925 (753) 3628 (415) <0.0001 3057 (784) 3583 (391) 0.0004
     Customized birth weight centile 40 (11, 70) 50 (35, 75) 0.02 40 (9, 76) 47 (36, 67) 0.24
     Small for gestational age 15 (25%) 10(25.6%)

Values are mean (SD), median (interquartile range), or n (%).

*

Data are defined as dipstick ≥2+, Protein:creatinine ratio ≥30 mg/mmol, or 24-hour urinary protein ≥0.3 g/24 hours.

Figure 1.

Figure 1

The scores plot for a PLD-DA model using the optimal number of latent vectors (n=1) for data taken from the “discovery” nested case-control study (yellow indicates preeclampsia; blue, controls). Model construction was performed using 5-fold cross-validation resulting in an R2 of 0.76 and Q2 of 0.68. The R2 distribution plot shows that the chosen model’s R2 value is significantly distant from the H0 randomly classified permutation distribution (n=1000); thus, the probability of the presented model randomly occurring is <0.001. Partial least-squares (PLS) score can be considered as the weighted linear combination of the “information-rich” peaks, which best discriminate between the preeclampsia and control samples. AUC curve was 0.99.

Of the 457 candidate biomarker metabolite peaks detected by the UPLC-MS, 70 were successfully identified chemically as known metabolites, of which 45 were “unique” in the sense of being defined molecular entities (Table 2). When grouped into metabolite classes (based on the Human Metabolome Database), 11 clear classes emerged. These were amino acids, carbohydrates, carnitines, Eicosanoids, fatty acids, keto or hydroxy acids, lipids, phospholipids, porphyrins, phosphatidylserine, and steroids.

Table 2. Metabolites Identified in Discovery and Validation Phases.

Metabolite Auckland Adelaide
Identified as Metabolite
Class
P AUC Odd Ratio
(95% CI)
Up/down
in PE?
P AUC Odd Ratio
(95% CI)
Up/down
in PE?
Final Rule?
Isobutyrylglycine and/or N-butyrylglycine Acyl glycines 0.05 0.64 2.0 (0.9 to 4.1) Up
Taurine Amino acids 0.01 0.65 3.4 (1.4 to 7.8) Up
5-Hydroxytryptophan Amino acids 0.01 0.67 23.8 (3.0 to 187.3) Down 0.833 0.61 2.4 (0.8 to 7.1) Down
Urea Amino ketones 0.01 0.66 2.9 (1.3 to 6.3) Down 0.949 0.59 1.8 (0.8 to 4.5) Down
12-Ketodeoxycholic acid* Bile acids 0.02 0.67 2.6 (1.3 to 5.6) Up 0.715 0.58 3.6 (0.9 to 14.4) Up
Monosaccharide(s) Carbohydrates 0.01 0.71 6.1 (2.5 to 15.0) Up 0.097 0.65 2.8 (1.1 to 7.4) Up
Sedoheptulose Carbohydrates 0.02 0.67 3.6 (1.5 to 8.4) Down
Palmitoylcarnitine Carnitines 0.001 0.71 3.8 (1.7 to 8.2) Up 0.244 0.63 3.4 (1.1 to 10.6) Up
Stearoylcarnitine Carnitines 0.006 0.69 3.3 (1.5 to 7.4) Up 0.610 0.61 2.7 (1.0 to 7.5) Up
Decanoylcarnitine Carnitines 0.007 0.68 3.1 (1.4 to 6.9) Up 0.624 0.59 1.6 (0.4 to 6.1) Up
Octanoylcarnitine Carnitines 0.01 0.7 3.0 (1.4 to 6.5) Up 0.494 0.61 1.9 (0.7 to 5.3) Up
Acetylcarnitine Carnitines 0.02 0.66 2.3 (1.1 to 5.0) Up 0.207 0.65 4.7 (1.2 to 18.3) Up
Dodecanoylcarnitine Carnitines 0.05 0.69 3.2 (1.2 to 8.8) Up 0.349 0.63 4.6 (0.9 to 23.5) Up
Methylglutaric acid and/or
adipic acid*
Dicarboxylic acid 0.01 0.64 2.6 (1.2 to 5.9) Down 0.010 0.72 3.8 (1.4 to 10.0) Down
8,11,14-Eicosatrienoic
acid
Eicosanoids 0.003 0.64 8.7 (2.5 to 29.9) Up 0.144 0.64 2.1 (0.8 to 5.3) Up
20-Carboxyleukotriene B4 Eicosanoids 0.005 0.69 3.1 (1.5 to 6.6) Up 0.268 0.64 2.1 (0.8 to 5.0) Up
Eicosapentaenoic acid
and/or retinoic acid
Eicosanoids and/or
retinoids
0.03 0.61 3.2 (1.3 to 7.7) Up
Isovaleric acid and/or
Valeric acid
Fatty acids 0.007 0.68 3.8 (1.7 to 8.6) Up
Oleic acid Fatty acids 0.007 0.68 3.1 (1.4 to 6.7) Up 0.276 0.63 2.0 (0.8 to 4.8) Up
Linoleic acid Fatty acids 0.01 0.66 3.5 (1.6 to 7.9) Up 0.441 0.60 2.3 (0.8 to 6.5) Up
Docosahexaenoic acid
and/or docosatriynoic
acid
Fatty acids 0.01 0.66 5.6 (1.9 to 16.3) Up 0.204 0.65 2.8 (1.0 to 8.0) Up
Hydroxy-octadecanoic
acid and/or
oxo-octadecanoic acid
Fatty acids 0.01 0.66 3.5 (1.4 to 8.4) Up 0.498 0.58 2.0 (0.6 to 6.6) Up
Hexadecanoic acid Fatty acids 0.02 0.67 7.5 (2.1 to 27.3) Up 0.317 0.62 2.0 (0.8 to 5.2) Up
Eicosatetraenoic acid Fatty acids 0.02 0.67 3.1 (1.4 to 7.1) Up 0.244 0.63 4.1 (1.0 to 16.3) Up
Octadecanoic acid Fatty acids 0.02 0.67 3.0 (1.4 to 6.5) Up 0.133 0.64 2.1 (0.8 to 5.3) Up
Docosahexaenoic acid Fatty acids 0.02 0.67 2.6 (1.2 to 5.9) Up
γ-Butyrolactone and/or
oxolan-3-one
Fatty acids and/or
ketones
0.0004 0.72 4.3 (1.8 to 10.0) Up 0.513 0.60 1.6 (0.6 to 4.1) Up
2-Oxovaleric acid and/or
oxo-methylbutanoic acid
Fatty acids or keto
acids
0.03 0.66 2.6 (1.2 to 5.4) Up 0.010 0.72 4.7 (1.8 to 12.3) Up
3-hydroxybutanoic acid
and/or 2-hydroxybutanoic
acid
Keto or hydroxy FA 0.002 0.71 5.1 (1.9 to 13.8) Up 0.459 0.61 1.8 (0.7 to 4.7) Up
Oxo-tetradecanoic acid
and/or
hydroxytetradecenoic
acid*
Keto or hydroxy FA 0.006 0.72 3.6 (1.5 to 8.8) Up
Acetoacetic acid Keto or hydroxy FA 0.01 0.67 2.9 (1.3 to 6.4) Up 0.069 0.70 4.2 (1.6 to 11.1) Up
Oxoheptanoic acid Keto or hydroxy FA 0.02 0.66 2.4 (1.1 to 5.3) Up
Di-(heptadecadienoyl)-
eicosanoyl-sn-glycerol*
Lipids 0.002 0.66 3.5 (1.5 to 8.0) Down 0.170 0.65 2.78 (1.2 to 6.9) Down
Hexadecenoyl-
eicosatetraenoyl-sn-glycerol*
Lipids 0.01 0.69 3.0 (1.4 to 6.9) Up 0.035 0.69 2.8 (1.1 to 6.9) Up
Di-(octadecadienoyl)-snglycerol* Lipids 0.05 0.65 2.2 (1.0 to 4.5) Up 0.007 0.73 5.6 (2.1 to 14.6) Up
Octadecenoyl-
hexadecanoyl-sn-glycero-
3-phosphoserine*
Phosphatidylserines 0.01 0.64 3.6 (1.4 to 9.0) Down 0.883 0.58 1.7 (0.7 to 4.1) Down
Octadecenoyl-sn-glycero-
3-phosphoserine*
Phosphatidylserines 0.02 0.65 2.8 (1.2 to 6.1) Up 0.494 0.61 1.9 (0.8 to 4.6) Up
Dioctanoyl-sn-glycero-3-
phosphocholine*
Phospholipids 0.01 0.67 3.0 (1.4 to 6.3) Up 0.605 0.60 2.5 (0.9 to 7.2) Up
Sphingosine 1-phosphate Phospholipids 0.01 0.68 3.3 (1.5 to 7.2) Up 0.037 0.69 4.2 (1.6 to 11.1) Up
Sphinganine 1-phosphate Phospholipids 0.03 0.66 2.6 (1.3 to 5.6) Up 0.939 0.59 1.8 (0.7 to 4.5) Up
Bilirubin Porphyrins 0.006 0.68 3.2 (1.5 to 6.9) Up
Biliverdin Porphyrins 0.01 0.67 3.1 (1.4 to 6.8) Up
Heme Porphyrins 0.02 0.63 2.9 (1.3 to 6.8) Up
Vitamin D3 derivatives Steroids or steroid
derivatives
0.002 0.69 6.2 (2.3 to 16.4) Up 0.153 0.63 2.8 (1.0 to 7.4) Up
Steroid and/or
etiocholan-3-α-o17-one
3-glucuronide*
Steroids or steroid
derivatives
0.01 0.68 2.5 (1.2 to 5.2) Up 0.979 0.58 1.4 (0.6 to 3.5) Up
*

Metabolite identification included other similar metabolites of the same class.

A PLS-DA was performed using only the 45 named metabolites (1 latent factor). This produced a predictive model with R2 of 0.58, Q2 of 0.57, and AUC of 0.96 (Figure S1, available in the online Data Supplement at http://hyper.ahajournals.org). This proved to be only a slight reduction of diagnostic performance when compared with the full 457-peak model.

Validation Phase

The maternal characteristics and pregnancy outcome in the women with PE and controls are shown in Table 1. Of the 45 significant metabolites named in the discovery study, 34 were also detected in the validation study. All of these metabolites showed similar changes in peak response (29 were raised in patients who went on to develop PE; 5 were lowered). A PLS-DA model using the 34 metabolites (1 latent factor) proved to be predictive, with R2 of 0.57, Q2 of 0.53, and AUC of 0.95 (Figure S2).

Metabolite Signature of PE

Finally, data from both studies were mined using a genetic algorithm-based search program to find the subset of named metabolites that produced the most robust predictive general model. The Genetic Algorithm chose 14 metabolites (Table 2). Figure 2 shows the PLS-DA model predictions using these metabolites for both the discovery study and the validation study. For the discovery data, the 14-metabolite model had an R2 of 0.54, Q2 of 0.52, an AUC of 0.94, and an optimal odds ratio of 36 (95% CI: 12 to 108). For the validation data, the 14-metabolite model had an R2 of 0.43, Q2 of 0.39, an AUC of 0.92, and an optimal odds ratio of 23 (95% CI: 7 to 73). Permutation testing showed that the probability of both of these models randomly occurring was <0.001 (Figure S3). The combined effect of the 14 metabolites was also tested using the Hotelling T2 statistic. For the discovery study data, this produced a P value of 2×10−6, and for the validation study data, a P value of 0.006. The P values were obviously affected by the differing sample sizes (discovery n=120; validation n=79).

Figure 2. The PLS-DA model predictions for the final 14-metabolite signature found by the genetic algorithm search program (C indicates controls, blue circles; PE, preeclampsia, yellow squares).

Figure 2

a, Model predictions for the discovery phase data; R2=0.54, Q2=0.52, an AUC of 0.94, an optimal odds ratio of 36 (95% CI: 12 to 108), and Hotelling T2 P=2×10−6. b, Model predictions for the validation data; R2=0.43, Q2=0.39, an AUC of 0.92, an optimal odds ratio of 23 (95% CI: 7 to 73), and Hotelling T2 P=2×10−3.

Discussion

PE is a complex syndrome with multiple biological pathways contributing to its etiology. We have, therefore, taken a holistic and data-driven systems biology approach to identify a metabolic signature in plasma that is predictive of subsequent PE.35

We identified 40 organic molecules to be significantly elevated and 5 that were reduced in plasma at 14 to 16 weeks’ gestation from healthy nulliparous women who later developed PE, as compared with matched controls composed of women who had uneventful pregnancies. During the discovery phase, we showed that there is clear multifactorial disruption of plasma because of onset of PE (Figure 1). The 45 identified molecules, whose molecular weights ranged between 60.06 and 883.42, were sufficiently well characterized to enable their allocation into 5 broad functional categories, as detailed in Table 2. A thorough discussion of the biological significance of this metabolic fingerprint is outside the scope of this article. However, we note that there appears to be a significant overlap of scope of markers with what is already well known about the pathogenesis of this disease.

Using robust data mining and modeling techniques, and using an independent validation cohort, we have shown that a combination of 14 metabolites representing the latent systems-wide interaction in the metabolome is sufficient to produce a robust predictive model with AUC of >0.9 (Figure 2). For both the discovery and validation studies, each individual metabolite in this panel is not highly significant; however, when these metabolites are combined into a single multifactorial model, the power of such data-driven technology proves its worth.

From the 14 metabolite ROC curves (Figure 2) we can also determine potential screening performance. At a 10% false-positive rate, the estimated respective detection rates of subsequent PE for the discovery data and validation data are 77% and 73%. Conversely, for a detection rate of 90%, it is estimated that the false-positive rate would be 21% and 24%. The predictive power of the 14-metabolite rule compares highly favorably with that of other proposed first trimester screening tests, including those based on first trimester levels of placental hormones, such as placental protein 13 and pregnancy-associated plasma protein A. In a longitudinal study by Akolekar et al,36 the comparative AUCs for placental protein 13 and pregnancy-associated plasma protein A alone are 0.818 and 0.872, respectively. For both placental protein 13 and pregnancy-associated plasma protein A the AUC is 0.878. The comparative values for our 14-metabolite rule in the discovery and validation sets are 0.94 and 0.92, respectively. Similarly, our 14-metabolite rule compares favorably with the predictive power of early pregnancy maternal levels of angiogenic factors. In a longitudinal study by Kusanovic et al,37 the AUCs for placental growth factor alone and for the ratio of placental growth factor:soluble endoglin are 0.647 and 0.662, respectively. Poon et al38 have generated first-trimester predictive models combining pregnancy-associated plasma protein A and placental growth factor together with a combination of maternal characteristics. For early onset PE, their model shows excellent (if not yet validated) predictive power that, given a 5% false-positive rate, produces a detection rate of 93%. However, for late-onset PE, the equivalent detection rate is only 36%. Based on the same false-positive rate assumptions, our metabolite model (early and late PE combined) produces detection rates of 71% (discovery) and 68% (validation). It is expected that the detection rates of our model will increase significantly when combined with maternal characteristics. One potential limitation of this study is the lack of ethnic variation in the validation cohort. However, ongoing work in a larger cohort containing women from different ethnic groups will further validate the model presented here.

Perspectives

The present study is one of the most detailed metabolic screens performed in any human disease to date. The finding of discriminatory metabolites in early pregnancy plasma preceding PE offers insight into disease pathogenesis and the potential for early prediction. Most importantly, ongoing metabolomics work with a larger prospective cohort of healthy nulliparous women offers the prospect of combining demographic details and clinical data with metabolite measurements. These additional data will potentially improve the sensitivity and specificity of the final algorithm for the prediction of PE as early as 15 weeks’ gestation and also provide further validation of the work presented here. A predictive rule at 15 weeks’ gestation will have a significant impact on clinical care, allowing scarce resources to be concentrated on those at greatest risk. As an early indicator of PE, such a test will also present a platform for developing therapeutic interventions that could minimize the likelihood of serious complications later in pregnancy, significantly reducing morbidity and mortality rates.

Supplementary Material

Supplementary file

Sources of Funding

SCOPE is funded by the New Enterprise Research Fund, Foundation for Research Science and Technology; Health Research Council; and Evelyn Bond Fund, Auckland District Health Board Charitable Trust (New Zealand); Premier’s Science and Research Fund, South Australian Government (Australia); and Health Research Board (Ireland). L.C.K. is a Science Foundation Ireland Principal Investigator (08/IN.1/B2083) and a Health Research Board Ireland Clinician Scientist (CSA/2007/2). The metabolomic discovery programme is funded by the Wellcome Trust and by Science Foundation Ireland.

Footnotes

Disclosures

None.

Contributor Information

Louise C. Kenny, Anu Research Centre, Department of Obstetrics and Gynaecology, University College Cork, Cork University Maternity Hospital, Cork, Ireland

David I. Broadhurst, Anu Research Centre, Department of Obstetrics and Gynaecology, University College Cork, Cork University Maternity Hospital, Cork, Ireland; School of Chemistry, Manchester Interdisciplinary Biocentre, University of Manchester, Manchester, United Kingdom

Warwick Dunn, School of Chemistry and Manchester Centre for Integrative Systems Biology, Manchester Interdisciplinary Biocentre, University of Manchester, Manchester, United Kingdom.

Marie Brown, School of Chemistry, Manchester Interdisciplinary Biocentre, University of Manchester, Manchester, United Kingdom.

Robyn A. North, Division of Reproduction and Endocrinology, St Thomas Hospital, King’s College London, London, United Kingdom

Lesley McCowan, Department of Obstetrics and Gynaecology, Faculty of Medicine and Health Sciences, University of Auckland, Auckland, New Zealand.

Claire Roberts, Research Centre for Reproductive Health, Robinson Institute, School of Paediatrics and Reproductive Health, University of Adelaide, Adelaide, Australia.

Garth J.S. Cooper, Department of Obstetrics and Gynaecology, School of Biological Sciences, University of Auckland, Auckland, New Zealand

Douglas B. Kell, School of Chemistry, Manchester Interdisciplinary Biocentre, University of Manchester, Manchester, United Kingdom

Philip N. Baker, Department of Obstetrics and Gynecology, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alberta, Canada

References

  • 1.Sibai B, Dekker G, Kupferminc M. Pre-eclampsia. Lancet. 2005;365:785–799. doi: 10.1016/S0140-6736(05)17987-2. [DOI] [PubMed] [Google Scholar]
  • 2.Bellamy L, Casas JP, Hingorani AD, Williams DJ. Pre-eclampsia and risk of cardiovascular disease and cancer in later life: systematic review and meta-analysis. BMJ. 2007;335:974. doi: 10.1136/bmj.39335.385301.BE. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Levine RJ, Maynard SE, Qian C, Lim KH, England LJ, Yu KF, Schisterman EF, Thadhani R, Sachs BP, Epstein FH, Sibai BM, et al. Circulating angiogenic factors and the risk of preeclampsia. N Engl J Med. 2004;350:672–683. doi: 10.1056/NEJMoa031884. [DOI] [PubMed] [Google Scholar]
  • 4.Redman CW, Sargent IL. Latest advances in understanding preeclampsia. Science. 2005;308:1592–1594. doi: 10.1126/science.1111726. [DOI] [PubMed] [Google Scholar]
  • 5.Meads CA, Cnossen JS, Meher S, Juarez-Garcia A, ter Riet G, Duley L, Roberts TE, Mol BW, van der Post JA, Leeflang MM, Barton PM, et al. Methods of prediction and prevention of pre-eclampsia: Systematic reviews of accuracy and effectiveness literature with economic modelling. Health Technol Assess. 2008;12(iii–iv):1–270. doi: 10.3310/hta12060. [DOI] [PubMed] [Google Scholar]
  • 6.Dugoff L, Hobbins JC, Malone FD, Vidaver J, Sullivan L, Canick JA, Lambert-Messerlian GM, Porter TF, Luthy DA, Comstock CH, Saade G, et al. Quad screen as a predictor of adverse pregnancy outcome. Obstet Gynecol. 2005;106:260–267. doi: 10.1097/01.AOG.0000172419.37410.eb. [DOI] [PubMed] [Google Scholar]
  • 7.Enquobahrie DA, Williams MA, Butler CL, Frederick IO, Miller RS, Luthy DA. Maternal plasma lipid concentrations in early pregnancy and risk of preeclampsia. Am J Hypertens. 2004;17:574–581. doi: 10.1016/j.amjhyper.2004.03.666. [DOI] [PubMed] [Google Scholar]
  • 8.Levine RJ, Lam C, Qian C, Yu KF, Maynard SE, Sachs BP, Sibai BM, Epstein FH, Romero R, Thadhani R, Karumanchi SA. Soluble endoglin and other circulating antiangiogenic factors in preeclampsia. N Engl J Med. 2006;355:992–1005. doi: 10.1056/NEJMoa055352. [DOI] [PubMed] [Google Scholar]
  • 9.Goodacre R, Kell DB. In: Metabolic Profiling: Its Role in Biomarker Discovery and Gene Function. Harrigan GG, Goodacre R, editors. Boston, MA: Kluwer Academic Publishers; 2003. Evolutionary computation for the interpretation of metabolome data; pp. 239–256. [Google Scholar]
  • 10.Sreekumar E, Issac A, Nair S, Hariharan R, Janki MB, Arathy DS, Regu R, Mathew T, Anoop M, Niyas KP, Pillai MR. Genetic characterization of 2006-2008 isolates of chikungunya virus from kerala, south india, by whole genome sequence analysis. Virus Genes. 2010;40:14–27. doi: 10.1007/s11262-009-0411-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Oresic M, Simell S, Sysi-Aho M, Nanto-Salonen K, Seppanen-Laakso T, Parikka V, Katajamaa M, Hekkala A, Mattila I, Keskinen P, Yetukuri L, et al. Dysregulation of lipid and amino acid metabolism precedes islet autoimmunity in children who later progress to type 1 diabetes. J Exp Med. 2008;205:2975–2984. doi: 10.1084/jem.20081800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Dunn WB, Broadhurst D, Brown M, Baker PN, Redman CW, Kenny LC, Kell DB. Metabolic profiling of serum using ultra performance liquid chromatography and the ltq-orbitrap mass spectrometry system. J Chromatogr B Analyt Technol Biomed Life Sci. 2008;871:288–298. doi: 10.1016/j.jchromb.2008.03.021. [DOI] [PubMed] [Google Scholar]
  • 13.Kenny L, Dunn W, Ellis D, Myers J, Baker P, Consortium G, Kell D. Novel biomarkers for pre-eclampsia detected using metabolomics and machine learning. Metabolomics. 2005;1:227–234. [Google Scholar]
  • 14.Kenny LC, Broadhurst D, Brown M, Dunn WB, Redman CW, Kell DB, Baker PN. Detection and identification of novel metabolomic biomarkers in preeclampsia. Reprod Sci. 2008;15:591–597. doi: 10.1177/1933719108316908. [DOI] [PubMed] [Google Scholar]
  • 15.Broadhurst DI, Kell DB. Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics. 2006;2:171–196. [Google Scholar]
  • 16.Zelena E, Dunn WB, Broadhurst D, Francis-McIntyre S, Carroll KM, Begley P, O’Hagan S, Knowles JD, Halsall A, Wilson ID, Kell DB. Development of a robust and repeatable uplc-ms method for the long-term metabolomic study of human serum. Anal Chem. 2009;81:1357–1364. doi: 10.1021/ac8019366. [DOI] [PubMed] [Google Scholar]
  • 17.Brown MA, Hague WM, Higgins J, Lowe S, McCowan L, Oats J, Peek MJ, Rowan JA, Walters BN. The detection, investigation and management of hypertension in pregnancy: Full consensus statement. Aust NZ J Obstet Gynaecol. 2000;40:139–155. doi: 10.1111/j.1479-828x.2000.tb01137.x. [DOI] [PubMed] [Google Scholar]
  • 18.Smith CA, Want EJ, O’Maille G, Abagyan R, Siuzdak G. Xcms: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem. 2006;78:779–787. doi: 10.1021/ac051437y. [DOI] [PubMed] [Google Scholar]
  • 19.Eriksson L, Johansson E, Kettaneh-Wold N, Wold S. Multi-and Megavariate Data Analysis: Principles and Applications. Umeå, Sweden: Umetrics Academy; 2001. [Google Scholar]
  • 20.Wold H. In: Perspectives in Probability and Statistics, Papers in Honour of M S Bartlett. Gani J, editor. London, United Kingdom: Academic Press; 1975. Soft modelling by latent variables: the non-linear iterative partial least squares (nipals) approach; pp. 117–142. [Google Scholar]
  • 21.Wold S, Trygg J, Berglund A, Antti H. Some recent developments in pls modeling. Chemometr Intell Lab Syst. 2001;58:131–150. [Google Scholar]
  • 22.Westerhuis JA, Hoefsloot HCJ, Smit S, Vis DJ, Smilde AK, van Velzen EJJ, van Duijnhoven JPM, van Dorsten FA. Assessment of plsda cross validation. Metabolomics. 2008;4:81–89. [Google Scholar]
  • 23.Westerhuis JA, de Jong S, Smilde AK. Direct orthogonal signal correction. Chemometr Intell Lab Syst. 2001;56:13–25. [Google Scholar]
  • 24.van den Berg RA, Hoefsloot HC, Westerhuis JA, Smilde AK, van der Werf MJ. Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genomics. 2006;7:142. doi: 10.1186/1471-2164-7-142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Brown M, Dunn WB, Dobson P, Patel Y, Winder CL, Francis-McIntyre S, Begley P, Carroll K, Broadhurst D, Tseng A, Swainston N, et al. Mass spectrometry tools and metabolite-specific databases for molecular identification in metabolomics. Analyst. 2009;134:1322–1332. doi: 10.1039/b901179j. [DOI] [PubMed] [Google Scholar]
  • 26.Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3:32–35. doi: 10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
  • 27.Perkins NJ, Schisterman EF. The inconsistency of “optimal” cutpoints obtained using two criteria based on the receiver operating characteristic curve. Am J Epidemiol. 2006;163:670–675. doi: 10.1093/aje/kwj063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Broadhurst D, Goodacre R, Jones A, Rowland JJ, Kell DB. Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry. Analytica Chimica Acta. 1997;348:71–86. [Google Scholar]
  • 29.Cavill R, Keun HC, Holmes E, Lindon JC, Nicholson JK, Ebbels TM. Genetic algorithms for simultaneous variable and sample selection in metabonomics. Bioinformatics. 2009;25:112–118. doi: 10.1093/bioinformatics/btn586. [DOI] [PubMed] [Google Scholar]
  • 30.Allen J, Davey HM, Broadhurst D, Heald JK, Rowland JJ, Oliver SG, Kell DB. High-throughput classification of yeast mutants for functional genomics using metabolic footprinting. Nat Biotechnol. 2003;21:692–696. doi: 10.1038/nbt823. [DOI] [PubMed] [Google Scholar]
  • 31.Jarvis RM, Goodacre R. Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data. Bioinformatics. 2005;21:860–868. doi: 10.1093/bioinformatics/bti102. [DOI] [PubMed] [Google Scholar]
  • 32.Kell DB. Metabolomics and machine learning: explanatory analysis of complex metabolome data using genetic programming to produce simple, robust rules. Mol Biol Rep. 2002;29:237–241. doi: 10.1023/a:1020342216314. [DOI] [PubMed] [Google Scholar]
  • 33.Krzanowski WJ. Principles of Multivariate Analysis: A User’s Perspective. Oxford University Press; Oxford, United Kingdom: 1988. [Google Scholar]
  • 34.Speed T. Statistical Analysis of Gene Expression Microarray Data. Chapman and Hall/CRC; New York, NY: 2003. [Google Scholar]
  • 35.Kell DB, Oliver SG. Here is the evidence, now what is the hypothesis? The complementary roles of inductive and hypothesis-driven science in the post-genomic era. Bioessays. 2004;26:99–105. doi: 10.1002/bies.10385. [DOI] [PubMed] [Google Scholar]
  • 36.Akolekar R, Syngelaki A, Beta J, Kocylowski R, Nicolaides KH. Maternal serum placental protein 13 at 11-13 weeks of gestation in preeclampsia. Prenat Diagn. 2009;29:1103–1108. doi: 10.1002/pd.2375. [DOI] [PubMed] [Google Scholar]
  • 37.Kusanovic JP, Romero R, Chaiworapongsa T, Erez O, Mittal P, Vaisbuch E, Mazaki-Tovi S, Gotsch F, Edwin SS, Gomez R, Yeo L, et al. A prospective cohort study of the value of maternal plasma concentrations of angiogenic and anti-angiogenic factors in early pregnancy and midtrimester in the identification of patients destined to develop preeclampsia. J Matern Fetal Neonatal Med. 2009;22:1021–1038. doi: 10.3109/14767050902994754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Poon LC, Kametas NA, Maiz N, Akolekar R, Nicolaides KH. First-trimester prediction of hypertensive disorders in pregnancy. Hypertension. 2009;53:812–818. doi: 10.1161/HYPERTENSIONAHA.108.127977. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary file

RESOURCES