Skip to main content
Wellcome Open Research logoLink to Wellcome Open Research
. 2021 Sep 10;5:264. Originally published 2020 Nov 5. [Version 2] doi: 10.12688/wellcomeopenres.16341.2

Metabolomics datasets in the Born in Bradford cohort

Kurt Taylor 1,2,a,#, Nancy McBride 1,2,3,b,#, Neil J Goulding 1,2, Kimberley Burrows 1,2, Dan Mason 4, Lucy Pembrey 5, Tiffany Yang 4, Rafaq Azad 6, John Wright 4,7, Deborah A Lawlor 1,2,3
PMCID: PMC11109709  PMID: 38778888

Version Changes

Revised. Amendments from Version 1

We have made several amendments in light of the reviewer comments. These were mostly minor amendments to improve the clarity of the manuscript for the reader. We have also added details about a second version of the BiB Mass Spectrometry Metabolomics data under section "issues for data users".

Abstract

Metabolomics is the quantification of small molecules, commonly known as metabolites. Collectively, these metabolites and their interactions within a biological system are known as the metabolome. The metabolome is a unique area of study, capturing influences from both genotype and environment. The availability of high-throughput technologies for quantifying large numbers of metabolites, as well as lipids and lipoprotein particles, has enabled detailed investigation of human metabolism in large-scale epidemiological studies. The Born in Bradford (BiB) cohort includes 12,453 women who experienced 13,776 pregnancies recruited between 2007-2011, their partners and their offspring. In this data note, we describe the metabolomic data available in BiB, profiled during pregnancy, in cord blood and during early life in the offspring. These include two platforms of metabolomic profiling: nuclear magnetic resonance and mass spectrometry. The maternal measures, taken at 26-28 weeks’ gestation, can provide insight into the metabolome during pregnancy and how it relates to maternal and offspring health. The offspring cord blood measurements provide information on the fetal metabolome. These measures, alongside maternal pregnancy measures, can be used to explore how they may influence outcomes. The infant measures (taken around ages 12 and 24 months) provide a snapshot of the early life metabolome during a key phase of nutrition, environmental exposures, growth, and development. These metabolomic data can be examined alongside the BiB cohorts’ extensive phenotype data from questionnaires, medical, educational and social record linkage, and other ‘omics data.

Keywords: Metabolomics, mass spectrometry, nuclear magnetic resonance, pregnancy, mother, offspring health

Introduction

Metabolomics is the quantification of small molecules resulting from metabolic processes. The metabolome is influenced by both genotype and environment, and dynamically responds to environmental influences. Developments in high-throughput technologies have allowed the efficient and accurate quantification of metabolites. This has revolutionised our ability to understand the causes and consequences of variation in human metabolism, and the contribution that multiple metabolites can make to risk prediction, using large-scale epidemiological studies 14 . Lipids and lipoproteins, which are measured in most high-throughput platforms used in epidemiology, are larger than the threshold used to define metabolites (<1.5k Daltons) and are therefore metabolomic traits. For simplicity in this paper we refer to these as metabolites.

Birth cohorts can be useful for exploring prenatal influences on birth and later life outcomes. Recently, studies have shown metabolomic profiling can aid us in our understanding of maternal health during pregnancy 35 and of the influence of in utero exposures on subsequent offspring health 6, 7 . The Born in Bradford (BiB) study is a UK longitudinal birth cohort 8 . Nuclear magnetic resonance (NMR) and mass spectrometry (MS) data are available in BiB including measurements during pregnancy, cord blood and early life in the offspring. MS offers a truly untargeted approach with comprehensive coverage of the metabolome (>1,000 metabolites) due to its high sensitivity. However, MS only provides relative quantification based on peak area in these approaches without comparison to a metabolite reference standard. NMR offers less coverage of the metabolome, but with absolute quantification possible in clinically meaningful units (e.g. mmol/L).

The range of metabolomics data in BiB, coupled with the substantial data obtained through questionnaires, research clinic assessments, linkage to medical records, educational and social records, genome wide (mothers, offspring and a subgroup of fathers) and epigenome wide (mother and offspring) profiling makes BiB a valuable resource for metabolomics research. This data note describes the metabolomics data currently available in BiB - how these were obtained, quantified, utilised, as well as potential future uses, strengths and limitations. Figure 1 provides an illustrative summary of which type of metabolomic data have been collected on which cohort participants and when, up to 2020. Planned further metabolomic data collection is also described (see Using the BiB metabolomic data).

Figure 1. Summary illustration of the Born in Bradford metabolomics data.

Figure 1.

NMR, nuclear magnetic resonance; MS, mass spectrometry; EDTA, ethylenediaminetetraacetic acid.

Methods

Ethical approval and consent

Ethical approval for the study was granted by the Bradford National Health Service Research Ethics Committee (ref 06/Q1202/48), and all participants gave written informed consent. The ALL IN sub-study had ethical approval from the London School of Hygiene & Tropical Medicine ethics committee (ref: 5320) and the Bradford Research Ethics committee (ref: 08/H1302/21). Parents (usually the mother) gave informed, written consent to take part in the study.

Cohort

The BiB study is a population-based prospective birth cohort. In total, 12,453 women who experienced 13,776 pregnancies were recruited at their oral glucose tolerance test (OGTT) at approximately 26–28 weeks’ gestation, which was offered to all women booked for delivery at Bradford Royal Infirmary (BRI) (with the exception of those with pre-existing diabetes (N = 70 - 0.5% of BiB pregnancies)). Eligible women had an expected delivery between March 2007 and December 2010. The study is unique because it includes high proportions of White European and South Asian families, all residing in Bradford, UK. Bradford is a city in the North of England with high levels of socioeconomic deprivation, and the cohort was started due to a high prevalence of poor child health in the city 8 . Full details of the study methodology were reported previously 8 . The study website provides more information, including protocols, questionnaires and information on how researchers can access data and a full list of all available data. Mothers and their partners, who were recruited into the study, provided detailed interview questionnaire data, measurements, and biological samples. They also consented to the linkage of their and their child’s data to routine (primary and secondary care) health and education data.

Blood sampling

Maternal overnight-fasted blood was taken during the OGTT and processed and stored at -80°C for further research and analyses. Infant cord blood samples were taken whenever possible (i.e. so long as staff were available, and collection of an umbilical vein sample did not interfere with care of the mother or infant) and immediately processed and stored at -80°C. Samples were taken in a subgroup of offspring in early childhood for a specific project on childhood viral infection 9 . We describe the processes of taking, processing, and storing samples at each time point before moving on to describe the NMR and MS metabolomic profiling.

Pregnancy blood samples. Of the 13,776 pregnancies in the BiB cohort, 11,480 had a fasting blood sample taken during the OGTT (n = 10,574 [92%] between 26–28 weeks’ gestation, with the remaining women being within 11–39 weeks’ gestation). Samples were taken by trained phlebotomists working in the antenatal clinic of the BRI and sent immediately to the hospital laboratory.

Venous blood was collected in GEL tubes to obtain serum and plasma. The following processing steps were undertaken prior to storage at -80°C.

  • 1)

    Storage racks were prepared.

  • 2)

    Participant details were checked, making sure that both the BiB study ID and hospital number on the specimen bottles matched those on the participant tracking forms.

  • 3)

    Tubes were centrifuged at 3500 rpm for 10 minutes at room temperature.

  • 4)

    A 1 ml automatic pipette was used to aliquot samples into 1.5 ml aliquots (1–4 aliquots dependent on sample volume).

  • 5)

    Vials were labelled with appropriate BiB study labels and the duplicate barcode label was placed in the corresponding space marked on BiB tracking form.

  • 6)

    Aliquots were then placed in racks in a -80°C freezer.

All samples were processed within 2.5 hours and then placed in -80°C freezers. There were no freeze-thaw events of the samples prior to their use for the pregnancy metabolomic profiling. Serum samples were used for NMR metabolomic profiling, except for five (0.04%) samples which were plasma. For MS pregnancy metabolomics, ethylenediaminetetraacetic acid (EDTA) (a sample tube anticoagulant) plasma samples were used. Previous work has shown that reproducibility in both serum and plasma is good. As long as the same blood sample procedures are used (as in BiB), either matrix should yield similar results 10 .

Cord blood samples. Venous cord blood samples were all obtained at delivery by the attending midwife at the BRI, following research protocols. Cord blood sampling was not attempted for women delivering outside of the BRI, if the attending midwife was too busy, or if attempting to collect a research cord blood sample would interfere with postnatal care. Samples were refrigerated at 4°C in EDTA tubes until collected by BRI laboratory staff within 12 hours. Samples were then spun, frozen and stored at -80°C. In total, the BiB study collected 9,604 cord blood EDTA plasma samples. There were no freeze-thaw events of the cord blood samples.

Infant blood samples. Infant metabolomics were performed on blood samples that were collected on a subsample of the BiB cohort; those enrolled into the Allergy and Infection Study (ALL IN) 9 . Children enrolled in the BiB cohort, and born on or after 1 March 2008 with a maternal baseline questionnaire were eligible to take part in ALL IN. Mothers were invited to participate in ALL IN one month before their child’s first birthday. A questionnaire was completed by those who consented, and a 5ml venous blood sample was taken from the child, centrifuged, and stored at -80°C. This was repeated one year later to provide questionnaire data and serum from a ~12-month visit (mean age of 14 months, ranging from 9–18 months) and a ~24-month visit (mean age of 26 months, ranging from 23–33 months). Trained community research administrators (CRAs) recruited participants, obtained consent, and collected data, including blood samples, at each visit. They received training in phlebotomy and were assessed by the senior paediatric phlebotomist at the BRI. Ametop cream or Cryogesic spray were used to anaesthetise the venepuncture site. Only two attempts at venepuncture were permitted for each child. There was a fridge in the clinic for storing bloods before transfer to the lab. The blood samples taken on home visits were kept in a cool bag with an ice pack and then taken straight to the laboratory at BRI within 1–2 hours. The times of each step (blood taken, arrived at lab, centrifuged, aliquoted, frozen) were recorded on the blood form and were entered onto a database (so that researchers can check distribution of times if needed). For home or clinic visits outside normal working hours, the CRA who took the blood sample would centrifuge the blood at the lab and leave it in the lab fridge for processing the next day. All infant metabolomics were performed on serum samples. There was a maximum of two freeze-thaw events prior to metabolomics analyses of the infant samples.

Metabolomic datasets in BiB

There are six metabolomics datasets including different populations and timepoints available in BiB. These are described below and summarized in Table 1 and Figure 1. We have divided the methods between the two main platforms (NMR and MS). We describe the methods used to generate each dataset and use flow charts to illustrate how selection was performed.

Table 1. Metabolomics datasets in the BiB cohort separated by platform.

# Data source Brief description
Nuclear magnetic resonance
1 Pregnancy NMR – Dataset 1 N = 11,480 pregnancies. Single timepoint using maternal serum taken from a fasted blood sample
around 26–28 weeks’ gestation. Of the 11,480, 37% are White British (40% White European) mothers
and 44% Pakistani (49% South Asian).
2 Cord blood NMR – Dataset 2 N = 7,980 children. Single timepoint using cord blood, EDTA plasma.
3 Infants NMR (aged 12 or 24
months) – Dataset 3
N = 2,108 at either 12- or 24-months using serum samples.
N = 1,690 at 12 months.
N = 1,536 at 24 months.
N = 1,118 at both timepoints.
Mass Spectrometry
4 Pregnancy MS – Dataset 1a N = 1,000 pregnancies. Single timepoint using EDTA plasma taken from a fasted blood sample
between 26–28 weeks’ gestation. Of the 1,000, 50% are White British and the other 50% are Pakistani
ethnicity.
5 Cord blood MS – Dataset 1b N = 1,000 children (paired with women from Dataset 1a). Single timepoint using cord blood, EDTA
plasma.
6 Pregnancy MS – Dataset 2 N = 2,000 pregnancies within a case-cohort design. EDTA plasma sample taken between 26–28 weeks’
gestation. Of the 2,000 women, 47% are White British and 53% are Pakistani.

NMR, nuclear magnetic resonance; MS, mass spectrometry; EDTA, ethylenediaminetetraacetic acid.

NMR metabolomics

NMR methods. We describe the NMR methods which apply to all the NMR datasets described in Table 1. Profiling of circulating lipids, fatty acids, and metabolites was done by a high-throughput targeted NMR platform (Nightingale Health© (Helsinki, Finland)) at the University of Bristol, providing quantitative information on 227 metabolic traits (including ratios and other traits derived from the quantified NMR spectra) 1 . Details of all 227 traits can be found in the Extended data 11 .

The Nightingale NMR metabolite quantification was achieved through measurements of three molecular windows from each serum/plasma sample. Two of the spectra (LIPO and LMWM windows) are acquired from native serum/plasma and one spectrum from serum lipid/plasma extracts (LIPID window). The NMR spectra are measured using Bruker AVANCE III spectrometer operating at 600 MHz. Measurements of native serum/plasma samples and serum/plasma lipid extracts are conducted at 37°C and 22°C, respectively.

The NMR spectra were analysed for metabolite quantification (molar concentrations) in an automated fashion. For each metabolite, a ridge regression model was applied for quantification to overcome the problems of heavily overlapping spectral data. In the case of the lipid data, quantification models were calibrated using high-performance liquid chromatography methods, and individually cross-validated against NMR-independent lipid data. Low-molecular-weight metabolites, as well as lipid extract measures, were quantified as mmol/L based on regression modelling calibrated against a set of manually fitted metabolite measures. The calibration data were quantified based on iterative line-shape fitting analysis using PERCH NMR software (PERCH Solutions Ltd., Kuopio, Finland). Quantification could not be directly established for the lipid extract measures due to experimental variation in the lipid extraction protocol. Therefore, serum/plasma lipid extract were scaled to total a standard serum cholesterol sample from the LIPO spectrum.

Validation of the NMR platform. Quality control (QC) of the data were undertaken by Nightingale Health© prior to returning metabolite concentrations to BiB. Their QC procedures check various issues related to the sample integrity and the biomarker quantification. QC reports for the NMR datasets can be found in the Extended data 11 .

We also undertook validation of some of the NMR measures by comparing concentrations of fasting glucose, total cholesterol, high-density lipoprotein cholesterol (HDLc), low-density lipoprotein cholesterol (LDLc), and triglycerides from the NMR platform to the same measures from the same samples assessed by clinical chemistry measurements ( Figure 2). Clinical chemistry measurements were completed at the BRI laboratory (fasting glucose) or Glasgow Royal Infirmary (lipids). Glucose was measured using a glucose oxidase method that does not cross-react with insulin. Total cholesterol, HDLc and triglycerides were measured following the standard Lipid Research Clinics Protocol using enzymatic reagents. LDLc was estimated from total cholesterol, HDLc and triglycerides (LDLc = [Total cholesterol in mmol/l] – [HDLc in mmol/l] – [Triglycerides in mmol/l ÷ 2.2]). The correlation between fasting glucose measured by clinical chemistry and by NMR was 0.73 and for all four lipids was between 0.85 and 0.93, with the intercepts of the regression line close to zero for HDLc, LDLc, and triglycerides, but higher for glucose (1.85) and total cholesterol (1.21). This suggests that the NMR platform systematically underestimates glucose and total cholesterol levels. However, the high levels of correlation, particularly for the lipid measures, is reassuring and suggests association analyses would have validity. It is evident from Figure 2 that there are outliers for some of the measures, notably for glucose, total cholesterol and triglycerides ( Figures 2A, 2B, 2E, respectively). We would recommend for researchers using the data to consider these potential outliers before commencing analyses. Determining how to deal with outliers will depend on the research question and the personal preference of the research groupers undertaking analyses. To further test the validity of the NMR measures, we compared associations of maternal early pregnancy body mass index (BMI), treated as an exposure, with fasting glucose, and the four lipid measures from clinical chemistry and NMR as the outcome. We also compared associations between the five metabolic measures (from clinical chemistry and NMR) as exposures, with hypertensive disorder of pregnancy (HDP; either gestational hypertension or pre-eclampsia, defined on the basis of international criteria applied to all measures of blood pressure and proteinuria extracted from clinical records) 12 as the outcome. Associations of BMI with the five outcomes were directionally consistent between clinical chemistry and NMR measurements. However, the NMR associations were weaker (closer to the null) and there were clear differences in magnitudes of association between the two methods for the associations of BMI with glucose and HDLc ( Figure 3A). By contrast, results were concordant between the two methods for the associations of metabolites with odds ratios of HDP ( Figure 3B). Given the relatively modest correlation of glucose from the Metabolon mass spec analyses with the clinical chemistry levels on the same samples, we explored this further comparing results from two regression analyses – one of the difference in mean glucose per 1SD higher BMI (glucose as outcome) and one of the odds ratio for HDP per 1SD higher glucose (glucose as exposure).

Figure 2.

Figure 2.

Comparison of glucose ( 2A), total cholesterol ( 2B), high-density lipoprotein cholesterol ( 2C), low-density lipoprotein cholesterol ( 2D) and triglycerides ( 2E) concentrations between Nightingale Health© nuclear magnetic resonance (NMR) (x-axis) and routine clinical chemistry assays (y-axis) (N= 11,036 to 11,337). R = Pearson correlation coefficient.

Figure 3.

Figure 3.

Comparisons of the associations of early pregnancy body mass index (BMI) with fasting glucose and lipids measured by routine clinical chemistry assays, Nightingale Health© nuclear magnetic resonance (NMR) and mass spectrometry (MS, glucose only) ( 3A), fasting glucose and lipids measured by routine clinical chemistry assays, Nightingale Health© NMR and MS (glucose only) with hypertensive disorder of pregnancy (HDP) ( 3B). Associations in 3A are from unadjusted linear regression and data points show standard deviation differences in mean metabolite per one standard deviation (1SD) higher BMI. Associations in 3B are from unadjusted logistic regression and data points show unadjusted odds ratios for HDP per 1SD higher in metabolic trait. Error bars = 95% confidence intervals.

Participant selection and characteristics of those with NMR data. In this subsection, we present flow charts to illustrate selection and inclusions into the NMR participant groups ( Figure 4) and describe participant characteristics for the BiB NMR datasets ( Table 2). All three of the samples of BiB participants with NMR data (maternal pregnancy N = 11,480, offspring cord blood N = 7,980, and offspring 12–24 months N = 2,108) had very similar distributions of maternal age, parity, early pregnancy BMI, residential area deprivation, offspring sex and birth weight to those seen in the whole cohort of 13,776 participants ( Table 2).

Figure 4. Illustrating the flow of participants into the NMR datasets in the Born in Bradford cohort.

Figure 4.

Figure 4A shows the maternal pregnancy (Dataset 1: NMR metabolomics at 26–28 weeks’ gestation) and offspring cord blood samples (Dataset 2: NMR metabolomics taken from the umbilical vein shortly after delivery). Figure 4B shows the offspring 12–24 months NMR metabolomic sample (Dataset 3). Abbreviations: NMR, nuclear magnetic resonance; BiB, Born in Bradford; ALL IN, Allergy and Infection study.

Table 2. Participant characteristics for NMR datasets in the BiB cohort.

Maternal
pregnancy NMR
dataset (n=11,480)
Offspring cord
blood NMR dataset
(n=7,890)
Offspring 12- or 24-
months NMR dataset
(n = 2,108)
BiB cohort
(n=13,776)
Characteristics Unit / Category  
Maternal Age Years
Missing
27.3 (5.6)
410 (3.6)
27.5 (5.6)
627 (7.9)
28.3 (5.7)
60 (2.9)
27.3 (5.6)
1445 (10.5)
Maternal Parity Nulliparous
Multiparous
Missing
4310 (37.5)
6428 (55.9)
742 (6.5)
2765 (36.6)
5125 (65.0)
344 (4.4)
819(39.9)
1,233(58.4)
56 (2.7)
5101 (37.0)
7773 (56.4)
902 (6.5)
Maternal BMI kg/m 2
Missing
26.1 (5.7)
2160 (18.8)
26.2 (5.7)
1464 (18.5)
26.2(5.5)
106 (5.0)
26.0 (5.7)
3281 (23.8)
Maternal ethnicity White British
Pakistani
Other
Missing
4268 (37.2)
4995 (43.5)
1887(16.4)
330 (2.4)
2902 (37.7)
3596 (46.7)
1206 (15.7)
186 (2.4)
769 (49.7)
1048 (49.7)
291 (13.8)
0
5055 (37.8)
6088 (45.5)
2223 (16.6)
410 (3.0)
Index of multiple
deprivation
Quintile 1 (most deprived)
Quintile 2
Quintile 3
Quintile 4
Quintile 5 (least deprived)
Missing
6646 (65.9)
1830 (18.2)
1124 (11.2)
306 (3.0)
173 (1.7)
1401 (12.2)
4439 (65.8)
1220 (18.1)
778 (11.5)
187 (2.8)
118 (1.8)
1148 (14.6)
1400(66.4)
355 (16.8)
248 (11.8)
70 (3.3)
34(1.6)
1 (0.0)
7566 (66.4)
2052 (18.0)
1250 (11.0)
334 (2.9)
190 (1.7)
2384 (17.3)
Offspring sex Male
Female
Missing
5705 (49.7)
5420 (48.7)
355 (3.1)
4095 (51.9)
3795 (48.1)
3 (0.0)
1065(50.2)
1029(48.1)
14 (0.7)
6891 (50.0)
6470 (48.4)
415 (3.0)
Birth weight Grams
Missing
3226 (565)
356 (3.1)
3266 (522)
4 (0.1)
3224 (558)
14 (0.7)
3216 (565)
416 (3.0)

Data are mean ± SD or n (%) unless stated. ^ gestational age in weeks presented for columns 1, 2 and 3. Offspring age in weeks presented for column 4.

Abbreviations: NMR, nuclear magnetic resonance; BiB, Born in Bradford; BMI, body mass index; kg, kilogram; IMD, Index of Multiple Deprivation (taken from 2010 national quintiles). There were 9 ethnic groups, of which White British and Pakistani were the main homogeneous groups. The 'Other' ethnicity category comprises: White Other, Mixed-White and Black, Mixed-White and South Asian, Black, Indian, Bangladeshi or Other ethnicity.

Mass spectrometry metabolomics

Mass spectrometry methods. The untargeted MS metabolomics analysis of over 1,000 metabolites was performed at Metabolon, Inc. (Durham, North Carolina, USA). Samples were sent to Metabolon in two separate batches. Dataset 1 was completed in December 2017 and consisted of 1,000 maternal pregnancy samples and 1,000 offspring paired cord blood samples. Dataset 2 was completed in December 2018 and consisted of 2,000 maternal pregnancy samples.

At Metabolon, samples were managed by a laboratory information management system and were kept at -80°C. Recovery standards were added to samples prior to monitor the extraction process. To remove proteins, dissociate small molecules bound to proteins, disassociate molecules trapped in the precipitated protein matrices, and to recover chemically diverse metabolites, proteins were precipitated with methanol under vigorous shaking for 2 min (Glen Mills GenoGrinder 2000) followed by centrifugation. The resulting extract was divided into five fractions: two for analysis by two separate reverse phase ultra-high-performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) methods with positive ion mode electrospray ionization (ESI), one for analysis by reverse phase UPLC-MS/MS with negative ion mode ESI, one for analysis by hydrophilic interactive liquid chromatography (HILIC)/UPLC-MS/MS with negative ion mode ESI, and one sample was reserved for backup. Samples were placed on a TurboVap® (Zymark) to remove the organic solvent. The sample extracts were stored overnight under nitrogen before preparation for analysis.

The instrument configuration, data acquisition, and metabolite identification and quantitation used by Metabolon have been described previously 13 . To summarise, the structure of metabolites were identified by matching the ion features (retention time, molecular weight (m/z), MS fragmentation pattern, preferred adducts, and in-source fragments) in the study samples to a reference library of chemical standard entries. The confidence of this metabolite identification met most stringent tier 1 criteria defined by Schrimpe-Rutledge et al. 14 . Peaks were quantified using area-under-the-curve of primary MS ions. To adjust for instrument batch effects for each run day, the raw ion counts for each metabolite were divided by the median value for the run day. Missing values were assumed to be the result of falling below the detection sensitivity, and thus were imputed with the minimum detection value based on each metabolite.

This process provides relative quantification (i.e. multiples of the median (MoM) for the days run) of >1,000 metabolites in 10 key classes: amino acids, carbohydrates, lipids, nucleotides, microbiota metabolism, carbon metabolism, energy, cofactors & vitamins, xenobiotics, and unidentified metabolites. A list of metabolites defined in each of the datasets can be found in the Extended data 11 .

Validation of the MS platform. Details of the Metabolon QC procedures and data quality for the Metabolon BiB datasets are described in the reports found in the Extended data 11 . In brief, procedures were conducted to: (i) assure that all aspects of the Metabolon process are operating within specifications, (ii) assess the effect of a non-plasma matrix on the Metabolon process and distinguish biological variability from process variability, (iii) assess the contribution to compound signals from the process (using Process Blank) and (iv) segregate contamination sources in the extraction (using Solvent Blank).

As an additional data QC, we explored correlations between MS and both NMR and clinical chemistry fasting glucose measures (glucose is the only common trait we have data on for MS, NMR, and clinical chemistry). Pearson’s correlation coefficient comparing MS to clinical chemistry (0.65) was modest and lower than that for NMR (0.73, see above and Figure 2) and the intercept was 0.11 ( Figure 5A). Correlation between Metabolon and NMR was higher (0.77) and the intercept was 0.10 ( Figure 5B).

Figure 5.

Figure 5.

Comparisons of glucose concentrations for Metabolon mass spectrometry (MS) with routine glucose oxidase ( 5A) and Nightingale Health© nuclear magnetic resonance (NMR) ( 5B). R = Pearson correlation coefficient.

Participant selection and characteristics of those with MS data. The flow of participants into the MS datasets are illustrated in Figure 6, and the characteristics of participants included in the two MS datasets, together with characteristics of the whole BiB cohort are provided in Table 3. Selection processes for both MS datasets mean that we would not expect distributions of characteristics in these to reflect the whole cohort. Only women of either Pakistani or White British ethnic background were included in the MS datasets because, due to cost, we were only able to do this on a subset of the cohort. As these two groups represent ~85% of BiB it was felt the numbers for any other group would be too small for meaningful analyses. In Dataset 1, 1,000 women were selected on the basis that they had stored fasting plasma, a useable cord blood sample, genome wide data on both mother and offspring and were either of White British or Pakistani origin ( Figure 6A). Following these inclusions, 500 women were selected at random from each ethnic group (White British and Pakistani). In Dataset 2, a case-cohort design was used 15, 16 . A case-cohort design consists of a cohort with an over-sampling of all cases. The BiB case-cohort consists of 2,000 women (only pregnancy samples were assayed in Dataset 2). As with Dataset 1, women were selected based on certain characteristics shown in Figure 6B, including that they had not already had Metabolon MS analyses. From those who fulfilled these pre-specified criteria, six groups of cases were selected: women with (a) gestational diabetes; (b) gestational hypertension; (c) pre-eclampsia; (d) preterm birth; (e) congenital anomaly; (f) stillbirth. In total, 801 women had experienced one or more of these conditions. Having selected all cases these were then replaced into the eligible cohort and a sub-cohort of 1,199 women were randomly selected from the eligible cohort. Thus, the comparison group in this case-cohort study is representative of the eligible cohort (i.e. the cohort comparison group includes some of the cases in proportions that would reflect the whole cohort). The final BiB case-cohort sample consists of three groups ( Figure 6B): 1) selected as comparison group (N = 1,199), 2) selected as cases only (N = 408), and 3) selected as a case and control (N = 393). The comparison group in any analyses will vary depending on the research question.

Figure 6. Illustrating the flow of participants into the Metabolon datasets in the Born in Bradford cohort.

Figure 6.

Figure 6A shows dataset 1 which includes 1,000 pregnancies and infants with MS metabolomics during pregnancy (26–28 weeks’ gestation (Dataset 1A)) and in cord blood (Dataset 1B). Figure 6B shows Dataset 2 which includes 2,000 pregnancies (26–28 weeks’ gestation) with MS metabolomics within a case-cohort design. The 801 total cases are split into “cases” and “case-controls” based on how many cases we would expect in a representative cohort (i.e. the case-controls). “Cases” should not be included in a comparator group for any analyses as we want the comparison group to be representative. Abbreviations: MS, mass spectrometry; BiB, Born in Bradford; GWAS, genome wide association study; EDTA, ethylenediaminetetraacetic acid; HDP, hypertensive disorders of pregnancy; GD, gestational diabetes; GHT, gestational hypertension; PE, pre-eclampsia, PTB, preterm birth; CA, congenital anomaly; SB, still birth.

Table 3. Participant characteristics of the mass spectrometry datasets in the BiB cohort.

Dataset 1
(N = 1,000
mother/child
pairs)
Dataset 2
case-cohort a
(N = 2,000)
Dataset 2 random
cohort sample
only b (N = 1,199)
BiB cohort
(N = 13,776)
Characteristics Category - - -
Maternal age Years 27.5 (5.7) 27.5 (5.7) 26.91 (5.5) 27.3 (5.6)
Missing 0 (0) 0 (0) 0 (0.0) 1445 (10.5)
Maternal parity Nulliparous
Multiparous
Missing
359 (37.0)
611 (61.1)
30 (3.0)
745 (37.3)
1213 (60.1)
42 (2.1)
433 (37.4)
725 (60.5)
41 (3.4)
5101 (37.0)
7773 (56.4)
902 (6.5)
Maternal BMI (kg/m 2)
Missing
26.7 (6.0)
36 (3.6)
26.8 (5.9)
97 (4.9)
25.9 (5.4)
60 (5.0)
26.0 (5.7)
3281 (23.8)
Maternal ethnicity White British
Pakistani
Other
Missing
500 (50.0)
500 (50.0)
0
0
933 (46.7)
1067 (53.4)
0
0
537 (44.8)
662 (55.2)
0
0
5055 (37.8)
6088 (45.5)
2223 (16.6)
410 (3.0)
Index of multiple
deprivation
Quintile 1 (most deprived)
Quintile 2
Quintile 3
Quintile 4
Quintile 5 (least deprived)
Missing
656 (65.6)
175 (17.5)
112 (11.2)
38 (3.8)
19 (1.9)
0 (0)
1340 (67.0)
358 (17.9)
212 (10.6)
53 (2.6)
37 (1.8)
0 (0)
823 (68.6)
203 (16.9)
123 (10.3)
31 (2.6)
19 (1.6)
0 (0)
7566 (66.4)
2052 (18.0)
1250 (11.0)
334 (2.9)
190 (1.7)
2384 (17.3)
Offspring sex Male
Female
Missing
512 (51.2)
488 (48.8)
0 (0)
1053 (52.7)
947 (47.3)
0 (0)
625 (52.1)
574 (47.9)
0 (0)
6891 (50.0)
6470 (48.4)
415 (3.0)
Offspring
birthweight
Grams
Missing
3304 (517)
0 (0)
3232 (574)
1 (0)
3318 (486)
0 (0)
3216 (565)
416 (3.0)

a This column comprises of the full case-cohort dataset of 2,000 pregnancies. This includes 801 selected cases and the 1,199 random cohort.

b This column includes only the 1,199 random cohort to compare to the full case-cohort with the selected cases.

Data are mean ± SD or n (%) unless stated. Abbreviations: BiB, Born in Bradford; BMI, body mass index; kg, kilogram; IMD, Index of Multiple Deprivation (taken from 2010 national quintiles). There were nine ethnic groups, of which White British and Pakistani were the main homogeneous groups. The 'Other' ethnicity category comprises: White Other, Mixed-White and Black, Mixed-White and South Asian, Black, Indian, Bangladeshi or Other ethnicity. Please note because of the way participants were selected into the MS datasets we would not expect characteristics to match those of the whole cohort.

For the MS dataset, researchers are given the option of using the ‘raw’ data from Metabolon or a quantified (scaled) data set, in which missing data have been imputed and the multiple of median values transformed to standard deviation- (z-) scores (by subtracting the sample mean value for each metabolite from the participant value and then dividing by the sample standard deviation for that metabolite). This transformation helps overcome the problem of high missing data in metabolomics 17 . This cohort for MS profiling were sampled on their ethnicity. It is almost 50% White British and Pakistani (there are slightly more Pakistani women in Dataset 2), as opposed to around 15% of the whole BiB cohort not belonging to either of these ethnic groups. However aside from this, the sample is representative of the whole cohort ( Table 3).

Overlap between metabolomics datasets

Having participants in multiple datasets (i.e. maternal pregnancy, offspring cord, offspring 12–24 months) and across the two metabolomic platforms provides scope for unique research opportunities. Figure 7 illustrates the overlap between BiB metabolomic datasets. The numbers are all based around the offspring, for example the number of maternal pregnancy metabolite data in any cell refer to the number of offspring who have a mother with those samples. There were 11,557 children from 11,480 pregnancies whose mothers had a pregnancy NMR sample. Of these, 6,756 children also had a cord blood sample and 1,981 had at least one measurement from either the 12- or 24-month ALL IN subsample. All the mothers with a pregnancy MS sample (from either the first or second dataset) also have an NMR sample. There were 7,919 children in total with an NMR sample in cord blood with 1,275 of these also having at least one measure from the 12- or 24-month subsample. Of those with NMR cord blood data, 2,486 had a mother with MS pregnancy data (from either the first or second dataset) and 1,000 have MS cord blood data. There were 2,108 children with at least one NMR measure at either the 12- or 24-months assessment and of these, 690 have a mother with MS metabolite measures in pregnancy data (from either dataset) and 229 have MS cord blood data. Although the exclusion criteria for MS dataset 2 was no prior MS metabolomics ( Figure 6), there was one mother with MS metabolomics in both datasets from different pregnancies.

Figure 7. Showing the overlap between the metabolomic datasets in the Born in Bradford cohort presented at the offspring level.

Figure 7.

Abbreviations: NMR, Nuclear magnetic resonance; MS, mass spectrometry.

Using the BiB metabolomic data, including a summary of published, ongoing and future research using these data

The current BiB metabolite data have been quantified on blood samples collected during pregnancy, cord blood at birth and in the offspring at 12- and 24-months. These are critical time periods for life-course research and the combination of these data with large amounts of genomic, epigenomic, social and health data makes BiB a platform which provides scope for unique research opportunities.

Issues for data users

1. Batch effects: The quantified NMR metabolites that have been measured in BiB are represented in clinically meaningful units, so can be compared to results from other studies. By contrast the Metabolon MS metabolites are quantified in relative abundance i.e. in relation to other quantified MS measurements that were run on the same day. The MS Dataset 1 and Dataset 2 were obtained ~2 years apart and have been normalized to different references, so are not directly comparable. For example, the value of a specific metabolite from a maternal pregnancy sample in Dataset 1 compared to the same metabolite in Dataset 2 may differ because they are from different batches. Because of the different selection process for the two datasets (Dataset 1 is paired pregnancy-offspring cord blood samples and Dataset 2 has a case-cohort sampling frame) it is not possible to normalize them to the same reference. We recommend running analyses separately in each of the two datasets and comparing results, then meta-analyse if appropriate. In principal components analyses, 37 of the 1,000 women with a pregnancy sample in Dataset 1 of MS data had notably different values to those in the remaining 963 women, which may be a batch effect (see Figure in Extended data 11 ). This is a new finding and in previous analyses using these data we have not treated these 37 women differently. However, for future analyses we would recommend researchers consider running analyses with all women and in a sensitivity analysis with these 37 women removed.

2. Comparisons with clinical chemistry measurements: We have illustrated above strong correlations between glucose and lipids measured using clinical chemistry and the NMR platform. We found weaker (though directionally consistent) associations of BMI with these outcomes measured using NMR compared to those with clinical chemistry. In a second example, results were consistent between the two methods for the associations of pre-eclampsia with glucose and lipids. Researchers considering using these data might want to check for consistency with associations using the clinical chemistry measurement available in BiB. For the MS data we were only able to explore correlations with glucose and found this to be high between clinical chemistry and MS.

3. Second version of the Metabolon MS data: Due to recent developments, there are now two versions of the Metabolon MS data. We asked Metabolon to undertake identification and quantify of the metabolites from the stored chromatograms to see if it was possible to quantify a specific metabolite that was of interest in an ongoing project, 1-(1-enyl-stearoyl)-2-oleoyl-GPC (P-18:0/18:1). This new identification and quantification was conducted for MS Dataset 1a (the maternal pregnancy samples, not including the infant’s cord blood samples) and all pregnancy and cord blood 2000 samples from MS Dataset 2. As quantification of the spectra are done daily, with each quantified relatively as a multiple of the days median values, the metabolites identified, and their relative quantities (MoM) are not expected to be identical in the two versions of these samples.

For Dataset 1a, the median Spearman’s correlation coefficient across 1,002 overlapping metabolites in the original and the new data is 0.94, with a range from 0.1 to 1.0. Of these metabolites, 861 (86%) have a Spearman’s correlation of > 0.9 and 4 (0.4%) a coefficient <0.3. These 4 are leucyl-glycine, adenine, hypotaurine and uracil. The reason for the low correlations for these metabolites is unclear. We will update the data note when this is clarified. We include the MoM value for each analyses of Dataset 1 for the 1,002 metabolites that are quantified in both versions together with their correlation coefficient in the Extended Data. As different multipliers are used in each analysis, we would not expect the values to be the same in both datasets, but those with high correlations show that between women differences within each run are similar (i.e., women are ranked similarly in each version) and association analyses should give similar results with either dataset.

For Dataset 2, the median Spearman’s correlation coefficient across 1,217 overlapping metabolites is 1 and all Spearman correlation coefficients across 1,217 overlapping metabolites were >0.99 (see Extended Data).

Summary of published research using the BiB metabolomics data

We undertook a collaboration between BiB and the UK Pregnancies Better Eating and Activity Trial (UPBEAT), a randomised control trial of obese pregnant women (BMI ≥ 30kg/m 2) 18 . We found evidence that maternal pregnancy NMR samples can improve prediction of pregnancy-related disorders 18 . The prediction models consisting of NMR-derived metabolomics and established risk factors (maternal age, smoking, BMI, ethnicity, and parity) performed better than established risk factors alone for gestational diabetes, hypertensive disorders of pregnancy, small/large for gestational age but not preterm birth in BiB. We found directionally consistent, but attenuated, results in UPBEAT. The attenuated results in that validation sample may reflect the differences between the studies participants characteristics, model overfitting in BiB, or both.

In other work, we have also shown that that the distributions of most of the NMR metabolic measures differed by ethnicity 19 . White European women had higher levels of most lipoprotein subclasses, cholesterol, glycerides and phospholipids, monosaturated fatty acids, and creatinine but lower levels of glucose, linoleic acid, omega-6 and polyunsaturated fatty acids, and most amino acids, compared with South Asian women. This suggests a more lipidomic pregnancy metabolic profile in White Europeans and a stronger glycemic metabolic profile in South Asian women. Higher BMI and having gestational diabetes were associated with higher levels of several lipoprotein subclasses, triglycerides, and other metabolites in both groups but with evidence of weaker magnitudes of association for most of these in the South Asian women.

In recent collaborations between the BiB cohort and the Pregnancy Outcome Prediction study (POPs) using Metabolon MS data, we have found evidence that 4’-hydroxyglutamate improves prediction of pre-eclampsia compared to clinical risk factors alone 3 and that a ratio of four metabolites (1-(1-enyl-stearoyl)-2-oleoyl-GPC, 1,5-anhydroglucitol,5α-androstan-3α,17α-diol disulfate and N1,N12-diacetylspermine) together with the sFlt-1:PIGF ratio is a better predictor of fetal growth restriction/small for gestational age than sFlt-1:PIGF alone 4 . Initial associations in POPs, a nulliparous, largely White European, affluent cohort from the South East of England, were validated in BiB. As we have outlined, BiB is a cohort of mixed ethnic background, with high levels of deprivation and including both nulliparous and multiparous women. The consistency of associations between POPs and BiB suggests that the prediction accuracy may be widely generalisable and that the metabolites predicting these outcomes may be causally related to them.

Furthermore, combining the MS metabolomics with genomic sequence data has enabled the establishment of metabolomic consequences of loss of functional rare variants in autozygous individuals and the health effects of this loss of function 20 . This has supported the development of the drug lumasiran for a rare kidney disease 21 .

Ongoing and future research

Ongoing work using both the NMR and MS metabolomics data will explore how the pregnancy metabolic environment relates to fetal growth (using repeat ultrasound scan measures and birth weight), preterm delivery, and congenital heart disease. Potential causal effects in these studies will be explored where possible by replication, the use of Mendelian Randomization (MR) and triangulation with other types of data and study designs. In ongoing work, we are using data from both MS datasets to evaluate whether MS-derived metabolomics are better predictors of gestational diabetes, hypertensive disorders of pregnancy, small and large for gestational age and preterm birth, than risk factors alone (with external validation being undertaken in the POPs cohort). By combining both NMR and MS data, we are exploring the relationships between maternal pregnancy metabolites and their offspring cord blood metabolites. To date, there is no published work using the offspring metabolomics data. Researchers can find information on planned follow up data elsewhere, to understand whether these data could be useful to their ongoing or future research 22 .

BiB also contributes to metabolomic studies that are being undertaken by large collaborative efforts. This includes the European H2020 funded LifeCycle project 23 , in which we are exploring exposure to maternal hypertensive disorder of pregnancy, gestational diabetes, small and large for gestational age and preterm delivery on offspring subsequent metabolic profile. In the Consortium of Metabolomics Studies (COMETS) 2 there are ongoing projects including trans-ethnic genome-wide association analyses (GWAS), and exploring effects of BMI, smoking, dietary patterns and hypertension on maternal metabolomic profiles.

Discussion and future directions for metabolomic analyses in BiB

In this data note we have described multiple datasets with NMR and MS metabolomic measures in the BiB cohort. The wealth of metabolomics data available in BiB provides opportunities for addressing a range of research questions. In this section, we discuss the strengths and limitations of the data, together with some of our insights for using these data. We also provide information on plans for future measurements of metabolomics in BiB.

A key strength of these datasets is that they are based within a cohort that has very detailed information on 13,776 pregnancies. This includes detailed socioeconomic, education, cognitive, and mental and physical health data. We have OGTT results and fasting pregnancy blood samples on most (83%) of the mothers, genomics (genome wide and sequence) data and epigenomics data in maternal pregnancy and offspring cord blood. Few studies have pregnancy metabolomics data or OGTT data in numbers of this size. We can look at metabolomics and its role in prediction of adverse pregnancy/perinatal outcomes and health and development in children. BiB has large numbers of South Asian and White European families, residing in a city with high levels of socioeconomic deprivation. The ethnic diversity allows us to try and understand ethnic differences in the developmental origins of disease, for example, why South Asian populations have a higher risk of type two diabetes and coronary heart disease. There is also scope to explore how diet could relate to the range of metabolomic measurements that BiB possesses. Further information on dietary variables can be found online in the BiB data dictionary https://borninbradford.github.io/datadict/ .

Having access to two metabolomics profiles (NMR and MS) is beneficial. The NMR platform mostly consists of lipids and lipoproteins, but also provides quantified fatty acids, amino acids, glycolysis metabolites, ketone bodies and glycoprotein acetyl (an inflammatory marker). It provides considerably more information than clinical chemistry measures that are conventionally measured in cohorts (e.g. glucose, total cholesterol, LDLc, HDLc and triglycerides) and at not much more cost (~£20 per sample). As a result, we have been able to obtain these data on large numbers of women in pregnancy, offspring cord blood and in samples taken in offspring at 12- and 24- months assessments. By contrast, the MS data covers more of the metabolome, including being able to assess energy metabolism (which might be important in pregnancy) and markers of medications such as paracetamol. However, it is more expensive (~£80-£200, depending on how many samples are assayed at a time). By having access to both datasets here, we can have broader coverage of the metabolome 24 . There are potential uses for both platforms – ranging from disease prediction to causal analyses using methods such as MR 25 . Both platforms have been used in previous GWAS of metabolites 26 . As such, BiB could be used to explore whether genetic instruments from GWAS can be related to NMR or MS metabolites in pregnancy.

Access to this unique metabolomic data is a big advantage in BiB. However, we have been unable to validate findings in external cohorts. The work described above cannot be replicated because we cannot find other independent studies with relevant data 19 . We hope that this data note will encourage other studies to collect similar data in pregnancy, offspring cord blood, and in mothers and offspring postnatally throughout their life-course.

There are some additional important limitations of the data to consider. The impact of these limitations will depend on the research question. All the metabolomics datasets were collected on subsamples ranging from 11,480 with maternal pregnancy NMR samples (83% of the eligible 13,776 participants) to 1,000 (7%) with MS cord blood samples. Smaller sample sizes may be statistically inefficient in some analyses and the selection processes ( Figure 4 and Figure 6) may result in selection bias in some analyses. It is notable and provides some reassurance that, even for the smaller samples, distributions of most characteristics are similar between participant groups with different types of metabolomics at different time points and the whole cohort ( Table 2 and Table 3). As Metabolon MS data have been collected only on White British and Pakistani women it cannot be assumed that analyses with these data would generalize to other ethnic groups. BiB cohort participants were largely recruited at the OGTT (with a small number recruited after that). This was opportunistic as we had no funding for initiating the cohort. After consultation with the community and health care providers, we established that this would be a suitable time to obtain consent, interview pregnant women and collect a fasting blood sample for research. However, it means that we are likely to have missed women who did not attend the OGTT and were not captured later in pregnancy or at delivery, and those who delivered pre-term before they attended their OGTT. We have previously compared BiB participants to non-BiB births occurring between 2007–2011 8 . Summary data from obstetric and delivery records were obtained for 11,761 non-BiB births, which would include some who moved to Bradford shortly before delivery (and would not have been eligible to recruit). The comparison showed a small number of differences. BiB participants were less likely to include younger mothers (age 20–24 years) and had a higher proportion of South Asian and nulliparous mothers. There were differences in gestational age and preterm delivery that reflected recruiting BiB participants relatively late in pregnancy 8 . This selection on gestational age may introduce selection bias in some BiB analyses, including those using the metabolomics data described here.

A limitation is that BiB only has pregnancy metabolomics at a single time point and does not have pre-pregnancy measurements. Previous research suggests metabolites change upon becoming pregnant and then revert to pre-pregnancy levels 5 and that they change during pregnancy 27 . Earlier measures would be valuable for prediction of future adverse outcomes to enable earlier antenatal monitoring and intervention.

This data note has focused on metabolomics data that have been quantified by high throughput commercial platforms (Nightingale Health© NMR and Metabolon MS). On a small subsample of BiB participants (N = 199) NMR urine and serum MS blood metabolites have been quantified at Imperial College, London, as part of the HELIX collaboration. HELIX aims to identify the human exposome in pregnancy and childhood. Metabolite measurements were undertaken alongside similar subsamples from five other cohorts (total N = 1,192). In all six cohorts, samples were from children aged between 6–11 years (BiB participants were mean age 6.6 years). 44 urine metabolites (24 semi-quantified) and 188 serum (56 fully quantified) metabolites were measured. We have not described these metabolomics datasets here as the assays are unique to a small subgroup of BiB participants and any research on these participants is best done together with the other HELIX cohort subgroups on whom the same metabolomic data obtained at the same time and using the same methods is available. Further information about the samples and methods used can be found elsewhere 28 .

Up until March 2020, we were undertaking a follow-up of BiB parents and offspring, including collecting further blood samples with funding available to complete the NMR analyses on offspring and parent serum/plasma collected at this follow-up. However, that follow-up stopped on the 16 th March 2020 when restrictions on normal life due to the COVID-19 pandemic began in the UK. At the time of submitting this paper we do not know when face-to-face data collection will be possible to start again and what the best plans would be for further blood sample collection. At the relevant time we will discuss different potential scenarios for completing that planned follow-up with our scientific advisory groups. Whatever the decision, we should have some participants with serum/plasma NMR measures collected ~8–9 years after birth. We are also planning to measure metabolites on the available maternal pregnancy urine samples. Urine metabolites often provide a more accurate measure of dietary intake and medicine use than serum/plasma measures and would be a valuable addition to the existing datasets described here. Any new data will be made available to the wider research community.

Data availability

Underlying data

Scientists are encouraged to make use of the BiB data, which are available through a system of managed open access.

  • Before you contact BiB, please make sure you have read our Guidance for Collaborators. Our BiB executive review proposals on a monthly basis and we will endeavour to respond to your request as soon as possible. You can find out about all of the different datasets which are available here. If you are unsure if we have the data that you need please contact a member of the BiB team ( borninbradford@bthft.nhs.uk).

  • Once you have formulated your request please complete the ‘Expression of Interest’ form available here and email the BiB research team ( borninbradford@bthft.nhs.uk).

  • If your request is approved, we will ask you to sign a collaboration agreement; if your request involves biological samples, we will ask you to complete a material transfer agreement.

Extended data

Open Science Framework: Metabolomics data in the Born in Bradford cohort. https://doi.org/10.17605/OSF.IO/YST7N 11 .

This project contains the following extended data:

  • BiB_MS_Dataset1_PCA_Plot.png (Figure showing principal component analysis of dataset 1)

  • MS_Metabolite_Details.xlsx (Lists the names and details of all metabolites assessed by the Metabolon platform in the BiB MS dataset 1a (sheet 1) and BiB MS dataset 2 (sheet 2).

  • MS_Quality_Report_Dataset1.pdf (QC report for MS dataset 1 from Metabolon)

  • MS_Quality_Report_Dataset2.pdf (QC report for MS dataset 2 from Metabolon)

  • NMR_Metabolite_Details.xlsx (Lists the names and units of all metabolic traits assessed by the NMR platform)

  • NMR_Quality_Report_Cord.pdf (Summarizing quality observations in the NMR cord blood dataset)

  • NMR_Quality_Report_Infant.pdf (Summarizing quality observations in the NMR infant dataset)

  • NMR_Quality_Report_Pregnancy.pdf (Summarizing quality observations in the NMR pregnancy dataset)

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Acknowledgements

Our thanks to all the children (and parents), teachers and school staff who were involved in the research, and staff at the Bristol Bioresource Laboratory who manage the BiB biobank, including cataloguing and shipping samples (e.g. to Metabolon) and at the Bristol Metabolomics Research who ran NMR analyses. Thanks also to the Born in Bradford (BiB) Research Assistants for data collection. Born in Bradford is only possible because of the enthusiasm and commitment of the Children and Parents in BiB. We are grateful to all the participants, practitioners and researchers who have made Born in Bradford happen.

Funding Statement

Core funding for BiB has been provided by the Wellcome Trust [101597], a joint grant from the UK Medical Research Council (MRC) and UK Economic and Social Science Research Council (ESRC) [MR/N024397/1], the British Heart Foundation [CS/16/4/32482] and the National Institute for Health Research ARC Yorkshire and Humber [NIHR200166]. Funding for the metabolomics analyses has been provided by the US National Institutes of Health [R01 DK10324], the European Research Council (ERC) under the European Union’s Seventh Framework Programme [FP7/2007-2013] / ERC grant agreement no 669545 and the MRC via the MRC Integrative Epidemiology Unit Programme to D.A.L [MC_UU_00011/6]. The work presented here is also supported by the University of Bristol British Heart Foundation Accelerator Award [AA/18/7/34219]. K.T. is supported by a British Heart Foundation Doctoral Training Program [FS/17/60/33474]. N.M. PhD studentship is funded by the National Institute for Health Research (NIHR) Biomedical Centre at the University Hospitals Bristol NHS Foundation Trust and the University of Bristol. K.T., N.M., N.J.G, K.B. and D.A.L. work in a unit that is supported by the University of Bristol and UK Medical Research Council [MC_UU_00011/6]. D.A.L is a NIHR Senior Investigator [NF-0616-10102]. The ALL IN sub-study was supported by the Wellcome Trust [083521]. Funding was also provided by the National Institute for Health Research (NIHR) under its Collaboration for Applied Health Research and Care (CLAHRC) for Yorkshire and Humber

[version 2; peer review: 1 approved, 3 approved with reservations]

References

  • 1. Würtz P, Kangas AJ, Soininen P, et al. : Quantitative Serum Nuclear Magnetic Resonance Metabolomics in Large-Scale Epidemiology: A Primer on -Omic Technologies. Am J Epidemiol. 2017;186(9):1084–96. 10.1093/aje/kwx016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Yu B, Zanetti KA, Temprosa M, et al. : The Consortium of Metabolomics Studies (COMETS): Metabolomics in 47 Prospective Cohort Studies. Am J Epidemiol. 2019;188(6):991–1012. 10.1093/aje/kwz028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Sovio U, McBride N, Wood AM, et al. : 4-Hydroxyglutamate is a novel predictor of pre-eclampsia. Int J Epidemiol. 2020;49(1):301–11. 10.1093/ije/dyz098 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Sovio U, Goulding N, McBride N, et al. : A maternal serum metabolite ratio predicts fetal growth restriction at term. Nat Med. 2020;26(3):348–53. 10.1038/s41591-020-0804-9 [DOI] [PubMed] [Google Scholar]
  • 5. Wang Q, Würtz P, Auro K, et al. : Metabolic profiling of pregnancy: cross-sectional and longitudinal evidence. BMC Med. 2016;14(1):205. 10.1186/s12916-016-0733-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Santos Ferreira DL, Williams DM, Kangas AJ, et al. : Association of pre-pregnancy body mass index with offspring metabolic profile: Analyses of 3 European prospective birth cohorts. Ma RCW editor. PLoS Med. 2017;14(8):e1002376. 10.1371/journal.pmed.1002376 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Würtz P, Wang Q, Niironen M, et al. : Metabolic signatures of birthweight in 18 288 adolescents and adults. Int J Epidemiol. 2016;45(5):1539–50. 10.1093/ije/dyw255 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Wright J, Small N, Raynor P, et al. : Cohort Profile: The Born in Bradford multi-ethnic family cohort study. Int J Epidemiol. 2013;42(4):978–91. 10.1093/ije/dys112 [DOI] [PubMed] [Google Scholar]
  • 9. Pembrey L, Waiblinger D, Griffiths P, et al. : Cytomegalovirus, Epstein-Barr virus and varicella zoster virus infection in the first two years of life: a cohort study in Bradford, UK. BMC Infect Dis. 2017;17(1):220. 10.1186/s12879-017-2319-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Yu Z, Kastenmüller G, He Y, et al. : Differences between Human Plasma and Serum Metabolite Profiles. Oresic M editor. PLoS One. 2011;6(7):e21230. 10.1371/journal.pone.0021230 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Taylor K: Metabolomics data in the Born in Bradford cohort.2020. 10.17605/OSF.IO/YST7N [DOI]
  • 12. Farrar D, Santorelli G, Lawlor DA, et al. : Blood pressure change across pregnancy in white British and Pakistani women: analysis of data from the Born in Bradford cohort. Sci Rep. 2019;9(1):13199. 10.1038/s41598-019-49722-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Evans AM, Bridgewater BR, Liu Q, et al. : High Resolution Mass Spectrometry Improves Data Quantity and Quality as Compared to Unit Mass Resolution Mass Spectrometry in High-Throughput Profiling Metabolomics. Metabolomics. 2014;4:132. 10.4172/2153-0769.1000132 [DOI] [Google Scholar]
  • 14. Schrimpe-Rutledge AC, Codreanu SG, Sherrod SD, et al. : Untargeted Metabolomics Strategies-Challenges and Emerging Directions. J Am Soc Mass Spectrom. 2016;27(12):1897–1905. 10.1007/s13361-016-1469-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Cologne J, Preston DL, Imai K, et al. : Conventional case-cohort design and analysis for studies of interaction. Int J Epidemiol. 2012;41(4):1174–86. 10.1093/ije/dys102 [DOI] [PubMed] [Google Scholar]
  • 16. Sharp SJ, Poulaliou M, Thompson SG, et al. : A Review of Published Analyses of Case-Cohort Studies and Recommendations for Future Reporting. PLoS One. 2014;9(6):e101176. 10.1371/journal.pone.0101176 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Shah J, Brock GN, Gaskins J: BayesMetab: treatment of missing values in metabolomic studies using a Bayesian modeling approach. BMC Bioinformatics. 2019;20(Suppl 24):673. 10.1186/s12859-019-3250-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. McBride N, White SL, Farrar D, et al. : Do nuclear magnetic resonance (NMR)-based metabolomics improve the prediction of pregnancy-related disorders? medRxiv. 2020;2020.06.22.20134650. 10.1101/2020.06.22.20134650 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Taylor K, Santos Ferreira DL, West J, et al. : Differences in Pregnancy Metabolic Profiles and Their Determinants between White European and South Asian Women: Findings from the Born in Bradford Cohort. Metabolites. 2019;9(9):190. 10.3390/metabo9090190 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Clark DW, Okada Y, Moore KHS, et al. : Associations of autozygosity with a broad range of human phenotypes. Nat Commun. 2019;10(1):4957. 10.1038/s41467-019-12283-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. McGregor TL, Hunt KA, Yee E, et al. : Characterising a healthy adult with a rare HAO1 knockout to support a therapeutic strategy for primary hyperoxaluria. Goate A, Weigel D, Ryten M, Blackburn N, editors. Elife. 2020;9:e54363. 10.7554/eLife.54363 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Bird PK, McEachan RRC, Mon-Williams M, et al. : Growing up in Bradford: protocol for the age 7–11 follow up of the Born in Bradford birth cohort. BMC Public Health. 2019;19(1):939. 10.1186/s12889-019-7222-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Jaddoe VWV, Felix JF, Andersen A-MN, et al. : The LifeCycle Project-EU Child Cohort Network: a federated analysis infrastructure and harmonized data of more than 250,000 children and parents. Eur J Epidemiol. 2020;35(7):709–724. 10.1007/s10654-020-00662-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Dunn WB, Broadhurst DI, Atherton HJ, et al. : Systems level studies of mammalian metabolomes: the roles of mass spectrometry and nuclear magnetic resonance spectroscopy. Chem Soc Rev. 2011;40(1):387–426. 10.1039/b906712b [DOI] [PubMed] [Google Scholar]
  • 25. Würtz P, Wang Q, Kangas AJ, et al. : Metabolic Signatures of Adiposity in Young Adults: Mendelian Randomization Analysis and Effects of Weight Change. Sheehan NA, editor. PLoS Med. 2014;11(12):e1001765. 10.1371/journal.pmed.1001765 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Lotta LA, Pietzner M, Stewart ID, et al. : Cross-platform genetic discovery of small molecule products of metabolism and application to clinical outcomes. bioRxiv. 2020;2020.02.03.932541. 10.1101/2020.02.03.932541 [DOI] [Google Scholar]
  • 27. Mills HL, Patel N, White SL, et al. : The effect of a lifestyle intervention in obese pregnant women on gestational metabolic profiles: findings from the UK Pregnancies Better Eating and Activity Trial (UPBEAT) randomised controlled trial. BMC Med. 2019;17(1):15. 10.1186/s12916-018-1248-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Lau C-HE, Siskos AP, Maitre L, et al. : Determinants of the urinary and serum metabolome in children from six European populations. BMC Med. 2018;16(1):202. 10.1186/s12916-018-1190-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wellcome Open Res. 2024 Oct 28. doi: 10.21956/wellcomeopenres.18489.r107903

Reviewer response for version 2

Simone Zuffa 1

Taylor K., McBride M., et al. describe the collection rationale and methods for metabolomics data in the Born in Bradford (BiB) cohort. Blood samples were obtained from pregnant women, umbilical cords, and infants, and analyzed using NMR and MS. The authors highlight that the metabolomics data generated from this cohort can be analyzed alongside the extensive metadata and other omics data to address significant questions related to pregnancy and health development in both mothers and offspring.

The manuscript is well-structured and provides a clear overview of the collection practices. However, a major limitation of the presented data is that it was collected by commercial companies, and the generated raw data is not available to the research community. The field of computational metabolomics has advanced significantly in the past decade, and many tools (e.g., GNPS 1, SIRIUS 2, MSNovelist 3, MassQL 4) are now available for the reanalysis of metabolomics data, particularly in the realm of untargeted mass spectrometry. Access to the raw mass spectrometry data in open-source formats, rather than just tabular data of extracted and annotated peak areas, could unlock the full potential of this cohort. This would enable investigations into novel classes of molecules, as well as drugs or environmental pollutants, which could be associated with the phenotypic information collected by the authors. I recommend that the authors engage with the companies to obtain the raw data and make it publicly available to researchers.

Regarding other paired omics data available for the samples, it is unclear how many samples (from mothers and infants) have corresponding genomic data. From the Data Dictionary ( https://borninbradford.github.io/datadict/index.html), it also appears that proteomics and glycomics data are available. It would be helpful to include this information in the manuscript so that researchers have a clear overview of all the matching omics data available within the cohort.

Other minor comments

  1. The study website in page 3 link is broken.

  2. Add a bit more information of how metabolomics can be targeted or untargeted.

  3. The link “A list of metabolites defined in each of the datasets can be found  in the Extended data” at page 9 does not actually provide a list of the metabolites obtained via MS

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Partly

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

untargeted metabolomics, mass spectrometry, data science

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

References

  • 1. : Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat Biotechnol .2016;34(8) : 10.1038/nbt.3597 828-837 10.1038/nbt.3597 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. : SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat Methods .2019;16(4) : 10.1038/s41592-019-0344-8 299-302 10.1038/s41592-019-0344-8 [DOI] [PubMed] [Google Scholar]
  • 3. : MSNovelist: de novo structure generation from mass spectra. Nat Methods .2022;19(7) : 10.1038/s41592-022-01486-3 865-870 10.1038/s41592-022-01486-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. : A Universal Language for Finding Mass Spectrometry Data Patterns. bioRxiv .2022; 10.1101/2022.08.06.503000 10.1101/2022.08.06.503000 [DOI] [Google Scholar]
Wellcome Open Res. 2024 May 21. doi: 10.21956/wellcomeopenres.18489.r79682

Reviewer response for version 2

Sandi Azab 1

Paper Review: Metabolomics Datasets in the Born in Bradford Cohort

Thank you for the opportunity to review this manuscript submission. In this article, the authors describe the metabolomic data available in Born in Bradford (BiB), profiled during pregnancy in mothers, in cord blood and during early life in the offspring. The authors provide an overview of NMR and MS technologies used and the QC measures taken as well as the BiB cohort itself with some note on current and future metabolomics studies in bib.

This manuscript is very well written and clearly explains the different metabolomics datasets in BiB. Thus, a definite strength of this study is its clarity and its overall overview and structure.

However, the following remarks are to be considered:

Major comments:

  • Following, although the authors mention Comments in the introduction, there is a lack of discussing the term “Metabolomic epidemiology” that was introduced by the Metabolomics Society Epidemiology Task Group (overlap with researchers behind comets). As the authors discuss outcomes, study design, and other epidemiologic concepts, it is important to refer to this paper for the opportunities and challenges that might arise when “marrying” metabolomics and epidemiology. [Metabolomics (2021) 17:45 https://doi.org/10.1007/s11306-021-01789-0] [Ref 1]

 

  • The ethnic diversity in BiB is a point of strength. It will be valuable to note that in the abstract as well emphasize in the paper the scarcity of omics data in non-white ethnic populations and the need for ethnic expansion in the era of precision medicine. You can refer to this landmark Lancet series on precision medicine [Lancet Diabetes Endocrinol 2023; 11: 822–35] [Ref 2]

 

  • I realize that the paper has been submitted in 2020 so I understand that it might not be feasible for the authors to update their literature review. However, I encourage to revise the following sentence based on existing birth cohorts with metabolomics data e.g., Canadian “NutriGen” birth cohort consortium and Australian “checkpoint”. “However, we have been unable to validate findings in external cohorts. The work described above cannot be replicated because we cannot find other independent studies with relevant data 19 . We hope that this data note will encourage other studies to collect similar data in pregnancy, offspring cord blood, and in mothers and offspring postnatally throughout their life-course.”

J Am Heart Assoc. 2019; [Ref 3];

Sikorski C  et al 2022  [Ref 4];

Azab, S.(2023)[Ref 5] 

 

  • Please add 1 sentence on the advantage of the case-cohort design; Why was it selected?

Minor comments:

  • When referring to GDM, HDP etc., authors are advised to use the term “APOs” (Adverse Pregnancy outcomes) used in the 2024 American Heart Association Scientific Statement, [Ref 6]

  • “Previous work has shown that reproducibility in both serum and plasma is good.” Ref needed.

  • First paragraph in Intro needs several references.

  • “MS offers a truly untargeted approach.” Please rephrase as MS can be used for either targeted or untargeted metabolomics depending on the method.

  • “However, MS only provides relative quantification based on peak area in these approaches without comparison to a metabolite reference standard.” Please rephrase: While most high-throughput untargeted MS platforms provide relative peak areas, with available standards known metabolites can be quantified in absolute concentrations (mmol/l) using MS platform. See: “Shanmuganathan, M., Kroezen, Z., Gill, B.  et al. The maternal serum metabolome by multisegment injection-capillary electrophoresis-mass spectrometry: a high-throughput platform and standardized data workflow for large-scale epidemiological studies. Shanmuganathan M et al, (2021). [Ref 7]” this reference also gives reference values in pregnancy for 50 metabolites or so using CE-MS.

 

  • “A list of metabolites defined in each of the datasets can be found in the Extended data11 .” As far as my skills allowed me, this links to a list of NMR data only and not MS variables. The reader cannot access a list of the 1000 molecules identified in the MS datasets .

 

  • Second version of the Metabolon MS data: not clear what changed? can you clarify

 

  • “This suggests a more lipidomic pregnancy metabolic profile in White Europeans and a stronger glycemic metabolic profile in South Asian women.” I would rephrase the “more lipidomic profile”! do you mean more perturbations in the lipidome/lipids in White Europeans?

 

  • “NMR urine and serum MS blood metabolites have been quantified at Imperial College, London, as part of the HELIX collaboration.” Please change to NMR-quantified or identified metabolites; there are no NMR or MS metabolites per se.

 

  • However, that follow-up stopped on the 16 th March 2020 when restrictions on normal life due to the COVID-19 pandemic began in the UK. At the time of submitting this paper we do not know when face-to-face data collection will be possible to start again and what the best plans would be for further blood sample collection. At the relevant time we will discuss different potential scenarios for completing that planned follow-up with our scientific advisory groups. It seems this sentence needs updating; Is there a plan forward? Or delete this sentence.

Thank you,

Sandi Azab, MSc, RPh, PhD

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Partly

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

Metabolomics; Bioanalytical chemistry; Mass spectrometry; Metabolomic epidemiology; DOhaD; birth cohorts; Cardiovascular disease

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

References

  • 1. : A strategy for advancing for population-based scientific discovery using the metabolome: the establishment of the Metabolomics Society Metabolomic Epidemiology Task Group. Metabolomics .2021;17(5) : 10.1007/s11306-021-01789-0 45 10.1007/s11306-021-01789-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. : Lancet Diabetes & Endocrinology Commission on the Definition and Diagnosis of Clinical Obesity. Lancet Diabetes Endocrinol .2023;11(4) : 10.1016/S2213-8587(23)00058-X 226-228 10.1016/S2213-8587(23)00058-X [DOI] [PubMed] [Google Scholar]
  • 3. : A Cross-Cohort Study Examining the Associations of Metabolomic Profile and Subclinical Atherosclerosis in Children and Their Parents: The Child Health CheckPoint Study and Avon Longitudinal Study of Parents and Children. J Am Heart Assoc .2019;8(14) : 10.1161/JAHA.118.011852 e011852 10.1161/JAHA.118.011852 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. : Serum metabolomic signatures of gestational diabetes in South Asian and white European women. BMJ Open Diabetes Res Care .2022;10(2) : 10.1136/bmjdrc-2021-002733 10.1136/bmjdrc-2021-002733 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. : Early sex-dependent differences in metabolic profiles of overweight and adiposity in young children: a cross-sectional analysis. BMC Med .2023;21(1) : 10.1186/s12916-023-02886-8 176 10.1186/s12916-023-02886-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. : Opportunities in the Postpartum Period to Reduce Cardiovascular Disease Risk After Adverse Pregnancy Outcomes: A Scientific Statement From the American Heart Association. Circulation .2024;149(7) : 10.1161/CIR.0000000000001212 e330-e346 10.1161/CIR.0000000000001212 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. : The maternal serum metabolome by multisegment injection-capillary electrophoresis-mass spectrometry: a high-throughput platform and standardized data workflow for large-scale epidemiological studies. Nat Protoc .2021;16(4) : 10.1038/s41596-020-00475-0 1966-1994 10.1038/s41596-020-00475-0 [DOI] [PubMed] [Google Scholar]
Wellcome Open Res. 2021 Oct 4. doi: 10.21956/wellcomeopenres.18489.r45817

Reviewer response for version 2

Ruifang Li-Gao 1

Thanks to the authors for the revisions and great work. I have no further comments to make.

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

Metabolomics, genetics, twin/family study

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Wellcome Open Res. 2021 Apr 20. doi: 10.21956/wellcomeopenres.17967.r43230

Reviewer response for version 1

Mengna Huang 1

Taylor et al. provided a comprehensive review of the metabolomic data currently available in the Born in Bradford (BiB) birth cohort, including those generated from NMR and MS platforms at different time points from 26–28 weeks gestation to ~24 months of the children. Details of the sample collection procedures and metabolomic profiling were described, including validation of some of the data using other assays. Overall all relevant information was very clearly presented in this data note.

Below are some comments that may help the authors improve the manuscript:

  1. Page 4 – the total number of pregnancies in BiB was greater than the number of women enrolled in the study, and the recruitment period spanned over four years. In Figures 4A and 6, mostly it was the number of pregnancies that were presented – were these singleton birth? Were there women who contributed samples from more than one pregnancy? Same question for cord blood. In figure 4B, were children with NMR metabolomics all from independent families? Such information will have implications in statistical analysis and should be clarified.

  2. Page 4 – under “Pregnancy blood samples”, step 4 indicated that 2 vials were used, while step 6 says 3 aliquots. This seems inconsistent.

  3. Page 7 – for validation of the MS platform, did the author conduct regression analysis with the MS glucose measure in association with BMI and HDP? This would of course be in a smaller sample size, but it would be interesting to see the results.

  4. Page 10 – for the MS dataset, apart from the raw and scaled data sets from Metabolon, did the authors perform additional data processing (e.g. different imputation approach or log transformation)? There would usually be some metabolites with high missing percentages where imputing with the minimum (which I believe is what Metabolon usually does for imputation) would not be entirely appropriate for subsequent analysis. Is there any recommendation in terms of excluding them from subsequent analysis or applying a different analysis method?

  5. Figure 7 – there should not be overlap between MS pregnancy dataset 1a and MS pregnancy dataset 2 according to descriptions on the bottom of Page 9?

  6. Page 14 – the authors mentioned that differences by ethnicity in distributions of NMR data were observed, I wonder if differences in diet plays a role here. It’s not quite clear whether BiB collected dietary data from the main text?

  7. The article provided a good overview of the data available in pregnancy other than the metabolomics data. How much and what kind of data would be available during early life for the children? From the description of HELIX on Page 16, it seemed like BiB children were followed up to a median age of 6.6 years, were all children followed-up? Or just those in the ALL IN sub-study?

  8. Is there a plan to generate MS metabolomic data in the 12- and 24- months infant samples? Such data may complement the existing NMR data, and be important for studies like ALL IN.

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

epidemiology, metabolomics, integrative omics, asthma

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Wellcome Open Res. 2021 Sep 6.
Kurt Taylor 1

1. Page 4 – the total number of pregnancies in BiB was greater than the number of women enrolled in the study, and the recruitment period spanned over four years. In Figures 4A and 6, mostly it was the number of pregnancies that were presented – were these singleton birth? Were there women who contributed samples from more than one pregnancy? Same question for cord blood. In figure 4B, were children with NMR metabolomics all from independent families? Such information will have implications in statistical analysis and should be clarified.

The reviewer makes a useful point, and we agree that having this information in the manuscript will be useful for readers and prospective users of the data. We have now added this information to Figures 4 and 6, which show that women with more than one pregnancy had one removed at random and those with multiple pregnancies were excluded.

2. Page 4 – under “Pregnancy blood samples”, step 4 indicated that 2 vials were used, while step 6 says 3 aliquots. This seems inconsistent.

We thank the reviewer for pointing out this inconsistency, which we have now corrected by using only the term aliquot and clarifying the numbers:

“1) Storage racks were prepared.

2) Participant details were checked, making sure that both the BiB study ID and hospital number on the specimen bottles matched those on the participant tracking forms.

3) Tubes were centrifuged at 3500 rpm for 10 minutes at room temperature.

4) A 1 ml automatic pipette was used to aliquot samples into 1.5 ml aliquots ( 1-4 aliquots dependent on sample volume).

5) Vials were labelled with appropriate BiB study labels and the duplicate barcode label was placed in the corresponding space marked on BiB tracking form.

6) Aliquots were then placed in racks in a -80°C freezer.”

3. Page 7 – for validation of the MS platform, did the author conduct regression analysis with the MS glucose measure in association with BMI and HDP? This would of course be in a smaller sample size, but it would be interesting to see the results.

We thank the reviewer for pointing this out. We agree that it would be useful to include this. Please see response 2, to reviewer 1 above.

4. Page 10 – for the MS dataset, apart from the raw and scaled data sets from Metabolon, did the authors perform additional data processing (e.g. different imputation approach or log transformation)? There would usually be some metabolites with high missing percentages where imputing with the minimum (which I believe is what Metabolon usually does for imputation) would not be entirely appropriate for subsequent analysis. Is there any recommendation in terms of excluding them from subsequent analysis or applying a different analysis method?

We agree with the reviewer. However, the data are provided by Metabolon following extensive quality control. We cannot give recommendations for analyses because it depends on the research question of the group using the data. For example, in some of our analyses we remove participants where there is little between person variation in a particular metabolite. We also convert xenobiotics where there is substantial missing data to binary variables (indicating exposure to e.g. a particular medication).

5. Figure 7 – there should not be overlap between MS pregnancy dataset 1a and MS pregnancy dataset 2 according to descriptions on the bottom of Page 9?

We thank the reviewer for identifying this mistake. We have amended the figure to show the overlap of ‘0’.

6. Page 14 – the authors mentioned that differences by ethnicity in distributions of NMR data were observed, I wonder if differences in diet plays a role here. It’s not quite clear whether BiB collected dietary data from the main text?

For clarity, we have added the following to the discussion section:

There is also scope to explore how diet could relate to the range of metabolomic measurements that BiB possesses. Further information on dietary variables can be found online in the BiB data dictionary https://borninbradford.github.io/datadict/.

7. The article provided a good overview of the data available in pregnancy other than the metabolomics data. How much and what kind of data would be available during early life for the children? From the description of HELIX on Page 16, it seemed like BiB children were followed up to a median age of 6.6 years, were all children followed-up? Or just those in the ALL IN sub-study?

For clarity, we have added a sentence in the “ongoing and future research” section:

To date, there is no published work using the offspring metabolomics data. Researchers can find information on planned follow up data elsewhere, to understand whether these data could be useful to their ongoing or future research ( https://doi.org/10.1186/s12889-019-7222-2 ).

8. Is there a plan to generate MS metabolomic data in the 12- and 24- months infant samples? Such data may complement the existing NMR data, and be important for studies like ALL IN.

Currently there are no plans or funds to do this. Researchers who are interested in funding analyses on any available biosamples in BiB can submit an expression of interest to do so ( https://borninbradford.nhs.uk/research/how-to-access-data/). Because biosamples are finite these requests are assessed on several criteria, including the importance of the science, whether any existing assay results (including the metabolomic data described here) provide relevant data, whether other collaborators are already running similar assays and the volume required compared to current available sample volume.

-----------------------------------

ADDITIONAL SECTION TO THE REVISED DATANOTE

Since submitting the data not new versions of the two Metabolon MS data have been derived. We have added a new section called “second version of the Metabolon MS data” in the “issues for data users” section. We describe why we obtained these new versions and some analyses of how they relate to the original versions.  

Wellcome Open Res. 2021 Feb 8. doi: 10.21956/wellcomeopenres.17967.r42218

Reviewer response for version 1

Ruifang Li-Gao 1

Taylor and colleagues described the metabolomics data profile in the Born in Bradford (BiB) cohort. Metabolomics data were generated from two different approaches, i.e. NMR-based target metabolomics by Nightingale platform and MS-based untargeted metabolomics by Metabolon platform. To address varied research questions, there are six metabolomics datasets including different sub-populations and time points of BiB cohort. Overall, the datasets clearly presented in a useable and accessible format. The amount of details of data generation and study protocol is sufficient. I have several minor questions for the authors:

  1. Figure 2 shows the comparisons of measurements from NMR-based platform and clinical chemistry. In general, the correlations are high, between 0.73 and 0.93. However, in each measurement, there are “outliers” deviated from the diagonal. Are those measurements due to technical issues or participants’ biological characteristics? Do the authors recommend users of the datasets remove those outliers?

  2. Figure 5 shows the comparisons of fasting glucose measures between MS-based platform and clinical chemistry. The correlation is modest (0.65). Is it possible to verify the direction and magnitude of associations by metabolon measurements, as the authors did for NMR-based measurements and were shown in Figure 3?      

  3. In Figure 6B bottom part, “n=1,199 controls, n=408 cases, n=393 case-controls”. It is not very obvious where the numbers 408 and 393 coming from. Is it possible to explain this?

  4. In Figure 7, it showed that the overlap between MS Pregnancy Dataset 1a and MS Pregnancy Dataset 2 is 1000. However, in Figure 6B, the authors mentioned that 1,186 removed due to missing or selected for previous metabolon sample. If the previous 1000 already removed, how could be the overlap is 1000?

  5. On Page 13, the authors mentioned that 37 out of 1000 women in MS Pregnancy Dataset 1 had different values than the other 963 women. Could be the difference due to certain disease status?

  6. On Page 14, the authors suggested that “a more lipidomic response to pregnancy in White Europeans and a stronger glycaemic response in South Asian women.” Since the cohort did not have measurements before pregnancy, it is not exact to use “response” here.

  7. On Page 16, the authors mentioned that they did not have chance to collect blood samples before pregnancy, but previous research showed that “metabolites change upon becoming pregnant and then revert to pre-pregnancy levels [5] and that they change during pregnancy [24].”  Do the study consider to collect blood samples after pregnancy, as the proxy for before-pregnancy?

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

metabolomics, genetics, twin/family study

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Wellcome Open Res. 2021 Sep 6.
Kurt Taylor 1

2. Figure 2 shows the comparisons of measurements from NMR-based platform and clinical chemistry. In general, the correlations are high, between 0.73 and 0.93. However, in each measurement, there are “outliers” deviated from the diagonal. Are those measurements due to technical issues or participants’ biological characteristics? Do the authors recommend users of the datasets remove those outliers?

This is a useful question and one that we cannot answer for certain. We have acknowledged the potential outliers within the revised manuscript and recommend that researchers carry out their own investigations into these, which will depend on their research question:

“It is evident from Figure 2 that there are outliers for some of the measures, notably for glucose, total cholesterol and triglycerides (Figures 2A, 2B, 2E, respectively). We would recommend for researchers using the data to consider these potential outliers before commencing analyses. Determining how to deal with outliers will depend on the research question and the personal preference of the research group undertaking analyses.”

2. Figure 5 shows the comparisons of fasting glucose measures between MS-based platform and clinical chemistry. The correlation is modest (0.65). Is it possible to verify the direction and magnitude of associations by metabolon measurements, as the authors did for NMR-based measurements and were shown in Figure 3?

We thank the reviewer for pointing this out. We agree that it would be useful to include this. We have included these associations in Figure 3 to avoid having too many figures. We have also commented on the associations in text:

“Given the relatively modest correlation of glucose from the Metabolon mass spec analyses with the clinical chemistry levels on the same samples, we explored this further comparing results from two regression analyses – one of the differences in mean glucose per 1SD higher BMI (glucose as outcome) and one of the odds ratio for HDP per 1SD higher glucose (glucose as exposure).”

In Figure 6B bottom part, “n=1,199 controls, n=408 cases, n=393 case-controls”. It is not very obvious where the numbers 408 and 393 coming from. Is it possible to explain this?

We have added the following to the Figure 6B legend:

“The 801 cases are split into “cases” and “case-controls” based on how many cases we would expect in a representative cohort (i.e. the case-controls). “Cases” should not be included in a comparator group for any analyses as we want the comparison group to be representative.”

3. In Figure 7, it showed that the overlap between MS Pregnancy Dataset 1a and MS Pregnancy Dataset 2 is 1000. However, in Figure 6B, the authors mentioned that 1,186 removed due to missing or selected for previous metabolon sample. If the previous 1000 already removed, how could be the overlap is 1000?

We thank the reviewer for identifying this mistake. We have amended the figure to show the overlap of ‘0’.

4. On Page 13, the authors mentioned that 37 out of 1000 women in MS Pregnancy Dataset 1 had different values than the other 963 women. Could be the difference due to certain disease status?

The reviewer makes a useful point. This was our initial thought when we first discovered the unusual values. We checked the samples against all the phenotype data and did not find anything to suggest that the differences were being caused by disease status, or any exposure such as smoking or BMI. We think that it is a batch effect, as mentioned within the manuscript:

“In principal components analyses, 37 of the 1,000 women with a pregnancy sample in Dataset 1 of MS data had notably different values to those in the remaining 963 women, which may be a batch effect (see Figure in Extended data 11 ). This is a new finding and in previous analyses using these data we have not treated these 37 women differently. However, for future analyses we would recommend researchers consider running analyses with all women and in a sensitivity analysis with these 37 women removed.”

5. On Page 14, the authors suggested that “a more lipidomic response to pregnancy in White Europeans and a stronger glycaemic response in South Asian women.” Since the cohort did not have measurements before pregnancy, it is not exact to use “response” here.

We agree with the reviewer. We have reworded this sentence:

This suggests a more lipidomic pregnancy metabolic profile in White Europeans and a stronger glycemic metabolic profile in South Asian women.

6. On Page 16, the authors mentioned that they did not have chance to collect blood samples before pregnancy, but previous research showed that “metabolites change upon becoming pregnant and then revert to pre-pregnancy levels [5] and that they change during pregnancy [24].”  Do the study consider to collect blood samples after pregnancy, as the proxy for before-pregnancy?

The reviewer makes an interesting point. At the time that the COVID pandemic began, a follow-up of BiB participants was underway. The main focus was on the index children, with data largely collected in primary schools (including blood samples). However, some parents were recruited, and it is intended to run the NMR metabolites on those. The sample size will likely be smaller than anticipated.  We have noted this in the ‘Discussion and future directions for metabolomic analyses in BiB’ section and note there that we will update this data note regularly with any new metabolomics analyses.

----------------------------------

ADDITIONAL SECTION TO THE REVISED DATANOTE

Since submitting the data not new versions of the two Metabolon MS data have been derived. We have added a new section called “second version of the Metabolon MS data” in the “issues for data users” section. We describe why we obtained these new versions and some analyses of how they relate to the original versions. 

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Taylor K: Metabolomics data in the Born in Bradford cohort.2020. 10.17605/OSF.IO/YST7N [DOI]

    Data Availability Statement

    Underlying data

    Scientists are encouraged to make use of the BiB data, which are available through a system of managed open access.

    • Before you contact BiB, please make sure you have read our Guidance for Collaborators. Our BiB executive review proposals on a monthly basis and we will endeavour to respond to your request as soon as possible. You can find out about all of the different datasets which are available here. If you are unsure if we have the data that you need please contact a member of the BiB team ( borninbradford@bthft.nhs.uk).

    • Once you have formulated your request please complete the ‘Expression of Interest’ form available here and email the BiB research team ( borninbradford@bthft.nhs.uk).

    • If your request is approved, we will ask you to sign a collaboration agreement; if your request involves biological samples, we will ask you to complete a material transfer agreement.

    Extended data

    Open Science Framework: Metabolomics data in the Born in Bradford cohort. https://doi.org/10.17605/OSF.IO/YST7N 11 .

    This project contains the following extended data:

    • BiB_MS_Dataset1_PCA_Plot.png (Figure showing principal component analysis of dataset 1)

    • MS_Metabolite_Details.xlsx (Lists the names and details of all metabolites assessed by the Metabolon platform in the BiB MS dataset 1a (sheet 1) and BiB MS dataset 2 (sheet 2).

    • MS_Quality_Report_Dataset1.pdf (QC report for MS dataset 1 from Metabolon)

    • MS_Quality_Report_Dataset2.pdf (QC report for MS dataset 2 from Metabolon)

    • NMR_Metabolite_Details.xlsx (Lists the names and units of all metabolic traits assessed by the NMR platform)

    • NMR_Quality_Report_Cord.pdf (Summarizing quality observations in the NMR cord blood dataset)

    • NMR_Quality_Report_Infant.pdf (Summarizing quality observations in the NMR infant dataset)

    • NMR_Quality_Report_Pregnancy.pdf (Summarizing quality observations in the NMR pregnancy dataset)

    Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).


    Articles from Wellcome Open Research are provided here courtesy of The Wellcome Trust

    RESOURCES